This article was written in German, automatically translated into other languages and editorially reviewed. We welcome feedback at the end of the article.
How to start your project with Konfuzio
In order to understand the basics of the Konfuzio platform, we recommend this tutorial, which will teach you how to train your own AI in just a few minutes using only 5 documents. To do so, you can watch the video below or follow the step-by-step tutorial below. Video watch on YouTube.
Documents AI Step-by-Step Guide
- Create a new project
Click HOME > Projects > Add Project + to create a new project. Name your project. In our example it is called "Receipts". Save the project via "Save". You can invite additional users to your project via HOME > Project Invitations > Add+ .
- Create a label
Click HOME > Labels > Add Label + to create a label. Name your label. In our example, it is called "Bruttobetrag". Add it to your project via the tab (Here: "Quittungen") and click on "Save".
Click HOME > Templates to access the templates. Click on the template that has the name of your project (here: "Quittungen"). Add the created label to the template by using the arrow buttons to add it from "available Labels" to "chosen Labels". Save by clicking "Save". In the next tutorial, you will learn how to use templates to read complex documents. - Upload documents
Click on DOCUMENTS. You can upload your local files here via Drag&Drop or the browser window. Click on the Reload button to reload the page after the upload. Now the OCR process starts. Depending on the file size, this may take a moment. We are now uploading 9 receipts (5 training and 4 test documents).
- Labeling
Once the OCR process is complete, you can access your document via "Smartview". The OCR will have divided the information in your document into entities. "Entities" are individual words or pieces of information that are outlined with dashed lines. When you click on them, their background turns green. "Annotations" are relevant information in a document that should be retrieved or used. They are entities that have been assigned a label, which is done either manually by a human or automatically by AI. Use our lasso if you want to assign multiple entities to a label. To do this, hold down the mouse pointer and drag the red lasso that appears over the entities you want to select.
Click on an entity you want to mark (here e.g. "48,60"). On the right side in the annotation bar, you see that the content of the entity is read by OCR. Click on "Save" to assign the created label to the entity (here: "Bruttobetrag") and thus convert it into an annotation.
In a more complicated project, you would now need to select what type of template it is and what section of the document it is in. This is what the top tab is for. In this tutorial, however, we will only cover the basics, which is why you only have one label to choose from.
Repeat step 4 for all uploaded documents. Use the arrows to switch between the documents. - Division into training and test data
After all documents have been labeled, you can now split them into training and test data.
The training data set contains manually labeled documents, on the basis of which the AI learns how to label documents itself. The test data set also contains manually labeled documents. Here, the AI attempts to label them on the basis of the knowledge learned from the training data set. In retrospect, the documents created by the AI are then Annotations with those created by humans and statistically evaluated.
In the document view, you can now check the box to the left of each file name to select the documents. In our example, we select 5 documents and choose the action "Add to training data set" in the action tab at the bottom and click on "Go". Then we select the remaining 4 documents and repeat the step but with the action "Add to test data set". - Start retraining and evaluate results
Click HOME > Projects. Find your project and mark it with a check mark. In the Action tab, select "Retrain AI Model" and click "Go". A banner that says "AI model re-training has been started. This may take up to 24 hours." appears. In a small project like this example project, it should be trained after just a few minutes.
To check if the newly trained AI model is ready, click HOME >. AI models. There, the AI model is listed including the quantitative evaluation based on the test data. - Give feedback
Upload a new document as described in step 3. Click on "Smartview" after it has gone through the OCR process. Here you can revise the annotations produced by the AI. Confirm correct suggestions by clicking on the green tick and reject the incorrect ones by deleting this with the red "X". Also add any missing annotations.
You can now add this document to the training dataset as in step 5 to increase it and thus improve the AI model or you can export the information. If you get no results or very bad results, check if you did everything right in step 4-6 or increase the number of your training documents. - Export your results
Select the documents whose data you want to download by ticking them. If you select multiple documents here, they will be combined into one CSV file. Select the action "Get human revised data as a CSV file" in the action tab and click on "Go". The download of the CSV file should start automatically. CSV files can be used with spreadsheet programs such as Microsoft Excel, Google Sheets etc.
Any questions? We are constantly working to improve our instructions so that you can use Konfuzio as quickly and easily as possible. Please let us know if you have any unanswered questions so we can provide you with the best possible solution. Thank you!
Photo from Brandon Montrone from Pexels
About me
-
I am a digital detective, a spotter in the digital transformation. With a critical eye, I sift through the promises of innovations to separate truth from noise. My values? Objectivity in hype, transparency in complexity, and an ever-watchful eye for the monstrosities of digitalization. I invite everyone to join me on the quest. With every click, we reveal a bit more of the reality behind the dazzling sales claims.