Classification of documents with AI for your document management

You read an auto-translated version of the original German post.

Implement document management online

This tutorial is about the classification of documents with AI. Other than tutorial 1, 2 or 3 this one is not about extracting information from a document, but about the automated tagging of documents into certain predefined categories. This can be used to file documents and to optimize your document management and back-office tasks. 

How you can implement this in an AI project of your document management online, we show you as always with a practical example. We, again, will use receipts for this. Just like in Tutorial 1 and 2 we obtain our documents from a public dataset, which you can access here Our goal is to classify receipts into five industries (cafe, restaurant, hotel, retail, and public transportation). Without creating rules, the AI learns from examples to which of the industries new receipts belong.

In this documentation some elements are marked as beta. If any of these features are not yet operable, please contact us via our Contact form. Our support will take care of your problem immediately and of course free of charge.

Train AI to categorize or keyword documents.

  1. Create project

    You can create a new project or use an existing one. If you want to create a new one, you can do so. Check out Tutorial 1 to see how to create a project.

  2. Create Default Templates

    A default template is now required for each document category. This is a template which is not subordinated to a so-called "Parent Default Template". You create this via HOME > Templates > ADD DEFAULT TEMPLATE+. Here you just have to enter the name of your category (Here: "Café", "Restaurant", "Hotel", "Retail" and "Public Transport") and select your project. If you want to create multiple default templates like us, you should do this again via the template view and the ADD DEFAULT TEMPLATE+ button.

  3. Create training data

    Now click on DOCUMENTS to get to the document view. Here you can use your existing documents or upload new ones. Training the AI is especially easy if the file name indicates to which category this document belongs. Now we show the AI which documents belong to which category. We do this by selecting the corresponding category in the respective tab of the documents in the column "CATEGORY TEMPLATE" and clicking on the button "Save" in the bottom right corner. It is possible to assign the respective category to all documents on a page and finally click on "Save" for all of them. This procedure is only possible if the documents are not in the training, test or preparation data set. However, if this is the case, you should first remove the documents from the dataset with the action "Remove from dataset" in order to assign the category to them afterwards. After you are done with this step, add the documents back to the training dataset.
    To get high quality results suitable for dark processing, you should have at least 50 documents per category. So with our 5 categories, we use a training dataset consisting of 250 documents. You can add more files to the test dataset to evaluate the AI model later (beta). It is very important that documents do not overlap here in any case. If you have a file that contains several document categories, it is crucial that you split them beforehand and upload them individually so that you can then assign the category to each of them separately.

  4. Activate retraining

    You activate the retraining via HOME > Projects. Select your project and choose the action "Retrain category ai model" in the action tab and click "Go" afterwards.
    The AI will now look for patterns, similarities and differences between categories solely based on the mapping. You can read more about this in our article on the technical aspects of our classification.

  5. Test

    To see if your Category AI Model is finished training, click HOME> "Category ai model". Here you can also see a statistical evaluation of your AI model (beta). Furthermore, you can simply upload new documents as a test to see if they are classified correctly. Here the AI should already automatically show the correct category in the column "CATEGORY TEMPLATE". In our project, for example, the AI should automatically classify an uploaded hotel invoice into the category "Hotel".

  6. Export

    You can integrate the use of classification into the knowledge management of your company in many ways.
    First, it allows you to manage documents online by filtering by category on the right side of the document view. This makes it very easy, for example, to export only the contents of a certain document category.
    In addition, the category is also displayed in the CSV export. This results, for example, if you export the data of all documents, in a file directory sorted by category for your document management with Excel. Of course, you can also integrate the classification into your existing systems (e.g. from SAP) through an API integration and adapt it to your input management. This allows you to organize the correct filing of your documents and holds high potential for your back office activities.

Any questions? We are constantly working to improve our instructions so that you can use Konfuzio as quickly and easily as possible. Please let us know if you have any unanswered questions so we can provide you with the best possible solution. Thank you!

Photo from Karolina Grabowska from Pexels

Maximilian Schneider Avatar

Latest articles