Splitting AI: AI-controlled document splitting with Konfuzio

Scanned documents often involve a considerable amount of manual processing. But with the innovative Splitting AI feature from Konfuzio, you can say goodbye to these time-consuming tasks. Scan multiple documents at the same time and then process them individually.

The division is carried out using artificial intelligence, which makes the separate scanning of documents superfluous. So there is no need to use Barcodes separator pages. Our web-based interface (REST API) ensures simple connection to your projects.

Splitting AI saves you valuable time and allows you to focus your human resources on core activities within the company.

Features

Often the physical incoming mail is scanned in bundles, resulting in a long PDF file that includes multiple documents and needs to be split before processing. The following functions of the splitting feature enable the automatic Separation of your documents.

  • AI-based splitting - Artificial intelligence enables the accurate separation of your documents in the right places, eliminating the need for manual sorting.
  • Automated processing - The scanning process also eliminates the need for extensive manual intervention, which further speeds up the entire process.
  • Seamless integration - Integrate Splitting AI seamlessly into your existing projects via the Konfuzio REST API and process the PDFs automatically.
  • Download individual PDFs - Download each processed document directly as a PDF.
  • Data Privacy - Benefit from technology that is fully compliant with the GDPR and thus guarantees the legal security and privacy of your data.
  • Self-learning technology - Our AI is continuously improving for even greater precision over time.

Benefits

Our AI can be trained to recognize and automatically separate individual documents, enabling separate filing and processing. Here are your benefits at a glance:

  • Manual side separation no longer required
  • No need for separator pages or barcodes
  • Manual checking and adjustment possible at any time
  • Simplification of the document workflow without impacting upload limits
  • Download the separate documents as individual PDFs
  • Seamless integration through REST API
  • Full GDPR compliance
  • Increased operational efficiency through automated processing

Use Cases

Batch scans on end device

When a logistics service provider loads parcels at the airport and prepares them for air transportation, there are two types of documents that are important: Air Waybill and Arrival Notification. For each shipment, multiple instances of these documents are scanned together at the airport. Scanning them together saves the logistics company time, but results in a huge compiled file. In order to make the individual air waybills and arrival notifications accessible and easy to find for further processing, the huge compiled file must be split up.

This is where the splitting AI comes into play. The AI receives the comprehensive file, splits it into the constituent documents and categorizes them as either air waybills or arrival notifications. In addition, the documents are renamed based on their identification number, which is extracted using Konfuzio.

Various documents within one application

An insurance company has combined all documents relevant to a potential policyholder's life insurance application into a single document. These documents may include, for example, the "main application cover sheet", "insurability statement", "medical reports", etc. Within the further evaluation of the application, several steps can be automated through extraction and categorization.

The first step in achieving comprehensive automation of document processing is to split the comprehensively compiled application file into its constituent documents. This can be achieved with Konfuzio's Splitting AI. Konfuzio can then correctly categorize each split document and extract the necessary information, all in all providing a complete automation solution.

Important notes - Application of splitting AI

Users with a Konfuzio SaaS Pro subscription to app.confuzio.com as well as users with a self-hosted installation can use the Splitting AI feature directly.

For users with a Konfuzio SaaS Pro subscription on app.confuzio.com is the splitting feature in pre-trained projects created from a marketplace listing NOT available.

For users with a Konfuzio SaaS Basic subscription on app.confuzio.com the feature is also NOT available.

-> Please contact our support team to activate the Splitting AI and achieve the best results

Implementation

Setup

Step 1 - Activation

Activate Splitting AI in your project settings.

-> Instructions for step 1 - Activation

Step 2 - Selecting the mode

Choose your preferred splitting mode: manual or automatic, based on AI training.

-> Instructions for step 2 - Selecting the mode

Step 3 - Selecting the Splitting AI type

Select the most suitable Splitting AI type for your documents: Textual (default), Contextual or Multimodal. Details on the types can be found in the table below:

Textual (standard)ContextualMultimodal
Best suited for a wide range of documents.Ideal for similar document collections with a focus on fast processing.Analyzes both text and images.
Creates a good balance between accuracy and speed.May not work optimally with small data sets.Suitable for different document types, but slower than textual Splitting AI.

Training preparation and implementation

Step 1 - Categorization

Define categories for the document types to be separated.

-> Instructions for step 1 - Categorization

Step 2 - Document upload

Upload your documents for the training.

-> Instructions for step 2 - document upload

Step 3 - Training preparation of the Splitting AI model

Prepare your documents by assigning them to the respective categories.

Would you like to make the Konfuzio Document Validation UI the default option for opening all documents? How to proceed. Further information on the benefits of the Konfuzio DVUI can be found further down in this article.

Step 4 - Splitting the documents into training and test data sets

Divide the documents into a training data set and a test data set. As a rule of thumb, place around 80 % of the documents in each category in the training data set and the rest in the test data set.

-> Instructions for step 4 - splitting the documents

Step 5 - AI training Categorization

Train an AI to categorize your documents.

-> Instructions for step 5 - Categorization AI training

Creation of the Splitting AI

Create a new Splitting AI model: Visit the Splitting AI page and click on the button as shown in the following image.

Create new Splitting AI model

Start the training process by clicking on the Save click. It is not mandatory to give the model a name.

Start training Splitting AI

As soon as the training is complete, the Splitting AI model you have trained is ready for use. The successful training of the model is indicated by the status Training finished displayed.

Validation through human-in-the-loop

The Human-in-the-loop validation provides an effective method to optimize the performance of Splitting AI after training. This principle is specifically designed to address the challenges of document processing in numerous industries by enabling the automatic assignment of new uploads to defined categories. This makes document management much easier.

Konfuzio's Document Validation UI (DVUI) provides a valuable interface. It allows you to intervene in the process by reviewing the document splitting and classification suggested by the AI and correcting it if necessary. This interactive element allows the AI to adapt and evolve based on human feedback, increasing the accuracy of document processing.

By using the DVUI, you as a user can directly influence the performance of the AI by training the system to recognize new document types and thus ensure the precision of future automations. The continuous improvement of the AI through user feedback further reduces the need for manual intervention over time and makes document processing more efficient on an ongoing basis.

-> AI changes the rules of the game when it comes to document classification and separation. Get even more in-depth insights in our further technical article.

Conclusion and prospects

Konfuzio's Splitting AI not only improves efficiency in document processing, but also provides the basis for future developments. This technology promises to simplify data processing by reducing manual work and speeding up processes. In short, this segmentation function represents a step towards smarter and faster information processing.

Human-in-the-loop validation with the Konfuzio DVUI is an important complementary step towards improving automated document processing. It enables more precise categorization and organization of documents through direct user feedback and continuous adaptation of AI, making it much easier to manage documents in different industries.

Contact our experts today to learn more about the powerful splitting feature and activate it for your Konfuzio projects.

"
"
Maximilian Schneider Avatar

Latest articles