Amazon Textract DVUI Konfuzio

Amazon Textract and Konfuzio DVUI - Data Extraction without Training

Elizaveta Ezhergina

Want a way to quickly extract information from any document without having to train a model first? There is a solution for that:

Use the built-in Amazon Textract, which is available on the Marketplace from Konfuzio is available and, in combination with the Document Validation UI (DVUI), enables efficient processing of documents and visualization or modification of the results.

Together with the Document Validation UI, (DVUI) of Konfuzio, it allows you to efficiently process documents and visualize or modify your results. In this blogpost we will write about Amazon Textract and its functionality and possible ways to extend it by integrating Konfuzio.

In addition, we illuminate in this context Amazon A2I in more detail, provide a comparison with Konfuzio's DVUI, and give you helpful tips on how to make sense of Amazon Textract's integration with Konfuzio in the marketplace.

What is Amazon Textract?

Amazon Textract is a technology developed by Amazon Web Services (AWS). It extracts textual and structured data from all types of documents. Information that AWS can process includes text, table data, form data, OMR (Optical Mark Recognition), handwriting, and signatures. When a user enters their API the results of the document processing are displayed in the form of a nested JSON presented with hierarchical relationships between the extracted objects, such as key-value pairs.

Amazon Textract and Konfuzio - The Integration

Amazon Textract is a powerful tool in its own right. But in conjunction with Konfuzio, this technology unleashes even more power, taking the application to an overall higher level. It is possible to make an API call to Textract and thereby enable use as a custom Extraction AI in a wrapper as well as create a document processing pipeline on Konfuzio.

Textract uses the returned data to generate annotations in the processed documents. These annotations are then grouped into logical groups of so-called Key-Value Annotation Sets. When forms offer multiple options to choose from, this is followed by labeling the options as "Selected" and "NotSelected". Typical examples of such groups are:

  • Key: Name
  • Value: Erika
  • Key: Marital status
  • Selected: Single
  • NotSelected: Married
  • NotSelected: Divorced

Table data is usually grouped into a single set of annotations for the values contained in the table. Once a document is processed and annotated, a user can invoke the DVUI to check the correctness of the annotations and possibly add new annotations - if information is missing.

Amazon also has an interface for validating automatic annotations called A2I, both of which we'll take a closer look at in the next section.

Konfuzio DVUI vs. Amazon A2I - A comparison

To check the correctness of annotations or to edit documents before annotation - for example, to split a stacked document consisting of multiple sub-documents - the user must access a validation interface. Both Konfuzio and Amazon provide their own developments for this purpose. Let's take a look at what they have in common and what differences they have.

FeatureAmazon A2IKonfuzio DVUI
Human-in-the-loop
Accessibility for external annotators✔*
Processing of different data types and tasks
Display of multiple annotations of different annotation groups
Adding new annotations on the fly
Modification of documents (splitting, rotating, rearranging)
Possibility of integration into customer-specific solutions
* provided the user has access

As can be seen from the above table, Amazon A2I has features that allow it to flexibly handle different types of processed data or tasks for the human validators. Konfuzio DVUI has unique features that allow it to work best with document annotation.

Possible use cases

Amazon Textract Konfuzio Use Cases

Amazon Textract and Konfuzio integration is applicable to all types of documents.

The combined solution of Amazon Textract and Konfuzio

The following languages are supported according to the current status (09/2023):

  • English
  • German
  • Spanish
  • French
  • Italian
  • Turkish

United States as demand area for Amazon Textract and Konfuzio DVUI

Let's take a look at one of the largest areas of demand for document processing: the US market. Currently, there are several main groups of documents that businesses and individuals deal with on a daily basis. These include, first and foremost:

  • Invoices and receipts
  • Taxes
  • Contracts
  • Account statements and transactions
  • Documents related to healthcare and insurance
  • Vehicle related forms

All of these documents are potential use cases for Amazon Textract and Konfuzio integration, as automating the work with these documents speeds up processing tremendously compared to manual processing.

Common forms with automation suitability

Some of the most common forms that are suitable for automation include:

  • CMS 40B: Medicare enrollment application
  • CMS R-285: Request for information on retirement benefits
  • IRS Form 4506-C: Tax form
  • Medicare Insurance Card ID
  • CMS-1500: Health insurance application
  • CMS-100: Application for employment
  • W3 & W4 forms: Tax returns

How Amazon Textract works with Konfuzio - An example

This real-world example illustrates how Amazon Textract works in combination with Konfuzio, how the user gains access and tests the integration.

Let's take a closer look at a document from Amazon Textract. It contains text and checkboxes as well as tables. The document is already integrated into Konfuzio and is being processed. Our default document is a receipt from an Internet provider. It contains all three data types.

Amazon Textract Sample Document
Amazon Textract Sample Document 2

After the processing is complete, we find that all three data types have been correctly annotated:

Amazon Textract Konfuzio
Amazon Textract Konfuzio 2

The explanation of the example

  • Black fields: These represent the key-value pairs of text data.
  • Red fields: These stand for table data.
  • Reasons fields: These check OMR checkboxes and form data.

Want to try the combination of Amazon Textract and Konfuzio for yourself? The integration is currently available on the Konfuzio Marketplace - whatever fits companies better.

To use it, you need to create an account on app.confuzio.com and request access to the Forms listing. Our experts will contact you after a successful access request.

Conclusion

Amazon Textract is easily integrated and ideally used in conjunction with Konfuzio and its DVUI to provide fast and accurate Data extraction to achieve without prior training of the models. You have the option to test the integration on the Konfuzio Marketplace and try out the extraction of any type of document yourself. Amazon Textract easily handles text and table data, as well as forms and checkboxes - and you can validate the results via the Konfuzio DVUI.

Do you have questions about Amazon Textract or Document Validation UI from Konfuzio? Contact us anytime via the Contact form.

About us

More Articles

Financial Statement Analysis Financial Statement Analysis

Automate financial statement analysis with AI

A look at the world of finance reveals that the precise and effective handling of financial data is essential for the successful management of a...

Read article
Delivery bill OCR

Delivery bill OCR: How to automate processing

With the use of advanced document AI and OCR technology, goods receipt can be optimized through automatic recognition and digital archiving of delivery note data.

Read article
Securities Settlement

Efficient securities settlement with AI

Our AI model for securities settlements offers the ability to analyze securities documents and extract relevant information such as shares, transaction type, securities account number, individual price and other...

Read article
Arrow-up