Extract data


Do you want to extract data from PDF files?

PDFs are widely used for sending and presenting information. Not only send suppliers Invoices or also Payment advice as PDF, but almost all companies, private individuals and public sector actors.

Of course, you can easily view, save and print PDF files. But the problem is that PDF is designed to preserve the integrity of the file. It's more like an "electronic paper" to ensure that the content would look the same on any computer at any time.

A lot of important information often has to be laboriously extracted from a PDF. Manual typing of this information is no longer necessary thanks to the use of AI and OCR outdated.

For simple OCR software it was very difficult to professionally recognize the information contained in a PDF file, extract data and export it in a structured way. Konfuzio offers you the possibility to train your own AI in order to Extraction of data from PDF and image files.

How to extract data from a PDF?

Time needed: 5 minutes.

How to extract data from a PDF?

  1. Upload documents

    Upload sample documents. Make sure that they are as heterogeneous as possible. To get the first results, you need 5 documents.

  2. Create fields

    Define the fields you want to read out. To do this, create labels in Konfuzio.

  3. Mark examples

    In the documents from step 1, mark all the texts that you want the AI to extract in the future.

  4. Train AI to extract PDF

    Start the training via the web interface

  5. Upload new documents

    From now on, the AI takes over the extraction of the data. After uploading new documents, data will be extracted automatically.

  6. Download extractions

    Use the export via CSV or download the information via API.

What does PDF mean?

The PDF file is a platform-independent file format developed by the Adobe company so that electronic documents can be faithfully reproduced regardless of the original application program, operating system or hardware platform.

What is the cost of document extraction?

There are different price models. The prices of the providers start at up to 1 € and drop below 0.01 € per page for high volumes.

What alternatives to Konfuzio exist?

If you have only a few PDF documents from which you want to extract data, manual copy and paste is a quick way. Just open each document, select the text you want to extract, copy the value and paste the text into the Excel file.


Write a comment

More Articles

Contract analysis with AI

Dr. Patrick Afflerbach, member of the board of 1:1 Assekuranzservice AG, reports on the use of Konfuzio Since 2019, 1:1 Assekuranzservice...

Read article
Automatic text summarization Faster R-CNN for page segmentation

Automatic text summarization in documents with faster R-CNN and PEGASUS

Increasing volumes of documents and the information they contain need to be processed by businesses today in order to harness the hidden content...

Read article
colorful umbrellas in the air

Input Management using OCR AI

AI-driven input management through OCR and NLP In insurance companies, it has long been nothing new to use input management systems to...

Read article

    Are you looking for more information?

    You are also welcome to call us at +49 6441 8994005 or book a meeting.