Extract data



Do you want to extract data from PDF files?

PDFs are widely used for sending and presenting information. Not only send suppliers Invoices or also payment advices as PDF, but almost all companies, private individuals and public sector actors.

Of course, you can easily view, save and print PDF files. But the problem is that PDF is designed to preserve the integrity of the file. It's more like an "electronic paper" to ensure that the content would look the same on any computer at any time.

A lot of important information often has to be laboriously extracted from a PDF. Manual typing of this information is no longer necessary thanks to the use of AI and OCR outdated.

For simple OCR software it was very difficult to technically recognize the information contained in a PDF file, extract data and export it in a structured way. Konfuzio offers you the option of training your own AI to automate this extraction of data from PDF and image files.

How to extract data from a PDF?

Time needed: 5 minutes

How to extract data from a PDF?

  1. Upload documents

    Upload sample documents. Make sure that they are as heterogeneous as possible. To get the first results, you need 5 documents.

  2. Create fields

    Define the fields you want to read out. To do this, create labels in Konfuzio.

  3. Mark examples

    In the documents from step 1, mark all the texts that you want the AI to extract in the future.

  4. Train AI to extract PDF

    Start the training via the web interface

  5. Upload new documents

    From now on, the AI takes over the extraction of the data. After uploading new documents, data will be extracted automatically.

  6. Download extractions

    Use the export via CSV or download the information via API.

What does PDF mean?

The PDF file is a platform-independent file format developed by the Adobe company so that electronic documents can be faithfully reproduced regardless of the original application program, operating system or hardware platform.

What is the cost of document extraction?

There are different price models. The prices of the providers start at up to 1 € and drop below 0.01 € per page for high volumes.

What alternatives to Konfuzio exist?

If you have only a few PDF documents from which you want to extract data, manual copy and paste is a quick way. Just open each document, select the text you want to extract, copy the value and paste the text into the Excel file.

lets work together