It's no longer an overly common sight, but they still exist: bank customers filling out a remittance slip and submitting it to the bank. The data on these forms has long since ceased to be entered manually into the internal system by bank employees. Just as the processing of the documents themselves is automated, so is the scanning of these forms. This technology is called OCR, Optical Character Recognition or optical character recognition. For industries with high processing volumes, the implementation of such solutions is worthwhile. But what about companies without a development department or a different business focus? This is where online OCR solutions come into play.
This article was written in German, automatically translated into other languages and editorially reviewed. We welcome feedback at the end of the article.
Online OCR solutions at a glance
The use of text recognition is often subtle, as a supporting function within more complex processes. OCR is relevant for customers as well as for internal processes within companies.
Online text recognition for consumers
As a tech-savvy consumer, chances are highest that you have used online OCR applications before. This is the case, for example, when you want to quickly convert data from images or PDF files into a form that allows you to continue working with this data, for household accounting, genealogy or other hobbies. Here, online platforms offer the fastest way to extract text and data from the document. The user does not have to install any software on his computer and can often choose from various output formats. For example, PDF files can be converted to Excel or Word, depending on the layout and content. Image files such as JPG or TIFF can also be used. The files are uploaded to the selected platform for conversion. Processing then takes place on the provider's server. As soon as the output file is ready, e.g. an Excel file, the user can download it.
Integrate OCR in the enterprise
For companies, other requirements often arise. Here, a manual process with public platforms is not a reliable method for online processing. On the other hand, it is much more important that OCR processing is integrated into the company's existing processes. For this purpose, there are technical interfaces, in short APIs, to the providers of OCR services on the network. The API then allows enterprise software applications to access OCR services directly online.
One example of this is incoming invoices. These come directly as a PDF by e-mail into the mailbox of the entrepreneur. In the same way, the invoice can still be received classically in paper form. In the case of paper invoices, all pages are scanned in order to convert them into a digital JPG, PNG or TIFF image. After that, the process is the same again for both cases. Through the API, the documents can be automatically forwarded to the online service for word processing. The converted files are then directly processed to read out e.g. account data, price lists or recipient details.
A Word document can also serve as a basis for new content. And if e.g. a JPG image was used, image information can be used directly for Annotations, titles or image descriptions to be used for display on the Internet.
Advantages of online solutions
There are several advantages to the approach of delegating OCR tasks online. It starts with the infrastructure. Complex processing, especially with high document volumes, requires additional hardware in the company. The hardware, as well as the software itself, must be regularly maintained and updated. This places additional burdens on the IT team or the need to hire additional staff or service providers.
With in-house solutions, performance peaks are also more difficult to absorb. If more computing power is needed for a short time, companies can only slowly build up internal resources. Online OCR solutions work with variable resources and can react immediately to higher demands thanks to modern cloud architectures. Software updates, OCR algorithm improvements etc. are immediately available to users of the online services without manual updates.
Should I use open source or commercial OCR solutions
For small and medium-sized companies, the primary question is whether freely usable open source software is sufficient for the OCR solution. Or is a commercial platform the more efficient way? Here, it strongly depends on the competencies within the company. Open source OCR solutions such as Tesseract do not entail any acquisition costs for the software itself. Whether PDF or image, many formats can be read and many languages are recognized. However, it requires technical expertise to integrate it as an online component into the existing IT infrastructure. No technical support is offered, but instructions and documentation can be found on the Internet. IT administrators can use this to perform the setup for the company.
Commercial providers, on the other hand, offer direct customer service. Hardware use and cloud connection for the software are also often already included. Here, it is worthwhile to compare the services and prices among the providers in detail. One should ask oneself:
- Does the online solution provide all the OCR functions I need?
- How easy is it to integrate the API into my system?
- Can I use the platform's API for my established input formats (am I more likely to use PDF files or images, more likely to use long text or complex forms)?
- Can the tools convert my documents into the required output formats (Excel, Word, JSON, XML, etc.)?
- Does it allow us to keep up with future growth prospects?
- Am I possibly paying for a lot of services that I will never need?
Through this analysis, relevant candidates can be selected.
Also pay attention to the security of the OCR solution online
Despite all the advantages, it is important not to forget security. In many cases, sensitive data is transmitted online. When choosing a provider, great attention must be paid to how the data is transferred. After all, a PDF or image can contain very personal data. This can be any text, such as financial information, birthdays, home addresses or medical details. In any case, files must only be transferred with strong encryption so that no one can intercept data as it travels between systems. The OCR provider's system itself must also be strongly protected against external attacks. This should be a key criterion when choosing a provider.
In addition to data security, data protection also plays a major role. Particularly in Europe, this is made strictly regulated by the GDPR, the General Data Protection Regulation. The servers of the online provider should be located within Europe if possible (More information about third countries). Deriving the data to other regions should not take place. Neither the original document (e.g. the invoice scan or the PDF file), nor the converted files (e.g. the finished Excel spreadsheet) should remain on the OCR system server after online processing.
Text recognition accuracy
Conventional online OCR usually does not recognize all text elements. Recognized text elements are grayed out.
Konfuzio OCR recognizes all words
Example identity card online scan
Conventional OCR recognizes the letters only with errors.
Konfuzio online OCR
Mobile and intelligent - The future of OCR
As soon as the connection of one's own system with the online solution is established, many new possibilities arise to improve internal processes or communication with customers or business partners. Text recognition is also easy to do from mobile devices thanks to distributed computing power. Thanks to the continuous expansion of OCR platforms, new services are quickly available. Intelligent algorithms with artificial intelligence (AI) allow the recognition of handwriting and fragmented information across multiple pages.
Video sources are also coming into focus. Tools for text recognition from videos can convert content without sacrificing the performance of their own systems. Extracted text can then be used to improve the free accessibility of video sources, for example. And not only OCR processing, but also the subsequent processing steps can be gradually moved to the cloud. This allows the company to focus on its core business without having to worry about server technologies and IT administration.
- Regulations on third countries within the General Data Protection Regulation DSGVO: https://dsgvo-gesetz.de/themen/drittland/
- Open source OCR solution Tesseract documentation: https://tesseract-ocr.github.io/tessdoc/
The difference between online and local OCR applications is the transmission of documents over the Internet. This transmission must be demonstrably securely established. In addition, the provider must protect the data on its own servers from misuse. More information about the measures is available from all reputable providers on request.
For home users, various platforms are available for direct conversion of PDF files. The PDF file should be available on the local device (desktop, tablet, smartphone). The file can then be uploaded to the provider's server via a form. Additional options allow the selection of the output format. Once OCR processing is complete, the resulting document is available for download.
The quality of the results of OCR processing by online platforms is not fundamentally different from applications on the user's own computer. Much more important are the algorithms that are used. Here there are often visible differences, especially with more complex content and shear-readable documents. Often, local applications are not up to date with the latest technology after a few years on the computer. Online platforms, on the other hand, offer the latest version at all times, as they are automatically updated for all users. Machine learning also works much better on online platforms, as much more data is available to train the algorithms.