Analyzing documents does not always have to be technically complicated. Sometimes it is just a matter of capturing simple markings in large quantities. This calls for a solution that is as easy to use as it is efficient: Optical Mark Recognition (OMR) provides a remedy and transforms ballpoint pen crosses into analyzable data.
This approach should not be confused with optical character recognition (OCR), a special technique for Text recognition. We will go into the exact differences between OMR and OCR in more detail in the course of this blog post.
This text was automatically converted in your speech.
What is Optical Mark Recognition?
Optical Mark Recognition is a technology that allows machine recognition of marks in documents, especially in forms. Typically, handwritten ticks, crosses or circles are captured in this way. These are often in paper form, so OMR requires appropriate hardware, such as a scanner. There are special OMR devices that already have the necessary functionalities integrated. Further steps of analysis and interpretation are usually performed by the workflows of OMR software, although the individual distribution of tasks between the two components can vary: In some cases, a scanner is superfluous due to the completely digital preparation of the documents.
Due to the speed at which data can be obtained, OMR is well suited to processing large volumes of paper from surveys, elections or multiple choice tests. As the technology is largely limited to capturing specific markers, combinations with other approaches such as Barcode- or text recognition.
How does Optical Mark Recognition work?
The details of the process and the allocation of tasks to the individual components can vary depending on the individual implementation. In general, however, a loss of importance of classic OMR hardware in favor of powerful software can be observed.
OMR devices usually have a special scanner that irradiates a form with light. Wherever markings are expected, e.g. around the checkboxes of a questionnaire, the reflective contrast is then measured. A reduced reflection indicates that markings are positioned there. Some devices, on the other hand, work using transparent paper and determine the amount of translucent light. Due to the comparatively high costs, however, these approaches are increasingly becoming a thing of the past.
Visually, the devices often resemble a mixture of scanner, printer and fax machine. Typical examples include the OMR 23E or AXM960 from DATAWIN.
OMR software has gained a great deal of expertise in recent years and is increasingly replacing the need for additional hardware. Nowadays, commercially available scanners are often sufficient and all further steps are carried out digitally. This is primarily the definition of a Templatesconsisting of a blank form in which the expected positions of the markers are defined. A Anchorthat appears in the same place in every copy - e.g. a logo. In relation to this, the position of the markings can be estimated and determined more precisely by counting pixels. Specific software is e.g. QS-Beleg or Evasys.
A significantly higher degree of flexibility in form analysis is achieved through the use of artificial intelligence. Using neural networks such as a Convolutional Neural Network (CNN) automated and accurate recognition of markings and even text is possible using just a few training examples. Similar technologies already exist on this basis, but they should not be confused with Optical Mark Recognition.
Differences between OMR and OCR
optical character recognition (OCR) is also used to analyze documents and forms, but focuses on the Recognition of characters. It is therefore a technically similar but significantly more complex approach which, unlike OMR, is not limited to capturing the position of simple markings. Instead, the identification of individual (handwritten) letters enables the conversion of optical texts into digital formats.
|Automatic detection of markings
|Optical character recognition
|Standardized documents, forms and questionnaires
|Text in PDFs, images or other optical formats
|Searchable text in a digital format such as JSON
|Evaluation of interviews, election certificates and medical tests
|Automated data extraction from documents such as invoices, delivery bills or payment advice notes
Strengths of OMR
The generally more complex performance of OCR does not mean that OMR is superfluous. A look at the strengths of marker recognition makes it clear when and why this technology is preferable.
When analyzing forms, Optical Mark Recognition is a method of very high precision that can achieve up to 99.9%. This is mainly due to the particularly consistent data capture, which is made possible by the uniform tuning of the technology to specific forms. In addition, OMR prevents human error in the case of manual transcription.
Compared to many other analysis methods, Optical Mark Recognition has lower system requirements and requires less know-how for implementation and application. Adaptation to special form types is therefore uncomplicated and interface-based.
OMR allows thousands of forms to be analyzed per hour, making it one of the fastest data collection methods. In addition, personnel and logistical costs are reduced to a minimum, resulting in significant resource savings.
Optical Mark Recognition is therefore the most obvious choice for simple data acquisition based on markings. Similar performance can also be achieved with other technologies, but these require more resources. Only when mark recognition reaches its limits - namely in the identification of characters - is the use of optical character recognition, for example, indispensable.
Combine marker and character recognition
Companies often have various documents that contain both specific markings and handwritten notes. Analyzing these is a costly and time-consuming process. Attempts to use separate tools to capture the various elements instead often fail due to integration or only lead to marginal savings at best due to the effort involved.
The solution is a platform that flexibly combines OMR and OCR: Konfuzio. In addition, the AI-based document software features computer vision, barcode recognition and intelligent character recognition (ICR)which focuses particularly on handwriting. Various image formats, PDFs and e-mail attachments are supported, as is integration with service providers for external scanning. This makes No OMR device necessary.
All relevant forms of data can be digitally captured, extracted and processed using the appropriate technical approaches. This includes handwriting in over 130 languages, printed text in over 200 languages, 2D and 1D barcodes and various markings. A look at an exemplary, heterogeneous document illustrates the necessity of this versatile approach.
Example - Declaration of official duties
A typical use case is the declaration of employment obligations. In these, employers provide insurers with important information about the work or sickness status of employees. The forms therefore contain the following elements, to which the various Data capture systems can be assigned:
- Checkboxes with markings: Optical Mark Recognition (OMR)
- Printed text: Optical Character Recognition (OCR)
- Handwriting (e.g. names and contract numbers): Intelligent Character Recognition (ICR)
Information on the data set
# Fields: 17
of which # checkbox: 6
of which # manuscript: 11
# Documents: 50
# Pages: 100
In order to be able to use the various technologies in a targeted and precise manner, the AI from Konfuzio trained on 50 DOE forms. Each of these comprises 2 pages, resulting in a total of 100 pages. The forms each contain 17 fields, including 6 checkboxes and 11 fillable text fields (handwriting). During the analysis in the training run, the software was able to correctly read 422 of a total of 600 checkboxes. This corresponds to an accuracy of over 70% - an exceptionally good result. During further use, Konfuzio learns from the user's individual documents and adaptations in order to achieve maximum accuracy.
# of all checkboxes: 600
Correctly read: 422
When filling in forms by hand, precision can vary greatly. A mark does not always end up exactly in the checkbox intended for it. To compensate for this, a larger marking area than printed is defined in the template in the first instance. Corrected fields, for example due to overpainting, can be excluded by the increased pigmentation if the next field is filled in correctly. The marking sensitivity can also be set individually. A correction algorithm uses these criteria to check whether a reliable classification is possible or whether the case in question should be submitted for re-examination.
Integrated action rules and data comparisons help with further validation. In the event of deviations, Konfuzio provides a verification station to ensure accuracy at all times by comparing with the original. If required, this can also be done manually via the web-based interface. This allows users to maintain constant control despite the high degree of automation.
Areas of application and use cases
There are countless use cases for the specific application of OMR alone - some can be found in almost every company. The technology is particularly valuable for the following areas.
The main discipline of marker recognition is the precise and fast evaluation of election results. Ballot papers are normally available in anonymized form and only contain a few crosses. These can be easily captured by OMR - preferably thousands of times before the first projections.
Medical questionnaires often consist largely of checkboxes that ask, for example, about the presence of certain pre-existing conditions or allergies. This is relevant before an MRI examination, for example, but also for general patient data. OMR can make it much easier to analyze and manage this data.
Multiple-choice tests are another use case that occurs in many educational institutions and universities. Mark recognition makes it possible to quickly determine the results. Assessments are a suitable counterpart from the business world.
Optical Mark Recognition is a precise, efficient and easy-to-use technology for the automatic evaluation of markings. It can be used to analyze forms based on crosses, tick marks or circles in a short time and in large quantities. This can now increasingly be done digitally using software, meaning that traditional OMR devices are no longer a must. This also makes it possible to combine OMR with supplementary techniques such as optical character recognition for heterogeneous documents. The use of an AI-based complete solution that provides all the necessary functionalities and control instances is an obvious choice. This leads to high-quality results, data-based findings and saves valuable resources.
Would you like to find out more about the benefits of OMR for companies? Please leave us a message. Our experts will get in touch with you as soon as possible.