Document data capture with Microtask Digest

Whether your documents are medical claims, genealogy records or invoices, Microtask Digest service provides an automated pipeline for processing and extracting the relevant data in the format you need. The pipeline combines human labor with machine intelligence to efficiently digest the scanned material and produce the results quickly and with high quality.

Batch and stream processing

Sometimes you already have all the documents you need to be processed and sometimes it's an ongoing activity. Cloud based Digest service can take care of both cases by providing a customized pipeline for your project.

Contact us for more information and a quote for your project!


Digest OCR

Most OCR solutions are intended for capturing full text of a document and to produce a Word or an Excel file as a result. When you need to extract specific data in a specific format, these products are of little help. Digest OCR is a unique solution that utilizes human assisted machine intelligence to extract just the data you need with high quality.

Unlike regular OCR software that operates on individual pages, Digest OCR looks at all the pages at once, doing statistical analysis, dynamic language modelling and building custom dictionaries. This allows it to reconstruct broken words and determine correct characters even when the original image is damaged or has poor scanning quality.

Download our whitepaper comparing Digest OCR to industry leading OCR products to see the difference!


Digest ICR

For handwritten material Digest provides an ICR solution based on machine assisted human keying. Digest breaks the document down into small segments and these microtasks are then completed by a cloud based workforce. The ICR system provides assistance such as dictionaries, dynamic rules and reference information to ensure high quality and effective data entry.

For documents containing both typewritten and handwritten text, Digest will automatically use both OCR and ICR to deliver the best results cost effectively.


Frequently Asked Questions (FAQ)

I run a BPO business, can I use Digest to improve our performance?
Most certainly! Digest is a cloud based technology solution suitable for integrating into BPO processes.
What is your minimum size for a data extraction project?
Since configuring the Digest pipeline for a project takes some effort, we typically work on projects with tens of thousands of documents.
What languages does Digest OCR support?
Digest OCR supports any language given enough data, since it builds a dynamic language model based on the input. Currently we support only western alphabets, however.
My data is sensitive and cannot be sent outside country X. Can you handle that?
Depending on the country, typically the answer is yes. We run Digest on various cloud platforms providing data centers around the world. If a human workforce is required, it can be sourced from that country as well.
My BPO company has its own workforce, can that be utilized by Digest?
Yes! Digest can use any workforce by providing a web interface for performing the microtasks anywhere. You will get accurate reports on worker performance and accuracy as well.
What kind of pricing models do you use?
Pricing is based on the amount of data that is extracted from the documents. We give a quote based on the provided sample data.
Can I license Digest and run it on my own servers?
Digest is a cloud based SaaS product and is not available for licensing.