OCR Agent

The OCR agent converts images and scanned documents to text. These are typically PDFs and TIFF files containing the results of a scan.

The OCR agent ships with a built in OCR engine called IronOCR. It can also run a series of other OCR engines like ABBY Reader. The enrichment pipeline invokes the OCR agent through the OcrRest enrichment step.

For light OCR loads this agent will share a server with the other agents such as enrichment and source agents. If you are OCR’ing lots of content then you will want to run this agent on a dedicated server. You may want to increase the compute power for the initial load and then reduce it once you are processing deltas.

Last updated