OCR Agent

The OCR agent converts images and scanned documents to text. To get the most out of images and PDFs within Aiimi Insight Engine an OCR agent converts them to text and image-over-text pdf files. This means you can:

  • Search the contents of a PDF or image file.

  • See hit highlighting for a PDF or image file result.

  • Easily find entities and metadata for a PDF or image file within preview.

The OCR agent ships with a built in OCR engine called IronOCR. It can also run a series of other OCR engines like ABBY Reader. The enrichment pipeline invokes the OCR agent through the OcrRest enrichment step.

For light OCR loads this agent will share a server with the other agents such as enrichment and source agents. If you are OCR’ing lots of content then you will want to run this agent on a dedicated server. You may want to increase the compute power for the initial load and then reduce it once you are processing deltas.

Last updated