Tika Agent
Last updated
Last updated
The Tika agent converts documents from their native format to a text representation.
Tika is an open-source component licenced under the Apache Licence. It supports a myriad of file formats.
It is most efficient to co-host a Tika agent on each enrichment server. The Tika agent can be intense on CPU utilisation and memory consumption. It works closely with the enrichment agent and the need to ship the binary file is inherently network intensive.
The Tika code base includes timeout and long running process protection. This provides robust stability protection during the conversion process.
Tika's temporary files can build up due to errors during enrichment. If this happens you need to clear the Tika temp folder.
There is a script called delete-temp-files.ps1 within the scripts folder that will clean up the temp folder (e.g c:\InsightMaker\scripts). It can be run manually as needed or run on a schedule using Window Task Scheduler.