Tika Agent

The Tika agent converts documents from their native format to a text representation. Tika is an open-source component licenced under the Apache Licence. It supports a myriad of file formats. See their site for more information.

The Tika agent can exhibit quite intense CPU utilisation and memory consumption. It is most efficient to co-host a Tika agent on each enrichment server. This is because It works closely with the enrichment agent and the need to ship the binary file is inherently network intensive. We have extended the Tika code base to include timeout and long running process protection. This provides robust stability protection during the conversion process.

Last updated