Trie Entity Extractor

This step extracts keywords from text based on either regular expressions or dictionaries of terms (single or multi-word).

It is called the 'Trie' entity extractor since it uses a trie data structure to store the text and lookup values, which makes it very fast and efficient for entity extraction.

This step works in tandem with entity definitions that you set up in Control Hub. You will need to familiarise yourself with these configurations before using this step.

  • Normalise White Space - Normalise all while space to spaces.

  • Proximity Disabled File Types - You can include proximity words to help validate regular expression extracted terms. These file types will not perform proximity checking.

  • Include File Name - Do you want to extract entities from the filename.

  • Include File Location - Do you want to extract entities from the file location.

  • Whitespace Characters - What characters should be considered as whitespace.

  • Entities - You should then select the entities to extract. What will be extracted will depend on the entity definition which is in the entities section of Control Hub.

Last updated