Summary
This step summarises text or documents into a multi-sentence summary that can be used by users to understand what a document is about. Several algorithms are available along with several configuration parameters that control the behaviour of the step. Summary length is computed based on a logarithmic scale using the number of sentences in the document.
The Aiimi Insight Engine Enterprise Search user interface will look for the summary in the ‘metadata.summary’ field so this needs to be created in Control Hub before using this step if it does not already exist.
You will need to also perform some configuration in the steps configuration file which can be found in:
\PythonRestService\config\endpoints\summary.json
Settings:
nmax_text_size – maximum number of characters to use. If the text is larger than this value, the first n characters up to max text size are used.
percentage_of_numbers_allowed_in_sentence – this is the maximum percentage of numbers allowed in a sentence for it to be considered in summary generation. This setting helps avoid including sentences that are largely numbers.
minimum_sentence_count – minimum size for a generated summary.
maximum_sentence_count – maximum size for a generated summary.
language – the language to use – leave this set to English.
algorithm – the algorithm to use, the default being text-rank. Others include:
luhn
edmundson
lsa
lex-rank
sum-basic
kl
metadata_field_for_summary – the field to use to store the summary. This should be summary.
The endpoint for this enrichment step is ‘summary’.
Last updated