Summary

This step summarises text or documents into a multi-sentence summary that can be used by users to understand what a document is about. Several algorithms are available along with several configuration parameters that control the behaviour of the step. Summary length is computed based on a logarithmic scale using the number of sentences in the document.

The Aiimi Insight Engine Enterprise Search user interface will look for the summary in the ‘metadata.summary’ field so this needs to be created in Control Hub before using this step if it does not already exist.

You will need to also perform some configuration in the steps configuration file which can be found in:

  • \PythonRestService\config\endpoints\summary.json

Settings:

  • nmax_text_size – maximum number of characters to use. If the text is larger than this value, the first n characters up to max text size are used.

  • percentage_of_numbers_allowed_in_sentence – this is the maximum percentage of numbers allowed in a sentence for it to be considered in summary generation. This setting helps avoid including sentences that are largely numbers.

  • minimum_sentence_count – minimum size for a generated summary.

  • maximum_sentence_count – maximum size for a generated summary.

  • language – the language to use – leave this set to English.

  • algorithm – the algorithm to use, the default being text-rank. Others include:

    • luhn

    • edmundson

    • lsa

    • lex-rank

    • sum-basic

    • kl

  • metadata_field_for_summary – the field to use to store the summary. This should be summary.

The endpoint for this enrichment step is ‘summary’.