Language Detection
The language detection step determines the language of a document and then store this in a metadata field called ‘language’. If this metadata field does not exist, create it as type keyword.
Settings:
max_text_size – maximum number of characters to use. If the text is larger than this value, the first n characters up to max text size are used.
percentage_of_numbers_allowed_in_sentence – this is the maximum percentage of numbers allowed in a sentence for it to be considered in summary generation. This setting helps avoid including sentences that are largely numbers.
metadata_field – leave this set to language
language_map – this is the country code to friendly name map. You can change the friendly value if you wish.
The endpoint for this enrichment step is ‘language’