Related Areas of Text Analytics

Named Entity Recognition

Named Entity Recognition (NER) is the concept of extracting specific types of information from a larger text [19]. For example, an email address could constitute an entity, and a NER solution could find and label email addresses in contact information. This has recent importance during the implementation of enterprise GDPR solutions, wherein Personally Identifiable Information must be located across an organization’s information repositories for the purposes of minimizing risk and processing data-subject access requests. Other applications of NER include enterprise search filtering, clustering and classification of documents, and processing recommendations. NER can be a useful supplementary step for summarization if there are entities such as codes or quantities of known importance to the application or workstream.

Topic Extraction and Sentiment Analysis

Sentiment analysis is a close relative of topic and concept extraction -- which is itself a limited version of text summarization, with no effort paid to comprehensiveness or readability. The key difference is in the amount and format of information presented back to the user. Summarization offers the most complete distillation of the data, where a fully readable and coherent paragraph is generated. Topic and concept extraction present less information and in simple nouns or a single sentence. Sentiment analysis outputs none of the original text, but simply a description of the text’s overall tone or author’s intent. The most common pair of sentiments inferred are “positive”/”negative” when analyzing e.g. online product reviews or customer feedback. This provides a quick and useful method for an organization to survey a product or strategy response. Many of the methods for conducting topic extraction are identical to those for summarization, simply exiting the algorithm at a point prior to processing for sentence linkage and grammar. Sentiment analysis operates similarly but includes additional logic accounting for typical linguistic methods of conveying positive/negative messages.

Last updated