What’s in a Text Analysis tool?

Behind the scenes in NLP

Megaputer Intelligence
5 min readNov 7, 2019
A word cloud generated by PolyAnalyst.
A word cloud produced in PolyAnalyst.

If you ask five different people, “What does a Text Analysis tool do?”, it is very likely you will get five different responses. The term Text Analysis is used to cover a broad range of tasks that include identifying important information in text: from a low, structural level to more complicated, high-level concepts. Included in this very broad category are also tools that convert audio to text and perform Optical Character Recognition (OCR); however, the focus of these tools is on the input, rather than the core tasks of text analysis.

Text Analysis tools not only perform different tasks, but they are also targeted to different user bases. For example, the needs of a researcher studying the reactions of people on Twitter during election debates may require different Text Analysis tasks than those of a healthcare specialist creating a model for the prediction of sepsis in medical records. Additionally, some of these tools require the user to have knowledge of a programming language like Python or Java, whereas other platforms offer a Graphical User Interface.

Let’s take a look at some of the most popular types of Text Analysis tools one might encounter.

Part-of-Speech Taggers / Syntactic Parsers

Two of the most basic Text Analysis tasks are part-of-speech (POS) tagging and syntactic parsing. POS tagging adds part-of-speech labels to words, such as noun, adjective, and verb. Syntactic parsing identifies the underlying syntactic relationships among words in a sentence; these relationships are often visualized in a tree structure for easier interpretation. Rather than the analysis end-goal, these two tasks are usually a step that helps users perform further analysis. Thus, they are more likely to be used by researchers in academic institutions or R&D departments in industry, and such analysis often requires programming knowledge.

Verbatim:

Working with my team leader , who is a vast reservoir of information and knowledge .

Tagged:

Working/Verb with/Preposition my/Pronoun_Singular team/Noun_Singular leader/Noun_Singular ,/Punctuation who/Pronoun is/Verb_Singular a/Article vast/Adjective reservoir/Noun_Singular of/Preposition information/Noun_Singular and/Conjunction knowledge/Noun_Singular ./Punctuation

Chunked:

Working [with]/PP [my team leader]/NP , [who]/NP [is]/VP [a vast reservoir]/NP [of]/PP [information and knowledge]/NP .

Dependency Parsed:

A sentence parsed with PolyAnalyst’s dependency parsing technology.

Concordance / Keyword Tools

Concordance tools are used to create alphabetical lists of the words in text and their immediate context. They provide statistics regarding the frequency of words and how often they co-occur with other words, as well as the identification of important keywords in the text. These tools usually include a graphical interface for viewing the words in the text and are used both in academia and industry.

A word cloud generated by PolyAnalyst.

Text Annotation Tools

Text Annotation tools may be used for manual or automated annotation. These tools tag certain parts of the text based on a pre-existing schema or categorization model. Similarly with other tagging tasks, the annotated text is used for further analysis in a more structured format. Text annotation tools are very useful for classification tasks and are used widely in both academia and industry.

Entity Recognition Tools

Entity Recognition tools help identify entities such as people, companies, organizations, and locations in the text. They are often connected to resources such as knowledge graphs, which allow the enrichment of these entities with additional information about them and their relationships with other entities. Most of these tools are targeted toward business applications, in order to automate such processes in large organizations that have an abundance of data. These tools often support conversational AI agents.

A link terms visualization produced by PolyAnalyst.
A link terms graph produced in PolyAnalyst. This graph shows connections between related concepts.

Topic Identification/Modeling Tools

Topic Identification and Modeling tools employ text clustering methods in order to identify emerging themes or high-level topics in text. This type of task is useful both in academic and business settings. The majority of the text clustering tools require programming knowledge for both the analysis and visualization of the results, but there are a few available that provide a graphical user interface.

Sentiment Analysis Tools

While Sentiment Analysis or Opinion Mining is a task that is relevant to both academic and business settings, most of the tools available are targeted to businesses. Sentiment Analysis tools allow users to identify positive and negative sentiment in their data; depending on how advanced the tool is, it may also identify higher-level topics associated with the sentiment, as well as different sentiment degrees. Because of the business orientation of such tools, they usually include visualizations of the results for easier reporting.

Link terms data visualization with sentiment analysis that was produced in PolyAnalyst.
A link terms graph displaying positive sentiments in green and negative sentiments in red.

Query Search Tools

Another category of Text Analysis tools allows users to search text for instances of a word or a phrase. Some tools use simple queries with keywords and Boolean operators, while others offer advanced query languages to target more complex patterns in the text; a few tools even allow users to ask questions using natural language. Query Search tools may or may not have a GUI, depending on the target audience: tools oriented to businesses are more likely to have a GUI, whereas tools without a GUI usually require programming knowledge and are used as a precursor step for further data analysis.

Summarization Tools

Summarization tools are particularly popular in tasks associated with long, well-organized text, such as legal documents or scientific articles. These tools provide a brief summary including key points of the text and are used both for business and academic purposes.

The Combo Please

Most available text analysis software platforms offer one or two of the above tools within an integrated system. However, Text Analysis projects usually require a combination of the above tasks and techniques; this means that a lot of users need to use multiple tools for their analysis needs. Integrated text minng tools like PolyAnalyst take care of everything, starting from data loading to the visualization of results using an intuitive GUI.

Originally published at https://www.megaputer.com on November 7, 2019.

--

--

Megaputer Intelligence

A data and text analysis firm that specializes in natural language processing