Various estimates place the percentage of data found in unstructured formats such as e-mail, social media feeds, private messages, and documents (including contracts, memos, clinical notes, and legal briefs) as high as 85% of all data in existence. There is a great deal of value that can be extracted from this type of unstructured data if we only had the right tools to do so.

Fortunately, we do have tools that can help us automate the process of extracting insights from unstructured data. Known as text analytics, it is still an evolving field. However, we can develop a better understanding of where things might go in this cutting-edge domain of tech if we study the current state-of-the-art. Towards that end, this article will discuss five use cases for text analytics.

1-Data exploration

Data exploration aims to answer the question, “What is this text (a comment, a tweet, or a store of texts) all about?” Exploring your data at a general level is an essential first step towards helping you understand what your data is about, how concepts and topics discussed in it relate to one another, and what types of categorizations or analyses can possibly be performed on the data. Using a data exploration tool typically involves using a UI of some sort that allows you to search common terms, see relationships between similar terms and concepts, and identify which messages, tweets, or documents discuss those concepts. Think of researching common themes in university application essays at a high level to understand what people most commonly choose to talk about in their admissions statements.

2-Trend analysis and signal detection

Instead of depending on frequency metrics (such as the total number of likes or shares of a social post) to make decisions or draw insights, the signal detection and trend analysis function of text analysis tools can pick up recurring themes and patterns. It attempts to answer the question, “What is relevant to, or specific about, my unstructured data?” Similar to data exploration, this is a more advanced exploration tool that can pick up more precise attributes of individual sentences. While data exploration explores what the data contains as a whole, signal detection can pick up specific things, even facts that may be counter to the overall trend or theme of an entire sentence of comment. Think of trying to pinpoint which parts of a successful ad campaign users liked the most, and which parts may have been ‘over the top’ or unappreciated by viewers based on their comments about each specific part of the campaign.

3-Content-based profiling and clustering

Content-based profiling and clustering is used to answer the question, “Which strings of unstructured data belong together?” As an example, it is useful if you want to group together different documents, messages, or texts that fit a certain profile. Online bookshops might use such a solution to recommend books that a reader might like to read based on historical choices. It can also be used by a news app that suggests articles for reading.

4-Extracting, categorizing, and mapping data

This is a process that tries to answer the question, “What elements am I able to recognize in my unstructured data?” If you think about clothing, recipes, or even the specs of any type of device, there are all sorts of abbreviations, ranges, and measurements that are used within the general human text that is used when talking about these things.

Think about comments or a product description on a fashion website. “The cuff is more than 2″ long,” “The cuff should be between 2″ and 3″ long,” and “The cuff should be half the length of your palm” are all acceptable ways to describe the ideal length of the cuff of a dress-shirt. However, picking up on what is being said, what the ideal length is, and tabulating this information in a way that is easy for a user to understand is not so simple, and this is the challenge that extraction, categorization, and data-mapping features of text analysis tools attempt to overcome.

5-Sentiment analysis

Also known as opinion mining, sentiment analysis tries to answer the question, “What is the attitude or opinion of the speaker?” This involves building a library of positive and negative words, and incorporating basic logic sequences into the text analysis tool (for example, making it understand that the words “but” or “however” between two statements act as qualifiers).

Putting it all together

To better understand how some of these tools work together, think about how social interactions with a restaurant might be text-analyzed. General data exploration can be used to establish that people talk about visiting the restaurant at specific times. Trend analysis and signal detection may highlight that oily food is a common theme. Comments on wait-times that are different from comments on food quality would be filtered using data profiling and clustering. And sentiment analysis could pinpoint for you that many people complain about too much oil in their food while, in general, customers are happy with the speed of service, even if there are more negative impressions than positive ones (which is a key point to keep in mind, especially if you only use numerical or aggregate data on likes and shares). Combined, this kind of analysis is a powerful tool, and it is becoming an increasingly important and widely used component of corporate branding and marketing strategies.

About this sample

A long form post written for a text analytics consultancy and which outlines 5 use cases for text analytics within data mining

About the Client

Niche consultancy providing text analytics services

Content audience

Early stage prospects looking to get a general understanding of text analytics use cases