Language fuels the enterprise. We find it in everything from emails to videos to business documents and beyond. However, as pervasive as language data is to the enterprise, organizations struggle to maximize its value. Not only is there an incredible amount of language data available to and contained within organizations, but an exponentially increasing volume of it, as well.
There is no ignoring the importance of language to the enterprise ecosystem. Organizations are listening, as 42% have already adopted natural language processing (NLP) systems while 26% plan to within the next year, according to IBM’s Global AI Adoption Index 2021.
Organizations need a method for leveraging the copious amounts of unstructured data available to them. So while this momentum towards NLP is a good start, it is just that…a start. Natural language understanding (NLU) is where the real difference is made for the enterprise.
What Is Natural Language Understanding?
NLU is a branch of artificial intelligence and subset of NLP. Where NLP breaks down language into a machine-readable format, NLU goes a step further to help the machines understand, interpret, and emulate that language. It provides structure to unstructured data (e.g., contracts, emails, social media, and other enterprise documents), which allows organizations to scale the reading, organizing, and quantifying of text data for easier analysis.
NLU fills the gap between human communication and machine understanding. We can leverage it to automatically understand the meaning of words in context via disambiguation and extract valuable information from text data.
Your AI Approach to NLU
There are many different approaches to building NLU capabilities. Each of them offers its own set of pros and cons. The most common approaches include:
Rule- and Knowledge Graph-based (Symbolic) Approach
A symbolic approach is based upon pre-established linguistic rules. A knowledge graph provides an explicit representation of knowledge complete with rich, expressive and actionable descriptions of concepts, both general and specific to a domain. This information supports the logical explanations of reasoning outcomes.
This approach is human-driven, as it relies on linguistic rules and the knowledge embedded in the knowledge graph to examine linguistic and semantic relationships to interpret language and its parts (e.g., grammar, sentence structure, etc.). This process enables you to analyze language, extract data, and categorize text.
Subject matter experts (SMEs) and/or knowledge engineers (KEs) are critical to this process as you often require a high level of control and the ability to adjust rules as needed. This approach is well-suited for task-oriented experiences, complex document analysis or search.
Machine Learning Model Approach
Supervised learning (SL) is the machine learning (ML) task of learning a function that maps an input to an output. The function is inferred based on labeled training data consisting of a set of training examples. Each example consists of an input object and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used to map new examples.
An optimal scenario allows for the algorithm to correctly determine the class labels for unseen instances. This requires the algorithm to apply generalizations from the training data to new and unseen situations in a “reasonable” way (see inductive bias). The statistical quality of this algorithm is measured by the so-called generalization error.
The Hybrid Approach
Machine learning and symbolic have long been considered the only viable approaches to natural language understanding. They have been pitted against each other as mutually exclusive options. This has forced organizations to compromise one way or another. In a hybrid approach, organizations can use both ML and symbolic in tandem, enabling them to realize the core benefits of each.
One point to clarify is that a hybrid approach does not mandate that ML and symbolic work in parallel. In fact, a hybrid approach can take any of the following three forms:
Symbolic techniques in support of a machine learning model. A primary example of this hybrid relationship can be seen in the features engineering process. This process is arguably the most important aspect of building a machine learning model as it establishes the features (i.e., attributes) with which you train your machine learning algorithms.
In an ML-only approach, this process is typically done manually by domain experts (tedious and time-consuming) or is automated using an open-source NLP library or API (limited language comprehension capabilities). However, a symbolic approach enables your domain experts to establish a rule-based structure to identify elements from your textual data that can become features of the input data. This is the best and fastest way to scale your expertise and maintain flexibility when you need to retrain your model.
Machine learning techniques in support of a symbolic model. A symbolic approach is ideal for efficiently classifying and extracting text from content in a highly accurate and explainable way. However, this technique can be less scalable due to the complex and time-consuming nature of rule writing, especially when subject matter experts are starting with a blank slate.
Machine learning can accelerate the process by creating an initial set of rules through automated annotation of a document set. In doing this, you transform “black box” results into an explainable rule-based framework. These rules can then be easily extended and fine-tuned via a symbolic approach for unrivaled quality control.
Symbolic and machine learning working in parallel. Though one approach often supports another in hybrid, there are many instances in which they work more closely together to accomplish a task. A primary example of this is categorization of complex documents.
In many cases, a passage can appear multiple times in a document and imply something different in both instances. For example, a monetary amount (e.g., $50,000) found in an insurance policy could imply a reduction in risk for the insurer if it refers to a deductible cost or premium, or it could increase the risk if it refers to coverages.
In this example, a hybrid workflow that leverages a symbolic approach to assign specific roles and characteristics to document segments and makes machine learning aware of this information could prove beneficial.
The human language is a complex beast for which the enterprise has long sought an ideal solution. With a hybrid natural language understanding approach, the beast can finally be tamed. The hybrid approach is the only way for you to address the intrinsic limitations of each individual technique while also realizing the benefits of each. Leave compromise out of your vocabulary (unless you need it in your knowledge graph) and embrace the approach that will transform the present and future of your organization.