In order for systems to transform data into knowledge and insight that businesses can use for decision-making, process efficiency and more, machines need a deep understanding of text, and therefore, of natural language. Artificial intelligence (AI) technologies are a perfect partner for this activity at speed and scale, but when it comes to the core activity of making the text available for other processes, “understanding” is not always the capability at work.
Let’s look at some common types of AI that work closely with language data to understand what they are and how they work: natural language processing (NLP), natural language understanding (NLU) and natural language generation (NLG).
What is NLP?
Natural Language Processing turns words into action. For machines, human language, also referred to as natural language, is how humans communicate—most often in the form of text. This format is not machine-readable and it’s known as unstructured data. It comprises the majority of enterprise data and includes everything from text contained in email, to PDFs and other document types, chatbot dialog, social media, etc.
A subfield of artificial intelligence and linguistics, NLP provides the advanced language analysis and processing that allows computers to make this unstructured human language data readable by machines. It can use many different methods to accomplish this, from tokenization, lemmatization, machine translation and natural language understanding.
What is NLU?
For a person to learn a language, it can take 2,000 hours or more. Grammar complexity and verb irregularity are just a few of the challenges that learners encounter. Now, consider that this task is even more difficult for machines, which cannot understand human language in its natural form. NLU Creates order out of chaos.
Instead, machines must know the definitions of words and sentence structure, along with syntax, sentiment and intent. Natural language understanding (NLU) is concerned with the meaning of words. It’s a subset of NLP and It works within it to assign structure, rules and logic to language so machines can “understand” what is being conveyed in the words, phrases and sentences in text.
Natural language understanding uses a range of techniques (parsing and sentence analysis, grammatical and logical analysis, and finally semantic disambiguation) so as to provide a machine with the ability to truly grasp what’s happening in a document: who did what, to whom, where, etc. This allows it to disambiguate: in other words, to understand ambiguity (meaning in context), to provide comprehension of text (versus just reading), and to understand semantics (meaning) and relationships between words in text.
NLU vs NLP: What’s the Difference?
Together, NLP and NLU are a powerful combination that can be used to transform unstructured data into information that can be leveraged for insight, intelligence, efficiency and automation for a number of real-world applications and use cases.
To differentiate between NLP and NLU, remember that:
- While both NLP and NLU focus on human language, their objectives are different.
- NLU fills the gap between human language and machine understanding.
- Where NLP breaks down language into a machine-readable format and processes language, NLU provides language comprehension.
- NLU learns language syntax, context, patterns, definitions, sentiment and intent.
What is NLG?
Where NLP helps machines read and process text and NLU helps them understand text, NLG or Natural Language Generation helps machines write text.
The “suggested text” feature used in some email programs is an example of NLG, but the most well-known example today is ChatGPT, the generative AI model based on OpenAI’s GPT models, a type of large language model (LLM). Such applications can produce intelligent-sounding, grammatically correct content and write code in response to a user prompt.
The ability of an application to be able to generate text depends largely on two things: the underlying AI approach for processing and training data and the dataset it is trained on. Therefore, while a NLG application is able to produce text that sounds correct, it does not mean that the text it produces is accurate or factual.
In the case of ChatGPT or any application trained on a large language model—with 45 terabytes of data in the case of GPT-3—it is trained using a machine learning approach that uses statistics and pattern matching to predict the words and phrases that come next. This is in contrast to NLU, which applies grammar rules (among other techniques) to “understand” the meaning conveyed in the text.
This difference is very important to distinguish when it comes to NLG and generative AI because the text it produces can be so authoritative and correct sounding that it’s easy for users to be convinced that it actually does understand the text that it is repeating.
To understand how NLG works, it’s important to remember that:
- NLG does not understand the text on its own—it needs the language rules and knowledge base integration with NLP and NLU to do that.
- The text that an NLP application produces is the result of pattern matching, not an understanding of facts.
- The text that NLG applications are able to produce depends on the data set that the underlying model/algorithm has been trained on.
The Success of Any Natural Language Technology Depends on AI
Artificial intelligence is critical to a machine’s ability to learn and process natural language. It’s what generates the algorithms and rules for learning. So, when building any program that works on your language data, it’s important to choose the right AI approach.
The two most common approaches are machine learning and symbolic or knowledge-based AI, but organizations are increasingly using a hybrid approach to take advantage of the best capabilities that each has to offer.
Machine Learning AI: Data Training
Machine learning uses computational methods to train models on data and adjust (and ideally, improve) its methods as more data is processed. Rules are honed through repeated processing and learning.
The computational methods used in machine learning result in a lack of transparency into “what” and “how” the machines learn. This creates a black box where data goes in, decisions go out, and there is limited visibility into how one impacts the other. This makes models highly susceptible to bias. What’s more, a great deal of computational power is needed to process the data, while large volumes of data are required to both train and maintain a model.
Both of these factors increase exponentially when we think about large language models that have scraped large amounts of data from the internet that can contain biased and toxic content and are both energy-intensive and expensive to operate.
With natural language AI driven by machine learning alone:
- Acquired knowledge trains machines autonomously.
- Computational resources are needed to process data.
- Large data sets are needed to train machines.
Symbolic AI: Embedded Rules
Symbolic AI uses human-readable symbols that represent real-world entities or concepts. Logic is applied in the form of an IF-THEN structure embedded into the system by humans, who create the rules. This hard coding of rules can be used to manipulate the understanding of symbols.
Using symbolic AI, everything is visible, understandable and explained within a transparent box that delivers complete insight into how the logic was derived. This transparency makes symbolic AI an appealing choice for those who want the flexibility to change the rules in their NLP model. This is especially important for model longevity and reusability so that you can adapt your model as data is added or other conditions change.
Symbolic AI works differently than machine learning because:
- Humans write rules.
- Rules can be revised.
- Embedded knowledge trains machines.
Hybrid AI: A Best of All Worlds Approach
Understanding AI methodology is essential to ensuring excellent outcomes in any technology that works with human language. Hybrid natural language understanding platforms combine multiple approaches—machine learning, deep learning, LLMs and symbolic or knowledge-based AI. They improve the accuracy, scalability and performance of NLP, NLU and NLG technologies.
With Hybrid AI, teams can:
- Leverage the flexibility of an approach that integrates multiple techniques
- Use less data by focusing strictly on your domain(s) of interest
- Ensure humans are in the loop during the development, training and fine-tuning phases
- Meet your governance goals through algorithm explainability, transparency and accountability