For thousands of years, our mastery of language has allowed us to communicate and cooperate with one another to an extent matched by no other species. Now, thanks to cutting-edge AI technology, we have entered the next stage in our evolution: communication and cooperation with machines.
Unlike humans, machines haven’t had millennia to learn about the intricacies of language, so they require our help to understand it. The ability for machines to truly understand language, known as word-sense disambiguation, is quickly becoming one of the biggest and most consequential AI and natural language understanding (NLU) breakthroughs of our time.
NLP vs. NLU
Natural language processing (NLP) is a prerequisite to NLU. In NLP, a computer system is trained to make sense of human language through the analysis of text or audio. It does this by breaking down words and sentences into tokens. These tokens are then further analyzed via stemming, which identifies the root of each word. For example, the words “consign”, “consigned”, “consigning” and “consignment” would all be filtered down to their root, “consign”.
However, not all words are so easily reduced to their root and an additional step is required to complete the process. Lemmatization, or the process of grouping various forms of a word together with respect to grammaticality, morphology and phonology, helps to further tune the stemming process to ensure that the correct lemma (stem word) is returned.
While both NLP and NLU focus on human language, their objectives are different. Unlike NLP, which focuses on the analysis of language, NLU is concerned with the semantics or meaning behind words. To help computer systems decipher the intent behind a word or phrase, techniques like part-of-speech (PoS) analysis, classifier tagging (e.g., dates, addresses, money, etc.) and parsing (breaking words or text into syntactic components) are applied.
This gives the computer system a meaningful method of understanding a word using known grammar rules. These rules are learned when the computer system is trained on language data, which it obtains via text mining (extraction) classification or search. Once a system can correctly identify and understand the meaning of a word, or disambiguate, NLU is achieved.
The Complexities of Language
Computers struggle with ambiguity, because they lack the cumulative knowledge and contextual experience that humans have had. For example, the word “leg” can have several different meanings depending on the context in which it is used. It can refer to a human (or animal) appendage, a specific part of a foot race, the bottom pegs of a table or the long part(s) of a pair of trousers. Similarly, without the NLU process, common idioms like “break a leg” are utterly ambiguous to a machine.
Reading vs. Comprehension
Reading and comprehension are two very different things. Take the following sentence:
The painting was found by the tree.
A computer system is likely to read this sentence and interpret it to mean that the tree, which inherently lacks the ability to walk or see, was responsible for locating the missing painting. However, a computer system with disambiguation capabilities will correctly interpret the sentence to mean that the missing painting was found beside the tree versus by the tree.
Understanding the meaning of each word in a document or sentence is critical for comprehension, especially when there is more than one possible meaning. The word “run”, for example, could refer to the physical act (running), the action of leaving (“I have to run”), participation in an election (run for office) or a tear in hosiery (a run in the stocking).
Once the syntactic analysis function of NLU is complete (e.g., stemming, lemmatization, etc.), semantic analysis or word-sense disambiguation is applied to discern the contextual meaning of the word.
Establishing a clear relationship between words is a critical step in NLP and NLU. Knowledge graphs, which consist of nodes of concepts that are clustered according to their relationship to one another, can help put word relationships in a context that computer systems can easily understand.
For example, the terms “Ferrari”, “sedan”, “pickup”, “station wagon” and “Beetle” would all be clustered together on a knowledge graph because they all fall under the category of “automobile” and are therefore related to each other in the English lexicon. This connection between words is what gives machines a fundamental understanding of human language.
Expert.ai’s Disambiguation Process
The quest to achieve NLU is widely considered to be a problem on par with making computer systems as intelligent as humans. While the latter hasn’t yet been fully achieved, significant progress has been made by industry leaders. Expert.ai’s NLP platform is one such example.
By applying a multi-level text analysis (lexical, grammatical, syntactical and semantic) to linguistic data and combining it with symbolic and machine or deep learning algorithms, text is successfully translated into real-world concepts that machines can understand. More importantly, it does so with breadth and depth that approaches human language.
Lastly, with a knowledge graph representation of language, computer systems can quickly and easily make sense of text in a variety of formats including search, documents, chatbot messaging, emails and more. With disambiguation achieved, the benefits of the expert.ai process are clear.
What insights will you discover with the power of NLU and disambiguation?