A technology that strives to understand human communication must be able to understand meaning in language. In this post, we take a deeper look at a core component of our expert.ai technology, the semantic disambiguator, and how it determines word meaning and sentence meaning via disambiguation.
To start, let’s clarify our definitions of words and sentences from a linguistic point of view.
Word Meaning and Sentence Meaning in Semantics
Semantics is the study of the meaning of words, phrases, sentences and text. This can be broken down into subcategories such as formal semantics (logical aspects of meaning), conceptual semantics (cognitive structure of meaning) and today’s focus of lexical semantics (word and phrase meaning).
A “word” is a string of characters that can have different meanings (jaguar: car or animal?; driver: one who drives a vehicle or the part of a computer?; rows, the plural noun or the third singular person of the verb to row?). A “sentence” is a group of words that express a complete thought. To fully capture the meaning of a sentence, we need to understand how words relate to other words.
Going Back to School
To understand word meaning and sentence meaning, our semantic disambiguator engine must be able to automatically resolve ambiguities with any word in a text.
Let’s consider this sentence:
John Smith is accused of the murders of two police officers.
To understand the word meaning and sentence meaning in any phrase, the disambiguator performs four consecutive phases of analysis:
During this phase, the stream of text is broken up into meaningful elements called tokens. The sequence of “atomic” elements resulting from this process will be further elaborated in the next phase of analysis.
- John > human proper noun
- Smith > human proper noun
- is > verb
- accused > noun
- of > preposition
- the > article
During this phase, each token in the text is assigned a part of speech. The semantic disambiguator can recognize any inflected forms and conjugations as well as identify nouns, proper nouns and so on.
Starting from a mere sequence of tokens, what results from this elaboration is a sequence of elements. Some of them have been grouped to form collocations (e.g., police officer) and every token or group of tokens is represented by a block that identifies its part of speech.
- John Smith > human proper noun
- is accused > predicate nominal
During this phase, the disambiguator operates several word grouping operations on different levels to reproduce the way that words are linked to one another to form sentences. Sentences are further analyzed to attribute a logical role to each phrase (subject, predicate, object, verb, complement, etc.) and identify relationships between them and other complements whenever possible. In our example, the sentence is made of a single independent clause, where John Smith is recognized as subject of the sentence.
- John Smith > subject
- is accused > nominal predicate
During the last and most complex phase, the tokens recognized during grammatical analysis are associated with a specific meaning. Though each token can be associated to several concepts, the choice is made by considering the base form of each token with respect to its part of speech, the grammatical and syntactical characteristics of the token, the position of the token in the sentence and its relation to the syntactical elements surrounding it.
Like the human brain, the disambiguator eliminates all candidate terms for each token except one, which will be definitively assigned to the token. When it comes across an unknown element in a text (e.g., human proper names), it tries to infer word meaning and sentence meaning by considering the context in which each token appears to determine its meaning.
- Is accused > to accuse > to blame
- police officer > policeman, police woman, law enforcement officer
Want to learn more about the disambiguation process? Take a deep dive in our brief, “Disambiguation: The Key to Contextualization“.
Originally published October 2016, updated May 2022.