Growing Explainable Knowledge Team - 18 August 2020

In the Summer of 2019, the term “Knowledge Graph” (KG) suddenly appeared in the Gartner Hype Cycle for Emerging Technologies, already close to the peak of inflated expectation. That was quite a surprise, as it did not exist at all in the previous year’s: it just popped up halfway to the top, as a relevant topic, together with many other AI-related technologies.

At, we have been developing and using Knowledge Graphs for about 20 years now, so we have always been aware of the great impact that this knowledge model can have on language analysis and AI-based document understanding.

Our Knowledge Graphs, however, are more advanced than just collections of entities with attributes. Every KG is not just a graph-shaped knowledge base for people names, companies, etc. We do not just use it as a Gazetteer on steroids to pinpoint entities and get their info, as most KG tools out there do.

A KG at is a key element in document analysis and understanding, storing deep linguistic and conceptual knowledge that can be used for a full Word Sense Disambiguation (WSD) and to find relations among document content elements.

Each KG contains hundreds of thousands of concepts and terms, with millions of links among them, information about synonyms, similarities, relationships, linguistic phenomena, all in 14 languages and counting. Content includes carefully tuned weights and frequencies allowing the core technology to perform WSD with an unrivaled quality, no matter the domain, or industry, or document type. The core engine works a lot like a human reader, relying on learned rules and a lot of language knowledge (in the KG).

How do you deal with such a huge amount of knowledge? How can you manage it, keep it updated and fine-tuned, correct errors? You certainly don’t want to do it manually, not even with a large team of language experts. Maintaining consistency and a uniform strategy is virtually unfeasible.

Moreover, quite often the need is to extend and customize the KG, to account for specific domain or industry terminology and knowledge, or even embed some customer language, product lists, specific terminology, etc. for a perfect fit on business document language.

This is a complex and resource-intensive task that can’t be tackled with a manual approach, unless it’s very limited in scope.

That’s where the Cognitive Learning process enters the stage.

Boiling the ocean

One of our most powerful approaches to knowledge extension is based on using large document sets (called corpora) as knowledge sources for an automatic learning.

For example, we regularly download the whole Wikipedia article set and process all the documents with a full Cognitive Analysis, to create a huge “feature set”. At this point, we run a set of advanced learning algorithms, carefully tuned to distill relevant knowledge, and use the results to correct, extend and complete the existing KG. We exploit well-known techniques and have designed a powerful knowledge learning framework that allows for sustainable multiple KG management over time.

The process is entirely unassisted and requires no annotation.

In fact, it somehow emulates the human learning. When we study, we encounter new terms and concepts, and try to learn what they mean by using the context in which we find them. The more we find the same elements, the more context information we gather for them, and the more confidence we get in the knowledge we build for them.

Every new occurrence reinforces what we know, be it the meaning for a term, or the fact that it is the same as something we know, or that a single term can mean more than one thing, and a lot of relations to other terms and concepts.

Just a casual mention of something may not help, but multiple occurrences in different documents do, and that is how humans learn new concepts and words, constantly adapting and extending their world knowledge.

At the end of the process we have a large amount of knowledge added to the KG, and for each single element we have a carefully weighted confidence value, defined by the amount and quality of the contexts in which we found this element, representing “how much we trust what know about it”. Even elements we only met once may be added to the KG, with a low confidence, just like when we – as humans – have met one term once, we get a rough idea of what it means but would never confidently use it.

So, we can store and use knowledge about neologisms, slang, new companies and people, changes in the usage of known terms, etc. If we process industry-related documents, we learn the technical jargon and terminology, and optimize the analysis capabilities for that industry. By processing customer documents, we may learn about the corporate language, all the product names and what they are, etc.

A transparent Box

The process above extends a KG with new concepts, entities, attributes, and relationships. The KG “tailoring” is a native feature: we’ve been using it for years for KG evolution and for KG customization for Industries and customers.

It helps creating high-quality and low-noise knowledge extensions, thanks to our deep NLU analysis performed on all documents with the standard KG, relying on a top-notch quality knowledge extraction system.

Moreover, unlike virtually all automatic knowledge creation tools used now (e.g. those based on word embeddings and the like) it produces an entirely structured and accessible knowledge format.

What is learned is completely explicit, and this is what makes this feature unique.

Everything can be navigated, checked, corrected, tuned, extended, under clear and complete user control. We don’t create opaque numeric vectors, but rather explicit concepts with their terms, the relationships to other concepts, etc., just like manually crafted knowledge.

The whole knowledge generation framework does not include any Black-Box component. Every single result detail, like concepts, entities, relationships, or attributes, is directly accessible, fully explainable and editable. We can have operators performing manual validation to ensure the highest quality, pinpoint single elements and edit them to correct any value. This is one of the major differentiating factors for with respect to competitors. Our knowledge is always explicit, and the results are explainable, which is one of the major challenges for today’s AI.

Nico Lavarini
Chief Scientist,