We stand with Ukraine

A challenge for Semantic Intelligence

Luca Scagliarini - 20 March 2009

Last Wednesday, I visited a client to discuss semantic searches. He motioned for me to sit in the chair in front of his desk. Then, right off the bat, he asked, “Can you explain why, when I search for something in Google or Yahoo, sometimes the information I’m looking for is at the top of the list and other times it’s not there at all?”

His question sparked a very interesting and lively discussion about the Semantic Web, which made me think about how much ground has been covered, but also about how much confusion still exists in regards to this subject.

The first time people began to talk about the Semantic Web was in 2001. It was a new kind of Web, in which web pages, various files, images and the like, would contain precise information about the data they contained. In this way, the Web would become Semantic: no longer a source for manually-searched documents, but rather an instrument capable of immediate and automatic data interpretation. I remember thinking, “Fantastic”. Many people still think it is just that…something that is more closely related to fantasy than reality.

The Web contains an enormous amount of information which is not always accessible. The pages that make up the Web are not “semantically” linked. The lack of explanation about content meanings and links, along with the exponential growth of the data it contains, is the main cause of fluctuation in the degree of precision of search results.

In order to give meaning to web pages, each informational resource should be able to provide information about itself (this is called “metadata”, meaning data about data). Of course, all of this information needs to be expressed in a language which is suitable for computers. To do this, the most feasible hypothesis is to use a shared vocabulary, along with some XML-based formalisms (I won’t go into the details here, further research on this subject can be done in Wikipedia). Let’s just say, that in this way, we can obtain complete, objective, accurate data and therefore, generate forms of analysis which are also exact: but who has the time to link each bit of information to the metadata? Usually, speed and simplicity are preferred (which compromise precision and efficiency).

There have been many approaches in an attempt to free the Semantic Web from labels such as, “interesting but (almost) impossible” and transform it into something “interesting but also useful and usable”. Some pioneers began to walk down the semantic road even before the theories about the Semantic Web were affirmed. For example, Semantic Intelligence aims to improve precision and recall in the search process, making computers able to automatically, “understand what we’re talking about”. If SI makes it possible to automatically understand what a text is talking about, then it is reasonable to think that metadata can be created for the Semantic Web. Today, we are way beyond the beta version frontier: Semantic Intelligence is a mature technology and is widespread in the business world.

We may not be that far away from that “fantastic” Web which is able to understand whether a jaguar is an animal or a car. A Web in which you can search for information on pop music from the Sixties and receive pages containing the keywords music, pop, and Sixties (for example), but also those about the Beatles and the Beach Boys and maybe even some useful tidbits about the next Rolling Stones tour.


Author: Luca Scagliarini