09 May 2013

At Enterprise Data World, a Focus on Unstructured Information

Last week I attended EDW, one of the most relevant educational events for data management. The conference was very well attended and it was a great opportunity for me to understand whether unstructured information has finally reached the level of importance in supporting the strategic decision-making process as reported by industry analysts and the press.

If I have to judge from the very limited understanding of unstructured information displayed by the data scientists, data architects and data consultants attending the presentations, I would have to disagree with Gartner and IDC. However, it is true that even compared to last year, more and more presenters seemed to pay more attention to the unstructured portion of information, and some of them presented clear cases supporting this focus. So, if we all aren’t completely on the same page yet, at least we’re getting close.

For technical people who have spent their entire careers looking at data in rows and columns, it may be difficult for them to develop an understanding or appreciation of the value of messy unstructured data. From an organizational point of view, it’s likely that data teams will expand to include crazy information analysts.

In order for this to happen, organizations only have to look as far as the business value of unstructured information. The good news is that, compared to raw data, unstructured information has the advantage of showing its business value quite clearly. For brands, the impact of a negative customer experience shared on Facebook, a leaked personal email or even off-the-record comment, or the rediscovery of valuable research buried inside an unnamed folder, cannot be questioned. I believe that the time for the world of data modeling to include unstructured information is already here.

Knowledge intensive sectors like publishing, oil and gas and finance are already using “soft” unstructured data for a variety of purposes: to maximize the value of their assets (publishing), real-time monitoring of supply chains against unexpected events that could create significant economic harm (oil and gas), including valuable contextual information in predictive models (finance). The advantage of these proof cases is that, once explained, they are very easy to understand. In the coming years, we’ll hear more and more real world examples highlighting the importance of including analysis from unstructured information and data extracted from internal or external information streams.

So, even if some of the questions I heard at EDW from intelligent, serious and prepared data scientists are still good material for laughs over beers with friends, I think that “the times they are a-changin”. Those who have begun to invest in semantics—the best way to ensure a smooth integration between the two worlds, btw—will laugh for a long time and all the way to the bank.

Author: Luca Scagliarini