We stand with Ukraine

The reality of today’s big data problems?

Expert.ai Team - 19 July 2016

The reality of today’s big data problems? More than just a buzzword, big data has become a major disruption that is now part of every organisation’s arsenal of tools for intelligence about every aspect of its business for decision making (at least it should be). However, no one said it would be easy. While the opportunity is clear, many organisations are still struggling with big data problems.

Big data doesn’t mean more accessible information

When you have a small amount of data, not only is it easier to manage and handle, it’s also easier to find what you’re looking for. However, the reverse is also true: the larger the scale of your big data, the noisier it will be and the more difficult it will be to distill the true signals from the noise and useless information. It’s not an uncommon problem; according to MIT Technology Review, “only 0.5 percent of all data is ever analyzed.”

No matter the size of your data repository, however, to tackle this big data problem and be able to access the information useful for your business, it helps to start with a clear and specific hypothesis of what you want to find.

Information is not the same as understanding

Another big data problem is that information does not equal understanding. Big data may reveal trends and correlation only between the variables you choose (but tell you nothing about the bigger picture). Another consideration is the problem of uncontrolled variables, because big data analyzes a small number of variables and it never tells us which correlations are meaningful. We have no idea if the relation is causal, if it is driven by a hidden variable, or if it is an accidental correlation. Big data can be easily gamed to create false positives with no actual meaningful connection between the variables. This leads us to another big data problem.

The bias in big data

A recent big data problem has emerged regarding bias in data. Studies from the Federal Trade Commission and universities such as Harvard and Princeton have highlighted the tendency of information to carry the biases of those compiling it. As the authors of the report “Big Data’s Disparate Impact” from Princeton’s Center for Information Technology Policy write, “An algorithm is only as good as the data it works with.”

You’ll want to be as objective as possible to ensure that your analysis is not carrying forward any company or personal biases where any results or signals that go against one’s experience or expectation may be discounted. Similarly, you’ll also want to have as much information as possible about all the variables in your business—both inside and in all of the markets you serve—so that you can better read the signals that you’re seeing in analysis.

Big Data, big security issues

As organizations collect ever more data, its security, and the compliance, legal, financial and reputational ramifications in the event of a breach, is a top concern. According to ZDNet, more than one billion personal records were illegally accessed in 2014. In 2015, the U.S. Office of Personnel Management alone had a number of breaches that resulted in the theft of data on 22 million employees, including the fingerprints of 5 million. The output of big data analysis is also at risk. According to the IDG Enterprise 2016 Data & Analytics Survey, 39% of respondents are protecting the this with alternate or additional security.

The contents of big data, from sensitive information about customers and R&D pipelines, not to mention the intelligence and personal data in government databases, is simply too important not to support it with strict security measures to ensure the integrity and safety of people and information.

Big Data could make the difference with the right metrics

Big data isn’t going away. With 51% of organizations preparing to increase their big data investment in the next two to three years (according to Forbes), working in advance to prepare for these big data problems will provide significant advantage in your ability to take advantage of the beneficial aspects of big data.