Fact vs. Fiction: How Does Machine Learning Actually Work?

Jay Selig - 18 May 2021

There are things that we hear so frequently (and without correction) that we understand them as fact. For instance, if you crack your knuckles too often, you will develop arthritis. However, we cannot take everything we hear at face value — because it is not always true. A perfect example of this is what we have been taught to believe about how machine learning works.

By now, there are a number of opinions about machine learning that many believe as fact. These include:

  1. Machine learning will eventually develop a human level of intelligence.
  2. Machine learning requires no human intervention to operate.
  3. Machine learning guarantees a high level of accuracy.
  4. Machine learning models automatically improve over time.

But are these statements fact or are they fiction? A simple breakdown of the artificial intelligence technique will tell you all you need to know.


The Truth About Machine Learning

Like many other technologies, machine learning (ML) offers great promise for businesses across several use cases — but not all of them. Despite the hype generated by the Big Tech marketing machine, it’s often not the best solution for analyzing unstructured information. For example, IBM’s Watson does not think or reason. Nor does GPT-3. Not today, nor in the near future. This is a fact. The problem is, no one wants to talk about.

At its core, machine learning is “one way of programming a computer to execute a task.” So, before you dive into how ML works, it’s important that you set the right expectations about its potential impact on your business.

Fact: Machine Learning Cannot Deliver Human-Level Intelligence

It’s easy to get the impression that computers could become very intelligent. In a sense, they can and already are. Where people become misguided is their belief that computers can reach (or even surpass) a human level of intelligence via AI and machine learning. The truth is machine learning has little to do with human intelligence. Rather, it is a technology that “learns” through training and processes specific inputs to apply text analysis.

Although there are several machine learning techniques, they all share one common thread: they are driven by statistics and co-occurrences. In layman’s terms, ML does not have any embedded knowledge. Rather, it requires a set of documents to train the model — and usually the larger the set of documents, the better.

Fact: Machine Learning Requires Significant Manual Labor

Training a machine learning algorithm is not a straightforward task. It not only requires that you obtain immense amounts of data, but manually pre-tag every data set as well. Tagging is a very time- and labor-intensive process too, not a one-time deal. Regardless, it is necessary as it provides your algorithm with the parameters needed to extract key information and categorize it to your needs.

Only after processing numerous documents and assessing both co-occurrences and keyword frequency will a system recognize the topic of document. Even then, it is no guarantee you will achieve the results you set out for. Per a survey by Dimensional Research and Alegion, 96% of companies have run into training-related problems with data quality, labeling required to train the AI, and building model confidence.

Fact: Machine Learning Does Not Guarantee Learned Accuracy

The accuracy level of a trained ML system is reliant on several factors, with the quality and volume of training data chief among them. However, it’s important to note that neither of these factors are mutually exclusive. Quality determines how representative your training documents are of the specific jargon you wish to extract from them. Volume determines the frequency of the jargon that the machine can learn from.

Selecting the right mix of training documents is key, but it is difficult to get right. More often than not, you will experience either:

  • Underfitting: The training documents cannot sufficiently train your model.
  • Overfitting: The training documents train your model well, but only on select documents. As a result, new documents fed to it cannot be accurately processed.

Even if you do select the right mix of data, machine learning models must frequently be retrained to maintain their level of quality. Rather than data being consistent, it remains a variable that requires oversight.


Fact: Machine Learning Systems Do Not Automatically Improve

A famous article once noted that “with machine learning, the engineer never knows precisely how the computer accomplishes its tasks. The neural network’s operations are largely opaque and inscrutable. It is, in other words, a black box.” This means that there is a limit to the level of improvement possible, and it is often difficult to understand why the system has improved or how you can improve it further.

For machine learning systems, there are simply no tools with which to refine the algorithm. The only option you have is to feed the algorithm more examples. Unfortunately, this doesn’t guarantee improvement or that you will reach the required level of accuracy. Not to mention, if any mistakes are discovered or the trained system needs to be modified for any reason, the entire process resets to square one.


Moving From Pure Machine Learning to Hybrid AI

Machine learning easily supports an organization when two conditions hold true:

  1. A significant number of sample documents are used to train the algorithm.
  2. The model is designed to support a simple scenario.

Therefore, the text analysis project that is ideal for pure ML is a low-complexity case and a large training set with a balanced distribution of all possible outputs. Unfortunately, most scenarios do not align with these conditions. Instead, they involve small, highly complex sample sets that are distributed in a non-uniform manner. Pure ML is not suited for this.

These use cases require a linguistic engine that is sophisticated enough to ensure a deep understanding of content as well as a set of tools powerful enough to ensure the development and effective application of advanced linguistic rules.

Expert.ai technology not only provides this unique combination of rule-based capabilities (symbolic AI) but combines it with ML-based algorithms in a hybrid AI approach. By combining the most advanced AI techniques, you gain a deeper understanding of your unstructured information that can unlock more efficient and more accurate business processes.

All of this is not to undermine the value of machine learning, but rather to put it in proper context. Some things are not better off on their own. Instead, they need something else to unlock their true potential. In the case of machine learning, that missing piece is knowledge. Hybrid AI was born to meet this need.

Join the Platform Early Access Program

Be among the first to know about new developments for the soon-to-release expert.ai Platform.

Learn More
Related Reading