We stand with Ukraine

NLP Stream: What is Hybrid NL?

There are two foundational approaches to addressing natural language (NL) solutions: a symbolic, rule-based approach and a pure machine learning approach. Both come with their own set of advantages and disadvantages. While many view these approaches as competing ideologies, they are actually ideal complements to one another.

In this NLP Stream, Anton Zykov will dispel the myth that symbolic and machine learning approaches are the only singular approaches and dive into the world of hybrid NL. He will break down the various approaches to NL, explain the various configurations of a hybrid approach and share valuable use cases that you can leverage yourself. Watch here to learn more!

Brian Munz:

Hi, everyone. Welcome to the NLP stream, which is a weekly series dedicated to the latest and greatest in natural language processing. As usual on your host Brian Munz, the product manager at Expert.ai. And so, yeah, I survived last week’s debacle with all those people in the house. All I got was COVID from it, so you’re not going to hear me asking too many questions this week. But I wanted to make sure that I joined this week because we have someone from my team who I like quite a bit, named Anton Zykov who… Or what is hybrid natural language. So, without further ado.

Anton Zykov:

All right. Thanks Brian. Yeah. Hi everyone. My name is Anton Zykov. I’m the product manager for the Hybrid technology here at Expert.ai. And I want to talk today about what exactly is Hybrid. I don’t want to give away any spoilers, so why don’t we jump right in? Oh. Here we go. All right. So first I want to cover just historically some schools of thought around natural language. So the first school, at least 30 years old now, it’s probably been research for much longer than that, is the symbolic approach. So you can think of this as solving language problems using rules, clearly explainable human readable and understandable rules in their most basic form. You can think of this as, if A and B then C. But although it has advantages of being easily processable by a machine, very low latency, things of that nature, it does take a lot of time and human effort to create these rules, to sit down and write all of these rules, especially from scratch.

Anton Zykov:

I think the 80/20 rule applies very well here. You can spend 20% of your time writing the rules and you’ll cover 80% of the use cases. However, that last 20% is really going to take you a lot of time to cover. And in recent years, we’ve had a new approach come about with the advent of machine learning, where we’re using the machine’s ability to learn on its own and to predict on its own. And while this is great, you have the machine thinking for itself and being able to resolve a lot of these intricacies, this can be very expensive. I mean, just off the top of my head, the GPT3 model, while a breakthrough in technology, can easily take hundreds of thousands, if not millions, of dollars, just to train a single time.

Anton Zykov:

And this isn’t very pragmatic for a lot of companies to do. So, both have advantages and disadvantages. And I want to walk through just quickly some typical pipelines of both. So in the machine learning pipeline, usually you start off with some text parsing, some tokenization. This is really your standard NLP that’s happening, just a basic analysis of the text, setting up for the model. And then you move on to the feature and coding. This is kind of the heart of the model itself, where you have different encodings for the features. Keep in mind, each model will have its own set of encodings. And then you’re moving on to the model itself. And this I like to call the soul of the model. This is where you’re giving the model annotated examples to learn from. And it starts to form those predictions on its own. Pretty standard.

Anton Zykov:

Whereas on the flip side for a symbolic model, the main point to note here is this big knowledge graph bar that we have in the middle, or any form of a knowledge base. So this is clearly defined information that we know to be true. That we can use throughout the entire process to create these rules, to make sure that we are very detailed in the rule writing process. And we’re getting exactly what we need out of those rules. So a typical pipeline will start off with the symbolic NLP analysis. So this is very similar to the text parsing and tokenization, however, it’s connected to whatever knowledge base you’re using and being able to know truths off the bat about this information.

Anton Zykov:

And then when we move on to the custom rules, this is that piece that I was talking about, where it’s very time and effort intensive, where you’re creating the actual rules, the if A plus B then C’s. And then you move on, you can start to process your documents. And at the end, if you need to create more rules based on business logic or inference, you can do so. So the main point here is every step is connected to your knowledge base and pulling information from that knowledge base. So how can we get the best of both worlds here? How can we harness the advantages of both and maybe alleviate the disadvantages, and create this sort of Hybrid NL idea that we’re talking about?

Anton Zykov:

So I’ll talk about a few approaches to Hybrid that we have implemented ourselves. So the main ideas really here are how can we use symbolic to support machine learning, machine learning, to support symbolic, and is there anywhere else in the natural language process that we can harness either symbolic or machine learning, just to help in any step of the process. You can think of this as, for example, an annotations phase, or can we maybe use combined models of both. So I’ll talk about all of these in a little more detail. So first off, symbolic to support machine learning. This is really where we’re asking ourselves the question, where can we use symbolic knowledge to improve the quality of data being fed to ML? So right off the bat, an ML model has a lot of features that it takes into account. What if we could provide even more features to this ML model based off of our symbolic analysis? And what if we can use concepts from our knowledge base like we did in the symbolic pipeline and the relationships between those concepts as well?

Anton Zykov:

So, great example of this is, take the word book. It could be noun as in a physical book that you read, or it could be the verb, to book. For example, to book a trip. How do we disambiguate between those two terms when we come across that word in the text? And to extend that a little more, for the verb to book, for example, a trip, is that the same concept as to reserve or to… Yeah, I mean, we’ll stop at to reserve. But being able to disambiguate between all of those concepts and realize that they’re really talking about the same action, that’s the most important key here. On top of that, we can also use extra textual information. So this is usually in the form of some domain specific knowledge that we can feed to this ML model to alleviate the need for the ML model to guess on its own.

Anton Zykov:

So things like curated lists, if we have a full list of law firms or regulatory agencies that we know to be true. Custom patterns that we can identify, things like social security numbers, or car registration plates, phone numbers. We can get a little more advanced with that as well. Can we use inferential rules to determine, for example, change of ownership? Or can we even use semantic to isolate specific passages in text? So if we know we have a hundred page document and we know that the information we’re looking for is always found in the same section of the document with the same header, can we use symbolic to isolate that section and tell the ML to only look in that section? Moving on to the flip side, machine learning to support symbolic. And this is where I asked three questions, where can we use ML to improve the quality of data?

Anton Zykov:

So we can take a lot of pre-processing steps, think automatic spell checking, or sentence detection, anything we can do to pre-process the text with machine learning, which generally machine learning is quite good at, to assist in the symbolic processing later on. Can we also use ML to create data? Can we put visual tags on the documents to pair with symbolic rules? So this is very similar to what I was talking about on the last slide, but on the flip side. Instead of using symbolic to find these sections, maybe there’s a scenario where machine learning works better for that. And then we can complete the problem using symbolic. And lastly, can we use machine learning to actually generate the symbolic rules themselves? So if you think of an engineer sitting down to write a set of rules to solve a use case, and they predict that it’ll take them a hundred days to write those rules, what if we had a machine learning model that could create at least some bootstrap set of rules that the engineer can then go through and revise? Maybe add a few new rules if they need to.

Anton Zykov:

But right away, we’re giving them a starting point to work, with significantly cutting down their time, however much that may be. And the answer is, yes, we can do that. And then just to cover some other approaches as well, this is just a few examples, but invitation improvements. So one of the biggest drawbacks of a machine learning model or process is the amount of annotations sometimes that you have to do. So can we use ML to, we call it active learning, on the annotations. If a user begins to annotate and say they’ve done maybe 50 or 100 annotations on their corpus, can we have an ML model that begins to learn from the annotations that the user has already done, and to predict even more annotations on the rest of the document set?

Anton Zykov:

Additionally, once the user’s completed all of their annotations, can we have a machine learning model to analyze the annotations that have been completed and to present a report back on those annotations? Maybe some of these annotations are inconsistent or adversely affecting the results of the training model. Those annotations should be pointed out and the user should be able to go back and adjust their annotations as needed. Another example of Hybrid, if we think about the full picture, the sort of full pipeline of a use case. Take a classification project, for example, where we have to classify class A and class B. Maybe we’ve discovered that class A is very effectively classified using a symbolic approach, but not so well a machine learning approach. And on the flip side class B, we can do very well with machine learning approach, but not so well with a symbolic approach.

Anton Zykov:

Maybe we can build two separate models to classify those classes and then put those two models together, concatenate them together in our full pipeline, to come to a conclusion on what the best answers would be. So I want to come back to the pipelines to show you what a Hybrid pipeline may look like. By combining the machine learning and the symbolic pipelines together. So, first step we’ll remove the text parsing and tokenization, since this is pretty much your basic NLP step, and we believe it’s more effective when your NLP analysis is linked to the knowledge graph as it is down here. Or any knowledge base. We then take the heart and soul of the ML model, the feature and coding and the model itself, and we slide them down right into our symbolic pipeline. So we begin with an NLP analysis, we write our custom rules. Once we get to our feature encoding, can we also link the feature encoding to the knowledge graph? Provide even more factual information that we know to be true to that feature encoding step.

Anton Zykov:

And once we do, we end up with a Hybrid pipeline. So really we’re trying to harness the best of both worlds here using custom rules when we need to, using machine learning when we can, when it improves our results. And in full, we get a Hybrid pipeline. So I’ll walk through a few examples now of why I personally believe that Hybrid works very well. The first example is very basic, just to enrich feature data using that knowledge base or that knowledge graph that we have. So we have a sentence here. “Heuking Kühn Lüer Wojtek, with a team led by Dr. John Smith and Dr. Frank Brown, advised Volkswagen on the acquisition of Porsche.” So just using our knowledge base, we can create a graphical representation of the sentence as you see on the right. So we have our subject, Heuking Kühn Lüer Wojtek. Our verb, advising.

Anton Zykov:

They’re advising with whom? A team. They’re advising, what? Volkswagen. They’re advising Volkswagen on what? An acquisition of Porsche. So here we’ve also included our custom list of known consulting firms and Heuking Kühn Lüer Wojtek is on that list. If we feed this sentence to a machine learning model, just without any of the knowledge base attached, machine learning could have a very difficult time with this consulting company name. Four last names all together in a row is generally quite difficult for a machine learning model to figure out. But if we can tell the model exactly that this Heuking Kühn Lüer Wojtek is a consulting firm, that normalize and enrich input alleviates the need for the machine learning model to guess. So this is just a very basic ground level example. So let’s dive into a more complex one on using custom rules to support machine learning.

Anton Zykov:

This is the big picture. So we have a use case with three separate problems here. First problem, we have input data that has various references to geographical locations. We’re looking in this use case to be able to detect governing law clauses and which country they’re connected to. So if we come across a section of the document where we have a governing law clause, but the only geographical reference we have is to Wales, for example. And Wales doesn’t have its own governing law clauses here. The clauses, they come hierarchically from the United Kingdom. So we need to write rules, for example, to specifically tell the system that Wales, England, and Scotland are all part of the United Kingdom. So we create custom rules for normalizing any geographical reference to its country, in this case. Since we know that countries are the ones with governing law clauses.

Anton Zykov:

Next we need to detect the actual law clauses themselves. So we’ve done our analysis and we’ve realized that machine learning is the best for this. So we train the data and use machine learning to extract those clauses themselves. And lastly, we need to check for multiple geographical references in these law clauses. Maybe we have multiple references that talk to the same clause or talk to different clauses, but we need to write additional rules to check for that. And once we have these, we can call them three separate models on their own, we can combine those models together. So two symbolic approaches and one machine learning approach to solve the whole problem, the whole use case, that we need to solve. And that is really the essence of Hybrid. Where can we use symbolic? Where can we use machine learning? How can they work together and how can we arrive at this perfect harmony that we’re looking for? So, Brian, that’s it for me. Thank you, everyone. My contact information is right there in bold, as well as the company. If you’d like to reach out, please do. Look forward to hearing from you.

Brian Munz:

Yeah, no, thanks. That was very interesting. One question I did have was… I mean, you touched on some of it, but what are some of the most common areas? Because for Hybrid to come up, there had to be some areas where either ML or symbolic had a very difficult time. And so what are some struggles where Hybrid kind of pops in there and really fits?

Anton Zykov:

[inaudible 00:19:39]. I lost you for about 15 seconds there.

Brian Munz:

Oh, geez. Okay. Yeah, no, I was just saying, what are some areas that ML struggles in or symbolic struggles in, where they really compliment each other?

Anton Zykov:

Yeah, absolutely. So symbolic is really good at clearly defined rules that… A computer doesn’t have to think for itself there. It’s where you can relate known factual knowledge and connect that knowledge together. Whereas, symbolic, in a sense, when you have clearly defined texts that you know that you’re looking for, where there’s really no guessing involved, no guessing needed. That’s where symbolic is going to be its strongest. On the flip side, machine learning. The whole idea of machine learning is that it can predict based off of some examples or some knowledge. It’s much better at being able to infer on its own, not a perfect match, but maybe a in the ballpark range match. So if we combine those two together, because sometimes for the stuff that’s right in front of the face, machine learning begins to think too much on its own and start coming up with these radical predictions where it’s really much more simple than that.

Anton Zykov:

And that’s what symbolic, I think, covers very well. Outside of the actual processing part of it. As I mentioned, symbolic is very easy for a computer to process all those symbolic rules, whereas machine learning takes a lot more time. So anywhere where you can alleviate the need for machine learning to guess, that’s going to be key. And anywhere where you’re getting to those last annoying rules that you need to write, where it’s not so straightforward, where you need some computer inference in there, that’s where machine learning can step in and really help you a lot.

Brian Munz:

Yeah. Okay. Yeah, that definitely makes sense. So it’s like two sides of the human brain really. Awesome. Well, that was great. And thanks for presenting and hopefully we’ll see more from you in the future. Next week, we have a presentation on hate speech detection in NLP and ways to fight hate speech and cyber bullying and things like that. So until then, thanks again, Anton, and we’ll see you all next week.

Anton Zykov:

Thanks Brian.

 

Related Reading