NLP Stream: Pulling Back the Curtain on Sentence Complexity and NLP

The beauty of language lies in its complexity. And you can pack a lot of complexity into just one sentence, including everything from relations, anaphora, logic and inference. Unfortunately, AI systems continue to struggle with many of these aspects.

Watch Antonio Linari as he explores how a good old fashioned AI model compares to a modern large language model when processing the incipit sentence from a classic book we should all be familiar with: L. Frank Baum’s The Wonderful Wizard of Oz.

Transcript:

Brian Munz:

Hey, everyone. Welcome again to the NLP stream, a weekly series dedicated to the latest and greatest in natural language processing. I’m your host, Brian Munz. I’m Product Manager at Expert AI. Last week, we dove into some real world use cases of NLP and saw some on-the-ground applications. This week is going to be very interesting.

Brian Munz:

Also, we’re going to get more into some of the intricacies of core NLP challenges and see how these are approached. Without further ado, who’s going to talk about that is Antonio Linari, who is the Head of Innovation at Expert AI. Take it away.

Antonio Linari:

Hi, and thank you. Thank you, Brian. I always share my screen. Thank you for making the time for this 20, 25 minutes talk about the complexity of language. Okay. The topic of today, as you can see, is The Wonderful Wizard of Oz, written in the … You see 1900. This is actually, the original cover. Specifically, we will analyze one single sentence, the first sentence of the book, and we will see how beautifully is our language and how difficult it is for modern, deep learning language models to answer very simple question that are very simple for us to answer.

Antonio Linari:

The sentence we use this the very first of the book. I don’t know who is familiar with this book, but okay. “Dorothy lived in the midst of great Kansas prairies, with Uncle Henry, who is a farmer, and Aunt Em, who is the farmer’s wife.” I have some questions here that, of course, as a human being, I answer very easily. Where Dorothy live? Well, it’s pretty simple. She lives in Kansas. Where Uncle Henry lives? Okay. Here, it’s a little bit more difficult, but we know that, because Uncle Henry lives with Dorothy. Well, Uncle Henry lives in Kansas.

Antonio Linari:

Can we say Dorothy lives in United States? Yes, because we know that Kansas is in United States. So we, as a human being, we are able to infer information, even if those information are not direct, present in the context. Okay. And specifically in the sentence. Again, we can have other things here. You see how many information we get by just one single sentence. With just one single sentence, you have a very good idea of what is the situation at the moment. By the way, Dorothy lived. Okay. We know that at the time of the book, he was living there. But without any spoiler, that’s something that happened in the past. Okay. That’s a very important clue for the rest of the topic today.

Antonio Linari:

Other things that we can say. Say, Uncle Henry. What’s the job of Uncle Henry? Oh, well, the job is a farmer. He is a farmer. We can say that Uncle Henry is the husband of Aunt Em, even though there’s no clue here that Uncle Henry is the husband. But we know that because Aunt Em is the wife of Uncle Henry. Well, it’s through the other way around, that Uncle Henry is the husband of Aunt Em. Again, you can see that we infer that Aunt Em is the wife of Uncle Henry, by the fact that we are talking about the farmer, and she is the farmer’s wife. Okay. Let’s see how in the past, these kind of problems were tackled, and we start from the very beginning.

Antonio Linari:

I don’t know if you’re very familiar with Eliza. Eliza appears in the 60, and it was an interesting project to demonstrate how weird it would’ve been a communication between a man or a human, sorry, and a machine. You can see here, it’s basically kind of a robot where we ask questions. You see here, we have some question. I’m not going into the Python code. Don’t worry. We are not going to analyze the Python code. It’s important to see the interaction in the 60, in the 70, and we’ll see in the rest of the meeting today. We see here, we are politely greeting Eliza, and then we asked where Dorothy lives. Okay. Eliza is going to answer in this way.

Antonio Linari:

Eliza says, “Okay. How do you do? Please tell me your problem.” It says, “Hi, Eliza.” And Eliza says, “Oh, I’m not sure I understand you fully.” I didn’t even say anything and just Eliza doesn’t understand. Where Dorothy live? And she says, “Please go on.” Okay. At the time, it was really, an interesting project and a conversation can be really, really weird. Okay. Of course, some step I had was done there in the 10 years after. In the 1972, a French guy invented an interesting programming language, logic programming language, named Prologue. Here you can see a dialect adopted to Python. Okay. But basically the idea is to describe the world by facts and rules.

Antonio Linari:

By analyzing the sentence, the very first sentence of The Wizard of Oz, we have some facts. And you see that we can describe these facts here, like live where, Dorothy? In Kansas. We are saying Dorothy lives in Kansas. We can say Dorothy lives with Uncle Henry, and we can say that of course, Dorothy lives with Aunt Em. We can say that the job of Uncle Henry is a farmer, and Aunt Em is the wife of the farmer. So you see that we described our world. We don’t put any rule down, but we ask the system to answer a simple question, where Dorothy lives. You see that the system was capable of answering the … sorry, to answer the question in the right way. Actually, it’s true. Dorothy lives in Kansas. Okay.

Antonio Linari:

After the 70, there were many attempts to create AI, but AI went through a very dark winter. Many companies started not seeing the business value, and so they kind of abandoned. And the AI remained a small niche in the academic world. Okay, mostly. Then, at a certain point, someone said, “Okay. Why we cannot represent reality through numbers?” You see here, someone said, “Okay. Let’s describe the reality as vectors.” So as a sequence of numbers, or as a sequence of features, not characteristic of a specific object. You see here, to simplify this topic, that is pretty hard from a mathematical perspective. But you see here, we have an apple, an orange, and a tangerine. Okay.

Antonio Linari:

Each of these is inside a square. Sorry. The three fruits are in square brackets, representing three columns of a vector. We can say that the vector 100 represents an apple, the vector 010 represents a orange, and the 001 represents a tangerine. Okay. That was the very first way to represent object. But you see that there is a problem here now. Because in the representation on the right, you see that this object has basically represented with no features in common. And that’s not true, because if we want to be a little bit of abstract, all the three can be represented by a circle, because they have this rounded shape. And at least two of the three share mostly the same color.

Antonio Linari:

In fact, someone says, “Okay. Let’s try to represent this in a better way.” Instead of using only zero and one, let’s try to represent objects with features, with real features. In this case, we are using shape and color. You see that the drawback that you have using shape and color is that if you don’t use enough features, you end up confusing, for example, in this case, an orange with a tangerine. Okay. Because they share the same shape and the same color. Technically speaking, this is like a compression, where you have basically three objects, but in reality, you reduce these three objects to two objects. So one red and one orange.

Antonio Linari:

Okay. Why this is important? This is important because this is the way modern language model understands reality, through features, through representation of specific characteristic of the object that we want to send to these models. Of course, nowadays you are all familiar with the word Transformers. Unfortunately, not these Transformers, but are mostly these Transformers. Okay. I think you are familiar with Sesame Street. We have Bert and Ernie, and Elmo in the middle. Bert and Ernie represents the state of the heart, kind of. You will see now, we have GPT-3, we have Lambda.

Antonio Linari:

We have all these sentient things that is happening, but mostly used in the business are Bert and Ernie. Bert and Ernie comes from one … Bert from the United States, and Ernie from China. It’s also a way to say that AI in this moment is mostly a topic for United States and China. And they are doing great things, both. At this point, let’s jump a little bit and describe how a neural network works, very simply, to understand how Transformers can try to answer the same question that we asked to Prologue and to Eliza.

Antonio Linari:

A neural network is basically representation, a simple, very simple representation of a neuron in our brain. Okay. In our brain, we have dendrites that are actually like pipes that brings the information inside the cell. And the axon is the tube that brings outside the information. You can imagine a neuron like a bucket with input pipes, and a hole with output pipes. You have water entering through these input pipes that can have for example, different radius. Okay. Once the bucket is filled enough to reach the hole, the water starts exiting from the other side. You can imagine this height in the bucket represents our weight in our neurons.

Antonio Linari:

Just to show you a very simple example, I think you are all familiar with the TensorFlow Playground. This is one way to show what kind of problem neural networks can actually solve. In this case, imagine you have to classify sentiment analysis of movie reviews. Imagine that the orange dots on the right represents negative reviews, and blue, the positive reviews. The idea is to ask an AI to automatic determine which review is positive and which is negative, once the system has been trained on a certain number of example.

Antonio Linari:

What we do here, we have our neural network here, you can see that each of the neurons detect one specific features. In this case, X1, for example, try to separate what is on the left from what is on the right, and the X2, the other way from the top to the bottom, and so forth. You have diagonal and other features. If we run this very quickly, you see that the neural networks is super fast and is capable to understand that. In order to determine if a review is positive or negative, you have just to create a line that separated the two sets.

Antonio Linari:

Just a little bit more complicated, you have the same reviews, but this time, there are dispositions so it’s not really easy to determine which one is positive and which one is negative by just drawing a line. We need something more complex. But still, you can see that with a little bit more effort, a neural network is capable to understand that. You have to draw more or less a circle around the blue to separate the blue from the orange. Of course, there are more challenging problem. We are not here now to talk about this, but basically, neural networks try to find the best way to separate this kind of set.

Antonio Linari:

Going back to our topic. We know now that we have Transformers. Transformers are language models, and what language models do is basically, given a sequence of words, they try to predict which word comes next. Here, you see, can you please come? And here is what the system is able to predict. Let’s start now to ask Bert, in this case, where Dorothy lives. We see that for simple questions, it works. But when the things get tough, not even too tough, at least for us, they start behaving weirdly. Okay.

Antonio Linari:

You see that we have a question here. Where does Dorothy live? Bert has some limitations, so you have to provide with the context. In this case, we are giving Bert the sentence. You see we are using a model name, question answering. If we ask Bert, it has to think a little bit. This is working on a GPU, so it takes some times. It says with 37% of accuracy, it says, “Oh, Kansas prairies.” That is quite accurate. Okay. But again, the answer is pretty easy because it’s here. Okay. Now, let’s go a little bit bigger. The Transformers now, more talked after Lambda is GPT-3.

Antonio Linari:

You see here, I don’t know if you’re familiar with this picture. This picture represents the Hitchhiker’s Guide to the Galaxy, and this is deep thought that answered the famous questions about the universe and life, the meaning of the life. And the famous answer is 42. Okay. Let’s see how GPT-3 answered this question. You see, we don’t even give any context. And we say, where does Dorothy live? Okay. This just has to think, but here we have an even better answer because we have kind of a sentence. “Dorothy lives in Kansas.” But there’s a drawback here, and it’s a big issue, in my opinion.

Antonio Linari:

Why GPT-3, of every Dorothy that can exist in the world, just answer about Dorothy from The Wizard of Oz? It’s clear, there is a simple bias, without going into political things. In this case, it’s even simple. And probably the only reason why Dorothy lives in Kansas is the answer is because The Wizard of Oz is a very famous book, and probably is being used as one of the possible book for the training set of GPT-3. But this raises two important issues. The first is about bias. And I strongly suggest you to read this interesting book, Weapons of Math Destruction, where you can actually see how a biased dataset and biased models trained on those dataset can create real damages in terms of business or money, but also in terms of life of people, free of people.

Antonio Linari:

The other big problem is the consumption of power. You can see here from this here, not only training GPT-3 costs something in between 10 and 30 million, but you consume basically two years of an average American adult to train it. Okay. Okay. Now that we have this general idea of how the single technologies works, or the algorithms works, let’s challenge it a little bit more and ask a little bit tougher question. Where does Uncle Henry live? Very simple for us to answer. We don’t ask Eliza because we already know she was weird. Okay. Let’s try with the logic programming.

Antonio Linari:

You see that here, we had some rules. Okay. And specifically, the rules that we are interested in is this one. We say that if a person A live in place, sorry, we say that a person A live in a place, if person A lives with person B and person B lives in that place. In that way, we know that if someone lives with someone else, basically it lives in the same place. If we ask where Uncle Henry lives, in this case, oh, we have an empty answer. The reason why we have an empty answer is because Prologue needs a little bit of help on reciprocity. Okay. The fact that A lives with B, it’s not immediately recognized with B living with A.

Antonio Linari:

What we have to do here, we have to add this extra information where we say that when person A lives with person B, then it’s true that person B lives with person A. With this in place, if we ask where Uncle Henry lives, you see that the answer is correct. Let’s see now Transformers, same context, different questions. Where does Dorothy, sorry, Uncle Henry live? Now, after again, thinking a little bit, we have a weird answer. Dorothy. Now, the Transformers starts to have some issue, because one thing is to be trained on pattern, one thing is trained from the pattern to infer data that is not predicting.

Antonio Linari:

It’s a different thing. Inferring implies logic. Probably, even if other models are capable of answering these questions, it’s probably because in the dataset the answer to that questions was already present. Okay. There’s really little clues that a system today can really infer, especially when there are tough logic. Like a box is in the car that is moving to London. Where is the box? Okay. Is in the car? Is in London? Is moving to London. There’s a lot of challenges there.

Antonio Linari:

The last, and then this ended up my presentation, is OpenAI. If we ask GPT-3, you see the now it says okay, he lives in the village. We don’t really know anything. Even GPT-3 is struggling answering this question, because again, in this case, the context is so vast that it’s impossible to know exactly what Uncle Henry we are talking about. Okay. In the Innovation Department, what we are doing is we use our technology that basically, what senses disambiguation. So assigned to every single token a meaning and extract relations.

Antonio Linari:

We are trying to use the concepts and the relations together with Prologue, to answer these questions of course, in a kind of a Prologue way, but with concept. Concept allows us to ask a question like, is she going at the riverbank? Or is she going to the bank to get a salary? Okay. So bank and bank in this case have different meetings. Okay. Hopefully, it’s been interesting. Any questions I’m here to answer. Thank you very much.

Brian Munz:

Yeah. No, thanks. I have a few questions. One thing I was thinking just a second ago is, when you were showing how when you are approaching language and you’re talking about inferring things through the relations of the tokens to each other, to me, it seems like, is that generally more how the human mind works than the previous models? Where you’re basically taking everything and trying to find attributes? And then when something new comes in, it’s about those attributes, where it seems like the context is more how my mind would think. When I see that sentence, I would say Kansas. Or if you say Dorothy even, I would think, like you said, of Wizard of Oz first.

Antonio Linari:

Yeah. I think that’s it.

Brian Munz:

Is there any thought?

Antonio Linari:

Yeah. Yeah. No, that’s an interesting question, because it introduces what we call the hybrid approach. Now, where we solve the problem not just by trying to find a pattern, like usually deep learning tries to do, but we put on top of this pattern also, rules, logical rules. So that in our experience, for example, we learn that we can cross the street, but we also learn that we have to watch for the cars before crossing the street. Okay. That’s a rule that we put on top of the fact that we are capable of crossing the street. Okay.

Antonio Linari:

Of course, if we want to generalize, it’s always our brain, and it’s always neurons that are hitting and doing something. But those neurons are organized in structure. That gives us two different level of reasoning. One is kind an instinct reasoning, and then on top of it, there’s the rules. Rules comes not only from something that we impose to ourself, but our also social rules that we have to respect. It’s a very complex topic, and there’s a lot of fight between the pure deep learning and the pure symbolic people. Okay. I think it’s in the middle.

Brian Munz:

Yeah. Yeah. No, thanks. Actually, I had one more question which was around the efficiency or the green concerns with AI, where it takes two years worth of energy to train a model. Are there aspects within how this model is trained that will make it more green later on, in terms of once it’s in usage? Are there more efficiencies that can be done if it’s done in a proper way?

Antonio Linari:

Language models are basically, to simplify, so don’t get offended, data scientists, are big, big matrices. And there’s a lot of calculation basically between matrices. That’s the reason why we need GPUs, because in games, where GPUs were born at the beginning, in games, you have a lot of sprites to move. And in order to move those sprites, you need to use a lot of matrix calculations. Okay. The same applies to AI, and that’s the reason why they consume a lot of power.

Antonio Linari:

We are getting better. Okay. Many companies are doing a lot of effort to reduce not only from a hardware perspective, so trying to create GPUs that consume less energy, but also from a data scientist perspective, they’re trying to reduce the number of column and rows in the matrix so that you reduce the matrix and you reduce the number of calculation that you have to do. Okay.

Antonio Linari:

In the symbolic approach, on the other way, on the contrary, we don’t use any GPU. Okay. We can boost up the same thing that Transformers do, so basically generate better features, but with a way less impacting carbon footprint. Okay. Symbolic AI has drawbacks, of course, because you have to do manual things. But again, also in deep learning, you have to do manual things. Okay. You have to label data, and the one misconception that I would like to talk about here is that we want these models to be trained fast and we want this symbolic AI to be created fast.

Antonio Linari:

But we want software to be stable, and we want software to be re reliable. And to make this software secure, reliable, it takes time. Okay. Sometimes years, too. Okay. We should start thinking about machine learning and AI in general as the process to generate those kind of models, the same way we think software. It’s not something that you just use and the day after you have it. Okay. It takes time, both approaches.

Brian Munz:

Yeah. Right. So the quality also informs.

Antonio Linari:

Absolutely.

Brian Munz:

Yeah. Great. No. Yeah. Thanks, again. This has been super interesting. It’s always mind-boggling to me how far we’ve come with understanding language, but also how powerful the human brain is. It’s always interesting to see. Yeah. Thanks again, for presenting and hopefully we’ll see more of you in the future.

Brian Munz:

Thanks, everyone, for joining this week. Next week’s episode, Jose Manuel will be speaking about misinformation detection in a way that is accurate and explainable. Hope to see you then. And until then, I’ll see you next week. Thanks.

Antonio Linari:

Thank you. Bye, guys.