NLP Stream: Accurate and Explainable Misinformation Detection: Too Good to be True

Many of the challenges we have detecting online misinformation are related to our own cognitive limitations as human beings. We can only see a small part of the world at once, so we need to rely on others to pre-process part of that information for us. This makes us vulnerable to misinformation and points to AI as a necessary means to amplify our ability to deal with it at scale.

Recent advances demonstrate it is possible to build semi-automatic tools to detect online misinformation. However, the limitations are still many. Join José Manuel Gómez Pérez as he explains why a partnership between humans and AI is necessary to deal with online misinformation detection.

Brian Munz:

Hey, everybody. Welcome to the NLP stream, which is our weekly live stream of all things related to NLP. As usual, I’m Brian Munz, the product manager at Expert.ai. If it looks like we’re doing this one from a bunker, it’s because I kind of am. I’m in a house of 26 of my in-laws, including 12 kids. So, tune in next week to see if I make it. But, I’m very excited about this week’s topic, it’s something that, of course, is always in the news. I believe there’s an interesting take on it, that you’re going to hear today. Again, it’s a returning superstar, Jose Manuel Gomez-Perez, who is a… Can you remind me of your role? Because my…

Jose Manuel Gomez-Perez:

Yeah, I’m in-charge of the language, technology, research lab here in Expert.ai, based in Madrid, Spain.

Brian Munz:

Okay.

Jose Manuel Gomez-Perez:

We work with all these kinds of technologies, innovations, research, and development.

Brian Munz:

So, you get to do the fun stuff.

Jose Manuel Gomez-Perez:

Yeah.

Brian Munz:

Yeah. So, without further ado, jump on in.

Jose Manuel Gomez-Perez:

Okay. Thank you, Brian. I’m going to share my screen.

Jose Manuel Gomez-Perez:

Yeah, here we are. Okay. Today, we’re going to talk about accurate and explainable misinformation detection and basically, about the work that we’ve been doing in this area for quite a few years already. So, the first thing I would like to do is a quick introduction to the topic of misinformation, and then we work from there.

Jose Manuel Gomez-Perez:

Misinformation is… We see it as a cognitive problem where you can only see part of the world, that’s what happens to all of us every day. So, you need to rely on others, and these others can be friends or media, to inform you. Now, in the context of the web and social media and the information deluge where we live, we have what we call misinformation 2.0.

Jose Manuel Gomez-Perez:

So, we have all this phenomenon is amplified by the web, there’s more content, it’s harder to curate. And in the end, we suffer from a gaslighting at scale, which is basically the manipulation of our perception of reality. This is something that we are all exposed to every day.

Jose Manuel Gomez-Perez:

Misinformation is also an asymmetric phenomenon, which is easy to produce, it’s hard to debunk. At the same time, it has a big impact because being misinformed can lead you to take the wrong actions or not take an action at all, which is equally bad, in many cases.

Jose Manuel Gomez-Perez:

Just to take a look at the different stakeholders in misinformation, we have the fact-checkers, we have the web giants, we have the researchers, and there should be here, in this lab, there should be another type of stakeholders, which is administration, public administration and policy makers. But in the end, fact-checkers are the ones that collect, investigate the claims and policy reviews.

Jose Manuel Gomez-Perez:

The problem that they have is that it doesn’t scale very well because it’s just a limited group of individuals working on this. Funding is limited, in order to support these kind of activities, and you always have the question about who checks the checkers. Can you really trust in the fact-checking activity all the time or not. Then, you have the web giants. Web giants need to do a lot of content moderation and it’s something that they cannot do only by employing people. They also need to apply artificial intelligence in order to automatically detect all these misinformation issues. They suffer from a delicate balance between the business, which is basically engagement with their content, and the integrity, with respect to the users, and also, transparency. And then, finally, you have researchers working on datasets, challenges, producing mobile systems, that solve or try to solve part of the problem.

Jose Manuel Gomez-Perez:

And here, we always have this tension between the problem that you want to solve and the lack of annotated data to train more and more sophisticated models and reproducibility. Can you really reproduce what this model is claimed to be doing, or not?

Jose Manuel Gomez-Perez:

The collaboration between all these different stakeholders is, unfortunately, it’s not ideal. Earlier this year, in January, there was this letter that was sent by the main fact-checking agencies in the world to YouTube, making four clear demands related to meaningful transparency, providing context and offering debunks instead of just removing the content, which is misinforming, acting against repeated offenders and supporting other languages, in addition to English. The fact-checkers said that this is the only way to establish this kind of collaboration between the fact-checkers and the large social media platforms.

Jose Manuel Gomez-Perez:

So, in this talk today, I’m going to talk about all these things from the perspective of our approach or framework and system that we propose to deal with misinformation. This was presented in 2020, in the International Semantic Web Conference and it’s a work that was awarded as the best paper of the conference. Interestingly, from this work, is the fact that we are dealing with the combination of neural approaches and symbolic or structured knowledge-based approaches to AI. So, in this case, I like to think of this as a good case study also, of what we call Hybrid NLP. So the first question that we make ourselves here is why using structured knowledge at all? And the thing is that automated misinformation detection is a very, extremely hard problem to solve.

Jose Manuel Gomez-Perez:

It has infinite corner cases. It combines many different disciplines of computer science and artificial intelligence, like informational retrieval, large-scale data processing, the wide contextual natural language they are understanding, you also have issues with multi-modality, not only just text. And it has to do with freedom of speech. How do you compare free speech with hate speech, for example? What are the differences between that, within those two different types of language, with involved collaborations, and it makes you think that you need the general intelligence to solve it. But detecting misinformation is not just about factuality, it’s not just about, of course, in the sense that you say, if something is true or false. In the end, you need to work in a way that you can provide evidence about the prediction provided by your model. And this is something that traditional [inaudible 00:08:00] or deeper learning based models… Neural models, in general, do not do very well. They tend to be powerful but opaque.

Jose Manuel Gomez-Perez:

So what we want to do here is to have the models produce results, but these results should be linked to evidence, much of it is provided by fact-checkers and explicitly represented in a way that this information is multi-lateral, interoperable between different systems and platforms. And it’s also explainable, which is another key goal for this work. We propose a conceptual model for what we call a credibility review here, and which is composed of four main elements. It’s the data item and the review, it’s the rating that the system or an agent, in general, is providing to the data item that is being reviewed, in terms of how credible it is, the confidence with which this rating is being provided and the provenance. The provenance of the sources, the credibility signals that are used in order to produce this rating. In this framework, provenance is mandatory while the rating and the confidence are subjective by the author.

Jose Manuel Gomez-Perez:

The thing is that since this is explicitly represented and it’s interpretable, others could see it and verify the rating and the confidence that you have in that rating, based on the provenance that comes along with it. How does this work? What is the process that we are, or the pipeline that we are proposing here? Well, the first thing we do when we want to verify the claims made in a document is to divide this document into sub-documents that can be processed individually. For example, in a tweet, it’s the main sentences in the tweet and the links to the webpages. If it is a webpage, then it is the sentences published on it. And then, after the composition, what we do is to link these candidate claims with similar items in our database, which contains many other fact-check claims that have been collected over time.

Jose Manuel Gomez-Perez:

So in doing this, in this kind of comparison, there are two main tasks involved. One of them is semantic textual similarity, where we compare our candidate claim with the already fact-checked claims. And then there’s stance detection, which helps you see if these two claims are in agreement or disagreement, one supports the other, it doesn’t or refutes, or it has nothing to do with it. Then the ideas that we look up for evidence, which are credibility reviews for the items in our database, and this can be human ratings from fact-checkers, like I said before, claims that have been already verified by the fact-checkers, or we can also use signals based on the reputation of the websites, where we are obtaining the ground truth from. And finally, we aggregate all these results for the different sentences in which we divided the document and apply a number of heuristics to provide a unified score.

Jose Manuel Gomez-Perez:

In this work, we provided an extension of Schema.org, which is a model proposed by the W3C, so that information can be more easily found online. And this is extensively used by search engines. And this extension, what it does is to extend the class review with all the different classes that have to do with credibility review, which is our main model. Here we are talking about, for example, will include the notion of confidence in the rating class, adding sentence stance and sentence similarity as part of the review, with the credibility review as well. We model sentence as a creative work. And we also add the concept of a bot or an agent to the originator of this review to the model.

Jose Manuel Gomez-Perez:

The implementation of this system is called acred and here we have the main building blocks of it. So we have the part where we do a look-up of the ground credibility signals and in order to do this, we have a database with several thousands of claims, with claim reviews provided by fact-checker agencies, like PolitiFact or Snopes, or FactCheckNI, and so on. And then we have another part of the ground truth, which is extracted from news sites, from reputable sources. And we obtain these sites from services like, MisinfoME, Web of Trials or NewsGuard. Then for the sentence linking part, and as I said before, we have two main blocks, sentence similarity and stance detection, we apply deep learning models based on transformer language models, which are fine tuned for this particular task, semantic textual similarity and the stance detection, perspectively. And then we have all these facilities or utilities for sentence splitting, sentence extraction or site scraping. The code is available publicly on GitHub, for those that want to look at it. So, welcome to do it.

Jose Manuel Gomez-Perez:

Okay, by doing all this, what we obtain is this kind of results, which are credibility reviews, which are evidence-based because they’re based on claims or evidence provided by reputable sources. And it’s also explainable. For example, for this claim made in this tweet, “U.S. Representatives agree to illicit UN gun-control.” Our system says that it is not credible, and it produces a textual explanation to why it is not credible. It is not credible because it’s least credible sentence, which is, “Many think they are hoping that the United States will help it track unlikely confiscated firearms in the country,” agrees with another sentence that is not credible, according to a fact-check provided by PolitiFact and this information is extracted from the database by means of semantic similarity and stance detection.

Jose Manuel Gomez-Perez:

And then not only are we able to provide this kind of textual explanations based on different templates that we created, but we can also provide an evidence graph that uses all the ground credibility signals, represents all the ground credibility signals, explicitly, in a way that you can follow word information, where the information used for the prediction comes from. And this gives you a lot of traceability on the result.

Jose Manuel Gomez-Perez:

Okay. We produce a number of dashboards to help visualize all this. For example, on COVID 19 topics. And this is also available through this link that I provided here, associated to the Co-Inform project, which was funded by the European Union. And this is the evaluation. What we did in the evaluation, we took a number of standard data sets and benchmarks and basically, just to summarize, because we don’t have time to go into details, we obtained state-of-the-art results or above state-of-the-art results for all these tasks in the CheckThat Factuality challenge or FakeNewsNet.

Jose Manuel Gomez-Perez:

Then for Co-Inform 250, which is a data set that we develop in the Co-Inform Project, we saw that the results were not as good as it was in the other datasets. And this has to do with the fact that Co-Inform is a real life dataset. It’s very hard, it has five classes and by being real life, it surfaces many of the weaknesses of these type of purchase. So, where is the problem? So we did a manual error analysis on the FakeNewsNet dataset and we found different types of errors. For example, there were fake news predicted as highly credible, which doesn’t make sense. And the problem is that for many of these claims, the evidence that we found didn’t have a claim review associated with it, so we had to trust on the website reputation, which is not optimal.

Jose Manuel Gomez-Perez:

In other cases, fake news were predicted with low confidence. And this was due to the fact that many of the datasets that were used to train the models, lack a lot of valid content due to constraints. For example, related to the GDPR, which protects privacy or advertising limitations that blocked the crawlers we used, by the people producing the dataset. But most importantly, the main problem came from the fact that there were a lot of mistakes related to semantic textual similarity, in the sense that the sentences that were considered similar, were probably similar between each other, but they were talking about close-related topics, but related to different events or different entities. For example, this happened in Paris and the claim that I am comparing it to, talks about something that happened in New York. Well, of course they may be textually similar or semantically textually similar, but they’re speaking of different things.

Jose Manuel Gomez-Perez:

Okay. And this induces a lot of errors. So here we have an example, if we want to verify this claim, this is what happens when a government believes people are illegal, kids in cages, a semantic textual similarity system can produce this other claim, a factory claim, which is very similar, 0.84, for the claim that violators of quarantine in the Philippines were trapped in wooden fetters. But of course, even if there are semantically similar and the text is similar, in that sense, they’re talking about different events, completely different events. These were factored both by PolitiFire.org. So this is the problem that we have, and we did a deeper inspection of this, and we found many problems related to, for example, similar topic, but different statement, facts or entities. The claims were talking about same topic, but different statements or they were completely different facts.

Jose Manuel Gomez-Perez:

So as a balance, acred is a great system and validates concept and architecture, is evidence based. It produces ways to provide explanations as text and also as a graph of evidence. But however, it’s not production ready yet, for the reasons that I just explained. We also found that there are many problems with datasets that are used to train these kind of models. At this point, they’re derived from fact-checked articles and they contain errors and cannot be reproduced, and this makes it difficult to measure real accuracy. So more better training data needs to be put in place. And now we propose a number of approaches in order to do that, as this paper that we published in SEMIFORM, a couple years ago.

Jose Manuel Gomez-Perez:

So some reflections about this, well, first thing is that semantic textual similarity only is insufficient and can be a weak point for the entire pipeline, so we need to work on ways to extend the notion of semantic textual similarity, which also takes into account information about events, entities, and these kind of things. In some of the cases where there were errors, the cost was that we defaulted to website reputation often because ground truth is too small. So the question here is, how can we help fact-checkers amplify their ability to produce more and more of these ground truths? This is aside from an evidence-based approach like this. And then also, how can we make abstract information, not only from the evidence, but also from the text itself, through linguistics signals, that can give us a hint of whether or not this is misinforming. Finally, and this is something that we’re not going to talk about today, we tend to study claims in an isolated way, and we miss elements of this course as a higher level of abstraction, like narratives.

Jose Manuel Gomez-Perez:

And this is something that happens very, narratives are very important. For example, in the study of Islamic radicalization or in, for example, the narratives that help spreading misinformation about COVID-19, these kind of things. So narrative is a very important topic that maybe we can touch base on some other day. So, thinking of how studying the language directly can help us detect misinformation, we did this studying the Co-Inform period, which based on the Undeutsch hypothesis, that says that the style of writing and misinformation potentially defers from that of real news and we found that it did. Misinforming language is very sensational because it tries to catch people’s attention. It’s also colloquial, so that everybody can understand it. And it also uses a lot of discredit to others, in order to discourage from checking evidence. And as part of a misinformation language, deception as a goal is a key dimension that we need to look at.

Jose Manuel Gomez-Perez:

There are many studies out there that have characters of deception in misinforming language, and it has to do with, for example, things like the truthful text is more likely to focus on facts and contain optimistic words. Deceptive text is very assertive. There are psycholinguistic cues, there’s a lot of subjectivity and so on and so forth. It’s possible and the scientific literature shows it, to identify lexical signals of deception from the language. For example, in this study, what the authors did was to show that truthful news present a way lower ratio of misinforming lexical elements, that other types of news like hoax, propaganda or satire. So did this study. And they saw that for example, lexicon markers like swearing or using the second person of pronoun, model adverse, action adverse, these kind of things, happen much more frequently in deceptive media, then they happen in truthful news.

Jose Manuel Gomez-Perez:

And these kind of conclusions are also incorporated in the Expert.ai platform. I would invite everyone to go to this URL here and try the linguistic analysis that is provided there, particularly for information detection. And you will find a lot of interesting features that are extracted from the text, that look at these lexicon markers. And also at the ways in which, the style with which, people write a text, some piece of text, that determines or can help determine automatically if something is deceptive or not. So just to summarize in acred, what we had is a system that was able to take some document, a tweet or webpage or whatever and the first thing it did was to analyze the check-worthiness of the claims contained, extracting the sentences that were relevant to be analyzed. Then there was a similarity tasks, semantic textual similarity tasks that compare this with the evidence from ground credibility signals, where it allows us to analyze the stance, seeing if this evidence supports or not, the claim that we are analyzing.

Jose Manuel Gomez-Perez:

And then sometimes what happens is that this claim can be credible, not credible or not verifiable, which is what I want to say here. So we can use these deceptiveness signals, these linguistic signals of deception, in order to provide fact-checkers with some evidence, some hint, that some potential claim, which in this case is not verifiable, just in an evidence based schema, that claim would need to be inspected by the fact-checkers. And this way we can help them increase the ground truth that this is necessary in order to support all these evidence-based scenarios.

Jose Manuel Gomez-Perez:

Okay. Thus, to conclude… some final remarks. To social media platforms, please, instead of deleting, provide evidence, because this is the only way to create media literacy and train users and consumers of social media in what is misinformed or not. Neural + symbolic AI can help, identifying and specifically structuring fact-checking evidence in the form of text or graph and extract linguistic signals of deceptiveness and can help expand in the ground truth necessary to automatically verify claims. Finally, accuracy is still and probably will always be an issue, but there is some light at the end of the tunnel, as our analysis showed. Some of the things that we show there can be addressed in a relative short-term, others will need further research and innovation work, but at least we’re working in identifying them.

Jose Manuel Gomez-Perez:

And that looks it. Thank you very much for listening.

Brian Munz:

Great, great. Thanks. That was super interesting. One thing that was occurring to me when we’re talking about this, in terms of just analyzing the text, how difficult is it to… It just seems that it would be difficult to address articles that are about conspiracy theories… So it’s talking about, so if it was something like some of the COVID things or [inaudible 00:26:38] or whatever it might be, as they come up and seems like there are new ones each day, more and more gets written about them and the articles may be accurate but what they’re talking about is a very bad bit of misinformation.

Jose Manuel Gomez-Perez:

Yeah. There are different ways to do this. It’s very difficult to start with and you can try to do it by just looking at the text, [inaudible 00:27:00] these linguistic signals that we were talking about during the talk, and this can give you an idea. The problem is that the style itself is not enough, in order to determine whether or not something is misinforming. And in order to be factual, you need evidence, which comes from some reputable source. And here again, we come to the problem of what is reputable? What can be trusted? And something that we found is that in some cases you can’t even trust scientific papers, in order to see if some particular claim is true or not because science is refuted continuously. And also there is a lot of controversy around that. So yeah, difficult to do.

Brian Munz:

Yeah. Yeah, exactly. It seems like there has to be weight between the source as well as the language. And at some point you’re just looking for misinformation as, through the language, less through the facts, right? In terms of, you highlighted in there, people using absolutes, like, always, never, and things like that is a good indication, even above facts, I would guess.

Jose Manuel Gomez-Perez:

Yes, definitely. What we’ve seen is that this is a polyhedric problem. So in the end, you need to look at many different things in order to be able to make some kind of prediction. And sometimes it is not so important to make the right prediction as to ground whatever prediction you make on a specific evidence. So that people can assess whether or not this is correct. But at least they have… there is some ground for discussion.

Brian Munz:

Right. Yeah, it’s not always a thumbs-up or thumbs-down, it’s a confidence level. So… Which is the case with a lot of NLP problems.

Jose Manuel Gomez-Perez:

Yeah. For example, if you train a deep learning model about these kind of things, probably if something related to Obama or Trump comes up in the claim that you want to check, it may always have some bias in one direction or the other, just because the corpus this model was trained on is also bias. Because for example, all the tweets, all the appearances of Donald Trump in the press, tend to be some stance in a very clear direction. So.

Brian Munz:

Yeah. Yeah, no, it’s a very interesting and very complex subject, but very important. It seems like it gets more important every day. But yeah, no. So, thanks for this presentation, it was super interesting. And hopefully we’ll see you again someday, with another one of your great topics. So yeah, that’s it for this week. Thanks everyone for joining. And we’re here every Thursday. Next week, we’re talking about what is Hybrid NL? Which was touched on a little bit today, but the Hybrid NL, which is a mix of machine learning and symbolic, we’re going to get into that and show some real world stuff. But until next week, this has been Brian Munz and I’ll talk to you next week. Thanks.