An app will automatically mark incongruities in a text, and other signs that warn that we are facing false news
Miguel Ángel García Cumbreras , University of Jaén and Estela Saquete Boro , University of Alicante
Fake news or fake news was defined by The New York Times as a “story fabricated with the intent to deceive, often for monetary gain as a motive.”
Its main objective is to manipulate public opinion to influence the socio-political behaviors or belief systems of the masses, and they are usually generated by ideological or economic interests. Detecting them is increasingly a social priority, and artificial intelligence is the only tool that can contain the invasion of hoaxes online .
In the project Living-Lang , the research groups GPLSI (University of Alicante) and SINAI (University of Jaén), we work on the automatic detection of false news. We are developing a system based on artificial intelligence that will mark in a text, automatically while it is being read, the incongruities, and other signals that warn that the information is not reliable. We have tested the detection system in news about covid-19.
Defending ourselves in the post-truth era
Fake news is the food of the “post-truth” in which we live. Post-truth, chosen as the word of the year 2016 by the Oxford dictionary, refers to a phenomenon of distortion in which objective facts influence the formation of public opinion less than appeals to emotion and personal beliefs.
Today the term has a much broader application in the news generation process, where “alternative facts” replace real facts, and feelings outweigh evidence.
The proliferation of fake news has been facilitated by the growth of personal blogs and social media such as Twitter, Facebook or WhatsApp. Anyone can be a transmitter of information, and fact-checking is less of a priority than sharing news that may be viral.
Currently, information is mostly consumed online. Researchers at MIT have conducted a study that demonstrates the disturbing power of fake news, which spreads much further, faster and more widely than real news.
An additional problem is that fake news is structured and worded in such a way that it is difficult to distinguish between what is true and what is false. Detecting and tackling fake news quickly and effectively is crucial, as once false information spreads and permeates society, it is difficult to disprove.
This situation of false information is aggravated in times of emergency, such as during the global pandemic that we are suffering from covid-19. According to the IFCN during the pandemic they verified more than 6,000 hoaxes that spread throughout the world.
The number of hoaxes is reaching such a level of viralization that it requires the application of automatic techniques that allow the detection of false news before it is widely disseminated.
Detect unreliable information
Artificial intelligence techniques in general and natural language processing specifically take a special role in improving and accelerating the detection process. Technologies such as machine learning or deep learning make it possible to detect characteristics in the information that make it unreliable. And all this working among millions of data.
Fact-checking technologies work in different ways. There are reference approaches, which look for a fact in some reference source; machine learning, which try to learn signals of probability of truth; and contextual, which evaluate the probability of veracity as a function of the time that the stories survive. Ideally, you should combine these three types.
Due to the complexity of detecting a hoax, the task is not approached as a whole, but as small related subtasks that should end up being integrated into a single global detection system.
Errors in the structure and content
We have designed a system that checks the news on two levels, analyzes its structure and content. To analyze its structure, we check if it meets the classic journalism rules: the 5W1H rule and the inverted pyramid (a concept of textual structure related to journalism).
The rule of 5Ws and an H refers to the fact that any journalistic text has to answer these questions: What = What, Where = Where, When = When, How = How, Who = Who and Why = Why. This theory is an effective method that was adapted in the different media.
In addition, the inverted pyramid consists of ranking the information, counting the most relevant in the first paragraph. Artificial intelligence detects whether the text it parses follows this rule, and if it doesn’t, the information it contains may not be reliable.
Regarding the content of the news, we divide the parts of a news item (title, subtitle, etc), and we use a fact checking system ( fact checking ) to verify the factual facts that are indicated with knowledge bases. We also extract various linguistic characteristics automatically.
How have we tested the system?
To test the effectiveness of our system we have generated a dataset ( dataset ) of news related to covid-19 that contains real and false news. An example of news published and that is false is the following:
“Covid-19 is not a virus, it is an exosome. It is pollution that weakens the immune system, and as a consequence, people die from various causes, including seasonal flu, and all deaths are labeled as coronavirus. Is a gotcha. And it will get worse, when 5G is fully deployed on Earth and in space, billions of people will die and another pandemic will be blamed. It is not a virus, it is an electromagnetic weapon. “
In our research work we have manually tagged a news set, in terms of structure, content and veracity.
Using deep and machine learning algorithms, and using as
input a relatively small dataset due to the complexity in the annotation, the results obtained have been very promising. We have obtained a 75% accuracy in the veracity of a news item based on a plain text taken from the internet. The research has been recently published in a international high impact journal .
The following image shows an example of the labeling that is carried out on any paragraph:
With the good results obtained, the next step is to develop a final application that automatically marks the text of a news item while it is being read and that alerts by means of a signal of parts of the news that may be false, indicating the reference with other texts similar in which its veracity can be verified.
In this way, both an end user and a journalist could use this information to make the most convenient decision about its veracity.
Miguel Ángel García Cumbreras , Doctor in Computer Engineering. Deputy Director of the Higher Polytechnic School of Jaén., University of Jaén and Estela Saquete Boro , University of Computer Languages and Systems, University of Alicante
This article was originally published in The Conversation . Read the original.