: Social media has brought significant changes in the ways people communicate with one another and has become a more preferred way of obtaining news information today as compared to traditional media sources such as printed newspapers. This increased adoption is because of the fast spread of information over it, with also being a cheaper way of accessing any type of information. One major implication of this increased consumption of information on various social media sites such as Twitter, Facebook, YouTube, and Reedit is that there is a rapid rise in the spread of false information on these sites. This is a matter of serious concern, as social media plays a vital role in influencing people in political, economic, and social domains. Therefore, the detection of false information on social media becomes quite critical.
"False information" is an umbrella term that encompasses different types of synonyms such as rumors, fake news, misinformation, hoax, satire, etc. These terms are closely related to one another, yet they all share subtle differences in their meanings, for example, a rumour is a non-verified piece of information, whose main purpose is to spread fear and anxiety, while fake news is untruthful information propagated to gain financial or political benefits
Much of the research concerning false information detection has only been done in the area of fake news. This implies that further investigation is still needed in the other categories of false information, such as rumour, misinformation, and hoax, with especially, hoax being the least addressed area. The definition of hoax, as per is: A deliberate falsehood intentionally fabricated, especially using the joke, prank, humour or malicious deception to masquerade the truth.
Automatic hoax detection tasks involve the usage of Machine learning models for distinguishing information between being a hoax or not being a hoax. Automatic hoax detection using just the content itself, i.e., without comments, shares, user-responses, and propagation patterns, is akin to performing an early detection task as only a handful of features can be obtained during the initial stages of false information propagation. Since false information spreads very rapidly, there is a need for its early detection, so that its impact on the users could be minimized.
The majority of studies related to false information detection have only employed traditional Machine learning models in which features were extracted manually. This is highly time-consuming, with also being impractical in cases where only a few features are available to work with, such as in early detection tasks. However, Deep learning models can extract both simple and complex features automatically. These models have achieved state-of-the-art performances in several text-classification and analysis tasks. The most commonly employed Deep learning models for false information detection are Recurrent Neural Networks (RNNs), Long Short-Term Memory Cells (LSTMs), Convolutional Neural Networks (CNNs), and Multi-Layer Perceptron (MLP). Each of these Deep learning models is different from one another in terms of their architectures, functionalities, underlying use-cases, and more.
Generally, the data for performing false information detection tasks comes from only two social media sources - Twitter and Sina-Weibo. Therefore, for extending the applicability of studies, it is essential to collect data from other sources as well such as YouTube, Facebook, Reddit, and such. Facebook is one major conduit for spreading fake information. During the 2016 U.S. presidential election campaign, false information in the form of posts was shared with millions of users on it.
Furthermore, as per much of the research regarding false information detection was only conducted on English texts, which suggests a further need to complement findings by choosing some