This dissertation was written as a part of the MSc in Data Science at the International Hellenic University.
The easy propagation and access to information in the web has the potential to become a serious issue when it comes to disinformation. The term “fake news” is describing the intentional propagation of fake news with the intent to mislead and harm the public, and has gained more attention since the U.S elections of 2016. Recent studies have used machine learning techniques to tackle it. This thesis reviews the style-based machine learning approach which relies on the textual information of news, such as the manually extraction of lexical features from the text (e.g. part of speech counts), and testing the performance of both classic and non-classic (artificial neural networks) algorithms. We have managed to find a subset of best performing linguistic features, using information-based metrics, which also tend to agree with the already existing literature. Also, we combined the Name Entity Recognition (NER) functionality of spacy’s library with the FP Growth algorithm to gain a deeper perspective of the name entities used in the two classes. Both methods reinforce the claim that fake and real news have very small differences in their content, setting limitations to style-based methods. The final results showed that convolutional neural network had the best accuracy outperforming SVM by almost 2%.
Collections
Show Collections