This dissertation focuses on Twitter sentiment analysis related to COVID-19 vaccines
in English and Greek language. This dissertation was written as part of the MSc in Data
Science at the International Hellenic University.
The COVID-19 pandemic caused by the coronavirus SARS-CoV-2 originated in China
in December 2019 [1]. The virus has infected and killed thousands of people according to
the World Health Organization (WHO) has announced the COVID-19 outbreak as a
pandemic that has hit the world [2]. An end to this pandemic can bring a worldwide
vaccination campaign. However, vaccines have traditionally been met with public fear
and hesitancy. During the lockdown imposed to many countries, people spent hours every
day on social media platforms sharing their opinions and expressing their feelings. As a
result, Twitter has become a valuable main resource for gathering information about
people’s emotions towards SARS-CoV-2 vaccination. Extracting useful knowledge from
naturally written texts is important for governments and health experts to understand
people’s beliefs and establish effective campaign ideas, to increase vaccination
acceptance. Therefore, the sentiment analysis process of classifying opinions towards
vaccines like “positive”, “negative” or “neutral” can yield remarkable findings.
To be more precise, the goal of this study is to classify people who are in favor or
against vaccination, as well as people’s preferences for the three types of vaccines (Pfizer,
Moderna, AstraZeneca) that are available today. Luckily, this task can be automated with
the power of Machine Learning (ML) and Natural Language Processing (NLP). Twitter
data have been retrieved in portions at different points of time during a period of seven
months using Python programming language. After data preprocessing, the sentiment
analysis was conducted using TextBlob, Valence Aware Dictionary and sEntiment
Reasoner (VADER), AFINN and NRC tools. Graphical representation and performance
analysis with state-of-the-art models (Logistic Regression, Decision Tree, Random
Forest, XGBoost, and SVM Classifier) have been conducted on the tweets.
Our results indicate that when using English ‘summer’ tweets from Twitter with
TextBlob as a sentiment analysis tool, DT is the ML algorithm that gives the highest
accuracy equal to 97.99% and F1-Score equal to 97.98%. In the autumn period, DT
demonstrates again the best performance with an accuracy equal to 97.94%. The accuracy
rate was slightly reduced to 0.05%. When examining the classification performance
5
results for the Greek language dataset, it is observed that the algorithms have the ability
to distinguish better in the Greek language when a tweet has a positive, negative or neutral
mood. DT was again the winner with 99.89% accuracy and 99.88% F1-Score. Regarding
the autumn period, the performance of DT improved by 0.03% reaching 99.92%
Collections
Show Collections