This dissertation was written as a part of the MSc in Data Science at the International Hellenic
University.
Some of the most well-liked websites in the world are social media platforms, such as YouTube,
that provide everyone with a voice and the ability to express their thoughts and feelings.
Sentiment analysis may be used to retrieve and measure these users' thoughts and feelings. This
study uses a hybrid approach that combines learning-based and lexicon-based methodologies to
achieve better results. The comments are labeled using general-purpose sentiment lexicons like
TextBlob, VADER and Flair. Furthermore, the sentiments of the comments are classified using
the learning models Logistic Regression (LR), Multinomial Naive Bayes (Multi.NB), Random
Forest (RF), Support Vector Machine (SVM), and Stochastic Gradient Descent Classifier (SGD
Classifier). The algorithms' performance is evaluated using accuracy, precision, recall, and F1-
score. Results from TextBlob are encouraging, with an accuracy of 91% when using SVM.
Finally, topic modeling was applied to extract information about the content of the comments.
Five dominant topics were spotted, that refer to what the users feel about the commercials.
Collections
Show Collections