heal.abstract
Using data from microblogging websites and analyze them to obtain their sentiment has become a popular approach for market prediction. However, many authors that analyzed this kind of data, stress the noise these data contain, and how difficult is to distinguish truly valid information. In this dissertation we collected 782.459 tweets starting from 2018 − 11 − 01 until 2019 − 31 − 07. For each user day, we create a graph (271 graphs in total) with the users that have tweeted and their followers, finally, we use this graph to obtain a PageRank score for each user. This score is then multiplied with the sentiment data. Our results indicate that using an importance-based measure, such as PageRank, can improve the scoring ability of the models, as the PageRank data set achieved, on average, a lower mean squared error than the economic data set and the sentiment data set. Lastly, we tested multiple machine learning models, the results show that XGBoost is the best model, with the random forest being the second best and LSTM being the worst.
en