GR Semicolon EN

Show simple item record

dc.contributor.author
Michalakos, Marios Aristotelis
en
dc.date.accessioned
2015-05-29T18:53:55Z
dc.date.available
2015-09-27T05:56:26Z
dc.date.issued
2015-05-29
dc.identifier.uri
https://repository.ihu.edu.gr//xmlui/handle/11544/121
dc.rights
Default License
dc.title
Content-based Tweets Semantic Clustering and Propagation
en
heal.type
masterThesis
heal.language
en
heal.access
free
el
heal.license
http://creativecommons.org/licenses/by-nc/4.0
heal.recordProvider
School of Science and Technology, MSc in Information & Communication Technology Systems
heal.publicationDate
2014-11
heal.bibliographicCitation
Michalakos Marios Aristotelis, 2014 , Content-based Tweets Semantic Clustering and Propagation, Master's Dissertation, International Hellenic University
en
heal.abstract
In this thesis our goal was to develop a methodology in order to cluster a set of tweets based on their semantic context. We have used probabilistic topic modeling techniques such as Latent Dirichlet allocation in order to extract topics from our dataset and then we applied several natural language methods in order to automatically generate semantically meaningful and grammatically correct phrases, as candidate labels for our extracted topics, aiming at creating an objective method for topic labeling. Developing a scoring function in order to assign the most semantically similar labels to our extracted topics was an essential part to our research that has helped us to assign the most relevant labels to each topic. Then we have generated the Twitter graph and used community detection algorithms in order to analyze each community topic of interest. This way we have been able to record the propagation of certain topics in our graph and we have been able analyze the topics of interest in each community in our graph. Using visualization layout algorithms was also essential in order to provide meaningful visualizations of our networks. We have created datasets that was populated using Twitter’s API and we have used open source tools in order to develop the software implementation of this method and a fully working prototype has been developed. Our research can be used as a valuable asset for modern market analysis from companies.
en
heal.tableOfContents
1. INTRODUCTION ................................................................................................................... 1 2. LITERATURE REVIEW ........................................................................................................ 3 2.1 INTRODUCTION ..................................................................................................................................... 3 2.2 SENTIMENT ANALYSIS ......................................................................................................................... 3 2.3 AUTOMATIC TOPIC LABELING ............................................................................................................ 4 2.4 TREND DETECTION .............................................................................................................................. 5 2.5 KNOWLEDGE BASED TOPIC LABELING ............................................................................................. 6 2.6 ENTITY BASED TOPIC DISCOVERY ..................................................................................................... 7 2.7 AUTOMATIC TOPIC LABELING ............................................................................................................ 8 2.8 COMMUNITY DETECTION IN SOCIAL NETWORKS ............................................................................ 9 3. PROBLEM DEFINITION ................................................................................................... 11 3.1 DATA GATHERING AND PREPARATION ........................................................................................... 12 3.2 TOPIC MODELING ............................................................................................................................... 13 3.2.1 Topic Extraction ........................................................................................................................ 13 3.2.2 Topic Labeling ............................................................................................................................ 14 3.3 KNOWLEDGE DISCOVERY .................................................................................................................. 15 3.3.1 Network Analysis ....................................................................................................................... 15 3.3.2 Community Detection .............................................................................................................. 16 3.3.3 Community Analysis ................................................................................................................. 17 3.3.4 Visualization ................................................................................................................................ 18 4. METHODOLOGY ................................................................................................................ 20 4.1 NATURAL LANGUAGE PROCESSING AND UNDERSTANDING ......................................................... 20 4.1.1 Data Transformation ............................................................................................................... 21 4.1.2 Part-­‐of-­‐speech Tagging (POS) ............................................................................................. 23 4.1.3 Chunking/Shallow Parsing ................................................................................................... 24 4.2 TEXT AND DATA MINING .................................................................................................................. 28 4.2.1 Topic Modeling ........................................................................................................................... 28 4.2.2 Generative Processes ............................................................................................................... 29 4.2.3 Latent Dirichlet Allocation (LDA) ...................................................................................... 30 4.2.4 Topic Discovery and Classification .................................................................................... 31 4.2.5 Probabilistic Topic Labeling ................................................................................................. 33 4.3 SOCIAL NETWORK ANALYSIS ........................................................................................................... 35 4.3.1 Graph Generation ...................................................................................................................... 35 VI 4.3.2 Community Discovery .............................................................................................................. 38 4.3.3 Community Labeling ................................................................................................................ 39 4.3.3.1 Degree ............................................................................................................................... 39 4.3.3.2 Influence .......................................................................................................................... 40 4.3.4 Community Analysis ................................................................................................................. 40 4.3.5 Network Visualization ............................................................................................................ 43 5. EXPERIMENTS AND RESULTS ....................................................................................... 45 5.1 PHRASE GENERATION ....................................................................................................................... 46 5.2 TOPIC EXTRACTION USING LDA ...................................................................................................... 48 5.3 TOPIC LABELING USING ZERO-­‐ORDER SCORING FUNCTION ........................................................ 51 5.4 UNFOLDING COMMUNITIES .............................................................................................................. 52 5. CONCLUSION AND FUTURE WORK ............................................................................. 59 BIBLIOGRAPHY ...................................................................................................................... 61 APPENDIX ................................................................................................................................ 64 APPENDIX A’ PYTHON SCRIPTS ............................................................................................................... 64 A.1 ConnectTweepyMongo.py ......................................................................................................... 64 A.2 ConnectTweepyCSV.py ................................................................................................................ 65 A.3 ChunkerPhraseGeneration.py ................................................................................................. 66 A.4 TopicExtractionLDAphraseRanking.py .............................................................................. 68 A.5 GraphGenerationWithLabels.py ............................................................................................. 72 APPENDIX B’ .............................................................................................................................................. 77
en
heal.advisorName
Bassiliades, Nick
en
heal.committeeMemberName
Berberidis, Christos
en
heal.committeeMemberName
Bassiliades, Nick
en
heal.committeeMemberName
Tzortzis, Christos
en
heal.academicPublisher
School of Science &Technology, Master of Science (MSc) in Information and Communication Systems
en
heal.academicPublisherID
ihu
heal.numberOfPages
85
heal.fullTextAvailability
true


This item appears in the following Collection(s)

Show simple item record

Related Items