GR Semicolon EN

Show simple item record

dc.contributor.author
Valk, Tom
en
dc.date.accessioned
2020-05-20T10:28:00Z
dc.date.available
2020-05-21T00:00:33Z
dc.date.issued
2020-05-20
dc.identifier.uri
https://repository.ihu.edu.gr//xmlui/handle/11544/29437
dc.rights
Default License
dc.subject
Artificial Intelligence
en
dc.subject
Radio Productions
en
dc.subject
Pattern Classification
en
dc.subject
Pattern Annotation
en
dc.title
Pattern Annotation and Classification in Broadcast Content of Radio Productions
en
heal.type
masterThesis
en_US
heal.generalDescription
This study concentrates on the recognition of speech and non-speech on their patterns by using radio productions as input and optimizing the extraction of numerical values, algorithms, and methods to combine and precise the accuracy over distinguishing the different categories and labels.
en
heal.language
en
en_US
heal.access
free
en_US
heal.license
http://creativecommons.org/licenses/by-nc/4.0
en_US
heal.recordProvider
School of Science and Technology, MSc in Data Science
en_US
heal.publicationDate
2019-12-02
heal.abstract
In the current digital century, there are plenty of radio stations to choose from. However, the choice usually is only based on the music genre, and the listener has to recognize if the program, schedule, and amount of talking suits their demands. In order to compare the amount of music/talking on a radio station, it could either be compared manually by listening, although, in modern times, this could also be automated by the usage of machine learning. This study concentrates on the recognition of speech and non-speech on their patterns by using radio productions as input and optimizing the extraction of numerical values, algorithms, and methods to combine and precise the accuracy over distinguishing the different categories and labels. The distinguishing is achieved by using knowledge from earlier research and combining modern newly introduced technologies and ideas, the paper experiments with a multi-layer classical machine learning setup. The numerical extraction from the audio input is executed with the usage of existing research and technologies from the digital signal processing and audio processing fields in combination with optimized parameters. Based on the literature review, the experimental setup extracts a set of features from the audio tracks, which are manually labeled to create ground truth label data. The experiments are covering three algorithms and will compare not only the algorithms but also the methods of extracting by tuning the hop and window sizes. Furthermore, two algorithms in the multi-layer setup are being parameter tuned using grid-search methods to result in an optimal setup specialized on the numerical data. The results indicate that the numerical extraction and the decision between the hop and window size is one of the most critical parameters. Furthermore, the results indicate that both MLP and XGBoost are very good in performance and show both similar results with negligible differences. Further research and experiments are demanded to optimize and increase the performance of the models by, for example, focusing on silence periods and reducing the impact of background noise on the performance.
en
heal.advisorName
Kotsakis, Rigas
el
heal.committeeMemberName
Kotsakis, Rigas
en
heal.committeeMemberName
Berberidis, Christos
en
heal.committeeMemberName
Baltatzis, Dimitrios
en
heal.academicPublisher
IHU
en
heal.academicPublisherID
ihu
en_US
heal.license.source-code
http://www.gnu.org/licenses/gpl-3.0.html
en_US


This item appears in the following Collection(s)

Show simple item record

Related Items