This dissertation was written as a part of the MSc in Data Science at the International
Hellenic University. NLP applications often use text-to-text transformations, in which a
system given a natural language word sequence as input, is expected to generate an alternative version of this text as output, also in natural language. In Machine Translation, the
evaluation of this output can be done in two ways. Firstly, by comparing the MT output
to one or more reference outputs with the help of distance-based evaluation metrics and
secondly, by building ML models, trained on large human-annotated datasets, that aim at
predicting the quality of MT outputs when reference translations are not known. Following the second approach, the goal of this dissertation is to develop a Quality Estimation
(QE) model able to predict confidence scores for given English to Greek automated translations. For that, several machine learning algorithms are explored and trained on a dataset of 77720 human-annotated English to Greek translation tuples, where each of these
tuples consists of the source, the target and the edited segment
Collections
Show Collections