GR Semicolon EN

Show simple item record

dc.contributor.author
Tompra, Konstantina Vasiliki
en
dc.date.accessioned
2024-06-14T10:48:18Z
dc.date.available
2024-06-14T10:48:18Z
dc.date.issued
2024-06-14
dc.identifier.uri
https://repository.ihu.edu.gr//xmlui/handle/11544/30417
dc.rights
Default License
dc.subject
Cardiovascular diseases
en
dc.subject
Preventive healthcare
en
dc.title
Enhancing preventive healthcare: Identifying high-risk patients for cardiovascular diseases
en
heal.type
masterThesis
en_US
heal.creatorID.email
tomprakwnstantina@gmail.com
heal.classification
Machine Learning
en
heal.dateAvailable
2024-06-08
heal.language
en
en_US
heal.access
free
en_US
heal.license
http://creativecommons.org/licenses/by-nc/4.0
en_US
heal.recordProvider
School of Science and Technology, MSc in Data Science
en_US
heal.publicationDate
2024-01-07
heal.abstract
This dissertation was conducted as a part of the MSc in Data Science at the Interna-tional Hellenic University. The global fight against cardiovascular diseases (CVD) is experiencing a plateau in progress. One of the major causes of this issue, is that it is extremely difficult even for health practitioners to predict heart diseases as it is an intricate task, demanding a great amount of knowledge and experience. In such times, there exists a growing demand to integrate machine learning (ML) and data mining within the healthcare system, as by har-nessing the wealth of available data, insights to society can be very beneficial. This research successfully addresses a significant gap in the existing literature, by thoroughly examining both machine learning models and neural networks for CVD risk prediction based on personal lifestyle factors in a highly imbalanced real-life dataset. We trained multiple classifiers, including namely, Logistic Regression (LR), Decision Trees (DT), Random Forest (RF), Gradient Boosting (GB), XGBoost (XGB), CatBoost and Arti-ficial Neural Networks (ANN). We used the Behavioral Risk Factor Surveillance System (BRFSS) 2021 Heart Disease Health Indicators dataset and to tackle the class imbalance challenge, we used methods such as Synthetic Minority Over Sampling Technique (SMOTE) Sampling, Adaptive Synthetic (ADASYN) Sampling, SMOTE-Tomek, and SMOTE-ENN. Based on the findings, we conclude that hybrid models like SMOTE-ENN and SMOTE-Tomek outperformed the alternative sampling techniques in terms of the sensitivi-ty metric. Our proposed implementation includes SMOTE-ENN coupled with CatBoost optimized through Optuna, achieving a remarkable 88% on recall and 82% on the AUC metric. Also, the ANN proposed, exhibited promising results, offering an additional layer of robustness in detecting positive cases of cardiovascular diseases.
en
heal.advisorName
Tjortjis, Christos
en
heal.committeeMemberName
Koukaras, Paraskeuas
en
heal.committeeMemberName
Akritidis, Leonidas
en
heal.academicPublisher
IHU
en
heal.academicPublisherID
ihu
en_US


This item appears in the following Collection(s)

Show simple item record

Related Items