Comparative Analysis of Machine Learning Algorithms for Chronic Kidney Disease Prediction.
Authors
Mst. Nafia Islam Shishir
Abstract
In recent days, chronic kidney disease (CKD) has
been recognized as one of the most significant health problems
globally. The defining feature of CKD is a progressive
deterioration in renal function over time. Since kidney damage
develops slowly over a long period of time, early detection and
appropriate treatment may be able to save the lives of many.
Machine learning classifier algorithms have emerged as a
reliable tool to identify the disease at its early stages, providing
a means to intervene and manage it sooner than other methods.
In this paper, the performance of 10 models is evaluated on the
dataset of CKD collected from the UCI ML repository for the
classification of CKD. The training data in this study was
augmented by applying the SMOTE technique and Gaussian
noise. In case of missing values handling, for numerical and
categorical variables KNN imputation and mode imputation for
features were utilized respectively. Combining the filter,
wrapper and embedded feature selection strategies led to the
identification of the most important 13 features. Extra Tree
Classifier, XGBoost, Gradient Boosting and Random Forest
performed better than other algorithms with an accuracy of
99.17%. When compared to the other nine methods, Extra Tree
Classifier performed extremely well in case of precision, recall
and F1 score. For this proposed approach, the error rate and
training time were all comparatively low at 0.0083, and 0.0787
seconds respectively. This paper illustrates the performance
comparison of ten different machine learning (ML) algorithms
and the importance of feature selection for predicting CKD.