• Aberham Tadesse Zemedkun Rift Valley University School of Post Graduate Studies Department of Computer Science P.O.Box 80734 Addis Ababa, Ethiopia



Diabetes, Data Mining, ML, J48, PART, JRIP, Naïve Bayes


Diabetes is one of the most common non-communicable diseases in the world. Diabetes affects the ability to produce the hormone insulin. Thus, complications may occur if diabetes remains untreated and unidentified. That features a significant contribution to increased morbidity, mortality, and admission rates of patients in both developed and developing countries. When disease is not detected early, it leads to complications. Medical records of the cases were retrospective. Anthropometric and biochemical information was collected. From this data, four ML classification algorithms, including Decision Tree (J48), Naive-Bayes, PART rule induction, and JRIP, were used to prognosticate diabetes. Precision, recall, F-Measure, Receiver Operating Characteristics (ROC) scores, and the confusion matrix were calculated to determine the performance of the various algorithms. The performance was also measured by sensitivity and specificity. They have high classification accuracy and are generally comparable in predicting diabetes and free diabetes patients. Among the selected algorithms tested, the Decision Tree Classifier (J48) algorithm scored the highest accuracy and was the best predictor, with a classification accuracy of 92.74%.


Download data is not yet available.


A. G. Eapen, (2004) "Application of Data mining in Medical Applications by," Univ. Waterloo, Retrieved from

A. Iyer, J. S, and R. Sumbaly, (2015) "Diagnosis of Diabetes Using Classification Mining Techniques," Int. J. Data Min. Knowl. Manag. Process, vol. 5, no. 1, pp. 01-14, Retrieved from


A. Tella, (2015) "Electronic and paper based data collection methods in library and information science research: A comparative analyses," New Libr. World, vol. 116, no. 9-10, pp. 588-609, Retrieved from

B. Dagnew et al., (2021) "Hypertriglyceridemia and Other Plasma Lipid Profile Abnormalities among People Living with Diabetes Mellitus in Ethiopia: A Systematic Review and Meta-Analysis," Biomed Res. Int., vol. 2021, Retrieved from

B. S. Kumar and D. G. R., (2016) "A Survey on Data Mining Approaches to Diabetes Disease Diagnosis and Prognosis," Ijarcce, vol. 5, no. 12, pp. 463-467, Retrieved from

B. Zerihun, (2017) "Developing a Predictive Model for Pre-Diabetes Screening by Using Data Mining Technology." Addis Ababa University,

D. Kabakchieva, (2016) "Predicting Student Performance by Using Data Mining Methods for Classification Predicting Student Performance by Using Data Mining Methods for Classification Dorina Kabakchieva," no. March 2013, Retrieved from

H. Hauner and W. A. Scherbaum, (2002) "Type 2 diabetes," DMW - Dtsch. Medizinische Wochenschrift, vol. 127, no. 19, pp. 1003-1005, Retrieved from

H. Yan, Y. Jiang, J. Zheng, C. Peng, and Q. Li, (2006) "A multilayer perceptron-based medical decision support system for heart disease diagnosis," Expert Syst. Appl., vol. 30, no. 2, pp. 272-281, Retrieved from

I. M. Ahmed, A. M. Mahmoud, M. Aref, and A.-B. M. Salem, (2012) "A study on expert systems for diabetic diagnosis and treatment," Recent Adv. Inf. Sci., pp. 363-367,

J. James and K. Sarvanakumar, (2017) "Empirical Study on Data Mining Algorithms related to Breast Cancer," Indusedu.Org, vol. 07, no. 03, pp. 14-18,, [Online]. Available Retrieved from :

J. M. Dowling and C.-F. Yap, (2014) "Communicable Diseases in Developing Countries," Commun. Dis. Dev. Ctries., 2014. Retrieved from

J. Yu, H. Huang, and S. Tian, (2004) "Cluster validity and stability of clustering algorithms," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3138, no. 3, pp. 957-965, Retrieved from

K. Eyasu, W. Jimma, and T. Tadesse, (2020) "Developing a Prototype Knowledge-Based System for Diagnosis and Treatment of Diabetes Using Data Mining Techniques," Ethiop. J. Health Sci., vol. 30, no. 1, pp. 115-124, Retrieved from

O. Region, (2017) "Research in Molecular Medicine Prevalence of Prediabetes and its Risk Factors among the Employees of Ambo," vol. 5, no. 3, pp. 11-20, Retrieved from

R. Williams et al., (2020) "Global and regional estimates and projections of diabetes-related health expenditure: Results from the International Diabetes Federation Diabetes Atlas, 9th edition," Diabetes Res. Clin. Pract., vol. 162, Retrieved from

S. Anagaw, (2002) "Application of data mining technology to predict child mortality patterns : the case of butajira rural health project (brhp)," Unpubl. Masters thesis Addiss Ababa Univ.,.

S. Habibi, M. Ahmadi, and S. Alizadeh, (2015) "Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining," Glob. J. Health Sci., vol. 7, no. 5, pp. 304-310, Retrieved from

W. Gao and Q. Qiao, (2012) "Screening for type 2 diabetes," Epidemiol. Type 2 Diabetes, pp. 29-38, Retrieved from

Z. Marzuki and F. Ahmad, (2007) "Data Mining Discretization Methods and Performances," Mach. Learn., no. 1, pp. 978-980, Retrieved from




How to Cite

Zemedkun, A. T. (2021). PREDICTION OF DIABETES SCREENING BY USING DATA MINING ALGORITHMS. International Journal of Engineering Science Technologies, 5(6), 87–101.