PERFORMANCE ANALYSIS OF CLASSIFICATION ALGORITHM ON DIABETES HEALTHCARE DATASET

Subhankar Manna; Malathi G.

doi:10.29121/granthaalayah.v5.i8.2017.2229

Authors

Subhankar Manna MCA, VIT University, Chennai Campus, India
Malathi G. Associate Professor, School of Computing Science & Engineering, VIT University, Chennai Campus, India

DOI:

https://doi.org/10.29121/granthaalayah.v5.i8.2017.2229

Keywords:

Classification, Probabilistic Classification, Naïve Bayes Methodology, ID3 Methodology

Abstract [English]

Healthcare industry collects huge amount of unclassified data every day. For an effective diagnosis and decision making, we need to discover hidden data patterns. An instance of such dataset is associated with a group of metabolic diseases that vary greatly in their range of attributes. The objective of this paper is to classify the diabetic dataset using classification techniques like Naive Bayes, ID3 and k means classification. The secondary objective is to study the performance of various classification algorithms used in this work. We propose to implement the classification algorithm using R package. This work used the dataset that is imported from the UCI Machine Learning Repository, Diabetes 130-US hospitals for years 1999-2008 Data Set.

Motivation/Background: Naïve Bayes is a probabilistic classifier based on Bayes theorem. It provides useful perception for understanding many algorithms. In this paper when Bayesian algorithm applied on diabetes dataset, it shows high accuracy. Is assumes variables are independent of each other.

In this paper, we construct a decision tree from diabetes dataset in which it selects attributes at each other node of the tree like graph and model, each branch represents an outcome of the test, and each node hold a class attribute. This technique separates observation into branches to construct tree. In this technique tree is split in a recursive way called recursive partitioning. Decision tree is widely used in various areas because it is good enough for dataset distribution. For example, by using ID3 (Decision tree) algorithm we get a result like they are belong to diabetes or not.

Method: We will use Naïve Bayes for probabilistic classification and ID3 for decision tree.

Results: The dataset is related to Diabetes dataset. There are 18 columns like – Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. and 623 rows. Naive Bayes Classifier algorithm will be used for getting the probability of having diabetes or not.

Here Diabetes is the class for Diabetes data set. There are two conditions “Yes” and “No” and have some personal information about the patient like - Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. We will see the probability that for “Yes” what unit of probability and for “No” what unit of probability which is given bellow. For Example: Gender – Female have 0.4964 for “No” and 0.5581 for “Yes” and for Male 0.5035 is for “No” and 0.4418 for “Yes”.

Conclusions: In this paper two algorithms had been implemented Naive Bayes Classifier algorithm and ID3 algorithm. From Naive Bayes Classifier algorithm, the probability of having diabetes has been predicted and from ID3 algorithm a decision tree has been generated.

Downloads

Download data is not yet available.

References

MR. CHINTAN SHAH from Information Technology, SHANKERSINH VAGHELA BAPU from Institute of Technology Gandhinagar, India. (2013). “COMPARISON OF DATA MINING CLASSIFICATION ALGORITHM FOR BREAST CANCER PREDICTION.”

ZEINAB SEDIGHI, HOSSEIN EBRAHIMPOUR-KOMLEH, SEYED JALALEDDIN MOUSAVIRAD, Department of Computer Engineering, Faculty of Computer and Electrical Engineering, University of Kashan, Kashan, I.R.Iran, November, (2015), “FEATUE SELECTION EFFECTS ON KIDNEY DESEASE ANALYSIS.” DOI: https://doi.org/10.1109/ICTCK.2015.7582712

VEENIT KUNWAR, KHUSBOO CHANDEL, A. SAI SABITHA, Amity University Uttar Pradesh, July, (2013). “CHRONIC KIDNEY DISEASE ANALYSIS USING DATA MINING CLASSIFICATION TECHNIQUES.”

HAMIDAH JANTAN, ABDUL RAZAK HAMDAN AND ZULAIHA ALI OTHMAN, University Teknologi MARA (UiTM) Terengganu, 23000 Dungun, Terengganu, “MALAYSIA.POTENTIAL DATA MINING CLASSIFICATION TECHNIQUES FOR ACADEMIC TALENT FORECASTING.”

D. RAJESWARA RAO, VIDYALLATA PELLAKUN, SATHISH TALLAM, RAMYA HARIKA, K L University, Guntur, Andhra Pradesh. (2015) “PERFORMANCE ANALYSIS OF CLASSIFICATION ALGORITHMS USING HEALTHCARE DATASET.”

KETAN SANJAY DESALE, CHANDRAKANT KUMATHEKAR, ARJUN PRAMOD CHAVAN from DYPSOEA, Pune, Maharahtra. (2015), “EFFICIENT INTRUSION DETECTION SYSTEM USING STREAM DATA MINING CLASSIFICATION TECHNIQUE.” DOI: https://doi.org/10.1109/ICCUBEA.2015.98

GRIGORIOS CHYSOS, PANAGIOTIS DAGRIZIKOS, IOANNIS PAPAEFSTATHIOU, APOSTOLOS DOLLAS, Microprocessor & Hardware Laboratory, Dept of Electronic and Computer Engineering, Chaina, Greece. (2012), “NOVEL AND HIGHLY EFFICIENT RECONFIGURABLE IMPLEMENTATION OF DATA MINING CLASSIFICATION TREE.” DOI: https://doi.org/10.1109/FPL.2011.82

C. M. VELU, K. R. KASHWAN, Dept of Computer Science and Engineering Dattakala Group of Institution, Swami Chincholi, Pune. (2013), “VISUAL DATA MINING TECHNIQUES FOR CLASSIFICATION OF DIABETIC PATIENTS.” DOI: https://doi.org/10.1109/IAdCC.2013.6514375

A. SWARUPA RANI, S. JYOTHI, Dept of Computer Science Sri Padmavathi Visvavidyalayam, Tirupati, AP. (2016), “PERFORMANCE ANALYSIS OF CLASSIFICATION ALGORITHM UNDER DIABETIC DATASET.”