COMPARATIVE STUDY OF MACHINE LEARNING KNN, SVM, AND DECISION TREE ALGORITHM TO PREDICT STUDENT’S PERFORMANCE

Students who are not-active will affect the number of students who graduate on time. Prevention of not-active students can be done by predicting student performance. The study was conducted by comparing the KNN, SVM, and Decision Tree algorithms to get the best predictive model. The model making process was carried out by steps; data collecting, pre-processing, model building, comparison of models, and evaluation. The results show that the SVM algorithm has the best accuracy in predicting with a precision value of 95%. The Decision Tree algorithm has a prediction accuracy of 93% and the KNN algorithm has a prediction accuracy value of 92%.


Introduction
Improving the quality of education and accreditation of departments is always endeavored by every college department. Timeliness of graduating students is one of the elements for accreditation assessment [1]. The accreditation will be better if more students graduate on time. Students who are not-active will affect the number of students who graduate on time. Thus, the more students who graduate not on time will the lower the department's accreditation.
Prevention of not-active students can be done by predicting student performance. Several studies on student performance had been conducted. Some studies use Data Mining algorithm. Data Mining algorithm was used to perform student performance analysis system (SPAS) [2], to analyze student performance using clustering techniques [3], and to predict student performance (poor, average, good, and excellent) using educational data [4]. Other research by applying Decision Tree algorithms such as; predictions of drop-out students from college based on GPA [5], analysis to predictive the accuracy of 4-year studies of student [6]. Other research to predict student performance at the beginning of joining a course program [7], predicting student performance in Http://www.granthaalayah.com ©International Journal of Research -GRANTHAALAYAH [191] distance higher education using active learning [8], predictions of student performance correlated with course activities [9], and predicting student performance using advanced learning analytics to compare features [10]. In addition to the Data Mining algorithm, using the Fuzzy method is also done to predict student performance. Fuzzy Support System method was used for evaluation of student performance in laboratory [11], and an application of fuzzy logic for evaluation of student academic performance [12].
Research by comparing several algorithms to get the best predictions has been done. Among had been done is; comparing Simple Logistic Classifier and SVM algorithms to predict athlete's win [13]at, comparative analysis between SVM and KNN classifier for EMG signal classification [14], compare KNN, SVM, and Random Forest algorithms for facial expression classification [15]. Comparative algorithm research for predicting student performance had also been carried out. Among them have been done are; look for classification algorithm that can be used to predict student performance [16], comparing Bayesian algorithm and Decision Tree [17], compare Apriori and K-Means algorithms [18], and compare Neural Network, SVM, and Decision Tree algorithms [19]. From some studies about student performance by comparing several algorithms, no one had compared the KNN, SVM, and Decision Tree algorithms in predicting student performance. The research that had been done aims to compare algorithms (KNN, SVM, and Decision Tree) to get the best model for predicting student performance.

Methods
This research had been done using several Machine Learning algorithms, namely KNN, SVM, and Random Forest. The tools used are R Studio. The library used in the R Studio is the Caret package. Machine Learning processing through several processes: data collecting, preprocessing, model building, comparison of models, and evaluation [20]. The research process is shown in Figure 1. Http://www.granthaalayah.com ©International Journal of Research -GRANTHAALAYAH [192] etc. Pre-processing is also done by spliting the data into training and testing. Training data is used to build models. The model that has been built is then tested using data testing to determine the accuracy of the prediction. The next step is to compare several models that have been built, namely the model of the KNN, SVM, and Decision Tree algorithm. The final step is to evaluate to determine the best algorithm for predicting student performance based on the model obtained.

Results
Student academic data of Informatics Engineering Department Politeknik Harapan Bersama are used in this paper. The dataset consists of 1530 rows and 7 attributes data. First 6 variables had used for predicting 7 th variable. Table 1 shows all the details of data.  ) is the average score of learning outcomes every semester, 0 means the lowest score and 4 means the highest score. GPA (Grade Points Average) is the cumulative average point value of all semesters that have been passed, 0 means the lowest score and 4 means the highest score. Hometown is the hometown of students, 0 means student coming from a city that near from campus and 1 means student coming from a city far away from campus. Type of school is a type of high school, 0 means students come from private schools and 1 means students come from public schools. Major is majors when high school, 1 means students come from the computer/informatics department, 2 means students come from natural science majors, and 3 mean students come from other than both. Parents jobs are jobs from student parents, 1 means parents work as civil servants, 2 means as private employees, 3 mean as entrepreneurs, 4 means as farmers/fishermen, and 5 mean other than that. Actif is student performance, 0 means students are not active and 1 means students are active.

Model Result
Before the data is processed, the data set is split into two parts by a ratio of 75:25, which 75% to training and 25% to testing. Training data used to get model. Training data used were 1148 Http://www.granthaalayah.com ©International Journal of Research -GRANTHAALAYAH [193] samples, 6 predictor, and 2 classes, with cross-validation 10 fold and repeated 3 times. Output of training data is a model used for classification. The model that had been built is shown in Table 2. The model was then tested used testing data to know how accurate that model. Table 3 show a matrix of the testing result for KNN algorithm, Table 4 is testing result for SVM algorithm, and Table 5 is testing result for Decision Tree algorithm.

Classification Result
Classification result is obtained from the model that has been tested. Table 6 shows the comparison of the testing result between KNN, SVM, and Decision Tree algorithm on the confusion matrix. Figure 5 shows the comparison accuracy between algorithm based on classes.  The final result is a comparison of model classification to see which algorithm has the best accuracy. Table 7 shows the comparison of the classification model obtained, while Figure 6 shows the comparison graph of the classification accuracy.

Discussion
The best model for KNN algorithm to predict student performance is k (kernel) = 5 with accuracy 93.81%, value C = 1 for SVM algorithm with accuracy 95.09%, and cp = 0.6689113 for Decision Tree algorithm with 95.65% accuracy. The comparison of the three algorithms shows that the best accuracy is the Decision Tree algorithm. This model has not been tested yet. After testing, it turns out that the SVM model can predict better than the KNN algorithm and Decision Tree. It can be seen that the SVM algorithm can predict exactly 311 active students and 53 non-active students, while the Decision Tree algorithm can only predict exactly 308 active students and 48 non-active students, and the KNN algorithm only predicts exactly 308 active students and 45 non-active students. If not testing the model, the Decision Tree is the best predictive accuracy model compared to SVM and KNN. Whereas if the model testing is done, SVM algorithm is the best accuracy model compared to Decisioan Tree and KNN.
Comparison with marix confusion shows different things from the results of previous comparisons. SVM algorithm has the best accuracy to predict active students (96%) compared to KNN (92%) and Decision Tree (92%). However, the Decision Tree algorithm has the best accuracy for predict non-active students (92%) compared to SVM (91%) and KNN (85%). Although algorithm decision tree has the best accuracy in predicting non-active students, but only 1% difference from SVM algorithm. While for predicting the accuracy of active students, SVM has a 4% difference from Decision Tree and KNN. It could be said that the SVM algorithm still occupies the best position compared to Decision Tree and KNN. This is corroborated after the overall accuracy calculation is performed, it is found that SVM has the best classification accuracy of 95% while the Decision Tree has 93% accuracy and KNN has 92% accuracy. Thus, the best algorithm for predicting student performance is by using the SVM algorithm.

Conclusions
KNN algorithm can predict student performance well with k = 5. The best model of SVM algorithm to predict student's performance is by using the value of C = 1. Whereas if using the Decision Tree algorithm, the best predictions if using the model cp = 0.6689113. Comparison of three algorithm machine learning (KNN, SVM, and Decision Tree) shows that SVM has the best accuracy (95%) compared to Decision Tree (93%) and KNN (92%) in predicting student performance.