DATA MINING TECHNIQUES FOR EDUCATIONAL DATA: A REVIEW

: Recently, data mining is gaining more popularity among researcher. Data mining provides various techniques and methods for analysing data produced by various applications of different domain. Similarly, Educational mining is providing a way for analyzing educational data set. Educational mining concerns with developing methods for discovering knowledge from data that come from educational field and it helps to extract the hidden patterns and to discover new knowledge from large educational databases with the use of data mining techniques and tools. Extracted knowledge from educational mining can be used for decision making in higher educational institutions. This paper is based on literature review of different data mining techniques along with certain algorithms like classification, clustering etc. This paper represents the effectiveness of mining techniques with educational data set for higher education institutions .


Introduction
"We are living in the information age". However, we are actually living in the data age. This explosively growing, widely available, and gigantic body of data makes our time truly the data age. Powerful and versatile tools are badly needed to automatically uncover valuable information from the tremendous amounts of data and to transform such data into organized knowledge. This necessity has led to the birth of data mining. Data mining has and will continue to make great strides in our journey from the data age toward the coming information age [16].
Data mining provides the different methods and techniques to transform the raw data into useful knowledge by evaluating the huge data repositories from which new knowledge can be made. Data mining, which is also called Knowledge Discovery in Databases (KDD), is the field of discovering new and potentially useful information from large databases. Data mining finds its application in various domains like marketing, healthcare industry, criminal domain etc. Educational Mining (EM) is one the application domain of Data Mining techniques on educational data. The objective of EDM is to analyse such data and to resolve educational research issues.

Educational Mining
Data mining introduces a new technique known as educational mining. In educational mining, data mining concepts are applied to data that is related to field of education. It is the process of remodelling the data assembled by education systems. Educational mining means analysing hidden data that came from educational settings by using new methods for better understanding of students and context they learnt.
Educational mining support distinct tools and algorithms for analyze the data patterns. In EM, data is acquired during learning process and then analysis work can be done with the techniques from statistics, machine learning and other data mining concepts. To extract the hidden knowledge from data came from educational system, the various data mining techniques like classification, clustering, rule mining etc. have been discussed for generating better decisions in educational system. The academic's responsible and educators worked upon the educational system to strengthen the performance of students. In this diagram it is shown that educators want to design the educational system then plan to build that system and substantially maintain that educational system. Educational systems include traditional classrooms and some innovative learning methods like e-learning system, intelligent and adaptive web based educational system etc. The data set can be extracted from students as students are directly connected with educational system. Now the data is given as input to data mining methods and in result it produces guidance to students and to extract new knowledge to the educators by using various data mining techniques like clustering, classification, pattern matching etc.

Goals of Educational Mining
Baker and Yacef [28] describe the following four goals of Educational Mining:  Predicting student's future learning behaviour  Discovering or improving domain models  Studying the effects of educational support  Advancing scientific knowledge about learning and learners 1) Predicting student's future learning behaviour -With the use of hidden knowledge in student modelling, this goal can be achieved by creating student models that incorporate the learner's characteristics, including detailed information such as their knowledge, behaviour's and motivation to learn. 2) Determining or improving domain models -Through the several methods and applications of EM, discovery of new models and explore them to make appropriate improvements to existing models is possible. 3) Studying the effects of educational support -It can be achieved through learning systems. 4) Advancing practical knowledge about learning and learners -By constructing and incorporating student models, the field of EM research and the technology and software used.

Phases of Educational Data Mining
Educational Mining is concerned with transformation of new extracted knowledge from the raw data collected from educational systems. The data relevant to educational systems is collected which is to be mined from different educational system resources i.e. from course management system (different institutes), E-learning environment, web based data (i.e. YouTube, twitter)which is relevant to students activities during learning process(i.e. their academic grades, students posts on social networking sites etc).
EM generally consists of four phases: 1) The first phase of educational data mining is to find the relationships between data of educational environment. The purpose of constituting these relationships is to utilize these relationships in various data mining techniques like classification, clustering, regression etc. 2) The second phase of educational mining is validation of discovered relationships between data so that over fitting can be avoided.
3) The third phase is to make predictions for future on the basis of validated relationships in learning environment. 4) The fourth phase is supporting decision making process with the help of predictions.

Related Work
This section concisely explains the literature review of various surveys and their comparative studies on the classification algorithms applied on educational mining. In year 2010 Z. J. Kovacic [15] explained a case study on educational data mining to identify upto what extent the enrolment data can be used to predict student's success. The accuracy obtained with CHAID and CART was 59.4 and 60.5 respectively.
In year april 2011 Bhardwaj and Pal [14] conducted the study on the student performance and found that the factors like students' grade in senior secondary exam, living location, medium of teaching, mother's qualification, students other habit, family annual income and student's family status were highly correlated with the student academic performance.
In year 2011 Pandey and Pal [13] accomplished study on the student performance established by selecting 600 students from different colleges. By means of Bayes Classification on category, language and background qualification, it was found that whether new comer students will performer or not.
In year 2012 Surjeet Kumar Yadav et al. [12] have explained Decision tree algorithms on students' data set and found C4.5 can learn effective predictive models from the student data and gives the better accuracy of classification.
In year april 2012 Sonali Agarwal et al. [10] describes the implementation by applying classification algorithms on educational data and found SVM classifier LIBSVM with Radial Basis Kernel has been taken as a best choice for data classification in her studies.
In year December 2012 Surjeet Kumar and Brijesh Bharadwaj [11] have done comparative analysis on the decision tree classification algorithms and found CART algorithm is classifying the First, Second, Third class and Fail students with high accuracy.
In april 2013 Trilok Chand Sharma et al. [7] discussed about the classification algorithms and explained how to run those classifiers for the selected dataset and found decision tree gives better performance and high accuracy.
In year 2013 V.Ramesh [9] has done an analysis on classification algorithms for redicting student's performance and found Multi-Layer Perceptron predicted the performance better than the others and also found that parents' designation plays a vital role for predicting their grades.
In year July 2013 Sumit Garg et al. [8] analysed various classification algorithms on educational data set and explained thorough details of the implemented algorithms and suggested some good algorithms to work on students' data set to predict their future performance.
In year 2014 Abdul Hamid and Amin [6] analysed and compared students' enrolment approval using classification algorithms and found C4.5 gives high accuracy and lowest absolute errors.
In year 2015 Amirah mohamed Shahiri et al. [2], have done a review on predicting students performance in data mining techniques and found classification algorithms predicts the performance better than other techniques in data mining and C4.5 is highly used to by the researchers for predicting student's performance. In year January 2015 Pooja Thakar et al. [3], broadly analysed many papers on educational mining which compared the data mining technique predicts student's performance and found the attributes which are highly correlated with the student's performance.
In year 2015 C. Anuradha and T. Velmurugan [4] selected classification algorithms and tested on students' data set and found that Classification of the students based on the attributes reveals that prediction rates are not uniform among the classification algorithms and also show classification algorithms works differently depends on the selection of attributes.
In year 2015 Sagar Nikam [5] has done the comparative study of classification algorithms. Analysis of classification algorithm says each algorithm has its own merits and demerits and the techniques have to be selected based on the situation.
In year June 2016 R. Sumitha, and Vinothkumar [14], analysed and compared Classification algorithms on students' data set and found J48 gives better accuracy of 97%.

Educational Mining Methods
EM not utilizes only data mining techniques like classification, clustering, and association analysis, but also applies methods and techniques drawn from the variety of areas related to EM (statistics, machine learning, text mining, web log analysis, etc.).There are various methods of educational mining but all kind of methods lie in one of following categories:

1) Prediction:
The objective is to build a model which can interpret a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables). Predictions methods are classification, regression (when the predicted variable is a continuous value), or density estimation (when the predicted value is a probability density function). 2) Classification: It classifies a data item into some of several predefined categorical classes .The algorithm used for classification are:  Decision tree  Naive biased classification  Generalized Linear Models (GLM)  Support vector machine etc.

3) Clustering:
In clustering technique, the data set is divided in various groups, known as clusters. As per clustering phenomenon, the data point of one cluster and should be more similar to other data points of same cluster and more dissimilar to data points of another cluster. There are two ways of inception of clustering algorithm: Firstly, start the clustering algorithm with no prior assumption and second is to start clustering algorithm with a prior premise. 4) Relationship mining: It is used for discovering relationships between variables in a dataset and encoding them as rules for later use. There are several types of association in mining techniques is available here which shows a better relationship between the attributes such as association rule mining (any relationships between variables), sequential pattern mining (temporal associations between variables), correlation mining (linear correlations between variables), and causal data mining (causal relationships between variables). In EM, basically relationship mining is an important concept to analyse the relationships between the student's on-line activities and the final marks and to model learner's problem solving activity sequences..

5) Discovery with Models:
Its aim is to use a validated model of an event (using prediction, clustering, or knowledge engineering) as a component in further analysis such as prediction or relationship mining. It is used for example to identify the relationships between the student's behaviour and characteristics.

Classification Algorithms Used in Educational Data
In this section, various classification algorithms used in educational mining is discussed here:

Classification Algorithms
Classification is one of the Data Mining techniques that are mainly used to analyse a given dataset. It is used to extract models that accurately define important data classes within the given dataset. Classification is a two-step process.
Step 1ː The model is created by applying classification algorithm on training data set Step 2ː The extracted model is tested against a predefined test dataset to measure the trained model performance and accuracy. So classification is the process to assign class label from dataset whose class label is unknown.

Decision Tree
A decision tree is a decision support tool that uses a tree like graph and models and their possible consequences, where each internal node is denoted by rectangles, and leaf nodes are denoted by ovals. All internal nodes have two or more child nodes. All internal nodes contain splits, which test the value of an expression of the attributes. Curves from an internal node to its children are labelled with distinct outcomes of the test. Each leaf node has a class label associated with it. It is basically used for decision analysis and research operations etc. training data same as ID3. In order to handle continuous attributes, C4.5 partitions the attribute values from a given training set into two smaller partitions based on the selected threshold in such a manner that all the values above the threshold as one child and the remaining as another child. It also handles missing attribute values. C4.5 uses Gain Ratio as an attribute selection measure to build a decision tree. It reduces the biasness of information gain when there are many event values of an attribute. At first, calculate the gain ratio of each attribute. The root node will be the attribute whose gain ratio is maximum. It handles the training data with different attributes and different attributes with different costs.

Random Forest
Random forest is a collection of decision trees built up with some element of random choice. Random forest works by generating a number of trees to analyse the data then it combine all the output from tree and then through the process of vote (look for the classes who have the majority) to obtain the final result. Random forest has high robustness for large data but it consumes much cost than other techniques

Naive Bayes
The Naive Bayes technique is statistical classifier technique and is particularly used when the dimensionality of the inputs is high. It is based on Bayesian theorem. It helps to predict class membership probabilities such as the probability that a given tuple belongs to a particular class. The Bayesian Classifier is capable of computing the most possible output based on the input. It is also possible to add new raw data at runtime and have a desirable probabilistic classifier Naïve Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is called class conditional independence. It is made to simplify the computations involved and, in this Sense, is considered "naïve" [16].

Multi-Layer Perceptron
This method of classification is a class of feed forward artificial neural network model known as multi-layer perceptron that maps sets of input data onto a set of appropriate output data. As its name suggests, it consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. The schematic architecture of this class of networks, besides having the input and the output layers, also have one or more additional layers also called the hidden layers. The hidden layer that means, the middle layer performs intermediate computation before giving the input to output layer.

Support Vector Machine
Support Vector Machines is one of the classification techniques which are based on the concept of decision planes that define decision boundaries. This classifier method uses a nonlinear mapping to transform the original training data into a higher dimension. Within this new dimension, it searches for the linear optimal separating hyper plane (i.e., a "decision boundary" separating the tuples of one class from another).The SVM finds best separable hyper plane using support vectors ("essential" training tuples) and margins (defined by the support vectors). SVM supports both regression and classification tasks and can handle multiple continuous and categorical variables. Support vector machine operator consists of kernel types including dot, radial, polynomial, neural, etc.

Review on Comparative Study of Educational Mining (EM) Techniques
This section describes the review on different studies done in educational domain which provides a comparative analysis and come to know the best algorithm and technique for obtaining more accurate and efficient results:

Efficiency of Decision Trees in Predicting Student's Academic Performance
In this paper, S. Anupama Kumar et.al has suggested an approach for predicting the student's performance in examination. They have used C4.5 (J48 in WEKA) to do the prediction analysis.
In data collection, a slight modification has been done in defining the nominal values for the analysis of accuracy. As per need of system, data is pre-processed, and integer values are converted into nominal values and stored in .CSV format. After that it is converted to .ARFF format that is accessible to WEKA. In this paper, the implementation of decision trees rules can be done by dividing the data into two groups. J48 create decision trees by using a set of training data and ID3 shows the same results with the concept of information entropy. In decision tree the attribute for splitting at each node of tree is normalized information gain. The attribute having highest normalized information gain is chosen to make decision. This paper analyzes the accuracy of algorithm in two ways, the first is by comparing the result of tree with the original marks obtained by student and the second is comparing the ID3 and C4.5 algorithm in terms of efficiency.

Classification Model of Prediction for Placement of Students
In paper Ajay Kumar Pal has presented a new approach of classification to predict the placement of students. This approach provides the relations between academic records and placement of students. In this analysis, various classification algorithms are employed by using data mining tools like WEKA for study of student's academic records. In this approach the training algorithm uses a set of predefined attributes. The most widely used classification algorithms are, naïve Bayesian classification algorithm, multilayer perceptron and C4.5 tree. For the high dimensional inputs the naïve Bayesian classification is best technique. Multilayer perceptron is most suitable for vector attribute values for more than one class. Nowadays C4.5 is most popularly used algorithms due its added features like supervising missing values, categorization of continuous attributes, pruning of decision trees etc. For testing, the 10 fold cross validation is selected as this evaluation approach. Here, a no of tests are regulated for estimation of input variables: chi square test, information gain test and gain ratio test. Each of the tests makes the concernment of variable in another way. According to this analysis, among three selected best algorithms, the best algorithm is Naïve Bayes classification.

Study of factors Analysis Affecting Academic Achievement of Undergraduate Students in International Program
In this paper, Pimpa Cheewaprakobkit has done analysis to identify the weak students so that the academic performance of those weak students can be improved. In this study, WEKA is used to evaluate the visible features for predicting the student's academic performance. In this study, data set to characterize classifier (decision tree, neural network). To predict the accuracy, a cross validation with 10 folds is used. In this study, to explore the proposal, two classification algorithms have been accepted and distinguished: The Neural Network and C4.5 decision tree algorithm. The investigation process consists of three main steps: data preprocessing, attribute filtering and classification rules. According to this analysis, it is suggested the decision tree model is more accurate than the neural network model. It can be concluded that the decision tree technique has better efficiency data classification for this data set.

Predicting Student's Performance Using Modified ID3 Algorithm
Ramanathan L. has overcome the shortcoming of famous algorithm ID3. This algorithm is used to generate the decision trees. In this analysis, instead of information gain, the gain ratio is used.
One additional aspect of this study is assignment of weights to each attribute at every decision point. In this paper, in place of traditional ID3 algorithm, a modified ID3 algorithm is used. This modified ID3 algorithm is known as weighted ID3 algorithm. To enhance the normalization, gain ration is more beneficial as compared to information gain. To get a new value, gain ratio is multiplied with the weight and among these new values, the attribute having maximum gain ratio will be elected as node of the tree. Here, WEKA tool is used to analyze the J48 and naïve Bayes algorithm. The modified weighted ID3 algorithm is based on gain ratio and the attributes should be converted by accounting its weight. As per analysis, it is concluded that WID3 algorithm is more efficient than other two algorithms J48 and Naïve Bayes algorithm.

Analysis and Predictions on Student's Behaviour Using Decision Trees in WEKA Environment
In this paper, Vasile Paul Bresfelean has worked on data accumulated by different surveys. it is necessary to identify the different conducts of the student's belonging to different specializations. In this paper, the author develops a progression of decision trees based on WEKA's implemented J48 algorithm. In this effort, to discriminate and predict the student's choice in continuing their education. WEKA workbenches applied in this research two of the most common decision tree algorithms are implemented: ID3 and C4.5 (called version J48). In this study, author used J48 because as compared to ID3, J48 gives better result in any circumstances.

Conclusion
In this paper, we discussed how the use of data mining techniques on educational data that can be proved a useful strategic tool for the administration of higher educational institutions for addressing and crucial challenge of improving the quality of educational processes. This paper demonstrates the data mining techniques used in educational domain and also gives a brief description about the Educational mining, goals of educational mining and phases of educational mining and existing classification techniques.
In this paper, we did the comparative study of different Education Mining Techniques with their algorithms on educational data sets using Weka tool. We also did the comparative analysis on the basis of accuracy percentage. It is difficult to say that which technique of education data mining is best because each technique has its own advantage and limitations and it also depend upon the purpose for which educational data is to be mined.
Various classification techniques can be implemented on the data set but which classification technique will be applied on the data to improve the academic performance of students, it is important. According to the analysis which had been carried out on different educational mining techniques, we can say that in Classification technique C5.0, Naïve Bayes Classification is the best algorithm in performance and in Clustering Technique K-Mean clustering algorithm is best algorithm or in Association Rule Technique Apriori algorithm is best and more accurate as compared to other algorithms.