COMPARISON OF ALGORITHMS BASED ON ROUGH SET THEORY FOR A 3-CLASS CLASSIFICATION

There are various data mining techniques to handle with huge amount of data sets. Rough set based classification provides an opportunity in the efficiency of algorithms when dealing with larger datasets. The selection of eligible attributes by using an efficient rule set offers decision makers save time and cost. This paper presents the comparison of the performance of the rough set based algorithms: Johnson’ s, Genetic Algorithm and Dynamic reducts. The performance of algorithms is measured based on accuracy, AUC and standard error for a 3-class classification problem on training on test data sets. Based on the test data, the results showed that genetic algorithm overperformed the others.


Introduction
The rapid development of online platforms or availability of storing data is an emerging area for researchers to form and process the huge amount of data stacks.The growing volume of larger data sets has gained considerably attention among researchers.Therefore, data mining and its related approaches have become useful and valid for identifying the data patterns.One of the important criteria regarding to that is the attribute reduction.The reduction describes as an attribute set for generating efficient rule sets.In other words, relevant attributes are needed to be selected, called as attribute reduction.This is important for researchers as decision makers save time and cost by excluding attributes that do not contribute positively to the solution of the problem.Thus, creating the efficiency of the algorithms that can be the removal of negligible variables from the data set, is the following emerging area.
Cluster and regression analysis, neural network, fuzzy sets, Bayesian methods, machine learning can be included in the field of data mining theories and techniques.Kusiak (2006) generally described data mining techniques in two classes, descriptive and predictive [1].The first class included a model created by the training data such as in neural network and regression analysis.The second was creation of a number of models in the form of decision models such as machine learning algorithms.Rough set theory is a novel approach, proposed by Pawlak, for researchers in data mining to handle with vagueness in data patterns.Attribute reduction without losing the necessary information from the data set is one of the most capable approaches used for this purpose is offered by the Rough Set Theory [2].
The reduct generation or approximations to reduction generation in rough set theory was studied by many researchers.In this regards, Johnson (1974) provided a possible classification of optimization problems as to the behaviour of their approximation algorithms [3].An approximate approach for reduct computation that utilized a weighting mechanism to determine the significance of an attribute to be considered in the reduct was provided by Al-Radaideh in 2005 [4].To produce small reducts by a genetic algorithm with a greedy algorithm was offered by Wroblewski [5].Swiniarski and Skowron [6] and Zeng [7] provided algorithms to knowledge acquisition based on rough set and principal component analysis.Srivastava et al. [8] introduced Rough Support Vector Machine approach based on the hybridization of SVM and Rough Set Exploration System.It was applied to find reducts which then used to SVM to get better classification results.Yamany et.al [9] developed an innovative use of an intelligent optimisation method, namely the flower search algorithm (FSA), with rough sets for attribute reduction.FSA has robust search capabilities and can effectively find small attribute reducts based on a suitable definition of a fitness function that combines both classification accuracy and attribute set size.Experimental results proved competitive performance for FSA-based approach showing that FSA combined with rough sets forms a useful technique for the attribute reduction problem.
In this paper, we evaluate reduction algorithms based on rough set theory for efficient classification with a minimum set of attributes for real estate in Istanbul.The paper is structured as follows.In Section 2, rough set theory preliminaries are defined.Reduction algorithms such as Johnson' s, Genetic Algorithm and Dynamic reducts are explained in Section 3. The reduction algorithms are evaluated by using the same classifier which is the voting method.Then, the comparisons of reduction methods are given in Section 4. The last section concludes the paper.

Rough Set Theory Preliminaries
Rough Sets developed by Pawlak is a new approach for handling vagueness and uncertainty in certain data sets [2,10,11].Following Pawlak, the information system and indiscernibility relation, discernibility matrix and function are introduced in this section.

Definition 1: Information Systems and Decision Systems
A data set is represented as a table, where each row represents an object.Every column represents an attribute (an explanatory variable or a property) that can be measured for each object; the attribute may be also supplied by a human expert or the user.Such table is called an information system.Formally, an information system is a tuple where U is a non-empty finite set of objects called the universe and A is a non-empty finite set of attributes such that for every .The set is called the value set of .
Decision system is the table that includes the decision attribute with the objects and conditional attributes.The elements of A are called conditional attributes or simply conditions.Decision system defines where d is decision attribute .The decision attribute is categorical variable.In rough set theory, decision attribute is always in the last column of the table.A reduct of a decision system is any subset such that and for every While B ⊆ A and , if the subset of conditional attributes B maintains the indiscernibility relation, the attributes of set a may be omitted.Subsets that do not contain removable attributes are called reduced attribute sets.The core set of the decision system is defined as Where Red(B) is the set of all reducts of B.

Definition 3: Discernibility Matrix
The discernibility knowledge of the decision system is commonly recorded in a matrix called the discernibility matrix (DM).The DM is a symmetric matrix with entries defined as: of the DM includes all the attributes that discriminate between two objects and .

Definition 4: Discernibility Function
Discernibility function is a Boolean function that composed of variable corresponds to attribute .It represents as [11,12].

Johnson's Algorithm
Johnson's algorithm [3] is a heuristic algorithm using a greedy technique.The idea of Johnson's algorithm is that it always selects the attribute most frequently occurring in the clause.
The reduct B is generated by executing the algorithm outlined below, where denotes the set of sets corresponding to the discernibility function and denotes a weight for set S  that automagically gets computed from the data.
The algorithm is described as follows [14]: 1) Let .2) Let denote the attribute that maximizes , where the sum is taken over all sets S in  that contain .Currently, ties are resolved arbitrarily.
3) Add to B. 4) Remove all sets S from  that contain .

Genetic Algorithm
Vinterbo and Øhrn [14] described genetic algorithms for computing minimal hitting sets.The algorithm has support for both cost information and approximate solutions.The algorithm' s fitness function is described as follows: Where  is the set of sets corresponding to the discernibility function, the parameter defines a weighting between subset cost and hitting fraction, while is relevant in the case of approximate solutions.
The subsets B of A are found by an evolutionary search measured by , when a subset B has a hitting fraction of at least ε then it is saved in a list.The size of the list is arbitrary.The function cost specifies a penalty for an attribute (some attributes may be harder to collect) but it defaults to .If the minimal hitting set is returned.In this algorithm the support count is the same as in Johnson' s algorithm [15].

Dynamic Reducts
The dynamic reduction algorithm is a combination of normal reduct computation with resampling techniques [16,17].
The steps of algorithms are explained as follows: 1) Randomly sample a family of subsystems from , where each sub-systems and .2) From each sub-systems, including , compute a reduced attribute set using reduction rules.
3) Determine the most frequently generated reduced attribute set from the reduced attribute sets obtained in the previous step.
The reducts that occur the most often across sub-tables are in some sense the most "stable" [14].
After reduction algorithms based on rough set theory, the decision rules obtained as a result of the application of these algorithms are used to determine the classification performance of the algorithms.Voting method is used for classification.It is an ad hoc technique for rule-based classification.The process of voting is that the most obtained class value for each object as a result of voting is the decision class value.

Application
In this paper, the advertisements of real estate from on an online platform in which people can sell or buy also car, variety of goods and services were collected for Istanbul between 9 October-13 December 2018.The data set contains the sale prices of 250 real estate for residential purposes.Also the properties such as the number of rooms, age of building, number of floor, elevator, and bathroom were considered as explanatory variables and recorded for each real estate as well.One of the explanatory variables was the district of the real estate.The variables had 5 classes where each represented a different district of Istanbul.Hence the variable district was classified in five classes.The variables of garage and balcony were classified as 1 and 0 elsewhere.Also, the convenience point or amenities for a real estate is considered as an explanatory variable and was coded as 1 for yes, 0 elsewhere.The dependent variable is the price of real estate that was converted to a categorical variable.The determine the class intervals, housing unit prices for Turkey (₺/m2) in 2018 are used (EVDS, Data Central) [18].According to that, if the price was larger than 2315,17TL then it was classified as 2, if smaller than 2315,17TL and larger than 2118,52TL then it was classified as 1, and 0 elsewhere.
Data set is split as 70% for training and 30% for testing.All operations are calculated in ROSETTA software which developed based on rough set theory by Øhrn in 2001 [19].Firstly, reduction algorithms are applied.Then, the decision rules obtained by reduction algorithms are used to determine the classification performance of the algorithms.Voting method is applied for classification.The accuracy, standard error of accuracy and AUC values are compared for the performance of classification.AUC is a kind of measure of separability, also it tells how much the model is capable of separating the class.

Conclusion
In this paper, reduction algorithms based on rough set theory for efficient classification with a minimum set of attributes for real estate in Istanbul have been examined.The reduction algorithms were evaluated by using the same classifier: the voting method.The housing unit prices of real estate for sale in different districts of Istanbul obtained from an online web source and 250 real /www.granthaalayah.com©International Journal of Research -GRANTHAALAYAH[396]

Definition 2 :
Indiscernibility Relation Every subset of attributes B ⊆ A induces indiscernibility relation: For each subset of attributes B ⊆ A, if two objects are same values for the set of attributes B, they cannot be discerned from each other on the basis of the set of attributes B. For every , there is an equivalence class in the partition of U defined by .

Table 1 .
The reduction results of attribute reduction algorithms based on rough set theory are demonstrated in It shows the number of reducts, attributes in reducts, decision rules and accuracy of algorithms.With respect to the number of reducts, dynamic reducts applied maximum reduct number and the number of reducts have changed from 1 to 6 attributes.The number of decision rules obtained by reducts is 1166 for dynamic reducts.Success of Johnson's algorithm with maximum 4 attributes equals to success of genetic algorithm with maximum 5 attributes for training, however genetic algorithm performed a better performance with 81.33% among reduction algorithms for testing.

Table 1 :
Overall performance of attribute reduction algorithmsBased on the previous results given in Table1, the performance of standard voting classifier for each reduction is summarized

Table 2 .
Johnson and genetic algorithms have performed well as 98.8% whereas dynamic reducts has slightly worse performed as 84% for each class with respect to training accuracy.Also, genetic algorithm has performed better with respect to accuracy in testing (81.3%) than others.However, the smallest difference in accuracy for train and test data is obtained by using Dynamic reducts algorithm.

Table 2 :
Classificaion performance of reduction algorithms score for dynamic reducts (73.4%) means that a randomly chosen expensive instance assigned to class expensive is higher than being assigned to class moderate and cheap with probability 73.4%.Hence, this score is better than Johnson and genetic algorithms have for expensive class.AUC score of moderate class within all algorithms have performed considerably weak.The AUC score for moderate class is 0.551 that means a randomly chosen moderate instance assigned to this class is 55.1% than being assigned to class expensive and cheap class.The AUC score for moderate class in genetic algorithm indicates that the model separates the moderate class poorly than the others. AUC