ENHANCED SVM CLASSIFIER FOR BREAST CANCER DIAGNOSIS

: Breast cancer is the leading disease to cause death especially in women. In this paper, a frame work based algorithm for the classification of cancerous/non-cancerous data is developed using application of supervised machine learning. In feature selection, we derive basis set in the kernel space and then we extend the margin based feature selection algorithm. We are trying to explore several feature selection, extraction techniques and combine the optimal feature subsets with various learning classification methods such as KNN, PNN and Support Vector Machine (SVM) classifiers. The best classification performance for breast cancer diagnosis is attained equal to 99.17% between radius and compact features using SVM classifier. And also derive the features of a breast image in the WBCD dataset .


Introduction
Breast cancer (malignant breast neoplasm) is cancer originating from breast tissue.So, there is a need for a reliable and an objective classification tool for detecting and classifying the breast cancer cases namely benign and malignant.Machine learning and neural networks help us out by providing a better classification tool that is reliable and objective too.
A neural network consists of an interconnected group of artificial neurons, [9] and it processes information using a connectionist approach to computation.Modern neural networks are nonlinear statistical data modeling tools.
In that, Machine learning is an emerging approach to classifying and computing results which help technicians to take decisions in an environment of uncertainty and imprecision.The need of neural network is that unlike the traditional, hard computing ,it is aimed at an accommodation with the pervasive imprecision of the real world.Thus, the guiding principle of neural network is to exploit the tolerance for imprecision, uncertainty and partial truth to achieve tractability.The NN models can be classified according to various criteria, such as their learning methods, architecture, implementation, operation and so on.The scope of the project is to model problems with desire input output data sets, so the resulting network must have adjustable parameters that are updated by supervised learning rule.Under the category of supervised learning, perception is one kind of classifiers performing the classification in the two dimensional space.There is need of an algorithm which is capable of classify the candidates, which have the closest similarities.Support vector machine is such an algorithm which has direct bearing on machine intelligence.Support vector machine [24] can be used as a best classification tool for classifying any kind of dataset even a nonlinear one.In this project the dataset regarding the cancerous and noncancerous cases of breast cancer are been taken as input for training and classified in the feature space using a hyper plane and its classification performance is also calculated.

Materials and Methods
In the past, the feature selection was done by filter approach [7].It doesn't account the bias of induction algorithm.In earlier classification was done by KNN and PNN.In proposed system a novel approach for feature selection is, namely wrapper approach and for classification is SVM.Wrapper approach [6] use induction algorithm and handle complex datasets and it estimates by LOOCV.In SVM produce minimum error rates and reduce the misclassifications.

Feature Selection in Kernel Space
Step1: Constructing a basis set by either kernel GP (or) kernel PCA.
Step2: Calculating weight by kernel RELIEF.

Step3: Ranking features by weight
Step4: Select features based on the rank.

KNN (K-Nearest Neighbor)
KNN classifier is one of the simplest and oldest methods for performing general, nonparametric classification [4].In this model, the distances between the test sample and all the other samples in the training set is first measured.Then, the class of the test sample is assigned according to a simple majority vote over the labels of its K nearest neighbors.

PNN (Probabilistic Neural Network)
The Probabilistic Neural Network: PNN was proposed by Specht in 1988 [18].It is designed to improve the performance of conventional neural networks in which long computation times are required.PNN replaces the sigmoid activation function often used in neural networks with a statistically derived exponential function.The PNN is an extension of what is probably the simplest possible classifier i.e., find the training sample closest to the test sample and assign it the same class.A single PNN is capable of handling multiclass problem.This is opposite to the so-called one-against-the rest or one-per class approach taken by some classifiers, such as the SVM, which decompose a multiclass classification problem into dichotomies and each chotomizer has to separate a single class from all others [19].

Representation of Hyper Plane
The goals of SVM are separating the data with hyper plane and extend this to non-linear boundaries using kernel trick [15] .For calculating the SVM we see that the goal is to correctly classify all the data.For mathematical calculations we have,

Results and Discussions
For the feature selection process, whole medical data is separated into two halves for training and testing.This process has been done automatically by SVM classifier.After that SVM classification, we can obtain the classification performance result in percentage in figure 4, 5, 6 and 7.

Conclusions and Recommendations
We have done classification of many features using SVM and also obtain classification performance percentage.Compare all the features compact and radius has maximum [73] classification performance.So, these two features are considered as efficient features that have more relevant information about breast cancer.Feature selection process has been done.
In this work, FEATURE SELECTION in SVM is done by training and testing the WBC data and also finds the CLASSIFICATION PERFORMANCE for breast cancer diagnosis.Experimental results along with classification percentage shows that while comparing all features, the features that have highest classification performance percentage such as radius and compact are identified as the best features used for breast cancer diagnosis.This method of classification through machine learning algorithm say support vector machine algorithm is a reliable and efficient methodology and this can be used for any kind classification provided the reliable data sets are available.Not only in medical field, this kind of classification through support vector machine can be done in all developing and unpredictable fields like Stock Market exchange, Weather Conditions, Predicting Natural Calamities, Automobile MPG predictions and so on.The best classification performance for breast cancer diagnosis is attained equal to 99.17% between radius and compact features using SVM classifier

2 : 3 : 4 :
Calculating wi by Kernel Relief Ranking implicit features by wi, select features based on the rank Projecting the data into the learned subspace

2454-1907 DOI: 10.5281/zenodo.1207413 Http
Input: training data xi, label yi Output: selected features in the kernel space 1: Constructing a basis set by either Kernel GP or Kernel PCA ISSN: ://www.ijetmr.com©InternationalJournal of Engineering Technologies and Management Research [69]