PERFORMANCE VALIDATION OF PRIOR QUANTIZATION TECHNIQUES IN OUTLIERS CLASSIFICATION USING WDBC DATASET

  • Dr.D.Rajakumari Assistant Professor, Department of Computer Science, Nandha Arts and Science College, Erode, Tamil Nadu, India
Keywords: Data Mining, Classification, Outlier Detection, Feature Selection, Wisconsin Diagnosis Breast Cancer (WDBC)

Abstract

Data mining is the process of analyzing enormous data and summarizing it into the useful knowledge discovery and the task of data mining approaches is growing quickly, particularly classification techniques very efficient, way to classifying the data, which is important in the decision-making process for medical practitioners. This study presents the quantization and validation (OQV) techniques for fast outlier detection in large size WDBC data sets. The distance metrics utilization makes the algorithm as the linear one for various objects and assures the sequential scanning. The inclusion of direct quantization technique and the cluster explicit discovery assures the simplicity and the economical. The comparative analysis of proposed OQV techniques with the triangular boundary-based classification and the Weighing-based Feature Selection and Monotonic Classification (WFSMC) regarding the accuracy, precision, recall and the number of attributes assures an effectiveness of OQV for large size datasets.

Downloads

Download data is not yet available.

References

C. C. Aggarwal, "Supervised outlier detection," in Outlier Analysis, ed: Springer, 2013, pp. 169- 198. DOI: https://doi.org/10.1007/978-1-4614-6396-2_6

K. Noto, C. Brodley, and D. Slonim, "FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection," Data mining and knowledge discovery, vol. 25, pp. 109-133, 2012. DOI: https://doi.org/10.1007/s10618-011-0234-x

A. Daneshpazhouh and A. Sami, "Entropy-based outlier detection using semi-supervised approach with few positive examples," Pattern Recognition Letters, vol. 49, pp. 77-84, 2014. DOI: https://doi.org/10.1016/j.patrec.2014.06.012

M. Radovanovic, A. Nanopoulos, and M. Ivanovic, "Reverse nearest neighbors in unsupervised distance-based outlier detection," IEEE Transactions on Knowledge and Data Engineering, vol. 27, pp. 1369-1382, 2015.

F. Angiulli, S. Basta, S. Lodi, and C. Sartori, "Distributed strategies for mining outliers in large data sets," IEEE Transactions on Knowledge and Data Engineering, vol. 25, pp. 1520-1532, 2013.

N. Pham and R. Pagh, "A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data," in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, pp. 877-885. DOI: https://doi.org/10.1145/2339530.2339669

N. Sharma and H. Om, "Usage of Probabilistic and General Regression Neural Network for Early Detection and Prevention of Oral Cancer," The Scientific World Journal, vol. 2015, 2015. DOI: https://doi.org/10.1155/2015/234191

S. G. Jacob and R. G. Ramani, "Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data," International Journal of Computer Applications (IJCA), vol. 32, pp. 46-53, 2011.

B. Kumari and T. Swarnkar, "Filter versus wrapper feature subset selection in large dimensionality microarray: A review," 2011.

Published
2018-04-30
How to Cite
Rajakumari, D. (2018). PERFORMANCE VALIDATION OF PRIOR QUANTIZATION TECHNIQUES IN OUTLIERS CLASSIFICATION USING WDBC DATASET . International Journal of Engineering Technologies and Management Research, 5(4), 48-56. https://doi.org/10.29121/ijetmr.v5.i4.2018.207