DETECTING OUTLIER IN THE MULTIVARIATE DISTRIBUTION USING PRINCIPAL COMPONENTS
Keywords:Outliers, Principal Components, Eigenvalues, Proximity, Multivariate Distribution
It is crucial to make inference out of the data at hand. It makes sense to discard spurious observations prior to application of statistical analysis. This study advances a procedure of determining outliers based from the principal components of the original variables. These variables are sorted and given weights based on the magnitude of their inner product with the principal components formulated from the centered and scaled variables. The weights are the corresponding variances explained by the principal components. The measure of proximity among observations is proportionate to the variance (eigenvalues) associated with the principal components. The methodology defines two distinct subintervals where the suspected outliers settle in one of these subintervals based on the proximity measures δo. On the merit of simulated data, the procedure detected 100 percent when the outliers are coming from distinct distribution. On the other hand, the procedure detected 98.7 per cent when the distribution of outliers have equal variance-covariance matrix with the outlier-free distribution and a slight difference in the vector of means.
Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, (2nd Ed.) N.Y.: Wiley https://doi.org/10.2307/2531310
Bock, R. D. (1975). Multivariate Statistical Methods in Behavioral Research, N.Y.: McGraw Hill. https://www.worldcat.org/title/902933
Carroll, J. D., Green, P. E. & Chaturvedi, A. (1997). Mathematical Tools for Applied Multivariate Analysis. (2nd ed.) N.Y.: Academic Press https://www.elsevier.com/books/mathematical-tools-for-applied-multivariate-analysis/chaturvedi/978-0-08-091723-8
Dillon, W. R., & Goldstein, M. (1984). Multivariate Analysis: Methods and Applications. N. Y.: Wiley. https://lib.ugent.be/catalog/rug01:000081114
Flury, B. (1997). A First Course in Multivariate Statistics. N.Y.: Springer https://doi.org/10.1007/978-1-4757-2765-4
Gifi, A. (1990, 2nd Ed.). Nonlinear Multivariate Amalysis. Chichester: Wiley
Gnanadesikan, R. (1997, 2nd Ed.). Methods for Statistical Data Analysis of Multivariate Observations, N.Y.: Wiley. https://doi.org/10.1002/9781118032671
Kendall, M. G. (1980). Multivariate Analysis. (2nd ed.), London: Griffin https://www.oeaw.ac.at/resources/Record/990001631300504498
Ronald E. Walpole (2002 3rd Ed.). Introduction to Statistics. Pearson Education, Asia Pvt. Limited. https://cir.nii.ac.jp/crid/1130282269988619520
Santos-Pereira, C.M. and Pires, A.M. (2002). Detection of Outliers in Multivariate Data: A Method Based on Clustering and Robust Estimators. Technical University of Lisbon Portugal. https://doi.org/10.1007/978-3-642-57489-4_41
Simon, M.K. (2006). Probability Distributions Involving Gaussian Random Variables. A Handbook for Engineers, Scientists and Mathematicians. Springer. https://link.springer.com/book/10.1007/978-0-387-47694-0
Scheaffer, R.L. and Young, L.J. (2010, 3rd Ed). Introduction to Probability and Its Application. Brooks/Cole CENGAGE Learning. International Edition. https://www.amazon.in/Introduction-Probability-Applications-Richard-Scheaffer/dp/0534386717
Snedecor, George.W. and William G. Cochran (1980 7th Edition). Statistical Methods 1980. The Iowa State University Press, USA. https://www.scirp.org/(S(oyulxb452alnt1aej1nfow45))/reference/ReferencesPapers.aspx?ReferenceID=1896667
Staudte, R.G. and Simon J. Sheather (1990). Robust Estimation and Testing. A Wiley- Interscience Publication. John Wiley & Sons, Incorporated. https://doi.org/10.1002/9781118165485
Teves, A. M. (2017). Test of Homogeneity of based on geometric mean of variances. 306, 3(2), September 06. https://doi.org/10.20319/pijss.2017.32.306316
Teves, Aldwin M. and Diola, A.C. Relative Efficiency of Linear Probability Model on Paired Multivariate Data. Journal of Positive School Psychology, Vol. 6, No. 3, 6140-6146. https://www.journalppw.com/index.php/jpsp/article/view/3520
Walpole, Ronald E. (2002 3rd Ed.). Introduction to Statistics. Pearson Education, Asia Pvt. Limited.
Walpole, Ronald E. (2011, 9th Ed.). Probability and Statistics for Engineers and Scientist. Pearson Education South Asia Pte Ltd. 23-25 First Lok Yang Road, Jurong, Singapore 629733.
How to Cite
Copyright (c) 2023 Aldwin M. Teves
This work is licensed under a Creative Commons Attribution 4.0 International License.