# DETECTING OUTLIER IN THE MULTIVARIATE DISTRIBUTION USING PRINCIPAL COMPONENTS

## Authors

• Aldwin M. Teves Institute of Arts and Sciences, Southern Leyte State University, Sogod, Southern Leyte, Philippines

## Keywords:

Outliers, Principal Components, Eigenvalues, Proximity, Multivariate Distribution

## Abstract

It is crucial to make inference out of the data at hand. It makes sense to discard spurious observations prior to application of statistical analysis. This study advances a procedure of determining outliers based from the principal components of the original variables. These variables are sorted and given weights based on the magnitude of their inner product with the principal components formulated from the centered and scaled variables. The weights are the corresponding variances explained by the principal components. The measure of proximity among observations is proportionate to the variance (eigenvalues) associated with the principal components. The methodology defines two distinct subintervals where the suspected outliers settle in one of these subintervals based on the proximity measures δo. On the merit of simulated data, the procedure detected 100 percent when the outliers are coming from distinct distribution. On the other hand, the procedure detected 98.7 per cent when the distribution of outliers have equal variance-covariance matrix with the outlier-free distribution and a slight difference in the vector of means.

## References

Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, (2nd Ed.) N.Y.: Wiley https://doi.org/10.2307/2531310

Bock, R. D. (1975). Multivariate Statistical Methods in Behavioral Research, N.Y.: McGraw Hill. https://www.worldcat.org/title/902933

Carroll, J. D., Green, P. E. & Chaturvedi, A. (1997). Mathematical Tools for Applied Multivariate Analysis. (2nd ed.) N.Y.: Academic Press https://www.elsevier.com/books/mathematical-tools-for-applied-multivariate-analysis/chaturvedi/978-0-08-091723-8

Dillon, W. R., & Goldstein, M. (1984). Multivariate Analysis: Methods and Applications. N. Y.: Wiley. https://lib.ugent.be/catalog/rug01:000081114

Flury, B. (1997). A First Course in Multivariate Statistics. N.Y.: Springer https://doi.org/10.1007/978-1-4757-2765-4

Gifi, A. (1990, 2nd Ed.). Nonlinear Multivariate Amalysis. Chichester: Wiley

Gnanadesikan, R. (1997, 2nd Ed.). Methods for Statistical Data Analysis of Multivariate Observations, N.Y.: Wiley. https://doi.org/10.1002/9781118032671

Kendall, M. G. (1980). Multivariate Analysis. (2nd ed.), London: Griffin https://www.oeaw.ac.at/resources/Record/990001631300504498

Ronald E. Walpole (2002 3rd Ed.). Introduction to Statistics. Pearson Education, Asia Pvt. Limited. https://cir.nii.ac.jp/crid/1130282269988619520

Santos-Pereira, C.M. and Pires, A.M. (2002). Detection of Outliers in Multivariate Data: A Method Based on Clustering and Robust Estimators. Technical University of Lisbon Portugal. https://doi.org/10.1007/978-3-642-57489-4_41

Simon, M.K. (2006). Probability Distributions Involving Gaussian Random Variables. A Handbook for Engineers, Scientists and Mathematicians. Springer. https://link.springer.com/book/10.1007/978-0-387-47694-0

Scheaffer, R.L. and Young, L.J. (2010, 3rd Ed). Introduction to Probability and Its Application. Brooks/Cole CENGAGE Learning. International Edition. https://www.amazon.in/Introduction-Probability-Applications-Richard-Scheaffer/dp/0534386717

Snedecor, George.W. and William G. Cochran (1980 7th Edition). Statistical Methods 1980. The Iowa State University Press, USA. https://www.scirp.org/(S(oyulxb452alnt1aej1nfow45))/reference/ReferencesPapers.aspx?ReferenceID=1896667

Staudte, R.G. and Simon J. Sheather (1990). Robust Estimation and Testing. A Wiley- Interscience Publication. John Wiley & Sons, Incorporated. https://doi.org/10.1002/9781118165485

Teves, A. M. (2017). Test of Homogeneity of based on geometric mean of variances. 306, 3(2), September 06. https://doi.org/10.20319/pijss.2017.32.306316

Teves, Aldwin M. and Diola, A.C. Relative Efficiency of Linear Probability Model on Paired Multivariate Data. Journal of Positive School Psychology, Vol. 6, No. 3, 6140-6146. https://www.journalppw.com/index.php/jpsp/article/view/3520

Walpole, Ronald E. (2002 3rd Ed.). Introduction to Statistics. Pearson Education, Asia Pvt. Limited.

Walpole, Ronald E. (2011, 9th Ed.). Probability and Statistics for Engineers and Scientist. Pearson Education South Asia Pte Ltd. 23-25 First Lok Yang Road, Jurong, Singapore 629733.

2023-05-03

## How to Cite

Teves, A. M. (2023). DETECTING OUTLIER IN THE MULTIVARIATE DISTRIBUTION USING PRINCIPAL COMPONENTS. International Journal of Engineering Science Technologies, 7(2), 107–113. https://doi.org/10.29121/ijoest.v7.i2.2023.488

Articles