Robust Principal Component Analysis in Multivariate Applications
Viewed = 0 time(s)
Abstract
Principal component analysis (PCA) is one of the multivariate methods that aims to construct components, each of which contains a maximal amount of variation from the data unexplained by the other components. Classical PCA (CPCA) is based on the classical covariance, which is easily affected by outliers. Additionally, the outliers may distort the PCA result. Robust statistics is one of the methods to overcome the outliers. Therefore, in this study, an alternative technique for robust principal component analysis (RPCA) is presented. Index Set Equality (ISE) is used as a robust estimator to robustify the CPCA and hence produce robust PCA (RPCA). The Hawkins-Bradu Kass dataset is used to illustrate the robustness of RPCA towards outliers compared to CPCA. The score plot, diagnostic plot, and performance measures are used to evaluate and illustrate the robustness of the RPCA. From the plots and performance measures, RPCA successfully identifies all outliers and is unaffected by masking and swamping effects. However, CPCA can only detect 0.2857 outliers, has 0.7143 masking effects, but does not have swamping effects. This study shows that RPCA, by using ISE as the robust estimator is a promising approach.
Downloads
References
Abd Mutalib, S. S. S., Satari, S. Z., & Yusoff, W. N. S. W. (2023). A New Single Linkage Robust Clustering Outlier Detection Procedures for Multivariate Data. Sains Malaysiana, 52(8), 2431–2451.
Affindi, A. N., Ahmad, S., & Mohamad, M. (2019). A Comparative Study between Ridge MM and Ridge Least Trimmed Squares Estimators in Handling Multicollinearity and Outliers. Journal of Physics: Conference Series, 1366(1), 012113.
Atkinson, A. C., & Mulira, H. M. (1993). The stalactite plot for the detection of multivariate outliers. Statistics and Computing, 3(1), 27–35.
Chen, X., Zhang, B., Wang, T., Bonni, A., & Zhao, G. (2020). Robust principal component analysis for accurate outlier sample detection in RNA-Seq data. BMC Bioinformatics, 21(1), 1–20.
Devlin, S. J., Gnanadesikan, R., & Kettenring, J. R. (1981). Robust estimation of dispersion matrices and principal components. Journal of the American Statistical Association, 76(374), 354–362.
Filzmoser, P., & Todorov, V. (2013). Robust tools for the imperfect world. Information Sciences, 245, 4–20.
Hubert, M., & Engelen, S. (2004). Robust PCA and classification in biosciences. Bioinformatics, 20(11), 1728–1736.
Hubert, M., Rousseeuw, P. J., & Branden, K. Vanden. (2005). ROBPCA: A new approach to robust principal component analysis. Technometrics, 47(1), 64–79.
Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis (Fifth Edition). New Jersey: Hall.
Lim, H. A., & Midi, H. (2016). Diagnostic Robust Generalized Potential based on Index Set Equality (DRGP (ISE)) for the identification of high leverage points in linear model. Computational Statistics, 31(3), 859–877.
Maronna, R. A., & Yohai, V. J. (2017). Robust and efficient estimation of multivariate scatter and location. Computational Statistics and Data Analysis, 109, 64–75.
Midi, H., Hendi, H. T., Arasan, J., & Uraibi, H. (2020). Fast and robust diagnostic technique for the detection of high leverage points. Pertanika Journal of Science and Technology, 28(4), 1203–1220.
Pan, J.-X., Fung, W.-K., & Fang, K.-T. (2000). Multiple outlier detection in multivariate data using projection pursuit techniques. Journal of Statistical Planning and Inference, 83(1), 153–167.
Rencher, A. C. (2002). Methods of Multivariate Analysis. A John Wiley & Sons, Inc. Publication.
Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. Mathematical Statistics and Applications, 8, 283–297.
Rousseeuw, P. J., & Van Driessen, K. (1999). A fast algorithm for the Minimum Covariance Determinant estimator. Technometrics, 41(3), 212–223.
Rousseeuw, P., & Yohai, V. (1984). Robust regression by means of S-estimators. In Robust and Nonlinear Time Series Analysis (pp. 256–272).
Salleh, R. M. (2013). A robust estimation method of location and scale with application in monitoring process variability [Universiti Teknologi Malaysia].
Salleh, R. M., & Djauhari, M. A. (2011). Robust Hotelling’s T2 Control Charting in Spike Production Process. International Seminar on the Application of Science & Mathematics 2011 (ISASM 2011), 1–8.
Todorov, V., & Filzmoser, P. (2009). An object-oriented framework for robust multivariate analysis. Journal of Statistical Software, 32(3), 1–47.
Wang, Y., & Pham, H. (2011). Analyzing the effects of air pollution and mortality by generalized additive models with robust principal components. International Journal of System Assurance Engineering and Management, 2(3), 253–259.
Zahariah, S., & Midi, H. (2023). Minimum regularized covariance determinant and principal component analysis-based method for the identification of high leverage points in high dimensional sparse data. Journal of Applied Statistics, 50(13), 2817–2835.
Zulkipli, N. S., Satari, S. Z., & Wan Yusoff, W. S. (2022). The effect of different similarity distance Measures in Detecting Outliers Using Single-Linkage Clustering Algorithm for Univariate Circular Biological Data. Pakistan Journal of Statistics and Operation Research, 18(3), 561–573.
Copyright (c) 2025 Sharifah Sakinah Syed Abd. Mutalib, Wan Nur Syahidah Wan Yusoff, Angelina Prima Kurniati, Nurul Aida Osman, Zulhelmy (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
https://doi.org/10.35877/454RI.asci3948



