Volume 10 , Issue 2 , PP: 75-85, 2023 | Cite this article as | XML | Html | PDF | Full Length Article
Omar A. abd Alwahab 1 *
Doi: https://doi.org/10.54216/FPA.100207
In this paper, the researcher discussed a developed approach to the detection of outliers that is suited to multivariate data fusion. The challenge in outlier detection when dealing with multivariate data it is the detection of the outlier with more than two dimensions. To address this issue, the researcher developed a method to detect anomalies using methods based on local density including comparing a specific observations density with the densities of its neighboring observations. To make such comparisons, the researcher often employs an outlier score. In this study, various density estimation functions and distance metrics were utilized. Nadaraya-Watson kernel regression for multivariate data considered the KNN with multivariate data. Finally, the estimate of the Volcano kernel method is an essential method for outliers detection. In the simulation experiments of multivariate data with (4,6,8) variables and (60,120,180) observations, the results of simulation experiments by using the criterion of the precision evaluation showed that the N-W method is better than the VOL method in outlier detection in multivariate data.
K-nearest neighbor , density of kernel function , outlier score , N-W regression , Volcano kernel method , data fusion.
[1] B. Tang, and H. He, “A local density-based approach for outlier detection,” Neurocomputing, Vol. 241, pp. 171–180, Jun. 2017.
[2] Devanshu Joshi , Rishabh Tater , Priya Yaday , Tripti Jain , Preeti Nagrath, Leukemia Cancer Detection Using Various Deep Learning Algorithms, Fusion: Practice and Applications, Vol. 9 , No. 1 , (2022) : 70-76 (Doi : https://doi.org/10.54216/FPA.090106)
[3] F. E.Grubbs, "Procedures for detecting outlying observations in samples." Technometrics Vol. 11, no.1, pp 1-21. 1969
[4] Fan, H., Zaïane, Foss, O. R., A., and J. Wu, “A nonparametric outlier detection for effectively discovering top-n outliers from engineering data,”. In Pacific-Asia conference on knowledge discovery and data mining Springer, Berlin, Heidelberg, pp. 557-566, 2006.
[5] Gao, J., Hu, W., Li, W., Zhang, Z., and O. Wu, “Local outlier detection based on kernel regression,” In 2010 20th International Conference on Pattern Recognition, pp. 585-588, IEEE. 2010 .
[6] Gao, J., Hu, W., Zhang, Z. M., Zhang, X., and O. Wu, “RKOF: robust kernel-based local outlier detection,” In Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp.270-283, 2011.
[7] Hawkins D M, “Identification of Outliers”, Chapman and Hall., London, Vol 11, 1980.
[8] Hu, W., Gao, J., Li, B., Wu, O., Du, J., & Maybank, S.. Anomaly detection using local kernel density estimation and context-based regression. IEEE Transactions on Knowledge and Data Engineering, 32(2), 218-233, 2018.
[9] Latecki, L.J. Lazarevic, A. and D. Pokrajac, “Outlier detection with kernel density functions,” in Proc. of International Conference on Machine Learning and Data Mining in Pattern Recognition, pp. 61 -75, 2007.
[10] L. Zhang, J. Lin, and R. Karim, “Adaptive kernel density-based anomaly detection for nonlinear systems,” Knowledge-Based Systems, Vol. 139, pp. 50–63, 2018.
[11] Manly, Bryan FJ, and Jorge A. Navarro Alberto. Multivariate statistical methods: a primer. Chapman and Hall/CRC, 2016.
[12] Noura Metawa, Maha Mutawea, Multi-objective Decision Making Model for Stock Price Prediction Using Multi-source Heterogeneous Data Fusion, Fusion: Practice and Applications, Vol. 9 , No. 1 , (2022) : 59-69 (Doi : https://doi.org/10.54216/FPA.090105)
[13] Papadimitriou, S., Kitagawa, H., Gibbons, P. B., & C. Faloutsos, “Loci: Fast outlier detection using the local correlation integral,” Proceedings 19th international conference on data engineering., Cat. No. 03CH37405, pp. 315-326, IEEE, 2003.
[14] S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for mining outliers from large data sets,” ACM Sigmod Record, Vol. 29, no. 2, pp. 427–438, May 2000.
[15] V. A. Epanechnikov, “Non-parametric estimation of a multivariate probability density,” Theory of Probability & Its Applications, Vol. 14, no. 1, pp. 153–158, 1969.
[16] W. Jin, A. K. H. Tung, J. Han, and W. Wang, “Ranking outliers using symmetric neighborhood relationship,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Berlin, pp. 577–593, 2006.
[17] X. Xu, H. Liu, L. Li, and M. Yao, “A comparison of outlier detection techniques for high-dimensional data,” International Journal of Computational Intelligence Systems, Vol.11, No. 1, pp. 652 -662, 2018.