Fusion: Practice and Applications

Journal DOI

https://doi.org/10.54216/FPA

Submit Your Paper

2692-4048ISSN (Online) 2770-0070ISSN (Print)

Volume 18 , Issue 1 , PP: 269-287, 2025 | Cite this article as | XML | Html | PDF | Full Length Article

Fusion Data Framework for Enhanced Outlier Detection Integrating Statistical and Machine Learning Techniques for Retail Analytics

Botirjon Karimov 1 * , Murodjon Sultanov 2 , Jasurbek Nematullaev 3

  • 1 University of Tasmania, Hobart city, Australia - (Botirjon.Karimov@utas.edu.au)
  • 2 Tashkent State University of Economics, Uzbekistan, 100066, Tashkent city, Islam Karimov st. 49 - (murodkhan.sultanov.1987@gmail.com)
  • 3 University of Liverpool, Liverpool city, UK - (jasurbecknematullaev@gmail.com)
  • Doi: https://doi.org/10.54216/FPA.180119

    Received: July 17, 2024 Revised: October 19, 2024 Accepted: January 02, 2025
    Abstract

    This paper aims at presenting an overview of the most popular outlier detection methods that can be used in the retail sector to solve such important problems as fraud, inventory issues, and untypical customer behavior. The techniques discussed in this paper include the conventional statistical methods such as Z-score, Mahalanobis Distance, and Elliptic Envelope and the advanced machine learning methods such as Local Outlier Factor (LOF), Isolation Forest, and DBSCAN. Each method is discussed in detail and the advantages and disadvantages of each are evaluated in relation to different retail scenarios. The primary contribution of this study is the new approach to use Artificial Neural Networks (ANN) for tuning contamination parameters in the Elliptic Envelope model, which makes the anomaly detection more accurate and efficient. Furthermore, the study also depicts the application of min-max scaling for normalizing the features where it helps in reducing the effect of outliers and thus improves the model performance. The results show that the integration of the statistical and machine learning methods is very useful for the real-time detection of anomalies particularly in the ever-changing environment of the retail industry. This research presents a practical insight and new methodological approaches that may be useful for researchers and practitioners who develop outlier detection systems. The outcomes of this study have the potential of enhancing data fusion quality, workflow, and decision-making in the context of retailing.

    Keywords :

    Data fusion , Retail , Outlier detection , Z-score , Elliptic Envelope , Local Outlier Factor , Isolation Forest , DBSCAN , Mahalanobis Distance

    References

    [1] C. Lartey, J. Liu, R. K. Asamoah, C. Greet, M. Zanin, and W. Skinner, "Effective Outlier Detection for Ensuring Data Quality in Flotation Data Modelling Using Machine Learning (ML) Algorithms," Minerals, vol. 14, no. 9, pp. 925, 2024. DOI: https://doi.org/10.3390/min14090925.

    [2] O. Alghushairy, R. Alsini, T. Soule, and X. Ma, "A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams," Big Data and Cognitive Computing, vol. 5, no. 1, pp. 1, 2021. DOI: https://doi.org/10.3390/bdcc5010001.

    [3] M. Olteanu, F. Rossi, and F. Yger, "A systematic meta-survey on outlier and anomaly detection," Neurocomputing, 2023. Available: https://link.springer.com/article/10.1007/s41060-021-00265-1.

    [4] Lahav, R. Talmon, and Y. Kluger, "Mahalanobis Distance Informed by Clustering," Information and Inference: A Journal of the IMA, vol. 8, no. 2, pp. 377–406, 2018. DOI: https://doi.org/10.1093/imaiai/iay011.

    [5] T. Ouyang, W. Pedrycz, and N. J. Pizzi, "Record Linkage Based on a Three-Way Decision with the Use of Granular Descriptors," Expert Systems with Applications, vol. 122, pp. 16–26, 2019.

    [6] B. B. Torres, J. A. Filho, A. R. da Rocha, R. S. Gondim, and J. N. de Souza, "Outlier Detection Methods and Sensor Data Fusion for Precision Agriculture," in Anais - XXXVII Congresso da Sociedade Brasileira de Computação, 2017. DOI: https://doi.org/10.5753/sbcup.2017.3316.

    [7] D. Hawkins, Identification of Outliers, Monographs on Applied Probability and Statistics. Dordrecht: Springer, 1980. DOI: http://dx.doi.org/10.1007/978-94-015-3994-4.

    [8] M. Markou and S. Singh, "Novelty Detection: A Review—Part 1: Statistical Approaches," Signal Processing, vol. 83, no. 12, pp. 2481–2497, 2003. DOI: http://dx.doi.org/10.1016/j.sigpro.2003.07.018.

    [9] M. Olteanu, F. Rossi, and F. Yger, "Challenges in Anomaly and Change Point Detection," in 30th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2022), Bruges, Belgium, 2022, pp. 277–286. DOI: http://dx.doi.org/10.14428/esann/2022.ES2022-6.

    [10] C.-U. Yeom and K.-C. Kwak, "A Design and Optimization of a CGK-Based Fuzzy Granular Model Based on the Generation of Rational Information Granules," Applied Sciences, vol. 12, no. 7226, 2022. DOI: https://doi.org/10.3390/app12147226.

    [11] D. Lahat, T. Adali, and C. Jutten, "Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects," Proceedings of the IEEE, vol. 103, no. 9, pp. 1449–1477, Sept. 2015. DOI: 10.1109/JPROC.2015.2460697.

    [12] J. Verstraete, F. Acar, G. Concilio, and P. Pucci, "Turning Data into Actionable Policy Insights," in The Data Shake, G. Concilio et al., Eds. Cham: Springer, 2021, pp. 123–132. DOI: https://doi.org/10.1007/978-3-030-63693-7_6.

    [13] E. Roszkowska, "Modifying Hellwig’s Method for Multi-Criteria Decision-Making with Mahalanobis Distance for Addressing Asymmetrical Relationships," Symmetry, vol. 16, no. 1, pp. 77, 2024. DOI: https://doi.org/10.3390/sym16010077.

    [14] R. E. Kondo et al., "Data Fusion for Industry 4.0: General Concepts and Applications," in Proceedings on 25th International Joint Conference on Industrial Engineering and Operations Management (IJCIEOM 2019). Cham: Springer, 2020, pp. 345–356. DOI: https://doi.org/10.1007/978-3-030-43616-2_38.

    [15] Abidov et al., "Analytical Model for Assessing the Reliability of the Functioning of the Adaptive Switching Node," in Internet of Things, Smart Spaces, and Next Generation Networks and Systems (NEW2AN 2022), Cham: Springer, 2023, pp. 46–56. DOI: https://doi.org/10.1007/978-3-031-30258-9_5.

    [16] M. Sultanov, I. Abdullayeva, and B. Karimov, "A Novel Fusion Method for Enhanced Multi-Criteria Decision-Making in Energy Management," Fusion: Practice and Applications, vol. 15, no. 2, pp. 298–312, 2024. DOI: https://doi.org/10.54216/FPA.150225.

    [17] G. Belalova, S. Mannanova, and B. Karimov, "The Future of Bitcoin Price Predictions Integrating Deep Learning and the Hybrid Model Method," in Proceedings of the 7th International Conference on Future Networks and Distributed Systems (ICFNDS '23). New York: ACM, 2024, pp. 202–211. DOI: https://doi.org/10.1145/3644713.3644739.

    [18] Khusanboev, I. Yodgorov, and B. Karimov, "Advancing Electric Vehicle Adoption: Insights from Predictive Analytics and Market Trends in Sustainable Transportation," in Proceedings of the 7th

    International Conference on Future Networks and Distributed Systems (ICFNDS '23). New York: ACM, 2024, pp. 314–320. DOI: https://doi.org/10.1145/3644713.3644754.

    [19] H. Yao, X. Fu, Y. Yang, and O. Postolache, "An Incremental Local Outlier Detection Method in the Data Stream," Applied Sciences, vol. 8, no. 1248, 2018. DOI: https://doi.org/10.3390/app8081248.

    [20] R. Nasimov, N. Nasimova, B. Karimov, and M. Abdullayev, "Deep Learning Algorithm for Classifying Dilated Cardiomyopathy and Hypertrophic Cardiomyopathy in Transport Workers," in Internet of Things, Smart Spaces, and Next Generation Networks and Systems (NEW2AN 2022), Cham: Springer, 2023, pp. 289–302. DOI: https://doi.org/10.1007/978-3-031-30258-9_19.

    [21] Y. Zhang and S. Kim, "Gaussian Graphical Model Estimation and Selection for High-Dimensional Incomplete Data Using Multiple Imputation and Horseshoe Estimators," Mathematics, vol. 12, no. 1837, 2024. DOI: https://doi.org/10.3390/math12121837.

    [22] D. Ribeiro, L. M. Matos, G. Moreira, A. Pilastri, and P. Cortez, "Isolation Forests and Deep Autoencoders for Industrial Screw Tightening Anomaly Detection," Computers, vol. 11, no. 54, 2022. DOI: https://doi.org/10.3390/computers11040054.

    [23] S. Lee et al., "Grid-Based DBSCAN Clustering Accelerator for LiDAR’s Point Cloud," Electronics, vol. 13, no. 3395, 2024. DOI: https://doi.org/10.3390/electronics13173395.

    [24] H. M. Hammouri, R. T. Sabo, R. Alsaadawi, and K. A. Kheirallah, "Handling Skewed Data: A Comparison of Two Popular Methods," Applied Sciences, vol. 10, no. 6247, 2020. DOI: https://doi.org/10.3390/app10186247.

    [25] E. I. Altman, "Applications of Distress Prediction Models: What Have We Learned After 50 Years from the Z-Score Models?" International Journal of Financial Studies, vol. 6, no. 70, 2018. DOI: https://doi.org/10.3390/ijfs6030070.

    [26] S. Mandić-Rajčević and C. Colosio, "Methods for the Identification of Outliers and Their Influence on Exposure Assessment in Agricultural Pesticide Applicators," Toxics, vol. 7, no. 37, 2019. DOI: https://doi.org/10.3390/toxics7030037.

    [27] Wikipedia contributors, "Prasanta Chandra Mahalanobis," Wikipedia. Available: https://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis.

    [28] S. Vladov, V. Vysotska, V. Sokurenko, O. Muzychuk, M. Nazarkevych, and V. Lytvyn, "Neural Network System for Predicting Anomalous Data in Applied Sensor Systems," Applied System Innovation, vol. 7, no. 88, 2024. DOI: https://doi.org/10.3390/asi7050088.

    [29] S. R. Moosavi, A. Boloorforoosh, and F. R. Salim, "A Hybrid Outlier Detection Model Combining Isolation Forest and Autoencoders for IoT Data Streams," Sensors, vol. 23, no. 14, pp. 6485, 2023. DOI: https://doi.org/10.3390/s23146485.

    [30] X. Liu, J. Sun, W. Song, and S. Ma, "A Comprehensive Review of Anomaly Detection Techniques Using AI in Smart Cities," Future Internet, vol. 13, no. 4, pp. 85, 2021. DOI: https://doi.org/10.3390/fi13040085.

    [31] H. Zaw, T. W. Wong, and T. Lau, "A Machine Learning Approach to Anomaly Detection in Time-Series Data," in Proceedings of the 16th International Conference on Machine Learning and Applications (ICMLA 2020). Los Alamitos, CA: IEEE, 2020, pp. 441–448. DOI: https://doi.org/10.1109/ICMLA51294.2020.00074.

    Cite This Article As :
    Karimov, Botirjon. , Sultanov, Murodjon. , Nematullaev, Jasurbek. Fusion Data Framework for Enhanced Outlier Detection Integrating Statistical and Machine Learning Techniques for Retail Analytics. Fusion: Practice and Applications, vol. , no. , 2025, pp. 269-287. DOI: https://doi.org/10.54216/FPA.180119
    Karimov, B. Sultanov, M. Nematullaev, J. (2025). Fusion Data Framework for Enhanced Outlier Detection Integrating Statistical and Machine Learning Techniques for Retail Analytics. Fusion: Practice and Applications, (), 269-287. DOI: https://doi.org/10.54216/FPA.180119
    Karimov, Botirjon. Sultanov, Murodjon. Nematullaev, Jasurbek. Fusion Data Framework for Enhanced Outlier Detection Integrating Statistical and Machine Learning Techniques for Retail Analytics. Fusion: Practice and Applications , no. (2025): 269-287. DOI: https://doi.org/10.54216/FPA.180119
    Karimov, B. , Sultanov, M. , Nematullaev, J. (2025) . Fusion Data Framework for Enhanced Outlier Detection Integrating Statistical and Machine Learning Techniques for Retail Analytics. Fusion: Practice and Applications , () , 269-287 . DOI: https://doi.org/10.54216/FPA.180119
    Karimov B. , Sultanov M. , Nematullaev J. [2025]. Fusion Data Framework for Enhanced Outlier Detection Integrating Statistical and Machine Learning Techniques for Retail Analytics. Fusion: Practice and Applications. (): 269-287. DOI: https://doi.org/10.54216/FPA.180119
    Karimov, B. Sultanov, M. Nematullaev, J. "Fusion Data Framework for Enhanced Outlier Detection Integrating Statistical and Machine Learning Techniques for Retail Analytics," Fusion: Practice and Applications, vol. , no. , pp. 269-287, 2025. DOI: https://doi.org/10.54216/FPA.180119