Journal of Intelligent Systems and Internet of Things

Journal DOI

https://doi.org/10.54216/JISIoT

Submit Your Paper

2690-6791ISSN (Online) 2769-786XISSN (Print)

Volume 12 , Issue 2 , PP: 150-165, 2024 | Cite this article as | XML | Html | PDF | Full Length Article

Filtering Big Data with Optimized Hybrid Algorithm for IoT-Based Data Selection

Sarvesh Kumar 1 , Satyajee Srivastava 2 , Surendra Kumar 3 * , Arun Kumar Saini 4 , Neeraj Verma 5 , Dhiraj Kapila 6

  • 1 Department of CSE, IcfaiTech, The ICFAI University, Jaipur, India - (skumarcse4@gmail.com)
  • 2 School of Computer Science and Engineering, Galgotias University, Uttar Pradesh, India - (drsatyajee@gmail.com)
  • 3 Department of computer Engineering and Applications, GLA University, Mathura, - (kumar.surendra1989@gmail.com)
  • 4 Department of CSE, IcfaiTech, The ICFAI University, Jaipur, India - (arunsaini1@gmail.com)
  • 5 Department of CSE, IcfaiTech, The ICFAI University, Jaipur, India - (er.neerajkumar@gmail.com)
  • 6 Department of Computer Science & Engineering, Lovely Professional University, Punjab, India - (dhiraj.23509@lpu.co.in)
  • Doi: https://doi.org/10.54216/JISIoT.120211

    Received: August 22, 2023 Revised: November 07, 2023 Accepted: April: 19, 2024
    Abstract

    Data management across servers has grown problematic because of technological advancements in data processing and storage capacities. Data that is neither organized nor labelled adds an additional layer of difficulty to the storing and retrieving processes. This data, which is not tagged, requires analytic techniques that are more powerful and time efficient. Clustering has long been regarded as one of the most effective methods for managing large amounts of data; nonetheless, larger volumes can lead to unexpectedly poor accuracy when using conventional clustering methodologies. In this study, we suggest the use of a novel framework for the clustering of large amounts of data. The preprocessing stage is one of the most important parts in the data cleansing process; hence, a global stop-word list is used to filter the contents of the files before sending them on to the cluster distribution stage. A meta-heuristic focused Genetic Algorithm (GA) is utilized to eradicate the redundant information present in the datasets. In addition to the generalized attributable fitness function, an attribute-based innovative fitness function (f) is being developed. To determine how well proposed method performs, it is compared to a variety of alternative clustering approaches. When comparing the distributions of clusters for the purpose of evaluation, the Standard Error (SE), root mean squared error (RMSE), and corrected R squared error are all computed.

    Keywords :

    Meta-heuristic , Internet of Things , Data selection , K-Mean Clustering , K-Medoid , Genetic Algorithm.

    References

    [1]      Gantz, J., & Reinsel, D. The digital universe decade-are you ready? Retrieved from http://idcdocserv.com/expired.asp?925, 2010.

    [2]      Gantz, John F. The expanding digital universe: A forecast of worldwide information growth through 2010. IDC, 2007.

    [3]      Ianni, M., Masciari, E., Mazzeo, G. M., Mezzanzanica, M., & Zaniolo, C. "Fast and effective big data exploration by clustering." Future Generation Computer Systems, 102, 84-94, 2020.

    [4]      Ikotun, A.M., Ezugwu, A.E., Abualigah, L., Abuhaija, B. and Heming, J., “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data”. Information Sciences, 622, pp.178-210, 2023.

    [5]      Oyewole, G.J. and Thopil, G.A., “Data clustering: application and trends”. Artificial Intelligence Review, 56(7), pp.6439-6475, 2023.

    [6]      Hu, H., Liu, J., Zhang, X. and Fang, M., “An effective and adaptable K-means algorithm for big data cluster analysis”. Pattern Recognition, 139, p.109404, 2023.

    [7]      Lampropoulos, G., “Educational data mining and learning analytics in the 21st Century”. In Encyclopedia of data science and machine learning (pp. 1642-1651). IGI Global, 2023.

    [8]      Al-Jumaili, A.H.A., Muniyandi, R.C., Hasan, M.K., Paw, J.K.S. and Singh, M.J., “Big data analytics using cloud computing based frameworks for power management systems: Status, constraints, and future recommendations”. Sensors, 23(6), p.2952, 2023.

    [9]      Zhang, Pu, & Qiang Shen. "Fuzzy c-means based coincidental link filtering in support of inferring social networks from spatiotemporal data streams." Soft Computing, 22(21), 7015-7025, 2018.

    [10]    Reddy, C.S., Rao, N.S.K.D., Sisir, A., Raju, V.S.S. and Aravinth, S.S., “A Comparative Survey on K-Means and Hierarchical Clustering in E-Commerce Systems”. In 2023 International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT) (pp. 805-811). IEEE, 2023.

    [11]    Djouzi, K., & Beghdad-Bey, K. “A review of clustering algorithms for Big Data”. In 2019 International Conference on Networking and Advanced Systems (ICNAS) (pp. 1-6). IEEE, 2019.

    [12]    Pandove, D., Goel, S., & Rani, R. Systematic review of clustering high-dimensional and large dataset. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(2), 1-68, 2018.

    [13]    Khan, G.Z., Ulhaq, I., Adil, I., Ulhaq, S. and Ullah, I. “A Privacy-Preserving Based Technique for Customer Churn Prediction in Telecom Industry”. VFAST Transactions on Software Engineering, 11(3), pp.73-80, 2023.

    [14]    Qtaish, A., Braik, M., Albashish, D., Alshammari, M.T., Alreshidi, A. and Alreshidi, E.J., 2024. Optimization of K-means clustering method using hybrid capuchin search algorithm. The Journal of Supercomputing, 80(2), pp.1728-1787, 2024.

    [15]    Mussabayev, R., Mladenovic, N., Jarboui, B. and Mussabayev, R., “How to use K-means for big data clustering?”. Pattern Recognition, 137, p.109269, 2023.

    [16]    Li, Y., Fei, T., & Zhang, F. “A regionalization method for clustering and partitioning based on trajectories from NLP perspective”. International Journal of Geographical Information Science, 33(12), 2385-2405, 2019.

    [17]    Mavridis, C. N., & Baras, J. S. “Progressive graph partitioning based on information diffusion”. In 2021 60th IEEE Conference on Decision and Control (CDC) (pp. 37-42). IEEE, 2021.

    [18]    Sun, X., Ma, H., Sun, Y., & Liu, M. “A novel point IoT compression algorithm based on clustering”. IEEE Robotics and Automation Letters, 4(2), 2132-2139, 2019.

    [19]    Tahir, M., Sardaraz, M., Mehmood, Z., & Muhammad, S. “CryptoGA: a cryptosystem based on genetic algorithm for IoT data security”. Cluster Computing, 24, 739-752, 2021.

    [20]    Patel, E., & Kushwaha, D. S. “Clustering IoT workloads: K-means vs gaussian mixture model”. Procedia Computer Science, 171, 158-167, 2020.

    [21]    Panwar, S. S., Rauthan, M. M. S., & Barthwal, V. “A systematic review on effective energy utilization management strategies in IoT data centers”. Journal of IoT Computing, 11(1), 1-29, 2022.

    [22]    Liu, Y., Liu, Z., Li, S., Guo, Y., Liu, Q., & Wang, G. “IoT-Cluster: An uncertainty clustering algorithm based on IoT model”. Knowledge-Based Systems, 263, 110261, 2023.

    [23]    Sharma, A., Sharma, A., Jalal, A. S., & Kant, K. “A Two Step Clustering Method for Facility Location Problem”. International Journal of Advanced Intelligent Paradigms, 18(3), 337-355, 2021.

    [24]    Sharma, A., Sharma, A., & Jalal, A. S. “Hybrid Algorithm of Density based Clustering and Profit maximization for Facility Location Problem”. International Journal of Future Generation Communication and Networking, 10(11), 47-54, 2017.

    [25]    Pooja, Kumar, R., Viriyasitavat, W., Yadav, K. and Dhiman, G., “Analysis of clustering algorithms for facility location allocation problems”. In Proceedings of Third International Conference on Advances in Computer Engineering and Communication Systems: ICACECS 2022 (pp. 597-605). Singapore: Springer Nature Singapore, 2023.

    [26]    Pradhan, R., & Sharma, D. K. “A hierarchical topic modeling approach for short text clustering”. International Journal of Information and Communication Technology, 20(4), 463–481, 2022.

     

    Cite This Article As :
    Kumar, Sarvesh. , Srivastava, Satyajee. , Kumar, Surendra. , Kumar, Arun. , Verma, Neeraj. , Kapila, Dhiraj. Filtering Big Data with Optimized Hybrid Algorithm for IoT-Based Data Selection. Journal of Intelligent Systems and Internet of Things, vol. , no. , 2024, pp. 150-165. DOI: https://doi.org/10.54216/JISIoT.120211
    Kumar, S. Srivastava, S. Kumar, S. Kumar, A. Verma, N. Kapila, D. (2024). Filtering Big Data with Optimized Hybrid Algorithm for IoT-Based Data Selection. Journal of Intelligent Systems and Internet of Things, (), 150-165. DOI: https://doi.org/10.54216/JISIoT.120211
    Kumar, Sarvesh. Srivastava, Satyajee. Kumar, Surendra. Kumar, Arun. Verma, Neeraj. Kapila, Dhiraj. Filtering Big Data with Optimized Hybrid Algorithm for IoT-Based Data Selection. Journal of Intelligent Systems and Internet of Things , no. (2024): 150-165. DOI: https://doi.org/10.54216/JISIoT.120211
    Kumar, S. , Srivastava, S. , Kumar, S. , Kumar, A. , Verma, N. , Kapila, D. (2024) . Filtering Big Data with Optimized Hybrid Algorithm for IoT-Based Data Selection. Journal of Intelligent Systems and Internet of Things , () , 150-165 . DOI: https://doi.org/10.54216/JISIoT.120211
    Kumar S. , Srivastava S. , Kumar S. , Kumar A. , Verma N. , Kapila D. [2024]. Filtering Big Data with Optimized Hybrid Algorithm for IoT-Based Data Selection. Journal of Intelligent Systems and Internet of Things. (): 150-165. DOI: https://doi.org/10.54216/JISIoT.120211
    Kumar, S. Srivastava, S. Kumar, S. Kumar, A. Verma, N. Kapila, D. "Filtering Big Data with Optimized Hybrid Algorithm for IoT-Based Data Selection," Journal of Intelligent Systems and Internet of Things, vol. , no. , pp. 150-165, 2024. DOI: https://doi.org/10.54216/JISIoT.120211