Efficient Data Processing Techniques for Structured Data Analysis Using Stream Pipeline Parallelism

Sampath Kini K.; D. K. Sreekantha

doi:https://doi.org/10.54216/FPA.180109

Efficient Data Processing Techniques for Structured Data Analysis Using Stream Pipeline Parallelism

Sampath Kini K. ^{1
*} , D. K. Sreekantha ²

1 Sampath Kini K, Assistant Professor, Computer Science and Engineering, NITTE Deemed to be University, Karnataka, India - (sampath@nitte.edu.in)

2 Professor, Computer Science and Engineering, NMAM Institute of Technology, NITTE Deemed to be University, Karnataka, India - (in, sreekantha@nitte.edu.in)

Doi: https://doi.org/10.54216/FPA.180109

Received: June 28, 2024 Revised: September 24, 2024 Accepted: December 27, 2024

Abstract

This research illustrates how dynamic task balancing and data sharing may improve distributed data processing. The technology handles parallel processing system difficulties with huge datasets by minimizing resource utilization, time complexity, and output. We modify the workload on the fly after splitting to ensure that all processing units receive equal work. One last optimization phase optimizes job distribution to maximize system efficiency. We test the solution for latency, speed, scalability, resource utilization, fault tolerance, and synchronization overhead. Results reveal that the new strategy outperforms existing ones in every regard. It features the lowest latency, quickest production, and highest growth potential. The approach handles mistakes well, divides data effectively, and syncs everything at a cheap cost. These properties make it ideal for real-time data processing and fast-growing applications. Future study will concentrate on flexible splitting strategies, fault tolerance mechanisms, and predictive analytics machine learning models. These modifications will improve real-time data handling.

Keywords :

Data Redistribution , Fault Tolerance , Latency , Parallel Processing , Predictive Analytics , Resource Utilization , Scalability , Synchronization Overhead , Throughput , Workload Balancing

References

1. N. Tantalaki, S. Souravlas, and M. Roumeliotis, "A review on Big Data real-time stream processing and its scheduling techniques," Int. J. Parallel Emerg. Distrib. Syst., vol. 34, no. 1, pp. 45–60, 2019.

2. N. Tantalaki, S. Souravlas, M. Roumeliotis, and S. Katsavounis, "Linear scheduling of big data streams on multiprocessor sets in the cloud," in Proc. IEEE/WIC/ACM Int. Conf. Web Intelligence (WI’19), Thessaloniki, Greece, 14–17 Oct. 2019, pp. 107–115. ACM, 2019.

3. Apache Software Foundation, "Apache Storm," 2019. [Online]. Available: http://storm.apache.org/. [Accessed: 5 Jun. 2019].

115

DOI: https://doi.org/10.54216/FPA.180109

Received: June 28, 2024 Revised: September 24, 2024 Accepted: December 27, 2024

4. Apache Software Foundation, "Spark Streaming - Apache Spark," 2019. [Online]. Available: http://spark.apache.org/streaming/. [Accessed: 5 Jun. 2019].

5. Apache Software Foundation, "Apache Samza—A Distributed Stream Processing Framework," 2019. [Online]. Available: https://samza.apache.org. [Accessed: 5 Jun. 2019].

6. R. Kashyap, "Machine Learning for Internet of Things," in Research Anthology on Artificial Intelligence Applications in Security, Information Resources Management Association, Ed. IGI Global, 2021, pp. 976–1002, doi: 10.4018/978-1-7998-7705-9.ch046.

7. A.D. Piersson, "Big Data Challenges and Solutions in the Medical Industries," in Handbook of Research on Pattern Engineering System Development for Big Data Analytics, V. Tiwari et al., Eds. IGI Global, 2018, pp. 1–24, doi: 10.4018/978-1-5225-3870-7.ch001.

8. L. Eskandari, Z. Huang, and D. P. Eyers, "P-Scheduler: Adaptive Hierarchical Scheduling in Apache Storm," in Proc. Australasian Computer Science Week Multiconference (ACSW ’16), Canberra, Australia, 2–5 Feb. 2016, Article 26, p. 10.

9. H. Byeon, R. Nair, V. Mahalakshmi, M. I. Khalaf, B. Kaushik, and M. Shabaz, "Enhancing medical image-based diagnostics through the application of convolutional neural networks techniques," in 2024 Third Int. Conf. Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballari, India, 2024, pp. 1–6, doi: 10.1109/ICDCECE60827.2024.10548500.

10. R. Nair, M. M. Abdulhasan, H. H. Khalaf, and A. M. Shareef, "A deep learning-based model for mutation rate prediction of COVID-19 using genomic sequences," in 2023 Seventh Int. Conf. Image Information Processing (ICIIP), Solan, India, 2023, pp. 759–764, doi: 10.1109/ICIIP61524.2023.10537657.

11. L. Eskandari, J. Mair, Z. Huang, and D. Eyers, "Iterative scheduling for distributed stream processing systems," in Proc. 12th ACM Int. Conf. Distributed and Event-Based Systems (DEBS ’18), Hamilton, New Zealand, 25–29 Jun. 2018, pp. 234–237. ACM, 2018.

12. A. Shukla and Y. Simmhan, "Model-driven scheduling for distributed stream processing systems," J. Parallel Distrib. Comput., vol. 117, pp. 98–114, 2018.

13. R. Eidenbenz and T. Locher, "Task allocation for distributed stream processing," in Proc. IEEE INFOCOM 2016—The 35th Annual IEEE Int. Conf. Computer Communications, San Francisco, CA, USA, 10–14 Apr. 2016, pp. 1–9.

14. J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," Commun. ACM, vol. 51, no. 1, pp. 107–113, 2004.

15. A. Oussous, S. Azizi, and M. Beni-Hssane, "Big data systems and analytics: A survey," Int. J. Comput. Appl., vol. 179, no. 9, pp. 1–8, 2018.

16. N. Waoo and A. Jaiswal, "DNA Nano array analysis using hierarchical quality threshold clustering," in 2010 2nd IEEE Int. Conf. Information Management and Engineering, Chengdu, China, 2010, pp. 81–85, doi: 10.1109/ICIME.2010.5477579.

17. T. Zheng, G. Chen, and X. Wang, "Real-time intelligent big data processing," IEEE Trans. Ind. Electron., vol. 69, no. 8, pp. 8432–8441, 2022.

18. J. Gantz and D. Reinsel, "The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the Far East," IDC White Paper, 2012.

19. R. Shukla and R. K. Gupta, "A Multiphase Pre-copy Strategy for the Virtual Machine Migration in Cloud," in Smart Intelligent Computing and Applications, S. Satapathy et al., Eds., vol. 104, Springer, Singapore, 2019, pp. 1–7, doi: 10.1007/978-981-13-1921-1_43.

20. V. Tiwari, "Active contours using global models for medical image segmentation," Int. J. Comput. Syst. Eng., vol. 4, no. 2/3, 2018.

21. Microsoft Azure Architecture Center, "Data partitioning guidance," [Online]. Available: https://learn.microsoft.com/en-us/azure/architecture/. [Accessed: 2023].

22. S. Salloum, S. Nassar, and S. Obeid, "A random sample partition data model for big data analysis," Int. J. Data Sci. Anal., vol. 22, no. 5, pp. 439–448, 2021.

23. G. Chen et al., "Adaptive partitioning techniques for stream processing," J. Big Data Anal., vol. 6, pp. 1–10, 2021.

Cite This Article As :

Kini, Sampath. , K., D.. Efficient Data Processing Techniques for Structured Data Analysis Using Stream Pipeline Parallelism. Fusion: Practice and Applications, vol. , no. , 2025, pp. 104-115. DOI: https://doi.org/10.54216/FPA.180109

Kini, S. K., D. (2025). Efficient Data Processing Techniques for Structured Data Analysis Using Stream Pipeline Parallelism. Fusion: Practice and Applications, (), 104-115. DOI: https://doi.org/10.54216/FPA.180109

Kini, Sampath. K., D.. Efficient Data Processing Techniques for Structured Data Analysis Using Stream Pipeline Parallelism. Fusion: Practice and Applications , no. (2025): 104-115. DOI: https://doi.org/10.54216/FPA.180109

Kini, S. , K., D. (2025) . Efficient Data Processing Techniques for Structured Data Analysis Using Stream Pipeline Parallelism. Fusion: Practice and Applications , () , 104-115 . DOI: https://doi.org/10.54216/FPA.180109

Kini S. , K. D. [2025]. Efficient Data Processing Techniques for Structured Data Analysis Using Stream Pipeline Parallelism. Fusion: Practice and Applications. (): 104-115. DOI: https://doi.org/10.54216/FPA.180109

Kini, S. K., D. "Efficient Data Processing Techniques for Structured Data Analysis Using Stream Pipeline Parallelism," Fusion: Practice and Applications, vol. , no. , pp. 104-115, 2025. DOI: https://doi.org/10.54216/FPA.180109

Fusion: Practice and Applications

Journal Menu

Journal Volumes

Volume 1

Volume 2

Volume 3

Volume 4

Volume 5

Volume 6

Volume 7

Volume 8

Volume 9

Volume 10

Volume 11

Volume 12

Volume 13

Volume 14

Volume 15

Volume 16

Volume 17

Volume 18

Volume 19

Volume 20

Volume 21

Efficient Data Processing Techniques for Structured Data Analysis Using Stream Pipeline Parallelism

Abstract

Keywords :

References

Cite This Article As :

Article Statistics

Download