Assessing Quality Attributes of Microservices in Hadoop and Spark Clusters: A Performance Benchmarking Approach in Dockerized and Non-Dockerized Architectures

¹ENETCom SFAX, ReDCAD Laboratory, University of Sfax, B.P. 1173, 3038 Sfax, Tunisia

Abstract

The rapid expansion of big data has accelerated the adoption of distributed computing frame- works such as Apache Hadoop and Apache Spark, enabling efficient large-scale data processing. While Spark’s in-memory computation model significantly enhances performance compared to Hadoop’s traditional MapReduce, the deployment architecture—whether Dockerized or non- Dockerized—plays a crucial role in affecting performance, scalability, and resource management. This study evaluates the impact of containerized and non-containerized multi-node cluster architectures on the performance of Hadoop and Spark, utilizing standardized workloads such as WordCount and TeraSort. Key performance metrics, including execution time, throughput, and resource utilization, are analyzed across various configurations with parameter tuning. Beyond pure performance benchmarking, the study also assesses the quality attributes of microservices in big data environments, focusing on scalability, maintainability, fault tolerance, and resource efficiency. The comparative analysis between monolithic and microservice-based architectures highlights the advantages of modularity and independent scaling inherent to microservices. Experimental findings indicate that Spark outperforms Hadoop on small to medium-scale workloads, while Hadoop exhibits superior robustness for processing extremely large datasets. Dockerized deployments offer better resource isolation and management flexibility, whereas non-Dockerized setups demonstrate reduced overhead under certain configurations. These insights contribute to optimizing deployment strategies and architectural decisions for microservices-based big data processing frameworks.

Received: March 11, 2025 Revised: June 10, 2025 Accepted: August 01, 2025

Keywords: Apache Hadoop; Apache Spark; Big Data; Microservices; Quality Attribute Assessment; Docker Containerization; Kubernetes; Multi-Node Clusters; Performance Benchmarking