Volume 27 , Issue 2 , PP: 497-503, 2026 | Cite this article as | XML | Html | PDF | Full Length Article
Abdulnaser Rashid 1 , Zahra I. Mahmoud 2 , Mawahib Elamin 3 , Amel H. Abdalla 4 , Adil O. Y. Mohamed 5 *
Doi: https://doi.org/10.54216/IJNS.270240
The increasing adoption of large-scale machine learning (ML) applications has exposed critical performance limitations in current data processing pipelines, particularly due to the separation between relational query execution and ML inference. This separation introduces redundant computations, excessive data materialization, and inefficient utilization of GPU Matrix Processing [10] resources. In this paper, we present a unified execution framework that integrates relational query processing and machine learning prediction by representing both as linear algebra operations. Leveraging algebraic properties such as associativity and distributivity, we introduce an operator fusion [8] strategy that enables query operators and ML models to be jointly executed on GPU Matrix Processing [10] architectures. This approach reduces intermediate data movement and enables end-to-end pipeline execution within a single linear algebra runtime. We analyze the computational complexity of the proposed fusion strategy and discuss its applicability to star-schema workloads commonly found in analytical systems. Experimental insights from prior studies indicate that linear algebra–based query execution combined with operator fusion [8] can yield substantial performance improvements over conventional GPU Matrix Processing [10]-accelerated pipelines, while maintaining scalability and portability. The proposed framework provides a viable foundation for future data-intensive systems that aim to unify analytics and machine learning on heterogeneous computing platforms. [1–3,14–16] This work unifies relational query processing and ML inference within a single algebraic runtime on GPUs, rather than coupling independent GPU-accelerated stages, thereby enabling cross-stage optimization and eliminating redundant materialization. Unlike existing GPU-accelerated databases and tensor-based query processors, the proposed framework provides a system-level unification of relational analytics and machine learning inference, rather than treating them as isolated or sequential stages. The framework is backend-agnostic and applicable to modern tensor runtimes and heterogeneous accelerator platforms, making it suitable for next-generation data-intensive systems.
Linear Algebra&ndash , based Query Processing (LAQ) , Operator Fusion , GPU Matrix Processing  , Acceleration , Sparse Matrix Computation , SpMM , Machine Learning Inference , Physical Matrix Design
[1] W. Sun, A. Katsifodimos, and R. Hai, “Accelerating machine learning queries with linear algebra query processing,” in Proc. Int. Conf. Scientific and Statistical Database Management (SSDBM), 2023.
[2] S. Luo, D. Jankov, B. Yuan, and C. Jermaine, “Automatic optimization of matrix implementations for distributed machine learning and linear algebra,” in Proc. ACM SIGMOD Int. Conf. Management of Data, 2021.
[3] N. Malaya et al., “Accelerating matrix processing with GPUs (invited),” AMD Research / IEEE, 2017.
[4] J. Kepner and J. Gilbert, Graph Algorithms in the Language of Linear Algebra. Philadelphia, PA, USA: SIAM, 2011.
[5] N. Bell and M. Garland, “Efficient sparse matrix-vector multiplication on CUDA,” NVIDIA Technical Report, 2008.
[6] S. Nakandala et al., “Hummingbird: Compiling trained ML models into tensor computations,” in Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI), 2020.
[7] D. He et al., “Query processing on tensor computation runtimes,” Proc. VLDB Endowment, vol. 15, no. 4, pp. 833–845, 2022.
[8] P. Bakkum and K. Skadron, “Accelerating SQL database operations on a GPU with CUDA,” in Proc. General-Purpose Computation on Graphics Processing Units (GPGPU), 2010.
[9] M. Boehm et al., “SystemDS: Declarative machine learning system,” Proc. VLDB Endowment, vol. 13, no. 12, pp. 2929–2942, 2020.
[10] D. Abadi et al., “Column-oriented database systems,” Proc. VLDB Endowment, vol. 1, no. 2, pp. 1664–1665, 2008.
[11] M. Stonebraker and J. Hellerstein, Readings in Database Systems, 5th ed. Cambridge, MA, USA: MIT Press, 2015.
[12] M. Zaharia et al., “Spark SQL: Relational data processing in Spark,” in Proc. ACM SIGMOD Int. Conf. Management of Data, 2016, pp. 1383–1394.
[13] T. Mattson et al., “The GraphBLAS C API specification,” in Proc. IEEE High Performance Extreme Computing Conf. (HPEC), 2013.
[14] S. Williams et al., “Roofline: An insightful visual performance model for multicore architectures,” Commun. ACM, vol. 52, no. 4, pp. 65–76, 2009.
[15] NVIDIA, cuSPARSE Library Documentation, 2023.
[16] RAPIDS AI, cuDF Library Documentation, 2023.
[17] E. Anderson et al., LAPACK Users’ Guide, 3rd ed. Philadelphia, PA, USA: SIAM, 1999.
[18] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI), 2004.
[19] J. Tang et al., “CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication,” in Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), 2015.
[20] R. Vuduc et al., “OSKI: A library of automatically tuned sparse matrix kernels,” Univ. California, Berkeley, Tech. Rep., 2005.