CL-FusionBEV: A Cross-Attention Based Fusion Model for Camera and LiDAR in Bird’s Eye View Perception

S. P. Samyuktha; S. Renuka; R. Shakthi Priyaa; Angel Meriba D. S.; Maheshwari M.; Megavarshini M.; S. Malathi

doi:https://doi.org/10.54216/FPA.190202

CL-FusionBEV: A Cross-Attention Based Fusion Model for Camera and LiDAR in Bird’s Eye View Perception

S. P. Samyuktha ^{1
*} , S. Renuka ^{2
*} , R. Shakthi Priyaa ³ , Angel Meriba D. S. ^{4
*} , Maheshwari M. ⁵ , Megavarshini M. ⁶ , S. Malathi ^{7
*}

1 UG Scholar, Artificial Intelligence and Data Science, Panimalar Engineering College, Chennai, India - (spsamyu516@gmail.com)

2 UG Scholar, Artificial Intelligence and Data Science, Panimalar Engineering College, Chennai, India - (renu74496@gmail.com)

3 UG Scholar, Artificial Intelligence and Data Science, Panimalar Engineering College, Chennai, India - (shakthipriyaa16@gail.com)

4 UG Scholar, Artificial Intelligence and Data Science, Panimalar Engineering College, Chennai, India - (angelmeriba17@gmail.com)

5 UG Scholar, Artificial Intelligence and Data Science, Panimalar Engineering College, Chennai, India - (maheshwari23maniveeran@gmail.com)

6 UG Scholar, Artificial Intelligence and Data Science, Panimalar Engineering College, Chennai, India - (varshinimega14@gmail.com)

7 Professor, Panimalar Engineering College, Chennai, India - (malathi.raghuram@gmail.com)

Doi: https://doi.org/10.54216/FPA.190202

Received: January 03, 2025 Revised: February 13, 2025 Accepted: March 05, 2025

Abstract

In autonomous navigation, the ability to detect 3D objects from a Bird’s-Eye View (BEV) perspective is essential. Nevertheless, many obstacles remain before LiDAR and camera data can be effectively combined. We propose CL-FusionBEV, a novel framework for sensor fusion that enhances Three-dimensional object recognition in the BEV domain. This method structures LiDAR point clouds for improved spatial feature extraction while converting camera data into BEV format via an implicit learning technique. An implicit fusion network and a multi-modal cross-attention mechanism facilitate seamless sensor interaction, ensuring comprehensive feature integration. Additionally, a self-attention mechanism of BEV enhances broad-scale reasoning and data extraction, improving the detection of occluded and distant objects. By efficiently synchronising data from several sensors, the suggested method improves feature uniformity and resolves spatial inconsistencies. It further leverages adaptive feature selection to enhance robustness against sensor noise and varying conditions. We evaluate CL-FusionBEV on the nuScenes dataset, achieving achieved a 73.3% mAP and a 75.5% NDS on the nuScenes benchmark, with vehicle and pedestrian detection accuracies of 89% and 90.7%, respectively. Our model demonstrates superior robustness in challenging conditions such as low visibility and dense urban environments. CL-FusionBEV maintains high efficiency with real-time inference, making it suitable for deployment in autonomous systems. Extensive experiments show our strategy routinely beats cutting-edge techniques, especially in detecting small and distant objects. By addressing key sensor fusion challenges in the BEV domain, CL-FusionBEV offers a notable advancement in Three-dimensional object recognition, ensuring high accuracy, efficiency, and reliability for real-world driving scenarios.

Keywords :

BEV-based vision , Three-dimensional object recognition , Attention-based model , Self-drivin

References

[1] C. Yan and E. Salman, "Mono3D: Open Source Cell Library for Monolithic 3-D Integrated Circuits," 2017.

[2] X. Chen, K. Kundu, Y. Zhu, A. Berneshawi, H. Ma, S. Fidler, and R. Urtasun, "3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection," 2017.

[3] C. Pham and J. W. Jeon, "Robust Object Proposals Re-ranking for Object Detection in Autonomous Driving Using Convolutional Neural Networks," 2017.

[4] B. Xu and Z. Chen, "Multi-Level Fusion Based 3D Object Detection from Monocular Images," 2018.

[5] H. Dou, Y. Liu, S. Chen, and H. Bilal, "A Hybrid CEEMD-GMM Scheme for Enhancing the Detection of Traffic Flow on Highways," 2023.

[6] Y. Zhou and O. Tuzel, "VoxelNet: End-to-End Learning for Point Cloud-Based 3D Object Detection," 2018.

[7] Y. Yan, Y. Mao, and B. Li, "SECOND: Sparsely Embedded Convolutional Detection," 2018.

[8] T. Yin, X. Zhou, and P. Krahenbuhl, "Center-Based 3D Object Detection and Tracking," 2021.

[9] S. Shi, Z. Wang, J. Shi, X. Wang, and H. Li, "From Points to Parts: 3D Object Detection from Point Cloud with Part-Aware and Part-Aggregation Network," 2020.

[10] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, "PointPillars: Fast Encoders for Object Detection from Point Clouds," 2019.

[11] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation," 2017.

[12] C. Zhang, X. Pan, and H. Li, "A Hybrid MLP-CNN Classifier for Very Fine Resolution Remotely Sensed Image Classification," 2018.

[13] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space," 2017.

[14] H. Bilal, W. Yao, Y. Guo, Y. Wu, and J. Guo, "Experimental Validation of Fuzzy PID Control of Flexible Joint System in Presence of Uncertainties," 2017.

[15] J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, "SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences," 2019.

[16] Z. Liu, H. Tang, A. Amini, X. Liu, X. Yu, S. Han, and D. Rus, "BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation," 2023.

[17] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, "Multi-View 3D Object Detection Network for Autonomous Driving," 2017.

[18] Q. Wu, X. Li, K. Wang, and H. Bilal, "Regional Feature Fusion for On-Road Detection of Objects Using Camera and 3D-LiDAR in High-Speed Autonomous Vehicles," 2023.

[19] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, "nuScenes: A Multimodal Dataset for Autonomous Driving," 2020.

[20] H. Yan, X. Yu, Y. Zhang, S. Zhang, X. Zhao, and L. Zhang, "Single Image Depth Estimation with Normal Guided Scale Invariant Deep Convolutional Fields," 2017.

H. Bilal, B. Yin, M. S. Aslam, and H. Wu, "A Practical Study of Active Disturbance Rejection Control for Rotary Flexible Joint Robot Manipulator," 2023

Cite This Article As :

P., S.. , Renuka, S.. , Shakthi, R.. , Meriba, Angel. , M., Maheshwari. , M., Megavarshini. , Malathi, S.. CL-FusionBEV: A Cross-Attention Based Fusion Model for Camera and LiDAR in Bird’s Eye View Perception. Fusion: Practice and Applications, vol. , no. , 2025, pp. 15-27. DOI: https://doi.org/10.54216/FPA.190202

P., S. Renuka, S. Shakthi, R. Meriba, A. M., M. M., M. Malathi, S. (2025). CL-FusionBEV: A Cross-Attention Based Fusion Model for Camera and LiDAR in Bird’s Eye View Perception. Fusion: Practice and Applications, (), 15-27. DOI: https://doi.org/10.54216/FPA.190202

P., S.. Renuka, S.. Shakthi, R.. Meriba, Angel. M., Maheshwari. M., Megavarshini. Malathi, S.. CL-FusionBEV: A Cross-Attention Based Fusion Model for Camera and LiDAR in Bird’s Eye View Perception. Fusion: Practice and Applications , no. (2025): 15-27. DOI: https://doi.org/10.54216/FPA.190202

P., S. , Renuka, S. , Shakthi, R. , Meriba, A. , M., M. , M., M. , Malathi, S. (2025) . CL-FusionBEV: A Cross-Attention Based Fusion Model for Camera and LiDAR in Bird’s Eye View Perception. Fusion: Practice and Applications , () , 15-27 . DOI: https://doi.org/10.54216/FPA.190202

P. S. , Renuka S. , Shakthi R. , Meriba A. , M. M. , M. M. , Malathi S. [2025]. CL-FusionBEV: A Cross-Attention Based Fusion Model for Camera and LiDAR in Bird’s Eye View Perception. Fusion: Practice and Applications. (): 15-27. DOI: https://doi.org/10.54216/FPA.190202

P., S. Renuka, S. Shakthi, R. Meriba, A. M., M. M., M. Malathi, S. "CL-FusionBEV: A Cross-Attention Based Fusion Model for Camera and LiDAR in Bird’s Eye View Perception," Fusion: Practice and Applications, vol. , no. , pp. 15-27, 2025. DOI: https://doi.org/10.54216/FPA.190202

Fusion: Practice and Applications

Journal DOI

Journal Menu

Journal Volumes

Volume 1

Volume 2

Volume 3

Volume 4

Volume 5

Volume 6

Volume 7

Volume 8

Volume 9

Volume 10

Volume 11

Volume 12

Volume 13

Volume 14

Volume 15

Volume 16

Volume 17

Volume 18

Volume 19

Volume 20

Volume 21

CL-FusionBEV: A Cross-Attention Based Fusion Model for Camera and LiDAR in Bird’s Eye View Perception

Abstract

Keywords :

References

Cite This Article As :

Article Statistics

Download