Volume 19 , Issue 2 , PP: 15-27, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
S. P. Samyuktha 1 * , S. Renuka 2 * , R. Shakthi Priyaa 3 , Angel Meriba D. S. 4 * , Maheshwari M. 5 , Megavarshini M. 6 , S. Malathi 7 *
Doi: https://doi.org/10.54216/FPA.190202
In autonomous navigation, the ability to detect 3D objects from a Bird’s-Eye View (BEV) perspective is essential. Nevertheless, many obstacles remain before LiDAR and camera data can be effectively combined. We propose CL-FusionBEV, a novel framework for sensor fusion that enhances Three-dimensional object recognition in the BEV domain. This method structures LiDAR point clouds for improved spatial feature extraction while converting camera data into BEV format via an implicit learning technique. An implicit fusion network and a multi-modal cross-attention mechanism facilitate seamless sensor interaction, ensuring comprehensive feature integration. Additionally, a self-attention mechanism of BEV enhances broad-scale reasoning and data extraction, improving the detection of occluded and distant objects. By efficiently synchronising data from several sensors, the suggested method improves feature uniformity and resolves spatial inconsistencies. It further leverages adaptive feature selection to enhance robustness against sensor noise and varying conditions. We evaluate CL-FusionBEV on the nuScenes dataset, achieving achieved a 73.3% mAP and a 75.5% NDS on the nuScenes benchmark, with vehicle and pedestrian detection accuracies of 89% and 90.7%, respectively. Our model demonstrates superior robustness in challenging conditions such as low visibility and dense urban environments. CL-FusionBEV maintains high efficiency with real-time inference, making it suitable for deployment in autonomous systems. Extensive experiments show our strategy routinely beats cutting-edge techniques, especially in detecting small and distant objects. By addressing key sensor fusion challenges in the BEV domain, CL-FusionBEV offers a notable advancement in Three-dimensional object recognition, ensuring high accuracy, efficiency, and reliability for real-world driving scenarios.
BEV-based vision , Three-dimensional object recognition , Attention-based model , Self-drivin
[1] C. Yan and E. Salman, "Mono3D: Open Source Cell Library for Monolithic 3-D Integrated Circuits," 2017.
[2] X. Chen, K. Kundu, Y. Zhu, A. Berneshawi, H. Ma, S. Fidler, and R. Urtasun, "3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection," 2017.
[3] C. Pham and J. W. Jeon, "Robust Object Proposals Re-ranking for Object Detection in Autonomous Driving Using Convolutional Neural Networks," 2017.
[4] B. Xu and Z. Chen, "Multi-Level Fusion Based 3D Object Detection from Monocular Images," 2018.
[5] H. Dou, Y. Liu, S. Chen, and H. Bilal, "A Hybrid CEEMD-GMM Scheme for Enhancing the Detection of Traffic Flow on Highways," 2023.
[6] Y. Zhou and O. Tuzel, "VoxelNet: End-to-End Learning for Point Cloud-Based 3D Object Detection," 2018.
[7] Y. Yan, Y. Mao, and B. Li, "SECOND: Sparsely Embedded Convolutional Detection," 2018.
[8] T. Yin, X. Zhou, and P. Krahenbuhl, "Center-Based 3D Object Detection and Tracking," 2021.
[9] S. Shi, Z. Wang, J. Shi, X. Wang, and H. Li, "From Points to Parts: 3D Object Detection from Point Cloud with Part-Aware and Part-Aggregation Network," 2020.
[10] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, "PointPillars: Fast Encoders for Object Detection from Point Clouds," 2019.
[11] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation," 2017.
[12] C. Zhang, X. Pan, and H. Li, "A Hybrid MLP-CNN Classifier for Very Fine Resolution Remotely Sensed Image Classification," 2018.
[13] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space," 2017.
[14] H. Bilal, W. Yao, Y. Guo, Y. Wu, and J. Guo, "Experimental Validation of Fuzzy PID Control of Flexible Joint System in Presence of Uncertainties," 2017.
[15] J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, "SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences," 2019.
[16] Z. Liu, H. Tang, A. Amini, X. Liu, X. Yu, S. Han, and D. Rus, "BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation," 2023.
[17] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, "Multi-View 3D Object Detection Network for Autonomous Driving," 2017.
[18] Q. Wu, X. Li, K. Wang, and H. Bilal, "Regional Feature Fusion for On-Road Detection of Objects Using Camera and 3D-LiDAR in High-Speed Autonomous Vehicles," 2023.
[19] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, "nuScenes: A Multimodal Dataset for Autonomous Driving," 2020.
[20] H. Yan, X. Yu, Y. Zhang, S. Zhang, X. Zhao, and L. Zhang, "Single Image Depth Estimation with Normal Guided Scale Invariant Deep Convolutional Fields," 2017.
H. Bilal, B. Yin, M. S. Aslam, and H. Wu, "A Practical Study of Active Disturbance Rejection Control for Rotary Flexible Joint Robot Manipulator," 2023