基于交叉注意力机制的雷视融合三维目标检测研究

车俐; 吕连辉; 蒋留兵; 李代江

doi:10.16592/j.cnki.1004-7859.2025031

基于交叉注意力机制的雷视融合三维目标检测研究

Research on 3D Object Detection Based on Radar-Visual Fusion with Cross-Attention Mechanism

摘要

摘要: 自动驾驶中算法的可靠性和准确性是尤为重要的，完成各种任务的前提是能精确地获取和描述各类光照及天气情况下的目标信息。考虑到单一传感器数据获取的局限性，基于通道注意力和交叉注意力机制提出一种特征融合的方法，将4D毫米波雷达的点云数据和相机图像进行特征级融合。首先，通过坐标系转换的方法，将毫米波雷达和相机采集到的数据进行空间对齐。然后，将空间对齐后的点云数据投影成点云伪图像，并对伪图像中的点云进行扩展，来提高融合的匹配度。最后，使用提出的交叉注意力特征融合网络对提取的相机和点云特征进行融合，以增强相机中关键特征。实验结果表明，提出的融合结构能够有效融合相机图像和点云伪图像中的特征，并以此增强网络对各类目标的检测性能。

Abstract: The reliability and accuracy of algorithms in autonomous driving are particularly important. The prerequisite for completing various tasks is the ability to accurately obtain and describe target information under various lighting and weather conditions. Considering the limitations of single sensor data acquisition, a feature fusion method is proposed based on channel attention and cross-attention mechanisms to perform feature-level fusion of 4D millimeter wave radar point cloud data and camera images. First, the data collected by the millimeter-wave radar and camera are spatially aligned through the method of coordinate system conversion. Then, the spatially aligned point cloud data is projected into pseudo-images, and the point clouds in these pseudo-images are expanded to improve fusion matching. Finally, a proposed cross-attention feature fusion network is used to merge the extracted camera and point cloud features, enhancing the key features in the camera. Experimental results demonstrate that the proposed fusion structure effectively integrates features from camera images and pseudo-images, thereby improving the network’s detection performance for various targets.

HTML全文

参考文献(0)

施引文献

资源附件(0)