基于多模态特征融合与动态注意力机制的目标识别算法

丛潇雨; 戴少怀; 陈铖; 单世臣; 韩玉兵

doi:10.16592/j.cnki.1004-7859.2025062903

基于多模态特征融合与动态注意力机制的目标识别算法

An Object Recognition Algorithm Based on Multi-modal Feature Fusion and Dynamic Attention Mechanism

摘要

摘要: 针对单模态目标识别易受气候、干扰等影响的问题，文中提出了一种基于多模态特征融合与动态注意力机制的目标识别算法。该方法主要分为三个步骤：首先对多模态图像进行预处理，对地面图像的可见光、红外、合成孔径雷达图像依据模态特性分别做扩增处理，以提高模型的泛化能力和鲁棒性；然后分别对图像添加分类标签，使用设计不同的深度学习网络训练不同模态的数据，将所有模态预训练的网络去掉分类头；最后将每个分类网络与特征融合模块相接，重新训练提高目标的分类精度。文中所提方法使用多模态图像特征融合进行目标识别实现信息互补，以达到对空间目标的高效率识别，实验表明文中所提算法在红外—光学—合成孔径雷达数据集上取得了96.74 % 的高识别率。

Abstract: Aiming at the problem that single-modal object recognition is easily affected by climate and interference, an object recognition algorithm based on multi-modal image feature fusion is proposed. The proposed method is divided into the following four primary steps. In the first step, multi-modal images are preprocessed, during which the visible light, infrared, and SAR images of the ground scene are augmented respectively based on their modality-specific characteristics to enhance the model′s generalization ability and robustness. Next, corresponding classification labels are assigned to each image. Subsequently, different deep learning networks are designed to train data with different modalities separately, and the classification heads of all pre-trained networks are then removed. Finally, each classification network is connected to the feature fusion module for a retraining, so as to improve the classification accuracy of the object classification. The method proposed in this paper accomplishes information complementation by employing multi-modal image feature fusion for object recognition, thereby achieving efficient identification of space objects. Experimental results demonstrate that the proposed algorithm achieves a high recognition accuracy of 96.74 % on the WHU-OPT-SAR dataset.

HTML全文

参考文献(13)

施引文献

资源附件(0)