时变环境下基于最大期望加权估计的干扰决策方法

王　军; 叶立诚; 刘　帅; 韩冬梅

时变环境下基于最大期望加权估计的干扰决策方法

A Novel Jamming Bandits Based on Maximum Expected Value Weighting Method in Time-varying Environment

摘要

摘要: 认知雷达对抗技术可使干扰系统具有自主学习能力来实现智能干扰决策。现有基于强化学习理论的干扰决策方法难以在实时性要求高、对抗时间受限、雷达策略快变的雷达对抗环境中获得高期望收益。文中基于多臂匪徒决策理论提出了一种时变环境下基于最大期望加权估计的在线干扰决策方法,通过最大期望加权方法提高了对收益最大臂估计正确率,通过学习时间漂移方法使得干扰决策具有对雷达时变环境的适应性。典型时变环境设置的数值仿真表明,该方法具有在时变环境中更高的决策收益和环境时变适应能力。

Abstract: Cognitive radar countermeasure technology can be exploited by jamming system to make intelligent decision without prior knowledge. Employing existing jamming strategy based on reinforcement learning theory, desirable benefit cannot be obtained in the radar countermeasures environment where real-time response is required, jamming time is limited and radar strategy changes rapidly. Based on multi-armed bandit(MAB) theory, an online intelligent jamming strategy is proposed in this paper using the maximum expected value weighted(MEVW) estimation method and learning-window shifting (LWS) approach, where MEVW can improve the estimation accuracy about maximal benefit arm, and LWS allow jamming to adapt to time-varying environment. Numerical experiments in typical time-varying environments show that the proposed has higher decision benefits and better adaptability than traditional methods.

HTML全文

参考文献(0)

施引文献

资源附件(0)