基于FPGA的SAR图像目标检测加速器设计

    Design of SAR Image Target Detection Accelerator Based on FPGA

    • 摘要: 主流的基于中央处理器(CPU)和图形处理器(GPU)的合成孔径雷达(SAR)图像目标检测算法,存在模型大、计算复杂度高、并行度低和功耗高等缺点,不适合部署在卫星和无人机等资源有限的平台上。文中在综合考虑板卡资源、功耗、推理速度和精度的条件下,设计了一种基于现场可编程门阵列(FPGA)的SAR图像目标检测加速器。该加速器采用的网络模型为优化后的YOLOv4-tiny,模型通过16位定点数优化数据位宽并加入空洞卷积来替换标准卷积,从而缩减了网络模型及参数,以便于部署在资源受限的FPGA上;在FPGA卷积层的实现中,采用了多重循环展开并行和循环分块并行的方法来加速卷积运算。实验结果表明,优化的算法在FPGA上获得了15.24 GOPS的吞吐量,每张图像识别速度为256 ms,介于CPU与GPU之间,但是由于FPGA硬件功耗仅为3.06 W,所以所提算法的能效比分别达到了CPU和GPU的18.4倍和7.3倍。

       

      Abstract: The mainstream synthetic aperture radar (SAR) image target detection algorithms based on central processing unit (CPU) and graphics processing unit (GPU) have disadvantages such as large model size, high computational complexity, low parallelism, and high power consumption, and are not suitable for deployment on resource limited platforms such as satellites and unmanned aerial vehicles. A SAR image target detection accelerator based on field programmable gate array (FPGA) is designed in this paper, taking into account the board resources, power consumption, inference speed and accuracy. The network model adopted by the accelerator is an optimized YOLOv4-tiny architecture, which optimizes the data bit width with a 16-bit fixed-point quantization and adds dilated convolutions to replace standard convolutions, thereby reducing the network model and parameters for deployment on resource limited FPGA. In the implementation of FPGA convolutional layers, multiple loops unrolling and loop blocks parallelism methods are used to accelerate convolution operations. The experimental results show that the optimized algorithm achieved a throughput of 15.24 GOPS on FPGA, with a recognition speed of 256 ms per image, which is between CPU and GPU. However, due to the FPGA hardware power consumption of only 3.06 W, the energy efficiency ratios of the proposed algorithm reach 18.4 times and 7.3 times that of CPU and GPU respectively.

       

    /

    返回文章
    返回