Abstract:
To address the insufficient spatiotemporal correlation and the incomplete feature representation caused by local action feature discretization in sparse pointcloud data generated by millimeterwave radar, this paper proposes a SpatioTemporal Fusion Graph Neural Network (STFGNN). First, a multilevel bidirectional long shortterm memory network (BiLSTM) is employed to extract temporal features—capturing both shortterm subtle variations and longterm motion trajectories—from consecutive pointcloud frames. Second, a graph neural network with a multiscale neighborhood aggregation strategy is used to derive the spatial geometric structure features of irregular point clouds. Finally, a bidirectional spatiotemporal crossattention fusion module is introduced to enhance interactive compensation between spatial and temporal features, thereby further strengthening the representation of human actions. The proposed method was evaluated on the MMAction and MMGesture datasets, achieving accuracies of 98.86% and 95.91%, respectively.