Abstract:To address the issues of limited receptive field and edge blurring in optical flow estimation, an optical flow estimation model based on multi-scale self-attention and local feature matching was proposed. This model was an improvement upon the recurrent all-pairs field transforms (RAFT) model.Firstly, a multi-scale self-attention mechanism was integrated into the feature extraction module, which learned the dependencies between long-distance pixels using multi-scale self-attention to obtain image feature information. Secondly, a local matching module was added during the upsampling process of low-level optical flow to generate high-resolution optical flow. Then, the model was trained on optical flow estimation datasets. Finally, ablation experiments and comparative experiments were conducted on the trained model. The results show that the proposed model achieves average end point error (AEPE) of 1.18 and 1.67 on the MPI Sintel Clean and MPI Sintel Final datasets, respectively, and 1.01 and 3.40% for average end point error and flow error of all (Fl-all) on the KITTI-2015 dataset, all outperforming RAFT. The proposed optical flow estimation model exhibits high accuracy in optical flow estimation, which can provide effective support for computer vision tasks relying on high-precision motion information.