详细信息
基于非局部关注和多重特征融合的视频行人重识别
Video person re-identification based on non-local attention and multi-feature fusion
文献类型:期刊文献
中文题名:基于非局部关注和多重特征融合的视频行人重识别
英文题名:Video person re-identification based on non-local attention and multi-feature fusion
作者:刘紫燕 朱明成 袁磊 马珊珊 陈霖周廷
第一作者:刘紫燕
机构:[1]贵州大学大数据与信息工程学院,贵阳550025;[2]贵州理工学院航空航天工程学院,贵阳550003
第一机构:贵州大学大数据与信息工程学院,贵阳550025
年份:2021
卷号:41
期号:2
起止页码:530-536
中文期刊名:计算机应用
外文期刊名:journal of Computer Applications
收录:CSTPCD;;北大核心:【北大核心2020】;CSCD:【CSCD_E2021_2022】;
基金:贵州省科学技术基金资助项目(黔科合基础[2016]1054);贵州省联合资金资助项目(黔科合LH字[2017]7226号);贵州大学2017年度学术新苗培养及创新探索专项(黔科合平台人才[2017]5788);贵州省科技计划项目(黔科合基础[2017]1069);贵州省教育厅创新群体重大研究项目(黔教合KY字[2018]026);贵州省普通高等学校工程研究中心项目(黔教合KY字[2018]007);贵州省科技计划重点项目([2019]1416)。
语种:中文
中文关键词:视频行人重识别;时空信息;全局特征;非局部关注;特征融合
外文关键词:video person re-identification;spatiotemporal information;global feature;non-local attention;feature fusion
摘要:现有视频行人重识别方法无法有效地提取视频连续帧之间的时空信息,因此提出一种基于非局部关注和多重特征融合的行人重识别网络来提取全局与局部表征特征和时序信息。首先嵌入非局部关注模块来提取全局特征;然后通过提取网络的低中层特征和局部特征实现多重特征融合,从而获得行人的显著特征;最后将行人特征进行相似性度量并排序,计算出视频行人重识别的精度。在大数据集MARS和DukeMTMC-VideoReID上进行实现,结果显示所提出的模型较现有的多尺度三维卷积(M3D)和学习片段相似度聚合(LCSA)模型的性能均有明显提升,平均精度均值(mAP)分别达到了81.4%和93.4%,Rank-1分别达到了88.7%和95.3%;同时在小数据集PRID2011上,所提模型的Rank-1也达到94.8%。
Aiming at the fact that the existing video person re-identification methods cannot effectively extract the spatiotemporal information between consecutive frames of the video,a person re-identification network based on non-local attention and multi-feature fusion was proposed to extract global and local representation features and time series information.Firstly,the non-local attention module was embedded to extract global features.Then,the multi-feature fusion was realized by extracting the low-level and middle-level features as well as the local features,so as to obtain the salient features of the person.Finally,the similarity measurement and sorting were performed to the person features in order to calculate the accuracy of video person re-identification.The proposed model has significantly improved performance compared to the existing Multi-scale 3D Convolution(M3D)and Learned Clip Similarity Aggregation(LCSA)models with the mean Average Precision(mAP)reached 81.4%and 93.4%respectively and the Rank-1 reached 88.7%and 95.3%respectively on the large datasets MARS and DukeMTMC-VideoReID.At the same time,the proposed model has the Rank-1 reached 94.8%on the small dataset PRID2011.
参考文献:
正在载入数据...