登录    注册    忘记密码

详细信息

Rwin-FPN plus plus : Rwin Transformer with Feature Pyramid Network for Dense Scene Text Spotting  ( SCI-EXPANDED收录)  

文献类型:期刊文献

英文题名:Rwin-FPN plus plus : Rwin Transformer with Feature Pyramid Network for Dense Scene Text Spotting

作者:Zeng, Chengbin Liu, Yi Song, Chunli

第一作者:Zeng, Chengbin;曾成斌

通信作者:Zeng, CB[1];Zeng, CB[2]

机构:[1]Guizhou Inst Technol, Sch Big Data, 1 Caiguan Rd, Guiyang 550003, Peoples R China;[2]Key Lab Elect Power Big Data Guizhou Prov, Guiyang 550003, Peoples R China

第一机构:贵州理工学院

通信机构:corresponding author), Guizhou Inst Technol, Sch Big Data, 1 Caiguan Rd, Guiyang 550003, Peoples R China;corresponding author), Key Lab Elect Power Big Data Guizhou Prov, Guiyang 550003, Peoples R China.|贵州理工学院;

年份:2022

卷号:12

期号:17

外文期刊名:APPLIED SCIENCES-BASEL

收录:;WOS:【SCI-EXPANDED(收录号:WOS:000850961100001)】;

基金:This research was funded by the National Natural Science Foundation of China (Grant No. 61966006), the Guizhou Provincial Science and Technology Projects (Grant No. [2020]1Y281), and the Science Research Foundation for High-level Talents of Guizhou Institute of Technology (Grant No. XJGC20150108).

语种:英文

外文关键词:dense scene text spotting; segmentation; transformer; Rwin-FPN plus plus

摘要:Featured Application Typical applications include industrial automatic inspection, smart vehicles, text retrieval, and advanced human computer interfaces. Scene text spotting has made tremendous progress with the in-depth research on deep convolutional neural networks (DCNN). Previous approaches mainly focus on the spotting of arbitrary-shaped scene text, on which it is difficult to achieve satisfactory results on dense scene text containing various instances of bending, occlusion, and lighting. To address this problem, we propose an approach called Rwin-FPN++, which incorporates the long-range dependency merit of the Rwin Transformer into the feature pyramid network (FPN) to effectively enhance the functionality and generalization of FPN. Specifically, we first propose the rotated windows-based Transformer (Rwin) to enhance the rotation-invariant performance of self-attention. Then, we attach the Rwin Transformer to each level on our feature pyramids to extract global self-attention contexts for each feature map produced by the FPN. Thirdly, we fuse these feature pyramids by upsampling to predict the score matrix and keypoints matrix of the text regions. Fourthly, a simple post-processing process is adopted to precisely merge the pixels in the score matrix and keypoints matrix and obtain the final segmentation results. Finally, we use the recurrent neural network to recognize each segmentation region and thus achieve the final spotting results. To evaluate the performance of our Rwin-FPN++ network, we construct a dense scene text dataset with various shapes and occlusion from the wiring of the terminal block of the substation panel cabinet. We train our Rwin-FPN++ network on public datasets and then evaluate the performance on our dense scene text dataset. Experiments demonstrate that our Rwin-FPN++ network can achieve an F-measure of 79% and outperform all other methods in F-measure by at least 2.8%. This is because our proposed method has better rotation invariance and long-range dependency merit.

参考文献:

正在载入数据...

版权所有©贵州理工学院 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心