详细信息
Rwin-FPN plus plus : Rwin Transformer with Feature Pyramid Network for Dense Scene Text Spotting ( SCI-EXPANDED收录) 被引量:2
文献类型:期刊文献
英文题名:Rwin-FPN plus plus : Rwin Transformer with Feature Pyramid Network for Dense Scene Text Spotting
作者:Zeng, Chengbin Liu, Yi Song, Chunli
第一作者:Zeng, Chengbin;曾成斌
通信作者:Zeng, CB[1];Zeng, CB[2]
机构:[1]Guizhou Inst Technol, Sch Big Data, 1 Caiguan Rd, Guiyang 550003, Peoples R China;[2]Key Lab Elect Power Big Data Guizhou Prov, Guiyang 550003, Peoples R China
第一机构:贵州理工学院
通信机构:corresponding author), Guizhou Inst Technol, Sch Big Data, 1 Caiguan Rd, Guiyang 550003, Peoples R China;corresponding author), Key Lab Elect Power Big Data Guizhou Prov, Guiyang 550003, Peoples R China.|贵州理工学院;
年份:2022
卷号:12
期号:17
外文期刊名:APPLIED SCIENCES-BASEL
收录:;WOS:【SCI-EXPANDED(收录号:WOS:000850961100001)】;
基金:This research was funded by the National Natural Science Foundation of China (Grant No. 61966006), the Guizhou Provincial Science and Technology Projects (Grant No. [2020]1Y281), and the Science Research Foundation for High-level Talents of Guizhou Institute of Technology (Grant No. XJGC20150108).
语种:英文
外文关键词:dense scene text spotting; segmentation; transformer; Rwin-FPN plus plus
摘要:Featured Application Typical applications include industrial automatic inspection, smart vehicles, text retrieval, and advanced human computer interfaces. Scene text spotting has made tremendous progress with the in-depth research on deep convolutional neural networks (DCNN). Previous approaches mainly focus on the spotting of arbitrary-shaped scene text, on which it is difficult to achieve satisfactory results on dense scene text containing various instances of bending, occlusion, and lighting. To address this problem, we propose an approach called Rwin-FPN++, which incorporates the long-range dependency merit of the Rwin Transformer into the feature pyramid network (FPN) to effectively enhance the functionality and generalization of FPN. Specifically, we first propose the rotated windows-based Transformer (Rwin) to enhance the rotation-invariant performance of self-attention. Then, we attach the Rwin Transformer to each level on our feature pyramids to extract global self-attention contexts for each feature map produced by the FPN. Thirdly, we fuse these feature pyramids by upsampling to predict the score matrix and keypoints matrix of the text regions. Fourthly, a simple post-processing process is adopted to precisely merge the pixels in the score matrix and keypoints matrix and obtain the final segmentation results. Finally, we use the recurrent neural network to recognize each segmentation region and thus achieve the final spotting results. To evaluate the performance of our Rwin-FPN++ network, we construct a dense scene text dataset with various shapes and occlusion from the wiring of the terminal block of the substation panel cabinet. We train our Rwin-FPN++ network on public datasets and then evaluate the performance on our dense scene text dataset. Experiments demonstrate that our Rwin-FPN++ network can achieve an F-measure of 79% and outperform all other methods in F-measure by at least 2.8%. This is because our proposed method has better rotation invariance and long-range dependency merit.
参考文献:
正在载入数据...