Multilingual text recognition is crucial for cross language information acquisition and related applications in the mobile computing era. The core problem is to find efficient representation and decoding methods for multilingual text recognition, including scene text recognition or handwriting recognition tasks. This book introduces primitive representation learning, which is a new deep learning framework for sequence modeling in contrast to CNN RNN CTC (convolutional neural network recurrent neural network connectionist temporal classification) or attention based encoder decoder approaches.…mehr
Multilingual text recognition is crucial for cross language information acquisition and related applications in the mobile computing era. The core problem is to find efficient representation and decoding methods for multilingual text recognition, including scene text recognition or handwriting recognition tasks. This book introduces primitive representation learning, which is a new deep learning framework for sequence modeling in contrast to CNN RNN CTC (convolutional neural network recurrent neural network connectionist temporal classification) or attention based encoder decoder approaches. Primitive representations are learned via global feature aggregation and then transformed into high level visual text representations via a graph convolutional network, which enables parallel decoding for text transcription. Multielement attention mechanism and temporal residual mechanism are further introduced to enhance the utilization of spatial and temporal feature information.
The methods presented in this book have been evaluated on public datasets and applied to scene text recognition and handwriting recognition systems. Readers will gain a better understanding of state of the art methods and research findings in multilingual scene text recognition, handwriting recognition, and related fields. The prerequisites needed to understand this book include basic knowledge for machine learning and deep learning.
Die Herstellerinformationen sind derzeit nicht verfügbar.
Autorenporträt
Liangrui Peng is currently an associate professor at the Department of Electronic Engineering, Tsinghua University, Beijing, China. She received her Ph.D. degree in Information and Communication Engineering from Tsinghua University in 2010. Her research interests include multilingual text recognition and understanding, computer vision and machine learning. She has received the National Awards for Science and Technology Progress (Second Class) in China three times. Her recent research work with graduate students has advanced multilingual text recognition, receiving multiple awards including the DAS 2016 Best Paper Award, the ICDAR 2019 Best Student Paper Runner Up Award, and the DRR 2015 Best Student Paper Award.
Ruijie Yan received his B.Sc. degree in 2017 and Ph.D. degree in 2022 from the Department of Electronic Engineering at Tsinghua University, Beijing, China. He is currently a senior applied scientist at Microsoft (China) Co. Ltd. His research interests include computer vision and machine learning. He has published several papers in venues such as CVPR, ECCV, etc., and won the ICPR 2020 and ICDAR 2017 Arabic Video Text Detection and Recognition competitions.
Inhaltsangabe
Chapter 1 Introduction.- Chapter 2 Primitive Representation Learning.- Chapter 3 Multielement Attention Mechanism.- Chapter 4 Dynamic Temporal Residual Learning and Attention Rectification.- Chapter 5 TH-DL Multilingual Text Recognition System Framework.