OCR (Optical Character Recognition) is a technology that converts text in an image into editable text. It automatically extracts and converts textual information by processing, analyzing, and recognizing images, and has wide applications in multiple fields This article proposes a deep learning model called CRNN (Convolutional Recurrent Neural Network) that combines the advantages of CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network). It also incorporates BiLSTM (Bidirectional Long Short Term Memory Network), SE Attention (Squeeze Excitation Attention Mechanism), and EMA (Exponential Moving Average). The system extracts image features through CRNN, processes sequence information through BiLSTM, and achieves end-to-end text recognition by combining CTC loss. The attention mechanism SEBlock is used to enhance feature selection ability, and the bidirectional LSTM combines layer normalization to improve long sequence modeling ability and optimize the EMA stable training process.
@artical{z1472025ijcatr14071003,
Title = "Integrating BiLSTM, SE Attention and EMA for OCR ",
Journal ="International Journal of Computer Applications Technology and Research (IJCATR)",
Volume = "14",
Issue ="7",
Pages ="28 - 31",
Year = "2025",
Authors ="Zehua Lv, Yan Chen, Chengyu Hou*"}