IJCATR Volume 14 Issue 8

End-to-End Chinese Speech Recognition Based on CNN and CTC

Chao Tang, Zehua Lv, Ximing Yuan
10.7753/IJCATR1408.1003
keywords : Speech Recognition; End-to-End; Convolutional Neural Networks; Gated Linear Units; Connectionist Temporal Classification;

PDF
Traditional acoustic models have limitations such as complex components, difficulty in joint training, and the need for data pre-alignment. To this end, this paper proposes an end-to-end Chinese speech recognition model that combines a 1D gated convolutional neural network with Connectionist Temporal Classification(CTC). the core of the model consists of stacked multi-layer 1D convolutional networks to extract contextual high-level features, gated linear units (GLU) to suppress gradient dispersion, and CTC to achieve end-to-end training and decoding of Chinese characters at the character level. Experiments demonstrate that the model significantly improves the performance over the baseline model on the public dataset, with a CER reduction above of 2.5%.
@artical{c1482025ijcatr14081003,
Title = "End-to-End Chinese Speech Recognition Based on CNN and CTC ",
Journal ="International Journal of Computer Applications Technology and Research (IJCATR)",
Volume = "14",
Issue ="8",
Pages ="33 - 37",
Year = "2025",
Authors ="Chao Tang, Zehua Lv, Ximing Yuan"}