IJCATR Volume 12 Issue 5

Next Word Prediction in Bodhi Language Using LSTM-based Approach

Ankush Kumar, Pushpendra Kumar Mishra, Tsering Namgail, Sandeep Kumar
10.7753/IJCATR1205.1005
keywords : NLP, Next word prediction, RNN, LSTM, machine learning, deep learning.

PDF
Bodhi language is one of the rare languages which is still spoken in the Leh neighborhood, Ladakh and many Tibetan regions. There is not much linguistic research done in this language. Even google translate does not work on this language. There are various types of other linguistic researches and model available on language like English and some other regional languages like Hindi, Bangla, Ukrainian etc. But there are almost negligible research and models available on Bodhi Language. In this paper, we proposed a Language Modelling Technique using Long Short Term Memory network (LSTM) which is based on Recurrent Neural Network (RNN), using this machine learning technique we have made a model to predict the next word in bodhi language, when the user will input anything, the model will predict the next word according to the previous word(s). This model is already made for the English language but we are making the model or basically programing the model to predict the next word in the Ladakhi language which is also called as Bodhi language. This language is more complex than English language. We have tried to make the model as accurate as possible while predicting the next word in Ladakhi language. To prepare the model we have collected dataset as a large collection of Bodhi words. In this model, we have trained the model in 500 iterations (Epochs). we used the TensorFlow, keras, dictionaries, pandas, NumPy packages. For the coding purpose we used the platform called Google Colab which is provided by google for machine learning enthusiasts.
@artical{a1252023ijcatr12051005,
Title = "Next Word Prediction in Bodhi Language Using LSTM-based Approach",
Journal ="International Journal of Computer Applications Technology and Research(IJCATR)",
Volume = "12",
Issue ="5",
Pages ="21 - 27",
Year = "2023",
Authors ="Ankush Kumar, Pushpendra Kumar Mishra, Tsering Namgail, Sandeep Kumar"}
  • The paper is first to work on a rare language like Bodhi which is spoken in Ladakh.
  • The paper uses a well-known approach in the area of next word prediction.
  • Thorough dataset analysis reveals human behavioural patterns and spoken tendency.
  • Our project can predict next word in Bodhi language with a fairly high accuracy.
  • Humans tend to type fewer if possible, our work help in reducing the human effort.