IJCATR Volume 7 Issue 10

Text Mining in Digital Libraries using OKAPI BM25 Model

Gesare Asnath Tinega, Prof. Waweru Mwangi, Dr. Richard Rimiru
keywords : Online Public Access Catalogs, Relevance Ranking, Digital Libraries, Okapi BM25 Model, Text Mining, Information Retrieval Models

The emergence of the internet has made vast amounts of information available and easily accessible online. As a result, most libraries have digitized their content in order to remain relevant to their users and to keep pace with the advancement of the internet. However, these digital libraries have been criticized for using inefficient information retrieval models that do not perform relevance ranking to the retrieved results. This paper proposed the use of OKAPI BM25 model in text mining so as means of improving relevance ranking of digital libraries. Okapi BM25 model was selected because it is a probability-based relevance ranking algorithm. A case study research was conducted and the model design was based on information retrieval processes. The performance of Boolean, vector space, and Okapi BM25 models was compared for data retrieval. Relevant ranked documents were retrieved and displayed at the OPAC framework search page. The results revealed that Okapi BM 25 outperformed Boolean model and Vector Space model. Therefore, this paper proposes the use of Okapi BM25 model to reward terms according to their relative frequencies in a document so as to improve the performance of text mining in digital libraries.
Title = "Text Mining in Digital Libraries using OKAPI BM25 Model",
Journal ="International Journal of Computer Applications Technology and Research(IJCATR)",
Volume = "7",
Issue ="10",
Pages ="386 - 406",
Year = "2018",
Authors ="Gesare Asnath Tinega, Prof. Waweru Mwangi, Dr. Richard Rimiru"}