IJCATR Volume 8 Issue 3

A K-Means Based Multi-level Text Clustering Algorithm for Retrieval of Research Information

Damaris Ndinda Waema, Petronilla Muriithi, George Okeyo
10.7753/IJCATR0803.1003
keywords : Text Clustering, Multi-level, Research Metadata, Information Retrieval, SQL Data Clustering

PDF
Academic researchers in institutions of higher learning and research institutes use research outputs and metadata throughout their research work and to help in identifying research collaborators as well as getting to know existing research. Research outputs range from academic theses, journal and conference articles, books and book chapters, and datasets while research meta-data includes authors, affiliations, research areas, and projects, among others. However, access and retrieval of relevant research outputs and meta-data remains a major challenge. As a result there is duplication of research, fewer opportunities for networking, and difficulty in detecting scientific fraud. Efforts need to be made to make academic research outputs and meta-data readily available and easy to retrieve. The main purpose of this work is to develop a tailor-made approach to information retrieval for the retrieval of research information and related meta-data. Therefore, the paper presents a multi-level text clustering algorithm for retrieval of scholarly research outputs and metadata from a central repository through a web based interface. The algorithm first clusters SQL data records that represents meta-data at the first level, then retrieves and clusters text documents representing research outputs at the second level. The algorithm was tested on retrieving information in the areas of text clustering, cloud computing, banking, HIV/AIDS, food security and cancer. The results show that it enables researchers to retrieve relevant information according to their information needs. To enable further enhancements and improvements, the algorithm will be released to the public domain for use in similar application domains or extension by other researchers.
@artical{d832019ijcatr08031003,
Title = "A K-Means Based Multi-level Text Clustering Algorithm for Retrieval of Research Information",
Journal ="International Journal of Computer Applications Technology and Research(IJCATR)",
Volume = "8",
Issue ="3",
Pages ="66 - 81",
Year = "2019",
Authors ="Damaris Ndinda Waema, Petronilla Muriithi, George Okeyo"}
  • The algorithm clusters both SQL data from a relational database and text documents.
  • It also performs matching and ranking, which are important operations in information retrieval
  • We also construct an information retrieval model for retrieval of relevant research information.
  • Evaluation of the algorithm’s effectiveness in the retrieval of research data yields promising results.