IJCATR Volume 7 Issue 8

Measure the Similarity of Complaint Document Using Cosine Similarity Based on Class-Based Indexing

Syahroni Wahyu Iriananda, Muhammad Aziz Muslim, Harry Soekotjo Dachlan
10.7753/IJCATR0708.1001
keywords : Complaints Document, Text Similarity, Class-Based Indexing, Cosine Similarity, K-Nearest Neighbour, LAPOR!

PDF
Report handling on "LAPOR!" (Laporan, Aspirasi dan Pengaduan Online Rakyat) system depending on the system administrator who manually reads every incoming report [3]. Read manually can lead to errors in handling complaints [4] if the data flow is huge and grows rapidly, it needs at least three days to prepare a confirmation and it sensitive to inconsistencies [3]. In this study, the authors propose a model that can measure the identities of the Query (Incoming) with Document (Archive). The authors employed Class-Based Indexing term weighting scheme, and Cosine Similarities to analyse document similarities. CoSimTFIDF, CoSimTFICF and CoSimTFIDFICF values used in classification as feature for K-Nearest Neighbour (K-NN) classifier. The optimum result evaluation is pre-processing employ 75% of training data ratio and 25% of test data with CoSimTFIDF feature. It deliver a high accuracy 84%. The k = 5 value obtain high accuracy 84.12%
@artical{s782018ijcatr07081001,
Title = "Measure the Similarity of Complaint Document Using Cosine Similarity Based on Class-Based Indexing",
Journal ="International Journal of Computer Applications Technology and Research(IJCATR)",
Volume = "7",
Issue ="8",
Pages ="292 - 369",
Year = "2018",
Authors ="Syahroni Wahyu Iriananda, Muhammad Aziz Muslim, Harry Soekotjo Dachlan"}
  • The paper propose a model to measure document similarity using class-based cosine similarity
  • Different Cosine Similarity are generate based on Class-Based Term Weighting
  • The value of measurement CoSimTFIDF, CoSimTFICF, CoSimTFIDFICF define as features
  • Confusion Matrix use to evaluate the performance of K-Nearest Neighbor classification.