IJCATR Volume 3 Issue 9

An Evaluation of Feature Selection Methods for Positive-Unlabeled Learning in Text Classification

Azam Kaboutari Jamshid Bagherzadeh Fatemeh Kheradmand
10.7753/IJCATR0309.1013
keywords : feature selection; unsupervised feature selection; positive-unlabeled learning; PU learning; document classification

PDF
Feature Selection is important in the processing of data in domains such as text because such data can be of very high dimension. Because in positive-unlabeled (PU) learning problems, there are no labeled negative data for training, we need unsupervised feature selection methods that do not use the class information in the training documents when selecting features for the classifier. There are few feature selection methods that are available for use in document classification with PU learning. In this paper we evaluate four unsupervised methods including, collection frequency (CF), document frequency (DF), collection frequency-inverse document frequency (CF-IDF) and term frequency-document frequency (TF-DF). We found DF most effective in our experiments.
@artical{a392014ijcatr03091013,
Title = "An Evaluation of Feature Selection Methods for Positive-Unlabeled Learning in Text Classification",
Journal ="International Journal of Computer Applications Technology and Research(IJCATR)",
Volume = "3",
Issue ="9",
Pages ="595 - 599",
Year = "2014",
Authors ="Azam Kaboutari Jamshid Bagherzadeh Fatemeh Kheradmand"}
  • null