Using Feature Selection Methods in Text Classification | Datumbox
Update: The Datumbox Machine Learning Framework is now open-source and free to download . Check out the package com.datumbox.framework.machinelearning.featureselection to see the implementation of Chi-square and Mutual Information Feature Selection methods in Java. The main advantages for using feature selection algorithms are the facts that it reduces the dimension of our data, it makes the training faster and it can improve accuracy by removing noisy features. As a consequence feature selection can help us to avoid overfitting. The basic selection algorithm for selecting the k best features is presented below ( Manning et al, 2008 ): On the next sections we present two different feature selection algorithms: the Mutual Information and the Chi Square. Mutual Information One of the most common feature selection methods is the Mutual Information of term t in class c ( Manning et al, 2008 ).Read full article from Using Feature Selection Methods in Text Classification | Datumbox
No comments:
Post a Comment