Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. Text categorization is a complex problem to solve, for solving it you need to provide a variable for each important word in your text. Maybe not stopwords or very common words, but at least you need to include any word that can help your classifier to identify the topic on your text.
Read full article from Text Categorization with K-Nearest Neighbors using Lucene | Raimon Bosch . blog
No comments:
Post a Comment