In the Lucene/Solr Revolution session, "Text Classification with Lucene/Solr, Apache Hadoop and LibSVM," Majirus Fansi, SOA and Search Engine Developer at Valtech, will show you how to build a text classifier using Apache Lucene/Solr with libSVM libraries. They classify their corpus of job offers into a number of predefined categories. Each indexed document (a job offer) then belongs to zero, one or more categories. Known machine learning techniques for text classification include naïve bayes model, logistic regression, neural network, support vector machine (SVM), etc..
They use Lucene/Solr to construct the features vector. Then they use the libsvm library, known as the reference implementation of the SVM model, to classify the document. They construct as many one-vs-all svm classifiers as there are classes in their setting. Then using the Hadoop MapReduce Framework, they reconcile the result of the classifiers. The end result is a scalable multi-class classifier. Finally they outline how the classifier is used to enrich basic Solr keyword search.
Read full article from Road to Revolution: Text Classification with Lucene/Solr, Apache Hadoop and LibSVM - Lucidworks
No comments:
Post a Comment