All About Programming: What are the best practices for combining analyzers in Lucene?

What are the best practices for combining analyzers in Lucene? - Stack Overflow
Lucene provides the org.apache.lucene.analysis.Analyzer base class which can be used if you want to write your own Analyzer.
You can check out org.apache.lucene.analysis.standard.StandardAnalyzer class that extends Analyzer.
Then, in YourAnalyzer, you'll chain StandardAnalyzer and SnowballAnalyzer by using the filters those analyzers use, like this:

TokenStream result = new StandardFilter(tokenStream);  result = new SnowballFilter(result, stopSet);

Then, in your existing code, you'll be able to construct IndexWriter with your own Analyzer implementation that chains Standard and Snowball filters.
Totally off-topic:
I suppose you'll eventually need to setup your custom way of handling requests. That is already implemented inside Solr.
First write your own Search Component by extending SearchComponent and defining it in SolrConfig.xml, like this:

The SnowballAnalyzer provided by Lucene already uses the StandardTokenizer, StandardFilter, LowerCaseFilter, StopFilter, and SnowballFilter. So it sounds like it does exactly what you want (everything StandardAnalyzer does, plus the snowball stemming).

If it didn't, you could build your own analyzer pretty easily by combining whatever tokenizers and TokenStreams you wish.

Read full article from What are the best practices for combining analyzers in Lucene? - Stack Overflow

What are the best practices for combining analyzers in Lucene? - Stack Overflow

No comments:

Post a Comment

Labels

Popular Posts