基于Lucene shingle英文单词NGram Analyzer的实现 - 天下任我行 - 博客频道 - CSDN.NET
此例为基于Lucene shingle英文单词BiGram Analyzer的实现
import java.io.Reader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.LowerCaseFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.WhitespaceTokenizer; import org.apache.lucene.analysis.shingle.ShingleFilter; import org.apache.lucene.util.Version; public final class BiGramAnalyzer extends Analyzer { @Override public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new WhitespaceTokenizer(Version.LUCENE_36, reader); result = new LowerCaseFilter(Version.LUCENE_36, result); ShingleFilter shingleFilter = new ShingleFilter(result, 2); shingleFilter.setOutputUnigrams(false); result = shingleFilter; return result; } }Read full article from 基于Lucene shingle英文单词NGram Analyzer的实现 - 天下任我行 - 博客频道 - CSDN.NET
No comments:
Post a Comment