All About Programming: AnalyzersTokenizersTokenFilters

solr.EdgeNGramFilterFactory

Creates org.apache.solr.analysis.EdgeNGramTokenFilter.

By default, create n-grams from the beginning edge of a input token.

With the configuration below the string value Nigerian gets broken down to the following terms

Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria", "nigeria", "nigerian"

By default, minGramSize is 1, maxGramSize is 1 and side is "front". You can also set side to "back" to generate the ngrams from right to left.

minGramSize - the minimum number of characters to start with. For example, minGramSize=4 would mean that a word like Apache => "Apac", "Apach", "Apache" would be the 3 tokens output.

This FilterFactory is very useful in matching prefix substrings (or suffix substrings if side="back") of particular terms in the index during query time. Edge n-gram analysis can be performed at either index or query time (or both), but typically it is more useful, as shown in this example, to generate the n-grams at index time with all of the n-grams indexed at the same position. At query time the query term can be matched directly without any n-gram analysis. Unlike wildcards, n-gram query terms can be used within quoted phrases.

<fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">     <analyzer type="index">        <tokenizer class="solr.LowerCaseTokenizerFactory"/>        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>     </analyzer>     <analyzer type="query">        <tokenizer class="solr.LowerCaseTokenizerFactory"/>     </analyzer>  </fieldType>

Read full article from AnalyzersTokenizersTokenFilters - Solr Wiki

AnalyzersTokenizersTokenFilters - Solr Wiki

solr.EdgeNGramFilterFactory

No comments:

Post a Comment

Labels

Popular Posts