solr.EdgeNGramFilterFactory
Creates org.apache.solr.analysis.EdgeNGramTokenFilter.
By default, create n-grams from the beginning edge of a input token.
With the configuration below the string value Nigerian gets broken down to the following terms
Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria", "nigeria", "nigerian"
By default, minGramSize is 1, maxGramSize is 1 and side is "front". You can also set side to "back" to generate the ngrams from right to left.
minGramSize - the minimum number of characters to start with. For example, minGramSize=4 would mean that a word like Apache => "Apac", "Apach", "Apache" would be the 3 tokens output.
This FilterFactory is very useful in matching prefix substrings (or suffix substrings if side="back") of particular terms in the index during query time. Edge n-gram analysis can be performed at either index or query time (or both), but typically it is more useful, as shown in this example, to generate the n-grams at index time with all of the n-grams indexed at the same position. At query time the query term can be matched directly without any n-gram analysis. Unlike wildcards, n-gram query terms can be used within quoted phrases.
<fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.LowerCaseTokenizerFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.LowerCaseTokenizerFactory"/> </analyzer> </fieldType>
Read full article from AnalyzersTokenizersTokenFilters - Solr Wiki
No comments:
Post a Comment