Overview
To be able to search the text efficiently and effectively, Solr (mostly Lucene actually) splits the text into tokens during indexing as well as during query (search). Those tokens can also be pre- and post-filtered for additional flexibility. This allows for things like case-insensitive search, misspelt product names, synonyms, and so on.
To achieve all this flexibility, Solr comes quite a variety of methods to manipulate the text. Understanding what filters and tokenizers are available and what they actually do is a major stumbling block for new Solr users. This page provides a comprehensive overview of all the classes that can be used in Solr, together with the link to their Javadoc pages.
Most of the analyzers, tokenizers and filters are located in lucene-analyzers-common-4.9.0.jar ( example/solr-webapp/webapp/WEB-INF/lib/ ), so any entry without a location indicated can be found in that jar.
Read full article from Solr 4.9 Analyzers, Tokenizers and Filters | Solr Start
No comments:
Post a Comment