基于Lucene shingle英文单词NGram Analyzer的实现 - 天下任我行 - 博客频道 - CSDN.NET



基于Lucene shingle英文单词NGram Analyzer的实现 - 天下任我行 - 博客频道 - CSDN.NET

此例为基于Lucene shingle英文单词BiGram Analyzer的实现

import java.io.Reader;  import org.apache.lucene.analysis.Analyzer;  import org.apache.lucene.analysis.LowerCaseFilter;  import org.apache.lucene.analysis.TokenStream;  import org.apache.lucene.analysis.WhitespaceTokenizer;  import org.apache.lucene.analysis.shingle.ShingleFilter;  import org.apache.lucene.util.Version;    public final class BiGramAnalyzer extends Analyzer {        @Override      public TokenStream tokenStream(String fieldName, Reader reader) {          TokenStream result = new WhitespaceTokenizer(Version.LUCENE_36, reader);          result = new LowerCaseFilter(Version.LUCENE_36, result);          ShingleFilter shingleFilter = new ShingleFilter(result, 2);          shingleFilter.setOutputUnigrams(false);          result = shingleFilter;            return result;      }    }

Read full article from 基于Lucene shingle英文单词NGram Analyzer的实现 - 天下任我行 - 博客频道 - CSDN.NET


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts