Lucene4.3开发之插曲之落寞繁华



Lucene4.3开发之插曲之落寞繁华
正则查询(RegexpQuery)跟通配符查询(WildcardQuery)的功能很相似,因为他们都可以完成一样的工作,但是不同的是正则查询支持更灵活定制细化查询,这一点与通配符的泛化是不一样的,而且正则查询天生支持使用强大的正则表达式的来准确匹配一个或几个term,需要注意的是,使用正则查询的字段最好是不分词的,因为分词的字段可能会导致边界问题,从而使查询失败,得不到任何结果,这一点和WildcardQuery效果是一样的。 

  1. RegexpQuery query=new RegexpQuery(new Term(field, ".*"+searchStr+".*"));  
1,如果是在不分词的字段里做模糊检索,优先使用正则查询的方式会比其他的模糊方式性能要快。2,在查询的时候,应该注意分词字段的边界问题。3,在使用OR或AND拼接条件查询时或一些特别复杂的匹配时,也应优先使用正则查询。4,大数据检索时,性能尤为重要,注意应避免使用前置模糊的方式,无论是正则查询还是通配符查询。
Please read full article from Lucene4.3开发之插曲之落寞繁华

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts