关于Lucene Collector - jollyjumper的专栏 - 博客频道 - CSDN.NET



Lucene Collector用于收集搜索之后的文档,可以方便做filter等,主要接口有setScorer,collect,setNextReader,acceptsDocsOutOfOrder。

一般和scorer配合使用,acceptDocsOutOfOrder这个选项很重要,表明是否接受doc id乱序,返回true的话对于or操作,不需要从堆中选最小的将快很多,但对于分页时如果指定顺序和scorer界限,可以让分页不重复,这是个好处。

setNextReader一般设置docBase,collect是实际的收集操作。collector有几个sub class:

TopDocsCollector,TopScoreDocCollector,TopFieldCollector,TimeLimitingCollector,PositiveScoresOnlyCollector,CachingCollector,MultiCollector,其中重点关注TopScoreDocCollector,根据是否acceptOutOfOrder一起是否paging after,有InorderTopScoreDocCollector,InOrderPagingScoreDocCollector,OutOfOrderTopScoreDocCollector,OutOfOrderPagingScoreDocCollector。

吐槽一下:今天发现项目中根本没有scorer,而search时又又scorer的重复记分,search engine完全退化成布尔检索器,好像很奇葩。


Read full article from 关于Lucene Collector - jollyjumper的专栏 - 博客频道 - CSDN.NET


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts