Lucene中的FST算法描述



Lucene中的FST算法描述
FSTFinite State Transducer,一种有限自动机,或者称为Mealy machine)是lucene中的一个核心算法,用于检索term信息存储的位置。在lucene中,term按照其字典顺序排序(term在存储时称为input),term相关的信息按照term排序的次序存储在磁盘上(其存储的位置为outPut),(input/output)二元组将以FST的形式存储在内存中(inputoutput都是有序的)。检索时,根据input,通过计算FST中的路径上的权值信息,获取到output数据,最终在磁盘上定位term的其它附加信息。同时FST还能够快速的判断一个term是否在lucene中。
实际上FST相当于term在内存中的一个索引,lucene使用FST能够快速确定系统中是否存在查询的term,如果存在,能够快速定位其信息存放的具体位置。
FSTtrie tree结构提供相似的功能,但是,在内存中存储更高效。

Please read full article from Lucene中的FST算法描述

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts