经典算法系(3)-后缀数组&后缀树(Suffix Array & Suffix Tree)



经典算法系(3)-后缀数组&后缀树(Suffix Array & Suffix Tree)

3. 后缀数组(Suffix Array)

 后缀树和后缀数组都是处理字符串的有效工具,前者较为常见,但后者更容易编程实现,空间耗用更少;后缀数组可用于解决最长公共子串问题,多模式匹配问题,最长回文串问题,全文搜索等问题;

 后缀数组的基本元素:
 给定一个string,其长度为L,后缀指的是从string的某一个位置i(0<=i<L)开始到串末尾(string[L-1])的一个子串,表示为suffix(i);
 L个suffix(i)按照字典顺序排列并顺序存储在一个数组SA[L]中,则SA[L]称为后缀数组,其元素值存储的是suffix(i)的起始字符在string中的位置;
 每一个suffix[i]对应在SA[k]数组中的一个位置,将这个对应的位置存储为Rank[i],时间复杂度为O(N);对于任意两个suffix[i]和suffix[j],由于知晓其在Rank[L]中的前后位置,所以在O(1)的时间内就可以得出他们的字典序大小关系;
 构建SA[i]数组中相邻元素的最长公共前缀(LCP,Longest Common Prefix),Height[i]表示SA[i]和SA[i-1]的LCP(i, j);H[i]=Height[Rank[i]表示Suffix[i]和字典排序在它前一名的后缀子串的LCP大小;
对于正整数i和j而言,最长公共前缀的定义如下: LCP(i, j) =lcp(Suffix(SA[i]), Suffix(SA[j]))
                   =min(Height[k]|i+1<=k<=j);
也就是计算LCP(i, j)等同于查找Height数组中下表在i+1到j之间的元素最小值
下述例子中如果LCP(0, 3),则最小值为2,则"aadab"和"aabaaaab"的LCP为2

Read full article from 经典算法系(3)-后缀数组&后缀树(Suffix Array & Suffix Tree)


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts