[转载]最长公共子序列―O(ND)比较算法和它的变化(翻译)_Bluewind_新浪博客



[转载]最长公共子序列―O(ND)比较算法和它的变化(翻译)_Bluewind_新浪博客

               在之前的最长公共子序列的文章中,我列出了动态规划算法。这是一个伟大的算法,它几乎能适用于很多求解最优解问题。然而,动态规划算法也有它弊端,我们需要足够的空间用来存储决策信息。比如,在求最长公共子序列的问题,虽然我们可以利用滚动数组来求得子序列的长度,但我们却无法逆推得到子序列(参见之前的文章)。所以,为了求得这个子序列(通过每个点保存上一次抉择的由来,最终通过回溯找到路径),我们依然需要开辟N*M的二维数组和使用O(N*M)的时间复杂度。

                于是,问题就产生了……它无法满足大文件之间的匹配,会超出内存……

O((M+N)D)的贪心算法

                这次介绍的O(ND)比较算法,我们可以参考:

http://code.google.com/p/google-diff-match-patch/

                它是从另一角度找出最长公共子序列-贪心,另一个求最优解的伟大算法。首先,关于O(ND)算法,我引用几个概念:

1.       Edit Graph如下图。主要保存两字符串的编辑信息,路径信息,公共子序列信息

2.       Edit Script通过适当的删除和插入字符,使得字符串完全匹配,下图的编辑边为1D,2D,3iB,6D,7IC,D=Delete, I=Insert

3.       Snake: 由连续匹配点形成的对角线,称为Snake

4.       The Shortest Trace我们需要找到最短的从 (0, 0) 到 (7, 6) 的路径,即Snake的个数和长度最长,同理Edit Scripts的数量最少。这样的一条路径一定满足花费最小的编辑,就能让两字符串匹配。而这条路径中的Snake便是最长公共子序列了。


Read full article from [转载]最长公共子序列―O(ND)比较算法和它的变化(翻译)_Bluewind_新浪博客


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts