Edit Distance in Java



In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.
Let dp[i][j] stands for the edit distance between two strings with length i and j, i.e., word1[0,...,i-1] and word2[0,...,j-1].
There is a relation between dp[i][j] and dp[i-1][j-1]. Let's say we transform from one string to another. 
  1. if x == y, then dp[i][j] == dp[i-1][j-1]
  2. if x != y, and we insert y for word1, then dp[i][j] = dp[i][j-1] + 1
  3. if x != y, and we delete x for word1, then dp[i][j] = dp[i-1][j] + 1
  4. if x != y, and we replace x with y for word1, then dp[i][j] = dp[i-1][j-1] + 1
  5. When x!=y, dp[i][j] is the min of the three situations.
Initial condition:
dp[i][0] = i, dp[0][j] = j
public static int minDistance(String word1, String word2) {
 int len1 = word1.length();
 int len2 = word2.length();
 
 // len1+1, len2+1, because finally return dp[len1][len2]
 int[][] dp = new int[len1 + 1][len2 + 1];
 
 for (int i = 0; i <= len1; i++) {
  dp[i][0] = i;
 }
 
 for (int j = 0; j <= len2; j++) {
  dp[0][j] = j;
 }
 
 //iterate though, and check last char
 for (int i = 0; i < len1; i++) {
  char c1 = word1.charAt(i);
  for (int j = 0; j < len2; j++) {
   char c2 = word2.charAt(j);
 
   //if last two chars equal
   if (c1 == c2) {
    //update dp value for +1 length
    dp[i + 1][j + 1] = dp[i][j];
   } else {
    int replace = dp[i][j] + 1;
    int insert = dp[i][j + 1] + 1;
    int delete = dp[i + 1][j] + 1;
 
    int min = replace > insert ? insert : replace;
    min = delete > min ? min : delete;
    dp[i + 1][j + 1] = min;
   }
  }
 }
 
 return dp[len1][len2];
}
The time complexity of dynamic programming method is O(mn) as we need to construct the table fully. The space complexity is also O(mn). If we need only the cost of edit, we just need O(min(m, n)) space as it is required only to keep track of the current row and previous row.
Usually the costs D, I and R are not same. In such case the problem can be represented as an acyclic directed graph (DAG) with weights on each edge, and finding shortest path gives edit distance.
Read full article from Edit Distance in Java

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts