Levenshtein distance (Java)

From LiteratePrograms
Jump to: navigation, search

[edit] Overview

Simple (no weights, no recorded edit transcript) DP-based computation of the edit distance (also known as Levenshtein distance) of two given strings S1 and S2 of lengths n and m with a time complexity of O(nm), the time to fill the DP table. See Gusfield, p. 215 for details and extensions.

[edit] Implementation

We compute the edit distance of two strings using the standard DP approach: we find D(n,m) by computing all combinations of D(i,j) for i and j smaller than n,m. Initialize the table of height s1.length+1 and width s2.length+1:

<<init>>=
int[][] dp = new int[s1.length() + 1][s2.length() + 1];

We fill each row, from left to right:

<<nested_loop>>=
for (int i = 0; i < dp.length; i++) {
	for (int j = 0; j < dp[i].length; j++) {

Fill the table with the initial values, based on the base conditions: D(i,0)=i and D(0,j)=j, resulting in row 0 filled with values from 0 to n-1 and column 0 with values 0 to m-1:

<<base_conditions>>=
dp[i][j] = i == 0 ? j : j == 0 ? i : 0;

For the other values, perform the bottom-up simulation of the recurrence relation, the actual DP computation:

<<recurrence>>=
if (i > 0 && j > 0) {

The best case: match, so we can take the diagonal value without further cost (we compare the strings at index-1 since the table is one position larger than the strings in height and width).

<<match>>=
if (s1.charAt(i - 1) == s2.charAt(j - 1))
	dp[i][j] = dp[i - 1][j - 1];

Else, we pick set the new optimal sub-solution to MIN(DP[i][j-1]+1,DP[i-1][j-1]+1,DP[i-1][j]+1), i.e. to the minimum result of the replace (diagonal), insert-into-s1/delete-from-s2 (horizontal), and delete-from-s1/insert-into-s2 (vertical) operations.

<<mismatch>>=
else
	dp[i][j] = Math.min(dp[i][j - 1] + 1, Math.min(
			dp[i - 1][j - 1] + 1, dp[i - 1][j] + 1));

The result is stored in the bottom right cell of the DP table: the edit distance for pair n,m: D(n,m).

<<result>>=
return dp[s1.length()][s2.length()];

Compute the edit distance for some samples, including empty strings:

<<usage>>=
EditDistance distance = new EditDistance();
System.out.println(distance.compute("vintner", "writers"));
System.out.println(distance.compute("vintners", "writers"));
System.out.println(distance.compute("vintners", ""));
System.out.println(distance.compute("", ""));

The complete program:

<<EditDistance.java>>=
public class EditDistance {
	public int compute(String s1, String s2) {
		init
		nested_loop
				base_conditions
				recurrence
					match
					mismatch
				}
			}
		}
		result
	}
	public static void main(String[] args) {
		usage
	}
}

[edit] References

  • Gusfield, Dan (1999), Algorithms on Strings, Sequences and Trees. Cambridge: University Press.[[Category:Dynamic_programming]|Java]
Download code
hijacker
hijacker
hijacker
hijacker