Many computer applications rely on finding close matches to partial or misspelled words. For example, most spellcheckers not only tell you when a word is misspelled, but also recommend possible words. Text messaging applications will recommend words based on partial matches with the characters typed in so far. One way to identify close matches is to utilize edit distance.
A word or phrase can be transformed using one of three operations:
If we associate a cost with each of these operations, it becomes possible to calculate the minimal cost by which one phrase can be transformed into another. For example, consider transforming "abcd" into "acxy". If all three operations have the same cost of 1, then the minimal cost is 3 (delete 'b', substitute 'x' for 'd', and insert 'y').
The following pseudocode describes a divide & conquer solution to calculating the Minimal Edit Distance (MED) between text1 and text2:Define an EditDistance
class that contains a static method named MED
. This method should implement the above divide & conquer solution using top-down recursion. Your class should have constants for SUBST_COST, INSERT_COST, and DELETE_COST, which are all assigned 1 (but can be easily changed). You should also have a driver class which allows you to repeatedly enter lines of text and which displays the Minimal Edit Distance between those lines. For example:
Test your code on a variety of phrases and answer the following questions:
The potential drawback of a top-down implementation of divide & conquer is redundancy. If there is significant overlap between the recursive cases, the redundancy can grow exponentially, leading to drastic decreases in performance as the problem size grows.
MEDdynamic
which utilizes the same recursive relationships but in a bottom-up fashion. Note that each recursive call deals with the phrases (text1 and text2) or successive suffixes of those phrases. Thus, you should be able to create a table where the rows correspond to suffixes of text1 and the columns correspond to suffixes of text2. That is, table[r][c] would store the Minimal Edit Distance between the last r characters of text1 and the last c characters of text2. When the table is filled, the Minimal Edit Distance between text1 and text2 should be at table[text1.length()][text2.length()].