Deductions has a problem: it punishes users who enter the wrong rule names. It doesn’t make nice. In the current release, if you enter a valid rule name, such as “Conj”, everything is happy. But if you enter an invalid rule name, such as “Conj” or “Conh”, the rule simply disappears, and you have to type it all over again if you want to fix it.
This goes against what might be called the principle of pleasant surprises. In a nutshell, this principle means that a program should try to adapt to the user. In this case, Deductions should try to figure out what rule the user was trying to enter when she types “conh”. What we need is fuzzy string matching.
There are a plethora of fuzzy string matching algorithms. The basic idea is that you take an input string, a list of possible target strings, and rank the possible target strings according to the “distance” to the input string. The target with the least distance is (hopefully) the intended string.
One of the simplest fuzzy string matching algorithms is one in which you simply calculate the number of letters that are different between the input string and a given target string. For example, suppose the input string is “hello” and the target string is “hullo”. Since the “e” in the input string is a “u” in the target string, the distance between these strings is 1. Given target string “wullo”, the distance between the strings is 2.
This algorithm just illustrates the idea; it is way too simple for most real-world applications. For example, an extraneous letter at the beginning, for example, could throw it off completely, making the distance between “hello” and “whello” 6. A more realistic algorithm is one that I settled on for Deductions, called the Damerau-Levenshtein algorithm.
The Damerau-Levenshtein algorithm calculates the distance between two strings as a function of: (i) insertion of a single character, (ii) deletion of a single character, (iii) substitution of a single character, or (iv) transposition of a pair of characters. With the example of “hello” and “whello” above, the distance would be 1, because all that is required is a deletion of the “w” in the second string. Given “hello” and “hlel”, the distance is 2, because you need to transpose “le” and then add “o”.
I have been very pleased with my initial tests of the Damerau-Levenshtein algorithm in Deductions. Now, instead of punishing users for entering an invalid rule name, Deductions guesses what rule they meant. Deductions 1.2 will make nice with users who enter the wrong rule names.
The following is some code you can use to implement the Damerau-Levenshtein algorithm in your own application. It is based on Rick Bourner’s implementation of the Levenshtein algorithm (Levenshtein is largely similar to Damerau-Levenshtein, except that it lacks support for transposition), and some pseudo-code for the Damerau extension:
-(float)compareString:(NSString *)originalString withString:(NSString *)comparisonString { // Normalize strings [originalString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]; [comparisonString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];originalString = [originalString lowercaseString]; comparisonString = [comparisonString lowercaseString];// Step 1 (Steps follow description at http://www.merriampark.com/ld.htm) NSInteger k, i, j, cost, * d, distance;NSInteger n = [originalString length]; NSInteger m = [comparisonString length];if( n++ != 0 && m++ != 0 ) {d = malloc( sizeof(NSInteger) * m * n );// Step 2 for( k = 0; k < n; k++) d[k] = k;for( k = 0; k < m; k++) d[ k * n ] = k;// Step 3 and 4 for( i = 1; i < n; i++ ) for( j = 1; j < m; j++ ) {// Step 5 if( [originalString characterAtIndex: i-1] == [comparisonString characterAtIndex: j-1] ) cost = 0; else cost = 1;// Step 6 d[ j * n + i ] = [self smallestOf: d [ (j - 1) * n + i ] + 1 andOf: d[ j * n + i - 1 ] + 1 andOf: d[ (j - 1) * n + i - 1 ] + cost ];// This conditional adds Damerau transposition to Levenshtein distance if( i>1 && j>1 && [originalString characterAtIndex: i-1] == [comparisonString characterAtIndex: j-2] && [originalString characterAtIndex: i-2] == [comparisonString characterAtIndex: j-1] ) { d[ j * n + i] = [self smallestOf: d[ j * n + i ] andOf: d[ (j - 2) * n + i - 2 ] + cost ]; } }distance = d[ n * m - 1 ];free( d );return distance; } return 0.0; }// Return the minimum of a, b and c - used by compareString:withString: -(NSInteger)smallestOf:(NSInteger)a andOf:(NSInteger)b andOf:(NSInteger)c { NSInteger min = a; if ( b < min ) min = b;if( c < min ) min = c;return min; }-(NSInteger)smallestOf:(NSInteger)a andOf:(NSInteger)b { NSInteger min=a; if (b < min) min=b;return min; }
With this code, you’ll be able to calculate the distance between any two strings. Given an input string and an array of target strings, you can run through the array and decide which target the input is most like.
Shouldn’t there be a library for this?
— Eimantas · Dec 17, 08:52 PM · #
Why not just use the MIN function instead of recreating it yourself?
— Dave · Dec 17, 09:09 PM · #