TFC Day 2: Levenshtein distance

Sun 29 Jul 07 00:56 | Tags: Programming

Today shall be Day 2 of my Totally Free Code series. It's an implementation of the Levenshtein distance algorithm in ActionScript 3, the programming language of Adobe Flash.

A little backstory. I work for Precision Intermedia, a "Multimedia Marketing Angency." (Of course, I write the content of this blog as an individual and it is in no way representative of the opinions of my company of the people that work there and so on and so forth.) One of our bigger clients is Patrick Wellman, the creator of a character called Mr Duz and the author of a new series of childrens' books featuring that character. (Mr Duz always does the right thing.) As part of creating Mr Duz's site, we created a series of games for the site, and I was charged with developing the games. I actually knew squat all about programming in Flash when I started working there a couple months ago, but I've picked it up pretty frequently. (I really love ActionScript's syntax; in programming terms, it's based on ECMAScript, is object-oriented almost to a fault, and is wonderfully strongly-typed.)

The most recent game involves spelling. I threw it together in a couple days after receiving some loose guidelines and a word list from Mr Wellman. You can play it here; click on "The Spelling Game" in the left frame of the page. (I'll leave it up to you whether you want to play a round or two of "Find Mr Duz" first; personally, I'd recommend it.) As you can see, the letters are randomly scrambled at the beginning of the game; the player then drags the letters into the correct order to unscramble the word.

There's a problem, though. If I just randomly shuffle a word's letters (quick aside: one way in which ActionScript falls short is that there's no simple way to shuffle an array), there's a possibility that the shuffled letters will come out exactly the same, or too similar to, the original sequence of letters. This is common on the shorter words; "Cat" would scramble as "Cat." "Truck" could scramble as "Trukc." When considering how to deal with this, I recalled reading about PHP's Levenshtein function while reading through the documentation of some of the roughly seven million functions available for that language. Named after its developer, a computer scientist in Soviet Russia, this algorithm calculates the "distance" (different-ness) between two strings (sequences of characters) by calculating how many removal, insertion, or substitution of characters would have to be performed on one word in order to turn it into another. Wikipedia gives an example:

…The Levenshtein distance between "kitten" and "sitting" is 3, since these three edits change one into the other, and there is no way to do it with fewer than three edits:

  1. kitten → sitten (substitution of 's' for 'k')
  2. sitten → sittin (substitution of 'i' for 'e')
  3. sittin → sitting (insert 'g' at the end)

This would be perfect to make sure that the scrambled version of a word was not too similar to its original. (Though, in retrospect, since the scrambled word will have the same number of characters as the unscrambled version, I could have used the less computationally-expensive Hamming distance algorithm instead… Oh well.) I could just run the algorithm on the original word and its scrambled version, and if the distance value is too low, then the strings are too similar and need to be shuffled again. However, ActionScript does not have a simple levenshtein() function already ready for me to use, like PHP does. And after a bit of searching, I couldn't find an implementation that anyone else had made. So, with a little help from the examples on this Wikibooks page, I made my own. Here it is in action.

It seems to work pretty well; its output matches the PHP implementation, at the least. The code of the algorithm itself could perhaps use a little more optimization; I'll leave that as a project for you. Here's the download for the entire Flash example above; the code for the algorithm can be found inside the "LevSample.as" file, at the bottom. (The Flash project itself can be considered public domain as well, though the Levenshtein algorithm part is really the only interesting part of it.)

Get the code (ZIP archive)

Keep coming back for more totally free code in the future!

Get more great Ray Gun Robot content sent directly to your feed reader or email inbox! Subscribe today!
Feed icon Articles & LinksVia Email
Feed icon Articles OnlyVia Email

0 Comments | 0 Trackbacks | Digg this article | Bookmark with del.icio.us

 

Trackbacks

No Trackbacks

Comments

No comments

Add Comment

RGR is a safe-for-work site. Please avoid posting explicit content, and please clearly label any links which link to explicit content. Comments which do not follow these guidelines will be deleted. Thank you.

Name:

Web site:

Comment:

Markdown format allowed

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA