Totally Free Code Day 1: Captcha
Mon 23 Jul 07 01:16 | Tags: Programming
Here's the first chunk of code I am offering into the public domain as part of my Totally Free Code series. It's a captcha generation script in PHP.
What's a captcha? You've probably used them on the internet before even if you don't know it. A captcha is an image composed of letters and/or numbers which are obfuscated in such a way that they are still legible to sighted humans, but a computer will have a difficult time "reading" it because of the obfuscation (see optical character recognition). Many forms on the internet will display a captcha image, then ask the user to type the characters that are image in order to prove they are really human. The origins of this script are in an image board script I wrote a few years ago; such image boards are often targeted by spam-posting "bots," but implementing a captcha stops the bots in their tracks most of the time.
Captcha scripts can be difficult to develop. The main difficult comes from its very function; how do you scramble the appearance of characters enough that they're illegible to computers, but not so much that they're illegible to humans?
To make matters more complex, I wanted my script to be as fast and widely-compatible as possible, which means it doesn't use any libraries (that is, shared code fragments) except for GD, which is installed with PHP by default. GD limits me to using only five fonts, and three of those were too small for this use; that gives me only two simple fonts' worth of characters to work with. Were someone to write a script to defeat this captcha, they would already have a known and finite range of characters for their script to try to read; that's not good. There are options to use other fonts in images with PHP, but again, that idea had to be scrapped due to speed and compatibility concerns.

So the script makes do with the two largest GD fonts, one of which is merely a bold version of the other. It takes quite a few steps beyond that to try to obfuscate the code, however. Before the script places a character, it randomly shifts the character's position vertically, so that the entire string does not have a consistent baseline. It also shift's the character's position horizontally, so that the characters are an unpredictable distance form each other. After the entire string has been placed, the image is rotated slightly by a random degree. Finally, the image is "blurred" by having copies of itself pasted over itself at various levels of transparency, to make it difficult for a character reading script to determine the exact edges of the characters.
The code, as it is written, is intended to be called by an <img> tag in HTML. (Unfortunately, the random elements of the script's image generation means it's impossible to predict the width or height of the generated image, so don't use width or height attributes for the tag.) It will try to find a session variable named captchaCode ($_SESSION['captchaCode']); if it doesn't exist, it will randomly create one with alphanumeric characters that will usually be five to six characters long (but could possibly be as short as one). Compare the form input to the value of this session variable to check for validity.
I think the resultant image is a pretty high-quality captcha given the parameters I was working with, and it's certainly better than some others I've seen on the internet. It's not flawless, however. It sometimes seems to be a bit on the illegible side even for humans; if you implement this, I highly suggest you give your users the option of reloading the captcha if they get a "bad one." The script, as it is written, will only offer up lower-case letters as well as numbers, but the zero could be indistinguishable by users from an upper-case letter O. It wouldn't be a bad idea to avoid using both zero and O in the code to be generated, or possibly to only use numbers (though this greatly reduces security). Finally, the script, as it is written, will create a new image every time it is reloaded, meaning some smart-aleck could stress out your web server by simply trying to reload this script over and over; it wouldn't be a bad idea to cache generated captcha images on a per-user basis for a limited amount of time (though a new one should be generated whenever the form is successfully submitted).
Get more great Ray Gun Robot content sent directly to your feed reader or email inbox! Subscribe today!
Articles & Links — Via Email
Articles Only — Via Email
0 Comments | 0 Trackbacks |
| ![]()

