You know those annoying boxes of jumbled, multicolored, slanted, crossed-out, or reversed text that are practically indecipherable to prove to a Web site that you are human? They’re called captchas.
Captcha stands for ‘Completely Automated Public Turing test to tell Computers and Humans Apart’ and consists of challenges that only humans are supposed to be capable of solving. Web sites use such tests in order to block spam bots that automate tasks like account registration and comment posting.
Researchers from Stanford University have developed an automated tool that is capable of deciphering these text-based anti-spam tests used by many popular Web sites with a significant degree of accuracy.
Researchers Elie Bursztein, Matthieu Martin and John C. Mitchel presented the results of their year-and-a-half long captcha study at the recent ACM Conference On Computer and Communication Security in Chicago.
There are various types of captchas, some using audio, others using math problems, but the most common implementations rely on users typing back distorted text. The Stanford team devised various methods of cleaning up purposely introduced image background noise and breaking text strings into individual characters for easier recognition, a technique called segmentation.
Some of their captcha-breaking algorithms are inspired by those used by robots to orient themselves in various environments and were built into an automated tool dubbed Decaptcha. This tool was then run against captchas used by 15 high-profile Web sites.
Decaptcha uses a five-stage process to remove noise from images and detect letters from shapes.
First, the program pre-processes the image for any noise or obscuring lines, then the segmentation stage separates each of the shapes. Post-segmentation analyzes each of the shapes, recognition approximates a letter to each shape, and finally, the program runs a spell check in post-processing.
Against all typographic adversity, the Decaptcha program was able to defeat 66 per cent of captchas on Visa’s Authorize.net payment site; 70 per cent at Blizzard Entertainment’s World of Warcarft portal.
Other interesting results were registered on eBay, whose captcha implementation failed 43 per cent of the time, and on Wikipedia, where one in four attempts was successful. Lower, but still significant, success rates were found on Digg, CNN and Baidu — 20, 16 and 5 per cent respectively.
The only tested sites where captchas couldn’t be broken were Google and re-captcha. The latter is an implementation originally developed at Carnegie Mellon University and bought by the Internet search giant in September 2009.
Authorize.net and Digg have switched to re-catpcha since these tests were performed, but it’s not clear if the other Web sites made changes as well. Nevertheless, the Stanford researchers came up with several recommendations to improve captcah security.
These include randomizing the length of the text string, randomizing the character size, applying a wave-like effect to the output and using collapsing or lines in the background. Another noteworthy conclusion was that using complex character sets has no security benefits and is bad for usability.
Bursztein and his team have also had other breakthroughs in this field in the past. Back in May, they developed techniques to successfully break audio captchas on sites like Microsoft, eBay, Yahoo and Digg and they plan to continue improving their Decaptcha tool in the future.
Of course, the researchers have no plans to release the software and explain at length of ways to improve captchas in their paper. But if these researchers can crack this anti-bot system, it is entirely possible someone else less reputable will figure it out too, and before long we’ll be inundated with spam and scams once again.