The end of the CAPTCHA era
Every webmaster should know what a CAPTCHA (abbreviation for Completely Automated Public Turing test to tell Computers and Humans Apart) is, but for who doesn’t know, the little images with obfuscated text on it under the registration forms or comment forms are CAPTHCAs. They are meant to stop robots, for example spam-bots from registering on a website or to post unwanted, spam comments. They are very effective. But the problem is, they are easier to brake than many thinks.
Braking, hacking CAPTCHAs
I am not a good desktop-software developer, but even for me, took less then 2 days to write a .NET software which can recognize with quite high reliability (over 80%) the characters in random CAPTCHAs i found on the internet. I was one step away from releasing a SPAM-Bot with CAPTCHA recognition as all I would have need is a network connector, a script which handles the data transfer on TCP between my PC and the remote website.
So, why are these CAPTHCAs so easy to break?
Basically these are just text, random letters and numbers on a random background then saved as an image file. When braking these images, the software has to follow these steps:
- separate the background and the foreground
- segment the characters from the image into separate blocks
- finally, match the blocks with templates, so all the letters from the dictionary plus the numbers
I think it’s too simple.
If a spammer team hires humans to decipher CAPTCHAs, the situation is even easier. In 3rd world country anyone would do it for a price of $0.0001 per CAPTHCA. Someone who is good enough, can solve about 5000 CAPTCHA’s per day, go figure. How much the spammers earn with their spam-campaign i have no idea, but since they doing this i figure way more than i would think.
So,
How to stop spammers effectively?
It’s very hard. A very good initiative is the Akismet project. Before a comment appears on a website is submitted to Akismet which will try to identify if the text from within the comment could be spam or it’s a legitimate comment. The effectiveness of this service is incredibly high. Sincerely, I didn’t see a spam comment on any website which is protected by Akismet. But this service has a great vulnerability: the service is free and the threshold the comments are matched against, is practically a database. The spammers use random text for their comments, so the owners and developers of Akismet have to update their database day-by-day, the database can only grow and updating the database is also time consuming… for a free service is not a good thing. They will either have to go further as a payed service, get sponsors or to give up. Neither one is an easy decision.
Another good alternative would be to make a website registration dependent and the registration data to be obtained from a trusted third-party which verifies extensively the identity of the users. Such initiative is OpenID. If the third-party which verifies the identity of the users - these entities are called OpenID Servers, such server is for example Verisign, one of the most trusted entities on the internet- makes a good job, the spammers can’t pass through the net thus can not post unwanted commercial comments. If the OpenID Server doesn’t do a good job, then the whole thing is meaningless.
And the list of the possible options has been exhausted… I think. If you know more, let me know.
Now let’s see what upgrade options for the CAPTCHA system would be possible. The official CAPTCHA website lists some great ideas, but at the time of my visits neither worked. The first is that they take hand-written (or not) words from old books and while a user types the letters (and numbers), they also digitize the books. Well, maybe it’s just me but this is the old CAPTHCA refurbished.
The second interesting initiative is a… I have no idea as it’s not working. They neither list any detail but, that it’s their newest CAPTCHA. Good to know.
And the last: 4 random images which relates in a way or other with each other. The user has to choose then the thing which is related to each image.
The last initiative I think is the most reliable and the same time unbreakable initiative as I can’t think of a reliable(!) solution which could recognize random objects from an image, of course i might be totally wrong.
Do you know of another solution which might work on a grand scale? Share your thoughts, the comments are open.
Possible related posts (automatic):












Recent Comments