Creating a basic captcha with Imagick

November 9, 2008 by Gary Illyes  
Filed under IMagick

I hate captcha systems, truly, madly, deeply. But spammers force us to create and use them.
Writing captcha systems can be very easy. You need a randomly generated image, a hard to read font, and finally basic server side coding knowledge. I use PHP cos it’s well documented and extremely easy to use, but you can use any other server side language you want. On which ImageMagick can be used of course.
So, let’s list what we want to do:

  • Create a random string
  • Display it as an image
  • Check whether the user entered the string correctly

That’s all.

Creating random string with PHP
This is the easiest. PHP already has a function which creates new random string each second. Yup, that’s the time() function. But the created string has only numbers. To fix that, we encrypt it with some sort of built-in encryption function. That can be md5, sha1, anything. I use md5 now cos that’s which popped in my mind on the first place.
The second thing we want to solve is to not make the user to enter 32 characters, but only 5. Again, PHP’s string manipulation knowledge comes handy, I use substr(). So let’s see our code:

$rand = md5(time());
$str = substr($rand, 0 , 5);

That simple. We now have a 5 characters long completely random string.
 

Pushing the random string in an image, using IMagick
 
If you followed my IMagick related posts, you should know that annotating an image with IMagick is simpler than confusing a snail with a mirror. The only difference is that now we’ll have in the image our random string, not a fixed variable’s value. Yeah, that’s dynamic image.
I’ll also use a hard to read font, because using DirectX even a 5-year-old can create a captcha reading software if we use a common font.
You may ask how will the server side code which checks whether the captcha is correct or not will know the user entered the correct code or not. That’s solved with sessions. We start our session in the image generator script, and set the shortened version of our random string in a session variable.
So, when the image is loaded in your HTML, the session variable will get the 5 character long random string as value. The session variable will be checked on the page which checks the user input, if matches the value which was entered by the user, we know it’s human. No, we hope it’s a human.
So, the code to generate the dynamic captcha image:

session_start();
$rand = md5(time());
$str = substr($rand, 0 , 5);
$_SESSION['captcha'] = $str;

$font = '/path/to/your/hard2read/font.ttf';

$image = new Imagick();
$draw = new ImagickDraw();
$pixel = new ImagickPixel( 'white' );
$image->newImage(100, 27, $pixel);
$draw->setFont($font);
$draw->setFontSize( 25 );
$draw->line( $x, $y, $x1, $y1 );
$image->annotateImage($draw, 10, 20, 0, $str);
$image->setImageFormat('png');
header('Content-type: image/png');
echo $image;
a captcha image created with IMagick

We save the above code as captcha.php and try to load it. If everything went well, we should see an image like the one on the left.
 
The HTML form for our captcha
 
If you can’t do this by your own, you shouldn’t mess with captcha verification. But just for the sake of the example:

captcha image

Now the server side script which checks the user input. I write something very basic, don’t validate or sanitize anything. When you write your own user input verifier code, you should validate every single bit of your users’ input, don’t trust em!

session_start();
/*Check if every field has been filled in*/
if (!empty($_POST['uname']) && !empty($_POST['upass']) && !empty($_POST['captcha'])){

/*Check if the CAPTCHA image value matches the user input*/
    if ($_SESSION['captcha'] != $_POST['captcha']){
        echo 'You failed the CAPTCHA verification. Try again.';
    }else{
         echo 'Congrats! Everything is cool!';
    }

}else{
echo 'You missed to fill in some fields. Try again.';
}

And we’re done.

To play with the example you can do it here.
To download the files which were created through this post, click here.

The end of the CAPTCHA era

August 29, 2008 by Gary Illyes  
Filed under Development, SEO

Every webmaster should know what a CAPTCHA (abbreviation for Completely Automated Public Turing test to tell Computers and Humans Apart) is, but for who doesn’t know, the little images with obfuscated text on it under the registration forms or comment forms are CAPTHCAs. They are meant to stop robots, for example spam-bots from registering on a website or to post unwanted, spam comments. They are very effective. But the problem is, they are easier to brake than many thinks.

Braking, hacking CAPTCHAs

I am not a good desktop-software developer, but even for me, took less then 2 days to write a .NET software which can recognize with quite high reliability (over 80%) the characters in random CAPTCHAs i found on the internet. I was one step away from releasing a SPAM-Bot with CAPTCHA recognition as all I would have need is a network connector, a script which handles the data transfer on TCP between my PC and the remote website.

So, why are these CAPTHCAs so easy to break?

Basically these are just text, random letters and numbers on a random background then saved as an image file. When braking these images, the software has to follow these steps:

  1. separate the background and the foreground
  2. segment the characters from the image into separate blocks
  3. finally, match the blocks with templates, so all the letters from the dictionary plus the numbers

I think it’s too simple.

If a spammer team hires humans to decipher CAPTCHAs, the situation is even easier. In 3rd world country anyone would do it for a price of $0.0001 per CAPTHCA. Someone who is good enough, can solve about 5000 CAPTCHA’s per day, go figure. How much the spammers earn with their spam-campaign i have no idea, but since they doing this i figure way more than i would think.

So,

How to stop spammers effectively?

It’s very hard. A very good initiative is the Akismet project. Before a comment appears on a website is submitted to Akismet which will try to identify if the text from within the comment could be spam or it’s a legitimate comment. The effectiveness of this service is incredibly high. Sincerely, I didn’t see a spam comment on any website which is protected by Akismet. But this service has a great vulnerability: the service is free and the threshold the comments are matched against, is practically a database. The spammers use random text for their comments, so the owners and developers of Akismet have to update their database day-by-day, the database can only grow and updating the database is also time consuming… for a free service is not a good thing. They will either have to go further as a payed service, get sponsors or to give up. Neither one is an easy decision.

Another good alternative would be to make a website registration dependent and the registration data to be obtained from a trusted third-party which verifies extensively the identity of the users. Such initiative is OpenID. If the third-party which verifies the identity of the users – these entities are called OpenID Servers, such server is for example Verisign, one of the most trusted entities on the internet- makes a good job, the spammers can’t pass through the net thus can not post unwanted commercial comments. If the OpenID Server doesn’t do a good job, then the whole thing is meaningless.

And the list of the possible options has been exhausted… I think. If you know more, let me know.

Now let’s see what upgrade options for the CAPTCHA system would be possible. The official CAPTCHA website lists some great ideas, but at the time of my visits neither worked. The first is that they take hand-written (or not) words from old books and while a user types the letters (and numbers), they also digitize the books. Well, maybe it’s just me but this is the old CAPTHCA refurbished.
The second interesting initiative is a… I have no idea as it’s not working. They neither list any detail but, that it’s their newest CAPTCHA. Good to know.
And the last: 4 random images which relates in a way or other with each other. The user has to choose then the thing which is related to each image.

The last initiative I think is the most reliable and the same time unbreakable initiative as I can’t think of a reliable(!) solution which could recognize random objects from an image, of course i might be totally wrong.

Do you know of another solution which might work on a grand scale? Share your thoughts, the comments are open.