Python and CAPTCHA 
Python August 4th, 2008
It’s a long time post after last updated, because I got much more lazier. There are three elements in my lifetime now only, eating, sleeping and DOTA…
I used a third party tool to enter VS platform for playing DOTA, and accounts often been ban. So I’ve write a automate registe tool, the following is a step by step guide for building it
First of all,I use Live HTTP headers to capture the data which been posted to server when registering. The road block is that we need to parse the CAPTCHA image and send the correct numbers to it.
The VS CAPTCHA is very simply,fixed position and lot of noise. Refs:

Split the image, find out the numbers region. Refs:

Use PIL library to convert the source image. Refs:

But this converted image has to many noise, so we need to do a CONTOUR filter first. Refs:

The next work is extracting data from regions which these numbers located. We just need to extract some stand data, for example the “2″ in previous image:

And the data is:
1 | [255, 0, 255, 0, 0, 0, 255, 0, 255, 255, 255, 0, 0, 255, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 0, 255, 255, 255, 255, 255, 0, 0, 255, 0, 255, 255, 255, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 0, 255, 0, 255, 0, 0, 255, 0, 255, 0, 0, 255, 0, 0, 255, 0, 0, 255, 0, 0, 255, 255, 255, 0, 255, 255, 0, 0, 255, 255, 255, 0, 255, 0, 255, 255, 255, 0, 0, 255, 255, 0, 255, 255, 0, 0, 255, 255, 0, 0, 255, 0, 255, 0, 255, 255, 0, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 255, 0, 255, 255, 0, 255, 0, 0, 0, 0, 0, 0, 255, 255, 0, 255, 0, 0, 0, 0, 0, 0, 0, 0, 255, 0] |
We need do this many times, and get the stand sample from 0-9 for building a features library.
So a complete CAPTCHA parse process for this is: get the image, do a CONTOUR filter and convert it, extract data from regions which numbers located, look up the matching number from the feature library. I use the Levenshtein Distance algorithm to do this matching.
The following is a example code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | #!/usr/bin/env python # -*- coding: utf8 -*- import cookielib, Image, ImageFilter, StringIO, urllib, urllib2 from features import FEATURES CAPTHA = 'http://www.vsa.com.cn/user/center/code/image2.jsp' def _levenshtein_distance(m, n): len_plus = lambda x: len(x) + 1 c = [[i] for i in range(0, len_plus(m))] c[0] = [j for j in range(0, len_plus(n))] for i in range(0, len(m)): for j in range(0, len(n)): c[i+1].append( min( c[i][j+1] + 1, c[i+1][j] + 1, c[i][j] + (0 if m[i] == n[j] else 1) ) ) return c[-1][-1] def _get_number(source): distance = [_levenshtein_distance(source, i) for i in FEATURES] minimal = min(distance) return distance.index(minimal) # Set the cookie cookie = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie)) # Get the captha image img_file = opener.open(CAPTHA) tmp = StringIO.StringIO(img_file.read()) image = Image.open(tmp) # Show the image image.show() # Convert the image i = image.filter(ImageFilter.CONTOUR).convert('1') # Get four numbers regions' data blocks = [list(i.crop(b).getdata()) for b in [(5, 3, 17, 16), (18, 3, 30, 16), (31, 3, 43, 16), (44, 3, 56, 16)]] # Parse numbers numbers = [_get_number(b) for b in blocks] # Output numbers print '%s%s%s%s' % tuple(numbers) |
features.py is a feature library file, so I don’t want to post it here. To download a complete code example can Click here
After parsed the CAPTCHA, the left work is posting the data to server only, that’s too easy for you.
I’ve just upgraded my WordPress to version 2.5, and there are many changes in the administration interface. The admin UI looks clean and simple, much better than before. The multi-file upload with progress bar feature is so cool, and tag management is powerful. Plugin can be upgraded by filled ftp information now, so fancy ![]()
Continuing trying new features~
Python CAPTCHA 
Python March 17th, 2008
Why do this? Because the VS banned my account too many times. I want to play DOTA with my friends via this platform only, so I need to write a auto-register for changing my account easier
But when I started working on it, my accounts got frozen…
Here is a sample of my university’s site BitUnion.org: Click here (poor network, maybe take 10s+)
PS: You can visit http://blog.lazytech.info/python/py-captha.php also.