Faking a browser with Mechanize
Some time ago I tried to get the current balance of my prepaid mobile phone plan in order to implement some kind of notification if it is lower than 10 Euro. Unfortunately my ISP doesn’t offer any API to get this information automatically, so my first attempt was to send a POST request using cURL to login to the website of my ISP and then parse the HTML to gain the current balance.
While this is easy on many sites, on that particular site it was a bit tricky because the site is built with JSF (JavaServer Faces) and it seems form field names e.g. for the login form are auto-generated. So before sending the POST request to login, you first need to guess the form field names (though this works pretty easy using regular expressions on the raw HTML). But for some reason I couldn’t make a successful login to the site using cURL (and its cookie storage features) even though the sent requests of cURL looked identically to what I saw in Firebug. Anyway, I decided to look for some other solution before wasting even more time on fiddling with cURL.
I remembered some time ago I read an article about website grabbing using a Perl module called Mechanize. And luckily, Mechanize is also available as a Python package. So after installing it, I just played with the examples on the website of Mechanize and quickly got a result: mechanize.Browser is an awesome simple interface to start requests and access the returned response very easily. Especially easy is to iterate over all forms of the response by using mechanize.Browser’s forms attribute which is a simple generator. And then just pick that form you want to use, in my case it was the one to login. Then you simply fill the form fields with your data and call the submit() method of the Browser object. Mechanize will then send the POST request, receives the response and in case it is a redirect to another page, it also follows this redirect and present you with the HTML of the new page. Not to mention it also handles cookies automatically without any need to further configure it.
The rest was rather simple, passing the HTML retrieved using Mechanize to BeautifulSoup and find() the relevant HTML element with the data I was looking for.
To sum it up, if you want to do a little more than trivial GET requests in Python on arbitrary websites, have a look at Mechanize. It makes it so easy to perform browser tasks from within a script :). Yay.