HTML Form Submission via Phython

Some tasks can be very tedious and then writing a short python script is often a nice way to automate them and save some time. But most people who actually tried doing that can probably tell you, that this script often gets more complicated than anticipated in the beginning. Here is the relevant xkcd comic:

xkcd comic 1319

Anyway, this story begins with a web form (on a site that shall remained unnamed), which I need to submit about 80 times with similar content. Since I have used that technique before, I thought it should not be to hard to automate this task with python’s urllib module. Here is main class I wrote for this:

import urllib.request
import http.cookiejar

class form_poster:
    def __init__(self):
        self.cj = http.cookiejar.CookieJar()
        self.opener = urllib.request.build_opener(
            urllib.request.HTTPCookieProcessor(self.cj) )

    def clear_cookies(self, cookie_url):

    def submit_form( self, form_url, form_field_dict ):
        data = urllib.parse.urlencode( form_field_dict ).encode( 'ascii' )
        form_submit_request = urllib.request.Request( form_url, data=data, headers=self.headers )
        response = form_submit_request )
        if response.getcode() == 200:
            print("Upload successfull: {}".format(form_field_dict))
            print("Upload failed: {}".format(form_field_dict))
        return response

    def open_page(self, url):
        request = urllib.request.Request( url, data=None, headers=self.headers )
        return urllib.request.urlopen(request)

The idea is as follows:

  1. We initialize this class and give it a cookie jar (since we need to log in to said website before submitting the form).
  2. We can now submit forms using the submit_form function, which takes an URL and a dictionary as arguments. Both can be obtained by looking at the HTML source code of the page that contains the web form.

The first thing we want to do, is log in to the website. To figure out how to do that, we open the log-in page in our web browser and find the log-in form in the source code. In Firefox this can be done (for example) by right clicking in the user field and selecting the “Inspect Element” option. There is probably a lot going on in the source code, but something similar to this should be present:

<form id="login" action="/login.php" method="post">
    <input name="username" id="login_name">
    <input type="password" name="password" id="login_pass">
    <input type="submit" value="Sign In">

If the website is located at, this tells us, that the form will be submitted to via the HTTP method POST and needs to contain the fields username and password. So it should be fairly simple for us to use our form_poster class defined above and our login credentials:

fp = form_poster()
fp.submit_form( "",
    { "username": "myUserName",
      "password": "EL9L*FWdij" } )

As a side note: please be careful where you store/type your credentials when using them in scripts! There are people out there, that are automatically scanning GitHub for things like this in every open repository.

This seemed to work fine for me and should now have set the login cookies that I need to submit the actual form I’m interested in. For this, I basically repeat the previous step of analysing the HTML of the web form and extracting all <input> fields (at least all that have a name attribute). Don’t forget, that there can also be other HTML elements contributing to a <form>! For example <textarea> or <select>.

After this analysis and deciding how to fill the fields, I tried to submit this form in the same way:

fp = form_poster()
# get log-in cookies
fp.submit_form( "",
    { "username": "myUserName",
      "password": "EL9L*FWdij" } )
# submit actual form for all wanted values
for value_dict in all_value_dicts_I_want_to_submit:
    fp.submit_form( "",
        value_dict )

But instead of a success message, urllib throws an exception:

urllib.error.HTTPError: HTTP Error 403: Forbidden

Apparently I am not allowed to access the form at all? Why would that be? The login seemed to work… It turned out, that the website was checking which kind of “browser” is trying to access the form and blocks urllib in order to prevent spam. Since I don’t want to spam or have any other malicious intend, I’m sure they won’t mind me circumventing this test… Luckily, urllib allows us to set custom header fields, so we just need to change our requests like this:

request = urllib.request.Request( ..., # other arguments
    headers = {'User-Agent':
    'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0'}

And suddenly the 403 error is gone! But this is only where the real problem starts, since now we get:

urllib.error.HTTPError: HTTP Error 500: Internal Server Error

But that’s a story for another post…

Addendum 2021-05-16:

The introduction to this post has been more accurate than I anticipated at the time of writing. I have since decided, that this endeavour took too much effort and just filled the forms by hand. In the case that anybody is interested, here is a short version of how my struggles played out:

  • The automatically received website didn’t open properly. Looking at the source code revealed one giant and very confusing javascript function call – the code was apparently obfuscated and I couldn’t find an (easy) way to de-obfuscate it.
  • I saw in the source code (that my browser showed) that a session id needs to be given along with the form fields, but couldn’t figure out a way to pass this properly.
  • A while later, I heard about Selenium, which seemed like the perfect tool for a job like this. I could get it to work for a similar task on another website, but not for this problem. I assume the website has some way to detect the automation attempt and blocks it actively.
  • This result is ok for me. It is actually a bit reassuring to know that this website has proper protection and filling the form a few dozen times by hand is also not that terrible. (Actually, I created templates, that I just need to copy and paste into the form…)

So in conclusion: My automation attempt was unsuccessful, but I still learned a bit of background information and could try out some new tools.


A blog by Timo Dreyer