PitchLake – a tar pit for scanners

by Simon Biles
Founder of Thinking Security Ltd., an Information Security and Risk Management consultancy firm based near Oxford in the UK.

We’ve had two bank holidays in a row here in the UK – first off for Easter, then for the Royal Wedding – time off work coupled with very pleasant weather and plenty of “refreshments” has caused my brain to atrophy! So, rather than pulling one of my usual type of topics from the hat for this article, I thought that I’d do a mini-project for the month.

[ I’ll apologise up front though – I can only just program in both Perl and C, and C isn’t exactly column friendly, so it’ll be Perl. I know that many readers here can program in Perl, and most of you probably better than me – I’d be interested to hear corrections, tips and tricks in the comments so as to improve, as no doubt there are better ways of doing this ! ]

One of my first tasks in the office this morning, after a cup of coffee of course, was to review my server logs [1]. As of yet I’ve not got enough staff to have a minion to do this for me, but to be honest I’d miss the connection to the real world of computing if I did [2]. I run a Linux server in a datacentre in Birmingham as my company’s main web-server and my high bandwidth, static IP’d pen-test machine. For the last few months I’ve been meaning to do something about the 404 errors ( that are being reported by Apache – some are my fault for taking pages away that people clearly still cross reference – the others though are clearly the work of automated web vulnerability scanning tools.

Vulnerability scanners ( are the bottom end of the pen-test toolkit – they are to penetration testing what the Windows “find” command is to digital forensics; e.g. superficial and basic. There are various types – but of interest today are those that operate on the application layer over HTTP ( In the open source market both Nikto ( and Nessus ( [ Nessus isn’t open source per se, but is free for home use … ] are examples of products that perform tests against webservers for potentially insecure CGIs and files – the trouble with this is that in order to determine if an insecure CGI script or file is present, the scanner asks Apache for it, and receives a 404 if it isn’t there, each 404 is written to the log, and when Nikto, for example, tests for over 6400 possible vulnerabilities you can imagine what the logs look like ! Sadly, tools like these, as they are available to everyone, are not only used by the kind of people that get written authorisation before testing your web server …

Apache, however, is a good thing – it allows you to reconfigure your 404 error messages in a number of ways. Most simply it allows for you to return a configured 404 page – one that matches the look and feel of your website, and perhaps allows you to indulge your love of haiku ( This however is for the weak and lazy – you can also pass it to a script, one which can perhaps guess what your user meant, or at least one that redirects them to where you want them to be and, perhaps, perform some other manipulations along the way …

Over hundreds and hundreds of years, at La Brea in Los Angeles, tar has seeped up from the earth creating huge tar pits ( These tar pits were often covered in water ( water and tar not being the natural combination to mix together ) – which tempted animals to come and drink – once they set foot into the tar, a slow, sticky and ultimately preservative end was inevitable. The idea of trapping ( and preserving – but I think that’s illegal ) script kiddies appeals to me, and it seems also to others. LaBrea ( is a “tarpit” or “sticky honeypot” program that takes over unused IP addresses on a network and creates “virtual machines” [3]. It then allows vulnerability scanners and port scanners to connect to these “virtual machines”, but by slowing the response times gradually grinds them to a halt waiting for a response – effectively trapping them in the tarpit. LaBrea is a great tool, however it has some significant restrictions on how you run it – you need a reasonable range of IP addresses for it to be effective, and, in this modern day and age not many of us can claim that ! (Roll on IPv6 !) And it’s no good for a single server being scanned for Web vulnerabilities. So what we need to do is create the equivalent for web servers …

First off, we need to configure our Apache installation to redirect all unknown requests to our script, in the httpd.conf:

ErrorDocument 404 /

This, admittedly rather self-evidently, sets the 404 error page to our script (being a Brit, I’m obliged to work with commonwealth Tar Pits – ). The script needs to sit in the document root of your webserver (in my case /var/www/html). Also make sure that your Apache is happy to process .pl files as CGI, rather than just printing the contents.

AddHandler cgi-script .pl

(Check that ExecCGI is enabled too)

At this point in time, any page that creates a 404 will redirect to the script. Now we need a script which does something pointful … We don’t really want to penalise the user who gets it wrong by accident by tarring them up waiting for a response, so, like a password or login protection, we’re going to increment the timeout by a power of 2 each time a wrong request comes in – so the first wrong request won’t have any increased time, the second will have a 1 second delay built in – the second a 2 second delay, the third a 4 second delay, the fourth an 8 second delay – you get the picture … ( It’s not actually that large, it starts at .1 of a second, then .2, .4, .8, 1.6 etc. so for a real person making an honest mistake – there’s a bit of flexibility) Also, anything that is older than two hours gets cleared from the database to stop it from clogging up over time. I can see that there are areas that this could be improved in – pre-emptive increases for people scanning for known URLs for example, and also there should be some tools to extract some relevant data – keep an eye on the web page if you are interested – I may get around to doing some of them !

In the meantime – Version 0.3.3 of PitchLake is available here – – along with some further information on the database. And you can test it by going to pretty much anything at that doesn’t exist ( try bob.html for example ).

Click here to discuss this article.

[1] Nobody reads all of their server logs if they have any sense – you need to apply some tools to filter the mundane and ordinary – I use “logwatch”on my Linux servers to extract the pertinent.

[2] Cue flashback to my original career as a SysAdmin with hundreds of UNIX servers as my responsibility – occasionally I long for the “good old days”. Usually when I’m asked about PCI/DSS compliance …

[3] But not “Virtual Machines” in the sense of VMWare, more in the sense of the Turing Test – they answer questions like a computer, but there’s no real processing going on behind them … These operate more at the Transport Layer (

Read Simon’s previous columns

Simon Biles is one of the founders of Thinking Security Ltd., an Information Security and Risk Management consultancy firm based near Oxford in the UK. He has worked on security projects for commercial, charity and government organizations for over 10 years. Simon is studying Forensic Computing at Cranfield University, although very slowly because of work commitments! He posts on the forum as Azrael and you can read an interview with him here.


No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: