Solipsistic Meanderings

September 25, 2004

I was getting all set to add more features to my WPBlacklist plugin yesterday when I realized that I was going about things all wrong :p I was adding more features to a plugin which I had to keep on maintaining and updating each time a new spam URL came in or the spammer found a new IP to flood me with spam from. Instead, I should have thought about why I was getting spammed so often and come up with a solution which prevented this. So I thought about it a bit …

I have two blogs this one (DC) and the one over at SM. Now DC, is maintained via Blog and uses a commenting system that I wrote while SM uses WordPress as the backend blogging system and uses the built-in comments from WordPress. Now, I’ve probably received like 3-5 spam comments here on DC whereas I’ve been spam bombed on a regular basis over on SM. So, why is it that I get more spam on SM than DC when DC has been existence longer than SM and probably has a higher presence on Google? I realized that the answer might lie in the fact that I used WordPress over on SM.

My logic was along these lines – if you ran a standard WordPress installation, you’d have a specific comments file (the file name itself is well known) and anybody who takes a look at the WP source, can figure out what the comment submission variables are. Then, all a spammer has to do is to write a script which calls the comments form’s action script (the one which processes the submitted comment) with the correct variables populated with the spam values. So, what if I changed the WordPress code so that the comments script was named differently? This would take care of part of the most basic robot scripts which simply hit every site with a WordPress blog and looked for the standard comments processing script but what if the robot actually went through your index page to figure out what the actual comments submission script was and then simply submitted comments to the script name it had discovered? That was easy enough to tackle too – I simply changed the comment variable names on my installation of WordPress.

Of course, the spam robot might be written so that it would parse the main page not only for the comments script name but also the comments variables. So, I added a third level of security by populating my main page with a couple of dummy forms which pointed to (among other things) the original WP comment submission script but I had replaced the code in the original script with innocuous stuff which did nothing at all :p

That probably would have been as far as I would have gone if I hadn’t started reading up about spam comment prevention online :p The above methods would not discourage a determined spammer but they would have been enough to deter most of the script kiddies out there who are probably simply using a script written by somebody else and are basically using their spam robots to target standard WP installations which are easy prey. But what if I was dealing with a more determined attacker who had a robot script which was really thorough? I found an answer to this in my online research in a blog entry about how to set spider traps :p This article basically explained how you could create a script which would identify bad robots which tried to access disallowed content on your site and then ban the IP they came from. I modified that method to mark the dummy spam submission scripts I’d entered into my main page as disallowed content so that any robot script which disregarded my warnings and tried to parse the dummy scripts would automatically get banned. Now, I’m waiting to see what happens – if I still continue to get spam or if the floods stop … or, I might become the next challenge for some idiot out there who becomes determined that he should get me since I went to such lengths to stop the spamming and explained it here so others could do the same :p