June 19, 2003

Trapping Email Harvesters

Boing-Boing points to Merlin Mann's directions for spammer-hunting. He cites as inspiration this comment from Mark Pilgrim's weblog on telling spambots where to go:

In each page I serve, I include a bogus email address, encoded with the date of access as well as the host IP address and embedded in a comment. [Apache's server-side includes are great!] This has allowed me to trace spam back to specific hosts and/or robots.

Sounds fun. I'm going to try it, and here's how you can too.

Requirements:

  • server running PHP
  • a throwaway domain (ours will be "example.com") [see note 1]
  • basic understanding of HTML, passing ability with basic PHP

Evolt.org also has some techniques for Apache users to keep away unwanted spiders and robots:

Setting the trap

The first step in our war against the Spiderts is to identify them. There are many techniques to find out who the bad bots are, from manually searching your access_logs to using a maintained list and picking which ones you want to exclude. At the end of the day it's getting the robots name—its User-Agent— that's important, not how you get it.

Posted at June 19, 2003 01:12 PM
Comments

This discussion has been closed. No more comments may be added.