Spambait

This spambait script has been retired,  and this remains located here to showcase abusive scrapers and clueless programs with the aim ito give them useless content.

What is a scraper?
A scraper is nothing more than a script that has the sole purpose of stealing and abusing website content. The main avenues of theft, are for email addresses. There are also scrapers that their primary function is to “data mine”, another word for, we are going to steal your pages, extract what we can and sell it.

What is a clueless program?
In a nutshell, a program run by a clueless person, that in their superior wisdom has chosen to do a task, poorly and with no real comprehension. Most common of these programs are submitting, looser programs, that submit aimlessly, to sites, with the hope that they can get a link. The loosers never even checks why their program fails so often, well they don’t or can’t comprehend.

Examples of looser’s submitting to non existing pages / forms

For the clueless loosers, we would just need to use an htaccess file to send them to somewhere else. you would think, they would check, if there is such a page or form first. They will probably not even realize they are being redirected. Rest my case. .htaccess is a quick and easy method, see rule below:

redirect 301 /cgi-shl/classifieds/classifieds.pl http://access-us.com/support/spambait/

Simple and effective, and you can send them anywhere you want, even to a bad part of the Internet, so they can annoy someone else, or we can drop him in a deny rule in htaccess or drop his IP into the firewall.

Bad Bots, what are they?
They are automated scripts that grab and index content, supposedly to search for data within the index. They appear often and frequently do not honor the robots.txt. These bad bots can be banned by using their user agent as a .htaccess rule ( see below ) or they can be banned by IP, by adding the IP’s to a firewall.

#block bad bots
SetEnvIfNoCase user-agent “^Accoona” bad_bot=1
SetEnvIfNoCase user-agent “^BlackWidow” bad_bot=1
SetEnvIfNoCase user-agent “^Bot\ mailto:craftbot@yahoo.com” bad_bot=1
SetEnvIfNoCase user-agent “^ChinaClaw” bad_bot=1
SetEnvIfNoCase user-agent “^ConveraCrawler” bad_bot=1
SetEnvIfNoCase user-agent “^Curl” bad_bot=1
SetEnvIfNoCase user-agent “^Custo” bad_bot=1
SetEnvIfNoCase user-agent “^DISCo” bad_bot=1
SetEnvIfNoCase user-agent “^Download\ Demon” bad_bot=1
SetEnvIfNoCase user-agent “^eCatch” bad_bot=1
SetEnvIfNoCase user-agent “^EchO” bad_bot=1
SetEnvIfNoCase user-agent “^EirGrabber” bad_bot=1
SetEnvIfNoCase user-agent “^EmailSiphon” bad_bot=1
SetEnvIfNoCase user-agent “^EmailWolf” bad_bot=1
SetEnvIfNoCase user-agent “^Exabot” bad_bot=1
SetEnvIfNoCase user-agent “^Express\ WebPictures” bad_bot=1
SetEnvIfNoCase user-agent “^ExtractorPro” bad_bot=1
SetEnvIfNoCase user-agent “^EyeNetIE” bad_bot=1
SetEnvIfNoCase user-agent “^FlashGet” bad_bot=1
SetEnvIfNoCase user-agent “^FrontPage” bad_bot=1
SetEnvIfNoCase user-agent “^GetRight” bad_bot=1
SetEnvIfNoCase user-agent “^GetWeb!” bad_bot=1
SetEnvIfNoCase user-agent “^Go-Ahead-Got-It” bad_bot=1
SetEnvIfNoCase user-agent “^Go!Zilla” bad_bot=1
SetEnvIfNoCase user-agent “^GrabNet” bad_bot=1
SetEnvIfNoCase user-agent “^Grafula” bad_bot=1
SetEnvIfNoCase user-agent “^Harvest” bad_bot=1
SetEnvIfNoCase user-agent “^HMView” bad_bot=1
SetEnvIfNoCase user-agent “^http://www.relevantnoise.com” bad_bot=1
SetEnvIfNoCase user-agent “^HTTrack” bad_bot=1
SetEnvIfNoCase user-agent “^ia_archiver” bad_bot=1
SetEnvIfNoCase user-agent “^Image\ Stripper” bad_bot=1
SetEnvIfNoCase user-agent “^Image\ Sucker” bad_bot=1
SetEnvIfNoCase user-agent “^Indy\ Library” bad_bot=1
SetEnvIfNoCase user-agent “^InterGET” bad_bot=1
SetEnvIfNoCase user-agent “^Internet\ Ninja” bad_bot=1
SetEnvIfNoCase user-agent “^JetCar” bad_bot=1
SetEnvIfNoCase user-agent “^JOC\ Web\ Spider” bad_bot=1
SetEnvIfNoCase user-agent “^larbin” bad_bot=1
SetEnvIfNoCase user-agent “^libwww-perl” bad_bot=1
SetEnvIfNoCase user-agent “^LinkWalker” bad_bot=1
SetEnvIfNoCase user-agent “^Mass\ Downloader” bad_bot=1
SetEnvIfNoCase user-agent “^Microsoft\ URL\ Control” bad_bot=1
SetEnvIfNoCase user-agent “^MIDown\ tool” bad_bot=1
SetEnvIfNoCase user-agent “^Mister\ PiX” bad_bot=1
SetEnvIfNoCase user-agent “^Morfeus Fucking Scanner” bad_bot=1
SetEnvIfNoCase user-agent “^Navroad” bad_bot=1
SetEnvIfNoCase user-agent “^NearSite” bad_bot=1
SetEnvIfNoCase user-agent “^Net\ Vampire” bad_bot=1
SetEnvIfNoCase user-agent “^NetAnts” bad_bot=1
SetEnvIfNoCase user-agent “^NetSpider” bad_bot=1
SetEnvIfNoCase user-agent “^NetZIP” bad_bot=1
SetEnvIfNoCase user-agent “^Nokia” bad_bot=1
SetEnvIfNoCase user-agent “^Nokia6230i” bad_bot=1
SetEnvIfNoCase user-agent “^NPBot” bad_bot=1
SetEnvIfNoCase user-agent “^Octopus” bad_bot=1
SetEnvIfNoCase user-agent “^Offline\ Explorer” bad_bot=1
SetEnvIfNoCase user-agent “^Offline\ Navigator” bad_bot=1
SetEnvIfNoCase user-agent “^page_verifier” bad_bot=1
SetEnvIfNoCase user-agent “^PageGrabber” bad_bot=1
SetEnvIfNoCase user-agent “^Papa\ Foto” bad_bot=1
SetEnvIfNoCase user-agent “^pavuk” bad_bot=1
SetEnvIfNoCase user-agent “^pcBrowser” bad_bot=1
SetEnvIfNoCase user-agent “^PHP\ version\ tracker” bad_bot=1
SetEnvIfNoCase user-agent “^Pingdom\ GIGRIB” bad_bot=1
SetEnvIfNoCase user-agent “^psbot” bad_bot=1
SetEnvIfNoCase user-agent “^RealDownload” bad_bot=1
SetEnvIfNoCase user-agent “^ReGet” bad_bot=1
SetEnvIfNoCase user-agent “^SBIder” bad_bot=1
SetEnvIfNoCase user-agent “^schibstedsokbot” bad_bot=1
SetEnvIfNoCase user-agent “^SiteSnagger” bad_bot=1
SetEnvIfNoCase user-agent “^SmartDownload” bad_bot=1
SetEnvIfNoCase user-agent “^SMBot” bad_bot=1
SetEnvIfNoCase user-agent “^sogou” bad_bot=1
SetEnvIfNoCase user-agent “^Sphere” bad_bot=1
SetEnvIfNoCase user-agent “^Strategic\ Board\ Bot” bad_bot=1
SetEnvIfNoCase user-agent “^studybot” bad_bot=1
SetEnvIfNoCase user-agent “^SuperBot” bad_bot=1
SetEnvIfNoCase user-agent “^SuperHTTP” bad_bot=1
SetEnvIfNoCase user-agent “^Surfbot” bad_bot=1
SetEnvIfNoCase user-agent “^SurveyBot” bad_bot=1
SetEnvIfNoCase user-agent “^tAkeOut” bad_bot=1
SetEnvIfNoCase user-agent “^Teleport\ Pro” bad_bot=1
SetEnvIfNoCase user-agent “^TheRarestParser” bad_bot=1
SetEnvIfNoCase user-agent “^TurnitinBot” bad_bot=1
SetEnvIfNoCase user-agent “^Twiceler” bad_bot=1
SetEnvIfNoCase user-agent “^User-Agent” bad_bot=1
SetEnvIfNoCase user-agent “^VCSoapClient” bad_bot=1
SetEnvIfNoCase user-agent “^VoidEYE” bad_bot=1
SetEnvIfNoCase user-agent “^Voila” bad_bot=1
SetEnvIfNoCase user-agent “^VoilaBot” bad_bot=1
SetEnvIfNoCase user-agent “^Voyager” bad_bot=1
SetEnvIfNoCase user-agent “^WasaBot” bad_bot=1
SetEnvIfNoCase user-agent “^Web\ Image\ Collector” bad_bot=1
SetEnvIfNoCase user-agent “^Web\ Sucker” bad_bot=1
SetEnvIfNoCase user-agent “^WebAlta\ Crawler/2.0″ bad_bot=1
SetEnvIfNoCase user-agent “^WebAuto” bad_bot=1
SetEnvIfNoCase user-agent “^WebCopier” bad_bot=1
SetEnvIfNoCase user-agent “^WebFetch” bad_bot=1
SetEnvIfNoCase user-agent “^WebGo\ IS” bad_bot=1
SetEnvIfNoCase user-agent “^WebLeacher” bad_bot=1
SetEnvIfNoCase user-agent “^WebReaper” bad_bot=1
SetEnvIfNoCase user-agent “^WebSauger” bad_bot=1
SetEnvIfNoCase user-agent “^Website\ eXtractor” bad_bot=1
SetEnvIfNoCase user-agent “^Website\ Quester” bad_bot=1
SetEnvIfNoCase user-agent “^WebStripper” bad_bot=1
SetEnvIfNoCase user-agent “^WebWhacker” bad_bot=1
SetEnvIfNoCase user-agent “^WebZIP” bad_bot=1
SetEnvIfNoCase user-agent “^Wget” bad_bot=1
SetEnvIfNoCase user-agent “^Widow” bad_bot=1
SetEnvIfNoCase user-agent “^WWWeasel” bad_bot=1
SetEnvIfNoCase user-agent “^WWWOFFLE” bad_bot=1
SetEnvIfNoCase user-agent “^Xaldon\ WebSpider” bad_bot=1
SetEnvIfNoCase user-agent “^YodaoBot” bad_bot=1
SetEnvIfNoCase user-agent “^Zeus” bad_bot=1

<FilesMatch “(.*)”>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</FilesMatch>

All the IP’s contained in our lookup database, has been taken from our logs, firewall, webserver and ftp / ssh servers. These are IP adresses that have attempted some sort of intrusion or abuse