Last Updated on 26th February 2024
Scraper Bots & Spam Bots Create Havoc With a Website
Have you ever had a moment when you find a spike in your analytics data and think, “yes, now were rocking!”? Then on closer inspection you find the reason for the spike is spam. It happens. It sucks!
Spambots can play havoc with your analytics data. In some cases, spammers visit so regularly they can influence the decision making of your overall content and digital marketing strategy.
Not to mention they are a pain in the rear-end – of your website as well! They can overload your server and slow down your load times, increase your bounce back rate and lower your ranking. They may even be trying to mine vital data (known as scraper bots) using data scraping tools and programmes.
They may potentially even block up the crawling superhighway on your website so much you then don’t get the crawling potential from important bots such as Googlebot and this could even possibly impact the healthy crawling and crawl rate on your website.
Spambots and scraper bots need to be stopped. Fortunately, you can take some steps to prevent the pests from visiting your website pretty easily.
Spambot prevention tactics
There are several ways of blocking referrers. The most effective is with the .htaccess file in your root directory, although this is a highly technical and requires knowledge of codes. You also need to have access to the .htaccess file in the first place which is not always the case if the site you are working on is a client site or you work in enterprise search where there are many layers of governance in place.
Warning – changing rules on .htaccess can go horribly wrong.
Using the .htaccess can cause your server to behave erratically if you get the regex coding wrong (it can even take your site down). One wrong character can collapse your entire site. If you are not tech-minded, or do not feel confident about getting the coding right, there are alternative methods that are much more simple and safe. There is also a very strict order to the items listed in .htaccess files and the rules run in order. Very easy to get this wrong.
If you do have access to .htaccess file or you have access to a developer who is confident with the technical aspects of .htaccess files then you can use the following code to block some of the most well known spambots and scraper bots right at the front door to your site:
This list of bad bots and code was produced by Tab Studio who update their list too from time to time. It’s important to note though that some tools you may want to use for your SEO analysis and ongoing campaigns are listed here (for example SEM Rush is on the list and this is a popular SEO analysts tool. If you decide you want to exclude a bot from the list temporarily add a hash (#) at the beginning of the line which the bot you wish to allow access to is on. You could also just delete the bot from the list as well of course. The hash is known as ‘commenting out the line’. i.e. You are saying to just ignore that line of code.
Here is the list below kindly developed by Tab Studio.
#bad bots start #programmed by tab-studio.com public version 2017.12 #1 new rule every 500 entries RewriteCond %{HTTP_USER_AGENT} \ 12soso|\ 192\.comagent|\ 1noonbot|\ 1on1searchbot|\ 3de\_search2|\ 3d\_search|\ 3g\ bot|\ 3gse|\ 50\.nu|\ a1\ sitemap\ generator|\ a1\ website\ download|\ a6\-indexer|\ aasp|\ abachobot|\ abonti|\ abotemailsearch|\ aboundex|\ aboutusbot|\ accmonitor\ compliance\ server|\ accoon|\ achulkov\.net\ page\ walker|\ acme\.spider|\ acoonbot|\ acquia\-crawler|\ activetouristbot|\ ad\ muncher|\ adamm\ bot|\ adbeat\_bot|\ adminshop\.com|\ advanced\ email\ extractor|\ aesop\_com\_spiderman|\ aespider|\ af\ knowledge\ now\ verity\ spider|\ aggregator:vocus|\ ah\-ha\.com\ crawler|\ ahrefs|\ aibot|\ aidu|\ aihitbot|\ aipbot|\ aisiid|\ aitcsrobot/1\.1|\ ajsitemap|\ akamai\-sitesnapshot|\ alexawebsearchplatform|\ alexfdownload|\ alexibot|\ alkalinebot|\ all\ acronyms\ bot|\ alpha\ search\ agent|\ amerla\ search\ bot|\ amfibibot|\ ampmppc\.com|\ amznkassocbot|\ anemone|\ anonymous|\ anotherbot|\ answerbot|\ answerbus|\ answerchase\ prove|\ antbot|\ antibot|\ antisantyworm|\ antro\.net|\ aonde\-spider|\ aport|\ appengine\-google|\ appid\:\ s\~stremor\-crawler\-|\ aqua\_products|\ arabot|\ arachmo|\ arachnophilia|\ archive\.org\_bot|\ aria\ equalizer|\ arianna\.libero\.it|\ arikus\_spider|\ art\-online\.com|\ artavisbot|\ artera|\ asaha\ search\ engine\ turkey|\ ask|\ aspider|\ aspseek|\ asterias|\ astrofind|\ athenusbot|\ atlocalbot|\ atomic\_email\_hunter|\ attach|\ attrakt|\ attributor|\ augurfind|\ auresys|\ autobaron\ crawler|\ autoemailspider|\ autowebdir|\ avsearch\-|\ axfeedsbot|\ axonize\-bot|\ ayna|\ b2w|\ backdoorbot|\ backrub|\ backstreet\ browser|\ backweb|\ baidu|\ bandit|\ batchftp|\ baypup|\ bdfetch|\ becomebot|\ becomejpbot|\ beetlebot|\ bender|\ besserscheitern\-crawl|\ betabot|\ big\ brother|\ big\ data|\ bigado\.com|\ bigcliquebot|\ bigfoot|\ biglotron|\ bilbo|\ bilgibetabot|\ bilgibot|\ bintellibot|\ bitlybot|\ bitvouseragent|\ bizbot003|\ bizbot04|\ bizworks\ retriever|\ black\ hole|\ black\.hole|\ blackbird|\ blackmask\.net\ search\ engine|\ blackwidow|\ bladder\ fusion|\ blaiz\-bee|\ blexbot|\ blinkx|\ blitzbot|\ blog\ conversation\ project|\ blogmyway|\ blogpulselive|\ blogrefsbot|\ blogscope|\ blogslive|\ bloobybot|\ blowfish|\ blt|\ bnf\.fr\_bot|\ boaconstrictor|\ boardreader|\ boia\-scan\-agent|\ boia\.org|\ boitho|\ boi\_crawl\_00|\ bookmark\ buddy\ bookmark\ checker|\ bookmark\ search\ tool|\ bosug|\ bot\ apoena|\ botalot|\ botrighthere|\ botswana|\ bottybot|\ bpbot|\ braintime\_search|\ brokenlinkcheck\.com|\ browseremulator|\ browsermob|\ bruinbot|\ bsearchr&d|\ bspider|\ btbot|\ btsearch|\ bubing|\ buddy|\ buibui|\ buildcms\ crawler|\ builtbottough|\ bullseye|\ bumblebee|\ bunnyslippers|\ buscadorclarin|\ buscaplus\ robi|\ butterfly|\ buyhawaiibot|\ buzzbot|\ byindia|\ byspider|\ byteserver|\ bzbot|\ c\ r\ a\ w\ l\ 3\ r|\ cacheblaster|\ caddbot|\ cafi|\ camcrawler|\ camelstampede|\ canon\-webrecord|\ careerbot|\ cataguru|\ catchbot|\ cazoodle|\ ccbot|\ ccgcrawl|\ ccubee|\ cd\-preload|\ ce\-preload|\ cegbfeieh|\ cerberian\ drtrs|\ cert\ figleafbot|\ cfetch|\ cfnetwork|\ chameleon|\ charlotte|\ check&get|\ checkbot|\ checklinks|\ cheesebot|\ chemiede\-nodebot|\ cherrypicker|\ chilkat|\ chinaclaw|\ cipinetbot|\ cis455crawler|\ citeseerxbot|\ cizilla|\ clariabot|\ climate\ ark|\ climateark\ spider|\ clshttp|\ clushbot|\ coast\ scan\ engine|\ coast\ webmaster\ pro|\ coccoc|\ collapsarweb|\ collector|\ colocrossing|\ combine|\ connectsearch|\ conpilot|\ contentsmartz|\ contextad\ bot|\ contype|\ cookienet|\ coolbot|\ coolcheck|\ copernic|\ copier|\ copyrightcheck|\ core\-project|\ cosmos|\ covario\-ids|\ cowbot\-|\ cowdog\ bot|\ crabbybot|\ craftbot\@yahoo\.com|\ crawler\.kpricorn\.org|\ crawler43\.ejupiter\.com|\ crawler4j|\ crawler@|\ crawler\_for\_infomine|\ crawly|\ crawl\_application|\ creativecommons|\ crescent|\ cs\-crawler|\ cse\ html\ validator|\ cshttpclient|\ cuasarbot|\ culsearch|\ curl|\ custo|\ cvaulev|\ cyberdog|\ cybernavi\_webget|\ cyberpatrol\ sitecat\ webbot|\ cyberspyder|\ cydralspider|\ d1garabicengine|\ datacha0s|\ datafountains|\ dataparksearch|\ dataprovider\.com|\ datascape\ robot|\ dataspearspiderbot|\ dataspider|\ dattatec\.com|\ daumoa|\ dblbot|\ dcpbot|\ declumbot|\ deepindex|\ deepnet\ crawler|\ deeptrawl|\ dejan|\ del\.icio\.us\-thumbnails|\ deltascan|\ delvubot|\ der\ gro§e\ bildersauger|\ der\ große\ bildersauger|\ deusu|\ dfs\-fetch|\ diagem|\ diamond|\ dibot|\ didaxusbot|\ digext|\ digger|\ digi\-rssbot|\ digitalarchivesbot|\ digout4u|\ diibot|\ dillo|\ dir\_snatch\.exe|\ disco|\ distilled\-reputation\-monitor|\ djangotraineebot|\ dkimrepbot|\ dmoz\ downloader|\ docomo|\ dof\-verify|\ domaincrawler|\ domainscan|\ domainwatcher\ bot|\ dotbot|\ dotspotsbot|\ dow\ jones\ searchbot|\ download|\ doy|\ dragonfly|\ drip|\ drone|\ dtaagent|\ dtsearchspider|\ dumbot|\ dwaar|\ dxseeker|\ e\-societyrobot|\ eah|\ earth\ platform\ indexer|\ earth\ science\ educator\ \ robot|\ easydl|\ ebingbong|\ ec2linkfinder|\ ecairn\-grabber|\ ecatch|\ echoosebot|\ edisterbot|\ edugovsearch|\ egothor|\ eidetica\.com|\ eirgrabber|\ elblindo\ the\ blind\ bot|\ elisabot|\ ellerdalebot|\ email\ exractor|\ emailcollector|\ emailleach|\ emailsiphon|\ emailwolf|\ emeraldshield|\ empas\_robot|\ enabot|\ endeca|\ enigmabot|\ enswer\ neuro\ bot|\ enter\ user\-agent|\ entitycubebot|\ erocrawler|\ estylesearch|\ esyndicat\ bot|\ eurosoft\-bot|\ evaal|\ eventware|\ everest\-vulcan\ inc\.|\ exabot|\ exactsearch|\ exactseek|\ exooba|\ exploder|\ express\ webpictures|\ extractor|\ eyenetie|\ ez\-robot|\ ezooms|\ f\-bot\ test\ pilot|\ factbot|\ fairad\ client|\ falcon|\ fast\ data\ search\ document\ retriever|\ fast\ esp|\ fast\-search\-engine|\ fastbot\ crawler|\ fastbot\.de\ crawler|\ fatbot|\ favcollector|\ faviconizer|\ favorites\ sweeper|\ fdm|\ fdse\ robot|\ fedcontractorbot|\ fembot|\ fetch\ api\ request|\ fetch\_ici|\ fgcrawler|\ filangy|\ filehound|\ findanisp\.com\_isp\_finder|\ findlinks|\ findweb|\ firebat|\ firstgov\.gov\ search|\ flaming\ attackbot|\ flamingo\_searchengine|\ flashcapture|\ flashget|\ flickysearchbot|\ fluffy\ the\ spider|\ flunky|\ focused\_crawler|\ followsite|\ foobot|\ fooooo\_web\_video\_crawl|\ fopper|\ formulafinderbot|\ forschungsportal|\ francis|\ freewebmonitoring\ sitechecker|\ freshcrawler|\ freshdownload|\ freshlinks\.exe|\ friendfeedbot|\ frodo\.at|\ froggle|\ frontpage|\ froola\ bot|\ fr\_crawler|\ fu\-nbi|\ full\_breadth\_crawler|\ funnelback|\ furlbot|\ g10\-bot|\ gaisbot|\ galaxybot|\ gazz|\ gbplugin|\ generate\_infomine\_category\_classifiers|\ genevabot|\ geniebot|\ genieo|\ geomaxenginebot|\ geometabot|\ geonabot|\ geovisu|\ germcrawler\ |\ gethtmlcontents|\ getleft|\ getright|\ getsmart|\ geturl\.rexx|\ getweb!|\ giant|\ gigablastopensource|\ gigabot|\ girafabot|\ gleamebot|\ gnome\-vfs|\ go!zilla|\ go\-ahead\-got\-it|\ go\-http\-client|\ goforit\.com|\ goforitbot|\ gold\ crawler|\ goldfire\ server|\ golem|\ goodjelly|\ gordon\-college\-google\-mini|\ goroam|\ goseebot|\ gotit|\ govbot|\ gpu\ p2p\ crawler|\ grabber|\ grabnet|\ grafula|\ grapefx|\ grapeshot|\ grbot|\ greenyogi|\ gromit|\ grub|\ gsa|\ gslfbot|\ gulliver|\ gulperbot|\ gurujibot|\ gvc\ business\ crawler|\ gvc\ crawler|\ gvc\ search\ bot|\ gvc\ web\ crawler|\ gvc\ weblink\ crawler|\ gvc\ world\ links|\ gvcbot\.com|\ happyfunbot|\ harvest|\ hatena\ antenna|\ hawler|\ hcat|\ hclsreport\-crawler|\ hd\ nutch\ agent|\ header\_test\_client|\ healia\ [NC,OR] #500 new rule RewriteCond %{HTTP_USER_AGENT} \ helix|\ here\ will\ be\ link\ to\ crawler\ site|\ heritrix|\ hiscan|\ hisoftware\ accmonitor\ server|\ hisoftware\ accverify|\ hitcrawler|\ hivabot|\ hloader|\ hmsebot|\ hmview|\ hoge|\ holmes|\ homepagesearch|\ hooblybot\-image|\ hoowwwer|\ hostcrawler|\ hsft\ \\-\ link\ scanner|\ hsft\ \\-\ lvu\ scanner|\ hslide|\ ht://check|\ htdig|\ html\ link\ validator|\ htmlparser|\ httplib|\ httrack|\ huaweisymantecspider|\ hul\-wax|\ humanlinks|\ hyperestraier|\ hyperix|\ iaarchiver\-|\ ia\_archiver|\ ibuena|\ icab|\ icds\-ingestion|\ ichiro|\ icopyright\ conductor|\ ieautodiscovery|\ iecheck|\ ihwebchecker|\ iiitbot|\ iim\_405|\ ilsebot|\ iltrovatore|\ image\ stripper|\ image\ sucker|\ image\-fetcher|\ imagebot|\ imagefortress|\ imageshereimagesthereimageseverywhere|\ imagevisu|\ imds\_monitor|\ imo\-google\-robot\-intelink|\ inagist\.com\ url\ crawler|\ indexer|\ industry\ cortex\ webcrawler|\ indy\ library|\ indylabs\_marius|\ inelabot|\ inet32\ ctrl|\ inetbot|\ info\ seeker|\ infolink|\ infomine|\ infonavirobot|\ informant|\ infoseek\ sidewinder|\ infotekies|\ infousabot|\ ingrid|\ inktomi|\ insightscollector|\ insightsworksbot|\ inspirebot|\ insumascout|\ intelix|\ intelliseek|\ interget|\ internet\ ninja|\ internet\ radio\ crawler|\ internetlinkagent|\ interseek|\ ioi|\ ip\-web\-crawler\.com|\ ipadd\ bot|\ ipselonbot|\ ips\-agent|\ iria|\ irlbot|\ iron33|\ isara|\ isearch|\ isilox|\ istellabot|\ its\-learning\ crawler|\ iu\_csci\_b659\_class\_crawler|\ ivia|\ jadynave|\ java|\ jbot|\ jemmathetourist|\ jennybot|\ jetbot|\ jetbrains\ omea\ pro|\ jetcar|\ jim|\ jobo|\ jobspider\_ba|\ joc|\ joedog|\ joyscapebot|\ jspyda|\ junut\ bot|\ justview|\ jyxobot|\ k\.s\.bot|\ kakclebot|\ kalooga|\ katatudo\-spider|\ kbeta1|\ keepni\ web\ site\ monitor|\ kenjin\.spider|\ keybot\ translation\-search\-machine|\ keywenbot|\ keyword\ density|\ keyword\.density|\ kinjabot|\ kitenga\-crawler\-bot|\ kiwistatus|\ kmbot\-|\ kmccrew\ bot\ search|\ knight|\ knowitall|\ knowledge\ engine|\ knowledge\.com|\ koepabot|\ koninklijke|\ korniki|\ krowler|\ ksbot|\ kuloko\-bot|\ kulturarw3|\ kummhttp|\ kurzor|\ kyluka\ crawl|\ l\.webis|\ labhoo|\ labourunions411|\ lachesis|\ lament|\ lamerexterminator|\ lapozzbot|\ larbin|\ lbot|\ leaptag|\ leechftp|\ leechget|\ letscrawl\.com|\ lexibot|\ lexxebot|\ lftp|\ libcrawl|\ libiviacore|\ libw|\ likse|\ linguee\ bot|\ link\ checker|\ link\ validator|\ linkalarm|\ linkbot|\ linkcheck\ by\ siteimprove\.com|\ linkcheck\ scanner|\ linkchecker|\ linkdex\.com|\ linkextractorpro|\ linklint|\ linklooker|\ linkman|\ links\ sql|\ linkscan|\ linksmanager\.com\_bot|\ linksweeper|\ linkwalker|\ link\_checker|\ litefinder|\ litlrbot|\ little\ grabber\ at\ skanktale\.com|\ livelapbot|\ lm\ harvester|\ lmqueuebot|\ lnspiderguy|\ loadtimebot|\ localcombot|\ locust|\ lolongbot|\ lookbot|\ lsearch|\ lssbot|\ lt\ scotland\ checklink|\ ltx71.com|\ lwp|\ lycos\_spider|\ lydia\ entity\ spider|\ lynnbot|\ lytranslate|\ mag\-net|\ magnet|\ magpie\-crawler|\ magus\ bot|\ mail\.ru|\ mainseek\_bot|\ mammoth|\ map\ robot|\ markwatch|\ masagool|\ masidani\_bot\_|\ mass\ downloader|\ mata\ hari|\ mata\.hari|\ matentzn\ at\ cs\ dot\ man\ dot\ ac\ dot\ uk|\ maxamine\.com\-\-robot|\ maxamine\.com\-robot|\ maxomobot|\ mcbot|\ medrabbit|\ megite|\ memacbot|\ memo|\ mendeleybot|\ mercator\-|\ mercuryboard\_user\_agent\_sql\_injection\.nasl|\ metacarta|\ metaeuro\ web\ search|\ metager2|\ metagloss|\ metal\ crawler|\ metaquerier|\ metaspider|\ metaspinner|\ metauri|\ mfcrawler|\ mfhttpscan|\ midown\ tool|\ miixpc|\ mini\-robot|\ minibot|\ minirank|\ mirror|\ missigua\ locator|\ mister\ pix|\ mister\.pix|\ miva|\ mj12bot|\ mnogosearch|\ moduna\.com|\ mod\_accessibility|\ moget|\ mojeekbot|\ monkeycrawl|\ moses|\ mowserbot|\ mqbot|\ mse360|\ msindianwebcrawl|\ msmobot|\ msnptc|\ msrbot|\ mt\-soft|\ multitext|\ my\-heritrix\-crawler|\ myapp|\ mycompanybot|\ mycrawler|\ myengines\-us\-bot|\ myfamilybot|\ myra|\ my\_little\_searchengine\_project|\ nabot|\ najdi\.si|\ nambu|\ nameprotect|\ nasa\ search|\ natchcvs|\ natweb\-bad\-link\-mailer|\ naver|\ navroad|\ nearsite|\ nec\-meshexplorer|\ neosciocrawler|\ nerdbynature\.bot|\ nerdybot|\ nerima\-crawl-|\ nessus|\ nestreader|\ net\ vampire|\ net::trackback|\ netants|\ netcarta\ cyberpilot\ pro|\ netcraft|\ netexperts|\ netid\.com\ bot|\ netmechanic|\ netprospector|\ netresearchserver|\ netseer|\ netshift=|\ netsongbot|\ netsparker|\ netspider|\ netsrcherp|\ netzip|\ newmedhunt|\ news\ bot|\ newsgatherer|\ newsgroupreporter|\ newstrovebot|\ news\_search\_app|\ nextgensearchbot|\ nextthing\.org|\ nicebot|\ nicerspro|\ niki\-bot|\ nimblecrawler|\ nimbus\-1|\ ninetowns|\ ninja|\ njuicebot|\ nlese|\ nogate|\ norbert\ the\ spider|\ noteworthybot|\ npbot|\ nrcan\ intranet\ crawler|\ nsdl\_search\_bot|\ nuggetize\.com\ bot|\ nusearch\ spider|\ nutch|\ nu\_tch|\ nwspider|\ nymesis|\ nys\-crawler|\ objectssearch|\ obot|\ obvius\ external\ linkcheck|\ ocelli|\ octopus|\ odp\ entries\ t\_st|\ oegp|\ offline\ navigator|\ offline\.explorer|\ ogspider|\ omiexplorer\_bot|\ omniexplorer|\ omnifind|\ omniweb|\ onetszukaj|\ online\ link\ validator|\ oozbot|\ openbot|\ openfind|\ openintelligencedata|\ openisearch|\ openlink\ virtuoso\ rdf\ crawler|\ opensearchserver\_bot|\ opidig|\ optidiscover|\ oracle\ secure\ enterprise\ search|\ oracle\ ultra\ search|\ orangebot|\ orisbot|\ ornl\_crawler|\ ornl\_mercury|\ osis\-project\.jp|\ oso|\ outfoxbot|\ outfoxmelonbot|\ owler\-bot|\ owsbot|\ ozelot|\ p3p\ client|\ pagebiteshyperbot|\ pagebull|\ pagedown|\ pagefetcher|\ pagegrabber|\ pagepeeker|\ pagerank\ monitor|\ page\_verifier|\ pamsnbot\.htm|\ panopy\ bot|\ panscient\.com|\ pansophica|\ papa\ foto|\ paperlibot|\ parasite|\ parsijoo|\ pathtraq|\ pattern|\ patwebbot|\ pavuk|\ paxleframework|\ pbbot|\ pcbrowser|\ pcore\-http|\ pd\-crawler|\ penthesila|\ perform\_crawl|\ perman|\ personal\ ultimate\ crawler|\ php\ version\ tracker|\ phpcrawl|\ phpdig|\ picosearch|\ pieno\ robot|\ pipbot|\ pipeliner|\ pita|\ pixfinder|\ piyushbot|\ planetwork\ bot\ search|\ plucker|\ plukkie|\ plumtree|\ pockey|\ pocohttp|\ pogodak\.ba|\ pogodak\.co\.yu|\ poirot|\ polybot|\ pompos|\ poodle\ predictor|\ popscreenbot|\ postpost|\ privacyfinder|\ projectwf\-java\-test\-crawler|\ propowerbot|\ prowebwalker|\ proxem\ websearch|\ proximic|\ proxy\ crawler|\ psbot|\ pss\-bot|\ psycheclone|\ pub\-crawler|\ pucl|\ pulsebot|\ pump|\ pwebot|\ python|\ qeavis\ agent|\ qfkbot|\ qualidade|\ qualidator\.com\ bot|\ quepasacreep|\ queryn\ metasearch|\ queryn\.metasearch|\ quest\.durato|\ quintura\-crw|\ qunarbot|\ qwantify|\ qweerybot|\ qweery\_robot\.txt\_checkbot|\ r2ibot|\ r6\_commentreader|\ r6\_feedfetcher|\ r6\_votereader|\ rabot|\ radian6|\ radiation\ retriever|\ rampybot|\ rankivabot|\ rankur|\ rational\ sitecheck|\ rcstartbot|\ realdownload|\ reaper|\ rebi\-shoveler|\ recorder|\ redbot|\ redcarpet|\ reget|\ repomonkey|\ research\ robot|\ riddler|\ riight|\ risenetbot|\ riverglassscanner\ [NC,OR] #1000 new rule RewriteCond %{HTTP_USER_AGENT} \ robopal|\ robosourcer|\ robotek|\ robozilla|\ roger|\ rome\ client|\ rondello|\ rotondo|\ roverbot|\ rpt\-httpclient|\ rtgibot|\ rufusbot|\ runnk\ online\ rss\ reader|\ runnk\ rss\ aggregator|\ s2bot|\ safaribookmarkchecker|\ safednsbot|\ safetynet\ robot|\ saladspoon|\ sapienti|\ sapphireweb|\ sbider|\ sbl\-bot|\ scfcrawler|\ scich|\ scientificcommons\.org|\ scollspider|\ scooperbot|\ scooter|\ scoutjet|\ scrapebox|\ scrapy|\ scrawltest|\ screaming\ frog|\ scrubby|\ scspider|\ scumbot|\ search\ publisher|\ search\ x\-bot|\ search\-channel|\ search\-engine\-studio|\ search\.kumkie\.com|\ search\.updated\.com|\ search\.usgs\.gov|\ searcharoo\.net|\ searchblox|\ searchbot|\ searchengine|\ searchhippo\.com|\ searchit\-bot|\ searchmarking|\ searchmarks|\ searchmee!|\ searchmee\_v|\ searchmining|\ searchnowbot|\ searchpreview|\ searchspider\.com|\ searqubot|\ seb\ spider|\ seekbot|\ seeker\.lookseek\.com|\ seeqbot|\ seeqpod\-vertical\-crawler|\ selflinkchecker|\ semager|\ semanticdiscovery|\ semantifire|\ semisearch|\ semrushbot|\ seoengworldbot|\ seokicks|\ seznambot|\ shablastbot|\ shadowwebanalyzer|\ shareaza|\ shelob|\ sherlock|\ shim\-crawler|\ shopsalad|\ shopwiki|\ showlinks|\ showyoubot|\ siclab|\ silk|\ simplepie|\ siphon|\ sitebot|\ sitecheck|\ sitefinder|\ siteguardbot|\ siteorbiter|\ sitesnagger|\ sitesucker|\ sitesweeper|\ sitexpert|\ skimbot|\ skimwordsbot|\ skreemrbot|\ skywalker|\ sleipnir|\ slow\-crawler|\ slysearch|\ smart\-crawler|\ smartdownload|\ smarte\ bot|\ smartwit\.com|\ snake|\ snap\.com\ beta\ crawler|\ snapbot|\ snappreviewbot|\ snappy|\ snookit|\ snooper|\ snoopy|\ societyrobot|\ socscibot|\ soft411\ directory|\ sogou|\ sohu\ agent|\ sohu\-search|\ sokitomi\ crawl|\ solbot|\ sondeur|\ sootle|\ sosospider|\ space\ bison|\ space\ fung|\ spacebison|\ spankbot|\ spanner|\ spatineo\ monitor\ controller|\ spatineo\ serval\ controller|\ spatineo\ serval\ getmapbot|\ special\_archiver|\ speedy|\ sphere\ scout|\ sphider|\ spider\.terranautic\.net|\ spiderengine|\ spiderku|\ spiderman|\ spinn3r|\ spinne|\ sportcrew\-bot|\ sproose|\ spyder3\.microsys\.com|\ sq\ webscanner|\ sqlmap|\ squid\-prefetch|\ squidclamav\_redirector|\ sqworm|\ srevbot|\ sslbot|\ ssm\ agent|\ stackrambler|\ stardownloader|\ statbot|\ statcrawler|\ statedept\-crawler|\ steeler|\ stegmann\-bot|\ stero|\ stripper|\ stumbler|\ suchclip|\ sucker|\ sumeetbot|\ sumitbot|\ summizebot|\ summizefeedreader|\ sunrise\ xp|\ superbot|\ superhttp|\ superlumin\ downloader|\ superpagesbot|\ supremesearch\.net|\ supybot|\ surdotlybot|\ surf|\ surveybot|\ suzuran|\ swebot|\ swish\-e|\ sygolbot|\ synapticwalker|\ syntryx\ ant\ scout\ chassis\ pheromone|\ systemsearch\-robot|\ szukacz|\ s\~stremor\-crawler|\ t\-h\-u\-n\-d\-e\-r\-s\-t\-o\-n\-e|\ tailrank|\ takeout|\ talkro\ web\-shot|\ tamu\_crawler|\ tapuzbot|\ tarantula|\ targetblaster\.com|\ targetyournews\.com\ bot|\ tausdatabot|\ taxinomiabot|\ teamsoft\ wininet\ component|\ tecomi\ bot|\ teezirbot|\ teleport|\ telesoft|\ teradex\ mapper|\ teragram\_crawler|\ terrawizbot|\ testbot|\ testing\ of\ bot|\ textbot|\ thatrobotsite\.com|\ the\ dyslexalizer|\ the\ intraformant|\ the\.intraformant|\ thenomad|\ theophrastus|\ theusefulbot|\ thumbbot|\ thumbnail\.cz\ robot|\ thumbshots\-de\-bot|\ tigerbot|\ tighttwatbot|\ tineye|\ titan|\ to\-dress\_ru\_bot\_|\ to\-night\-bot|\ tocrawl|\ topicalizer|\ topicblogs|\ toplistbot|\ topserver\ php|\ topyx\-crawler|\ touche|\ tourlentascanner|\ tpsystem|\ traazi|\ transgenikbot|\ travel\-search|\ travelbot|\ travellazerbot|\ treezy|\ trendiction|\ trex|\ tridentspider|\ trovator|\ true\_robot|\ tscholarsbot|\ tsm\ translation\-search\-machine|\ tswebbot|\ tulipchain|\ turingos|\ turnitinbot|\ tutorgigbot|\ tweetedtimes\ bot|\ tweetmemebot|\ twengabot|\ twice|\ twikle|\ twinuffbot|\ twisted\ pagegetter|\ twitturls|\ twitturly|\ tygobot|\ tygoprowler|\ typhoeus|\ u\.s\.\ government\ printing\ office|\ uberbot|\ ucb\-nutch|\ udmsearch|\ ufam\-crawler\-|\ ultraseek|\ unchaos|\ unisterbot|\ unidentified|\ unitek\ uniengine|\ universalsearch|\ unwindfetchor|\ uoftdb\_experiment|\ updated|\ url\ control|\ url\-checker|\ urlappendbot|\ urlblaze|\ urlchecker|\ urlck|\ urldispatcher|\ urlspiderpro|\ urly\ warning|\ urly\.warning|\ url\_gather|\ usaf\ afkn\ k2spider|\ usasearch|\ uss\-cosmix|\ usyd\-nlp\-spider|\ vacobot|\ vacuum|\ vadixbot|\ vagabondo|\ validator|\ valkyrie|\ vbseo|\ vci\ webviewer\ vci\ webviewer\ win32|\ verbstarbot|\ vericitecrawler|\ verifactrola|\ verity\-url\-gateway|\ vermut|\ versus\ crawler|\ versus\.integis\.ch|\ viasarchivinginformation\.html|\ vipr|\ virus\-detector|\ virus\_detector|\ visbot|\ vishal\ for\ clia|\ visweb|\ vital\ search'n\ urchin|\ vlad|\ vlsearch|\ voilabot|\ vmbot|\ vocusbot|\ voideye|\ voil|\ vortex|\ voyager|\ vspider|\ w3c\-webcon|\ w3c\_unicorn|\ w3search|\ wacbot|\ wanadoo|\ wastrix|\ water\ conserve\ portal|\ water\ conserve\ spider|\ watzbot|\ wauuu|\ wavefire|\ waypath|\ wazzup|\ wbdbot|\ web\ ceo\ online\ robot|\ web\ crawler|\ web\ downloader|\ web\ image\ collector|\ web\ link\ validator|\ web\ magnet|\ web\ site\ downloader|\ web\ sucker|\ web\-agent|\ web\-sniffer|\ web\.image\.collector|\ webaltbot|\ webauto|\ webbot|\ webbul\-bot|\ webcapture|\ webcheck|\ webclipping\.com|\ webcollage|\ webcopier|\ webcopy|\ webcorp|\ webcrawl\.net|\ webcrawler|\ webdatacentrebot|\ webdownloader\ for\ x|\ webdup|\ webemailextrac|\ webenhancer|\ webfetch|\ webgather|\ webgo\ is|\ webgobbler|\ webimages|\ webinator\-search2|\ webinator\-wbi|\ webindex|\ weblayers|\ webleacher|\ weblexbot|\ weblinker|\ weblyzard|\ webmastercoffee|\ webmasterworld\ extractor|\ webmasterworldforumbot|\ webminer|\ webmoose|\ webot|\ webpix|\ webreaper|\ webripper|\ websauger|\ webscan|\ websearchbench|\ website|\ webspear|\ websphinx|\ webspider|\ webster|\ webstripper|\ webtrafficexpress|\ webtrends\ link\ analyzer|\ webvac|\ webwalk|\ webwasher|\ webwatch|\ webwhacker|\ webxm|\ webzip|\ weddings\.info|\ wenbin|\ wep\ search|\ wepa|\ werelatebot|\ wget|\ whacker|\ whirlpool\ web\ engine|\ whowhere\ robot|\ widow|\ wikiabot|\ wikio|\ wikiwix\-bot\-|\ winhttp|\ wire|\ wisebot|\ wisenutbot|\ wish\-la|\ wish\-project|\ wisponbot|\ wmcai\-robot|\ wminer|\ wmsbot|\ woriobot|\ worldshop|\ worqmada|\ wotbox|\ wume\_crawler|\ www\ collector|\ www\-collector\-e|\ www\-mechanize|\ wwwoffle|\ wwwrobot|\ wwwster|\ wwwwanderer|\ wwwxref|\ wysigot|\ x\-clawler|\ x\-crawler|\ xaldon|\ xenu|\ xerka\ metabot|\ xerka\ webbot|\ xget|\ xirq|\ xmarksfetch|\ xqrobot|\ y!j|\ yacy\.net|\ yacybot|\ yanga\ worldsearch\ bot|\ yarienavoir\.net|\ yasaklibot|\ yats\ crawler|\ ybot|\ yebolbot|\ yellowjacket|\ yeti|\ yolinkbot|\ yooglifetchagent|\ yoono|\ yottacars\_bot|\ yourls|\ z\-add\ link\ checker|\ zagrebin|\ zao|\ zedzo\.validate|\ zermelo|\ zeus|\ zibber\-v|\ zimeno|\ zing-bottabot|\ zipppbot|\ zongbot|\ zoomspider|\ zotag\ search|\ zsebot|\ zuibot|\ zyborg|\ zyte\ [NC] RewriteRule .* - [F] #bad bots end
WordPress plugins
The easiest way to prevent spambots accessing your site is by installing the Wp-Ban plugin. This won’t fix historical data, but it will protect you against any attempts to crawl your site in the future.
The only problem with plugins however, is they have a shelf life. The crafty merchants that program spambots to crawl sites will always find a way to get around the access codes in plugins and cyber-attackers are always one step ahead of defence systems.
Google Analytics filters
In addition to the WP plugins, or whatever the alternative is for the content management system you are using, you also have the option to put filters in your analytics data.
The downside to this is the data will not register legitimate visitors from the countries you are filtering.
For example, many spambots originate from countries like Russia, India and Malaysia, and show referrals under names such as semalt, for example. If you do not attract traffic, or even want to attract traffic from these countries, activate your spam filters. This won’t prevent pesky spambots from visiting, but at least it won’t mess up your data. This, of course, won’t help with the issues of load on your server, but it goes part way to resolving the overall issue.
Data analytics is an important part of your digital marketing strategy and helps you identify which tactics are working and which are not. When spambots mess with the data, it is difficult to determine which tactics are effective.
Essentially, spambots can waste your time and money and have a detrimental effect on your page rank. Unless you take measures to prevent the problem of spambots, they could cause your SEO and marketing teams a lot of headaches.
Exclude Hits from Bots in Google Analytics View Settings
Another way to prevent spam bots from messing with your Google Analytics measurement is to actively set another exclusion in Google Analytics. This one is available at a ‘view’ level which is a subset of your property. Go to view settings and then you will see an item called ‘Bot Filtering’.
The option is marked with “exclude all hits from known bots and spiders”. Check this item in your view setting.
Hopefully you’ve now found a few more ways to block bad bots and scraper bots from messing up your Google Analytics and other analytics measurements. We’re always looking for ways to improve on our posts so if you have any suggestions please let us know in the comments below.