Comment empêcher les robots spammeur d'accéder à un site ?

Réponses rédigées par Antoine
Dernière mise à jour : 2018-08-27 17:19:55
Question

Comment faire pour empêcher les robots spammeur d'accéder à mon site ?

Réponse

Il est utile de bloquer l'accès à votre site à certains robots. Il existe tout un tas de robots dont leurs objectifs est d'utiliser certaines ressources de votre site à votre insu. Entre les rudebot, crawlbot, mailbot, exploitbot, spoofbot, formbot, spambot & cie, la liste à bloquer est longue.

Pour bloquer tous ces bots vous pouvez utiliser un fichier .htaccess à placer à la racine de votre site.

Pour identifier ces robots on utilise le user agent, il s'agit d'une chaîne de caractères qui est envoyée par le navigateur internet de l'internaute ou le robot, au serveur qui héberge votre site.

Deux solutions s'offrent à vous, soit vous bloquer les mauvais bots en les listant, sachant que la liste est importante et ne cesse d'évoluer.

Voici une liste non exhaustive :

Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^alexibot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^anarchie [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^AsiaNetBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^aesop_com_spiderman [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ASSORT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^attache [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ATHENS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^autohttp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^backweb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^bandit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^batchftp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^bew [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^bigfoot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^black.?hole [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^botalot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^buddy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^builtbottough [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bullseye [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^cheesebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^cherry.?picker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^chinaclaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^copier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^copyrightcheck [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^cosmos [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^custo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^curl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^da [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Deweb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^devsoft's\ http\ component [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^diibot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^disco [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^dittospyder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^dragonfly [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Digimarc [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Digger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^digout4uagent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^dloader(NaverRobot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^drip [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EasyDL/\d\.\d+ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ebingbong [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ecollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Educate\ Search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EO\ Browse [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^erocrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^fastlwspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FEZhead [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Fetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Filehound [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlickBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Flunky [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Franklin\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Full\ Web\ Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Getleft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetURL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWebPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Gozilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^go-ahead-got-it [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HTML\ Works [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^IBM_Planetwide [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ilsebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^imagefetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^IncyWincy[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Industry\ Program[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Explore\ 5\.x [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Irvine [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^KWebGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^leech[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LINKS\ ARoMATIZED [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^linkscan [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^likse [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^linkwalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^magnet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^mag-net [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^markwatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^mata.?hari [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^memo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^miixpc [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MCspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mirror [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Missauga\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Missigua\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^mister\ pix [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Monster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^netattache [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetCarta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Netcraft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Netmechanic [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^nextgensearchbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NPBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^OpaL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Openfind [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^OpenTextSiteCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PackRat [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pcbrowser|php.?version.?tracker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pockey [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PersonaPilot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Plucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Production\ Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Program\ Shareware [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^propowerbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^prowebwalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pump [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PushSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^rma [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^RepoMonkey [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Rover[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Rsync[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ScoutAbout [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^searchterms\.it [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^semanticdiscovery[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Shai [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^snagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Spegla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SpiderBot[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SurfWalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^suzuran [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^szukacz [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^tarspider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Telesoft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Templeton [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^test [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^tighttwatbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^thenomad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^turingos [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^urly.?warning [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^UtilMind [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^vayala [OR]
RewriteCond %{HTTP_USER_AGENT} ^vci [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^w3mir[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^web.by.mail [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebBandit[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMiner [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSnake [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^webvac [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^webwalk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WhosTalking [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WUMPUS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^wwwoffle [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^www\.pl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^XGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster[NC]
RewriteRule ^.* - [F,L] 

Soit une seconde solution s'offre à vous, elle consiste à l'inverse à n’autoriser que les bots et user agent connus et reconnus, voici une liste de user agent que vous pouvez autoriser :

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} !^.*AOL.*       [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*Mozilla.*   [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*Opera.*     [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*Msie.*      [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*Firefox.*   [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*Netscape.*  [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*Safari.*    [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*Google.*    [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*Slurp.*     [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*Yahoo.*     [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*MMCrawler.* [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*msnbot.*    [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*SandCrawl.* [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*MSRBOT.*    [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*Teoma.*     [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*Jeeves.*    [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*inktomi.*   [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*libwww.*    [NC]
RewriteRule .* - [F]
Réponse

Si vous recherchez des informations sur les robots que vous identifiez dans vos logs HTTP :

  • http://www.botreports.com/
  • http://www.user-agents.org/
  • https://useragentapi.com/
  • https://github.com/monperrus/crawler-user-agents/blob/master/crawler-user-agents.json