Jump to content

Blocking bad robots


allu62

Recommended Posts

Hi. Last month, I made a post concerning the diminshing of the visitor number on my site to 1..2 per day during May and first half of June, thinking that there were problems with the server or awstats. I guess, that the statistics were correct and that there was a reason that has nothing to do with Heliohost. Anyway, visist numbers are becoming normal again.

But, what could have been the reason? Searching the Internet, I found several articles, telling that excessive site access by bad crawlers may drastically affect the normal site traffic. And effectively, when having a closer look at the Apache log files, there are some robots, that download tons of megabytes from my site.

My questions:

  - Has Heliohost any recommendations, which robots to block?

  - Has any experienced user such recommendations? Perhaps a list of bad robots? Or, maybe, block all and just allow some known as ok?

  - What should be blocked in robot.txt and what in .htaccess?

  - Should some of them be blocked, using IP Bocking in C-Panel?

Hope to find some help, because really no real knowledge/experience with these things.

Thanks.

 

Link to comment
Share on other sites

Sounds like snake oil to me. I've never heard of such causing an impact (unless the site is down due to being overloaded by them...).

 

The only time I really see bot blocking scripts used is on phishing and other illegal sites...and that's generally done to hide from automated anti-abuse services (ironically, implementing a block like this actually makes abuse easier to identify).

Link to comment
Share on other sites

Trying to block certain IPs or trying to block certain bots just makes you look guilty of something. I wouldn't be surprised if legitimate crawlers like google down ranked you for doing suspicious stuff like that. Maybe link to the article you found and we can read it ourselves?

Link to comment
Share on other sites

There are dozens of sites (SEO and others) recommending to block what they call "bad robots" or even suggest just to allow "good ones"...

 

My awstats from yesterday:

 - SemrushBot:  18,536+356 hits, 84.77 MB

 - AhrefsBot:  4,591+261 hits, 46.92 MB

 - Unknown robot identified by bot\*:  2,908+147 hits, 37.03 MB

and similar for other days.

In comparison Googlebot:  1,248+853,,70 MB and this only some times a month...

Totals for this month: 29,808 pages (428.45 MB) not viewed traffic vs. 630 pages (199.83 MB) normal traffic.

 

Should I really let these crawlers do? And doing so, is that not a senseless "overload" of Tommy?

Link to comment
Share on other sites

We have an account on Tommy that did 47 GB of traffic last month. That's more than 100 times as much traffic as you got. Here is your load graph for the last week:

14552524ffaac49cada0e70245cc2df7.png

You definitely don't need to worry about overloading Tommy.

Link to comment
Share on other sites

  • 2 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...