|
-
bots
As mentioned in another thread I'm trying to get my head around bot activity on a site. I'm trying to determine if there is a minium of bot activity a site needs to experience benefit from bot activity.
For instance, if you notice a particular bot crawls your site 20,000 times a month is that something to be happy about, disappointed about or in the end, if it simply doesn't matter how often bots crawl.
The following are charts of the three main bots that have crawled my site since January of this year and the number of times they have crawled. It's interesting to me at least, that MSNbot and Inktomi Slurp have increased their presence on my site greatly since January. Again, I have no idea if that's good, bad or simply doesn't matter. Why all of a sudden they increased their activity 6 fold is a mystery to me too.
Jan
Googlebot 88,984
MSNBot 10,741
Inktomi Slurp 10,704
Feb
Googlebot 79,199
MSNBot 27,665
Unknown robot (identified by 'crawl') 10,713
Inktomi Slurp 7,015
March
Googlebot 71,469
MSNBot 37,310
Inktomi Slurp 30,735
April
Inktomi Slurp 62,704
Googlebot 61,932
MSNBot 26,061
May
MSNBot 116,865
Googlebot 101,932
Inktomi Slurp 43,694
June
MSNBot 127,779
Googlebot 83,272
Inktomi Slurp 35,743
July
MSNBot 104,158
Googlebot 81,865
Inktomi Slurp 42,747
As the tables show googlebot have remained decent numbers (decent in relation to my sites bots crawls but who knows in relation to other sites) but MSNbot and Inktomi Slurp have increased their activity almost 10 fold.
Anyones comments on bot activity would be appreciated.
-
A fairly large site we look after has the following ratio
Y 12 : M 4 : G 3
There are ways to limit the bots for the different engines (on their webmaster tools/login sites and in the we page code) but I would leave well alone especially for the big 3 you have indicated as these will provide the most traffic to your site.
If you are looking for traffic then the search engines usually are a positive benefit. There are plenty of leaches out there which may take less but offer very little benefit in return. Jay's DomainTools for example and other scrapers who just copy your work and then stick up Adsense etc.
-
Originally posted by gpmgroup
A fairly large site we look after has the following ratio
Y 12 : M 4 : G 3
There are ways to limit the bots for the different engines (on their webmaster tools/login sites and in the we page code) but I would leave well alone especially for the big 3 you have indicated as these will provide the most traffic to your site.
Interesting and thanks for the figures on the website you look after.
In regards to the second part, I just took a look at googles webtools: sitemap etc, I didn't understand much of it but interesting none the less. I submitted a sitemap and whatever else I thought I could do without breaking something.
I'm not bothered at the rate the bots visit, bandwidth they use isn't a concern for me, I'm more trying to find out if bots visit sites more often than others and if they do, why? If we didn't exclude bots on domainstate, I could ask safe for the stats and compare the two sites.
Originally posted by gpmgroup
If you are looking for traffic then the search engines usually are a positive benefit. There are plenty of leaches out there which may take less but offer very little benefit in return. Jay's DomainTools for example and other scrapers who just copy your work and then stick up Adsense etc.
I always knew search engine rank was beneficial. Until I started checking my awstats I just didn't realise how beneficial.
When you say scrapers like DomainTools copy your work and stick up adsense, is that part content of your site, like forum posts, names for sale etc? Or are you saying some will copy your entire site and publish it?
Is there a way of finding content that has been scraped from your site?
-
The more the big engines like a site the more they visit a site. Their criteria may vary, freshness, news, new content, forum posts or even just checking nothing changed. etc
The leaches grab differing quantities depending on how they can sell things from your site. Domaintools just wants the whois, and the index page so they can repackage that info and sell it on. Some scrapers grab the whole site and post a copy (Though this seems to be one the wane as the engines are pretty hot on it and ban their ad accounts pretty quickly. The only way to stop them is pattern match or rate limit their activities as they use different IP's each time.
Others grab bits of the site and repackage it in a MFA style site. The quickest way to find out if it has happened on a significant scale is to take a group of words from some of your pages and search for them in Google.
-
-
Here is one of my sites
Robots/Spiders visitors (Top 25) - Full list - Last visit
13 different robots* Hits Bandwidth Last visit
Googlebot 25680 1.11 GB 06 Aug 2008 - 22:19
Unknown robot (identified by 'bot/' or 'bot-') 6182 333.87 MB 06 Aug 2008 - 22:19
Unknown robot (identified by 'crawl') 5978 125.96 MB 06 Aug 2008 - 22:14
Yahoo Slurp 1139 13.88 MB 06 Aug 2008 - 22:18
Unknown robot (identified by 'spider') 937 48.55 MB 06 Aug 2008 - 22:16
Unknown robot (identified by 'robot') 868 9.68 MB 06 Aug 2008 - 21:45
MSNBot 820 12.71 MB 06 Aug 2008 - 21:46
MSNBot-media 235 3.47 MB 06 Aug 2008 - 21:51
Relevant Noise 22 1008.76 KB 06 Aug 2008 - 21:39
Ask 21 290.63 KB 06 Aug 2008 - 20:17
Yahoo! Slurp China 10 91.49 KB 05 Aug 2008 - 18:03
The web archive (IA Archiver) 5 31.86 KB 06 Aug 2008 - 17:17
Alexa (IA Archiver) 1 17.28 KB 05 Aug 2008 - 02:07
-
Here is another one that we launched on the 4th of Aug 08 
Robots/Spiders visitors (Top 25) - Full list - Last visit
5 different robots* Hits Bandwidth Last visit
Googlebot 30 212.73 KB 06 Aug 2008 - 15:59
Yahoo Slurp 8 72.71 KB 06 Aug 2008 - 18:38
Alexa (IA Archiver) 5 113.00 KB 06 Aug 2008 - 19:14
Unknown robot (identified by 'crawl') 4 144.88 KB 06 Aug 2008 - 00:09
Unknown robot (identified by 'robot') 1 56.41 KB 06 Aug 2008 - 13:18
-
Originally posted by apples4u
Here is one of my sites
Robots/Spiders visitors (Top 25) - Full list - Last visit
13 different robots* Hits Bandwidth Last visit
Googlebot 25680 1.11 GB 06 Aug 2008 - 22:19
Unknown robot (identified by 'bot/' or 'bot-') 6182 333.87 MB 06 Aug 2008 - 22:19
Unknown robot (identified by 'crawl') 5978 125.96 MB 06 Aug 2008 - 22:14
Yahoo Slurp 1139 13.88 MB 06 Aug 2008 - 22:18
Unknown robot (identified by 'spider') 937 48.55 MB 06 Aug 2008 - 22:16
Unknown robot (identified by 'robot') 868 9.68 MB 06 Aug 2008 - 21:45
MSNBot 820 12.71 MB 06 Aug 2008 - 21:46
MSNBot-media 235 3.47 MB 06 Aug 2008 - 21:51
Relevant Noise 22 1008.76 KB 06 Aug 2008 - 21:39
Ask 21 290.63 KB 06 Aug 2008 - 20:17
Yahoo! Slurp China 10 91.49 KB 05 Aug 2008 - 18:03
The web archive (IA Archiver) 5 31.86 KB 06 Aug 2008 - 17:17
Alexa (IA Archiver) 1 17.28 KB 05 Aug 2008 - 02:07
Is that from the 1st of Aug ?
-
yes from the first of aug
and the last dates you see is the last time they hit the website
-
-
Is there a way of finding content that has been scraped from your site?
http://copyscape.com/
The free version is pretty good, never tried the premium.
-
-
Originally posted by apples4u
Here is one of my sites
Robots/Spiders visitors (Top 25) - Full list - Last visit
13 different robots* Hits Bandwidth Last visit
Googlebot 25680 1.11 GB 06 Aug 2008 - 22:19
Unknown robot (identified by 'bot/' or 'bot-') 6182 333.87 MB 06 Aug 2008 - 22:19
Unknown robot (identified by 'crawl') 5978 125.96 MB 06 Aug 2008 - 22:14
Yahoo Slurp 1139 13.88 MB 06 Aug 2008 - 22:18
Unknown robot (identified by 'spider') 937 48.55 MB 06 Aug 2008 - 22:16
Unknown robot (identified by 'robot') 868 9.68 MB 06 Aug 2008 - 21:45
MSNBot 820 12.71 MB 06 Aug 2008 - 21:46
MSNBot-media 235 3.47 MB 06 Aug 2008 - 21:51
Relevant Noise 22 1008.76 KB 06 Aug 2008 - 21:39
Ask 21 290.63 KB 06 Aug 2008 - 20:17
Yahoo! Slurp China 10 91.49 KB 05 Aug 2008 - 18:03
The web archive (IA Archiver) 5 31.86 KB 06 Aug 2008 - 17:17
Alexa (IA Archiver) 1 17.28 KB 05 Aug 2008 - 02:07
Robots/Spiders visitors (Top 25) - Full list - Last visit
11 different robots* Hits Bandwidth Last visit
MSNBot 26343 524.33 MB 06 Aug 2008 - 22:30
Googlebot 12456 457.82 MB 06 Aug 2008 - 22:28
Inktomi Slurp 11229 1.04 GB 06 Aug 2008 - 22:31
Unknown robot (identified by 'spider') 1131 25.41 MB 06 Aug 2008 - 22:28
Unknown robot (identified by 'crawl') 142 2.96 MB 06 Aug 2008 - 18:33
EchO! 76 2.05 MB 06 Aug 2008 - 17:43
AskJeeves 66 17.77 MB 06 Aug 2008 - 11:59
Unknown robot (identified by 'robot') 57 6.26 MB 06 Aug 2008 - 18:25
Alexa (IA Archiver) 17 544.06 KB 06 Aug 2008 - 05:00
Voila 2 34.27 KB 06 Aug 2008 - 15:31
Netcraft 1 0 01 Aug 2008 - 13:59
It's interesting to compare the two sets. You have two more visiting bots. You have twice the amount I do of googlebot visits.
Interesting to note that my msnbot and your googlebot visits are about the same but they have used twice the bandwidth as mine.
Looking at my googlebot and msnbot visits they have used roughly the same bandwidth as each other yet msn bots are roughly double. Clearly googlebot eats up twice the bandwidth of msnbot. I only use 90Gb in bandwidth a month out of my allotted 1,000 GB, so bots using bandwidth isn't a concern, just interesting to me what bots use what.
How long has your site been up Frank?
Have you noticed a large decline/increase in a a particular bots activity over the last few months?
-
Originally posted by Matt
Clearly googlebot eats up twice the bandwidth of msnbot.
Looking at it further, even more useless statistics.
MSNBot 28830 572.32 MB
Googlebot 13327 496.83 MB
Inktomi Slurp 12451 1.13 GB
It's clear that while googlebot eats twice the bandwidth as msnbot, Inktomi Slurp uses twice that of googlebot.
As you can see, I'm bored today
-
Heya Boss,
The site has been up and running for over a year. The content has not changed since then, it is a multi user blog. It gets hit daily with spam.. and I will start another topic on that LOL
But I seen that google traffic has come down a lot for a certian keywords.
Here are my search engine traffic stats.
Connect to site from
Origin Pages Percent Hits Percent
Direct address / Bookmarks 98569 99 % 98741 98.8 %
Links from a NewsGroup
Links from an Internet Search Engine - Full list
- Windows Live 345 345
- Google 197 197
- Google (Images) 79 79
- Yahoo! 42 42
- MSN Search 27 27
- Unknown search engines 4 4
- Sphere (Blog) 1 1
- AOL 1 1
Originally posted by Matt
Robots/Spiders visitors (Top 25) - Full list - Last visit
11 different robots* Hits Bandwidth Last visit
MSNBot 26343 524.33 MB 06 Aug 2008 - 22:30
Googlebot 12456 457.82 MB 06 Aug 2008 - 22:28
Inktomi Slurp 11229 1.04 GB 06 Aug 2008 - 22:31
Unknown robot (identified by 'spider') 1131 25.41 MB 06 Aug 2008 - 22:28
Unknown robot (identified by 'crawl') 142 2.96 MB 06 Aug 2008 - 18:33
EchO! 76 2.05 MB 06 Aug 2008 - 17:43
AskJeeves 66 17.77 MB 06 Aug 2008 - 11:59
Unknown robot (identified by 'robot') 57 6.26 MB 06 Aug 2008 - 18:25
Alexa (IA Archiver) 17 544.06 KB 06 Aug 2008 - 05:00
Voila 2 34.27 KB 06 Aug 2008 - 15:31
Netcraft 1 0 01 Aug 2008 - 13:59
It's interesting to compare the two sets. You have two more visiting bots. You have twice the amount I do of googlebot visits.
Interesting to note that my msnbot and your googlebot visits are about the same but they have used twice the bandwidth as mine.
Looking at my googlebot and msnbot visits they have used roughly the same bandwidth as each other yet msn bots are roughly double. Clearly googlebot eats up twice the bandwidth of msnbot. I only use 90Gb in bandwidth a month out of my allotted 1,000 GB, so bots using bandwidth isn't a concern, just interesting to me what bots use what.
How long has your site been up Frank?
Have you noticed a large decline/increase in a a particular bots activity over the last few months?
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
Forum Rules
|
|
|