+ Reply to Thread
Page 1 of 2
1 2 LastLast
Results 1 to 15 of 23
  1. #1
    Join Date
    Sep 2002
    Location
    Gold Coast, Qld, Australia
    Posts
    6,986

    bots

    As mentioned in another thread I'm trying to get my head around bot activity on a site. I'm trying to determine if there is a minium of bot activity a site needs to experience benefit from bot activity.

    For instance, if you notice a particular bot crawls your site 20,000 times a month is that something to be happy about, disappointed about or in the end, if it simply doesn't matter how often bots crawl.

    The following are charts of the three main bots that have crawled my site since January of this year and the number of times they have crawled. It's interesting to me at least, that MSNbot and Inktomi Slurp have increased their presence on my site greatly since January. Again, I have no idea if that's good, bad or simply doesn't matter. Why all of a sudden they increased their activity 6 fold is a mystery to me too.

    Jan

    Googlebot 88,984
    MSNBot 10,741
    Inktomi Slurp 10,704

    Feb

    Googlebot 79,199
    MSNBot 27,665
    Unknown robot (identified by 'crawl') 10,713
    Inktomi Slurp 7,015

    March

    Googlebot 71,469
    MSNBot 37,310
    Inktomi Slurp 30,735

    April

    Inktomi Slurp 62,704
    Googlebot 61,932
    MSNBot 26,061


    May

    MSNBot 116,865
    Googlebot 101,932
    Inktomi Slurp 43,694

    June

    MSNBot 127,779
    Googlebot 83,272
    Inktomi Slurp 35,743

    July

    MSNBot 104,158
    Googlebot 81,865
    Inktomi Slurp 42,747

    As the tables show googlebot have remained decent numbers (decent in relation to my sites bots crawls but who knows in relation to other sites) but MSNbot and Inktomi Slurp have increased their activity almost 10 fold.

    Anyones comments on bot activity would be appreciated.

  2. #2
    Join Date
    Oct 2002
    Location
    England
    Posts
    2,151
    A fairly large site we look after has the following ratio

    Y 12 : M 4 : G 3

    There are ways to limit the bots for the different engines (on their webmaster tools/login sites and in the we page code) but I would leave well alone especially for the big 3 you have indicated as these will provide the most traffic to your site.

    If you are looking for traffic then the search engines usually are a positive benefit. There are plenty of leaches out there which may take less but offer very little benefit in return. Jay's DomainTools for example and other scrapers who just copy your work and then stick up Adsense etc.

  3. #3
    Join Date
    Sep 2002
    Location
    Gold Coast, Qld, Australia
    Posts
    6,986
    Originally posted by gpmgroup
    A fairly large site we look after has the following ratio

    Y 12 : M 4 : G 3

    There are ways to limit the bots for the different engines (on their webmaster tools/login sites and in the we page code) but I would leave well alone especially for the big 3 you have indicated as these will provide the most traffic to your site.
    Interesting and thanks for the figures on the website you look after.

    In regards to the second part, I just took a look at googles webtools: sitemap etc, I didn't understand much of it but interesting none the less. I submitted a sitemap and whatever else I thought I could do without breaking something.

    I'm not bothered at the rate the bots visit, bandwidth they use isn't a concern for me, I'm more trying to find out if bots visit sites more often than others and if they do, why? If we didn't exclude bots on domainstate, I could ask safe for the stats and compare the two sites.


    Originally posted by gpmgroup
    If you are looking for traffic then the search engines usually are a positive benefit. There are plenty of leaches out there which may take less but offer very little benefit in return. Jay's DomainTools for example and other scrapers who just copy your work and then stick up Adsense etc.
    I always knew search engine rank was beneficial. Until I started checking my awstats I just didn't realise how beneficial.

    When you say scrapers like DomainTools copy your work and stick up adsense, is that part content of your site, like forum posts, names for sale etc? Or are you saying some will copy your entire site and publish it?

    Is there a way of finding content that has been scraped from your site?

  4. #4
    Join Date
    Oct 2002
    Location
    England
    Posts
    2,151
    The more the big engines like a site the more they visit a site. Their criteria may vary, freshness, news, new content, forum posts or even just checking nothing changed. etc

    The leaches grab differing quantities depending on how they can sell things from your site. Domaintools just wants the whois, and the index page so they can repackage that info and sell it on. Some scrapers grab the whole site and post a copy (Though this seems to be one the wane as the engines are pretty hot on it and ban their ad accounts pretty quickly. The only way to stop them is pattern match or rate limit their activities as they use different IP's each time.

    Others grab bits of the site and repackage it in a MFA style site. The quickest way to find out if it has happened on a significant scale is to take a group of words from some of your pages and search for them in Google.

  5. #5
    Join Date
    Sep 2002
    Location
    Gold Coast, Qld, Australia
    Posts
    6,986
    Terrific, thanks.

  6. #6
    Join Date
    Mar 2004
    Posts
    2,122
    Here is one of my sites

    Robots/Spiders visitors (Top 25) - Full list - Last visit
    13 different robots* Hits Bandwidth Last visit
    Googlebot 25680 1.11 GB 06 Aug 2008 - 22:19
    Unknown robot (identified by 'bot/' or 'bot-') 6182 333.87 MB 06 Aug 2008 - 22:19
    Unknown robot (identified by 'crawl') 5978 125.96 MB 06 Aug 2008 - 22:14
    Yahoo Slurp 1139 13.88 MB 06 Aug 2008 - 22:18
    Unknown robot (identified by 'spider') 937 48.55 MB 06 Aug 2008 - 22:16
    Unknown robot (identified by 'robot') 868 9.68 MB 06 Aug 2008 - 21:45
    MSNBot 820 12.71 MB 06 Aug 2008 - 21:46
    MSNBot-media 235 3.47 MB 06 Aug 2008 - 21:51
    Relevant Noise 22 1008.76 KB 06 Aug 2008 - 21:39
    Ask 21 290.63 KB 06 Aug 2008 - 20:17
    Yahoo! Slurp China 10 91.49 KB 05 Aug 2008 - 18:03
    The web archive (IA Archiver) 5 31.86 KB 06 Aug 2008 - 17:17
    Alexa (IA Archiver) 1 17.28 KB 05 Aug 2008 - 02:07

  7. #7
    Join Date
    Mar 2004
    Posts
    2,122
    Here is another one that we launched on the 4th of Aug 08

    Robots/Spiders visitors (Top 25) - Full list - Last visit
    5 different robots* Hits Bandwidth Last visit
    Googlebot 30 212.73 KB 06 Aug 2008 - 15:59
    Yahoo Slurp 8 72.71 KB 06 Aug 2008 - 18:38
    Alexa (IA Archiver) 5 113.00 KB 06 Aug 2008 - 19:14
    Unknown robot (identified by 'crawl') 4 144.88 KB 06 Aug 2008 - 00:09
    Unknown robot (identified by 'robot') 1 56.41 KB 06 Aug 2008 - 13:18

  8. #8
    Join Date
    Sep 2002
    Location
    Gold Coast, Qld, Australia
    Posts
    6,986
    Originally posted by apples4u
    Here is one of my sites

    Robots/Spiders visitors (Top 25) - Full list - Last visit
    13 different robots* Hits Bandwidth Last visit
    Googlebot 25680 1.11 GB 06 Aug 2008 - 22:19
    Unknown robot (identified by 'bot/' or 'bot-') 6182 333.87 MB 06 Aug 2008 - 22:19
    Unknown robot (identified by 'crawl') 5978 125.96 MB 06 Aug 2008 - 22:14
    Yahoo Slurp 1139 13.88 MB 06 Aug 2008 - 22:18
    Unknown robot (identified by 'spider') 937 48.55 MB 06 Aug 2008 - 22:16
    Unknown robot (identified by 'robot') 868 9.68 MB 06 Aug 2008 - 21:45
    MSNBot 820 12.71 MB 06 Aug 2008 - 21:46
    MSNBot-media 235 3.47 MB 06 Aug 2008 - 21:51
    Relevant Noise 22 1008.76 KB 06 Aug 2008 - 21:39
    Ask 21 290.63 KB 06 Aug 2008 - 20:17
    Yahoo! Slurp China 10 91.49 KB 05 Aug 2008 - 18:03
    The web archive (IA Archiver) 5 31.86 KB 06 Aug 2008 - 17:17
    Alexa (IA Archiver) 1 17.28 KB 05 Aug 2008 - 02:07
    Is that from the 1st of Aug ?

  9. #9
    Join Date
    Mar 2004
    Posts
    2,122
    yes from the first of aug

    and the last dates you see is the last time they hit the website

  10. #10
    Join Date
    Sep 2002
    Location
    Gold Coast, Qld, Australia
    Posts
    6,986
    Ta, that was my guess.

  11. #11
    Join Date
    Nov 2002
    Location
    Philadelphia
    Posts
    6,801
    Is there a way of finding content that has been scraped from your site?
    http://copyscape.com/

    The free version is pretty good, never tried the premium.

  12. #12
    Join Date
    Sep 2002
    Location
    Gold Coast, Qld, Australia
    Posts
    6,986
    Good onya Ben.

  13. #13
    Join Date
    Sep 2002
    Location
    Gold Coast, Qld, Australia
    Posts
    6,986
    Originally posted by apples4u
    Here is one of my sites

    Robots/Spiders visitors (Top 25) - Full list - Last visit
    13 different robots* Hits Bandwidth Last visit
    Googlebot 25680 1.11 GB 06 Aug 2008 - 22:19
    Unknown robot (identified by 'bot/' or 'bot-') 6182 333.87 MB 06 Aug 2008 - 22:19
    Unknown robot (identified by 'crawl') 5978 125.96 MB 06 Aug 2008 - 22:14
    Yahoo Slurp 1139 13.88 MB 06 Aug 2008 - 22:18
    Unknown robot (identified by 'spider') 937 48.55 MB 06 Aug 2008 - 22:16
    Unknown robot (identified by 'robot') 868 9.68 MB 06 Aug 2008 - 21:45
    MSNBot 820 12.71 MB 06 Aug 2008 - 21:46
    MSNBot-media 235 3.47 MB 06 Aug 2008 - 21:51
    Relevant Noise 22 1008.76 KB 06 Aug 2008 - 21:39
    Ask 21 290.63 KB 06 Aug 2008 - 20:17
    Yahoo! Slurp China 10 91.49 KB 05 Aug 2008 - 18:03
    The web archive (IA Archiver) 5 31.86 KB 06 Aug 2008 - 17:17
    Alexa (IA Archiver) 1 17.28 KB 05 Aug 2008 - 02:07
    Robots/Spiders visitors (Top 25) - Full list - Last visit
    11 different robots* Hits Bandwidth Last visit
    MSNBot 26343 524.33 MB 06 Aug 2008 - 22:30
    Googlebot 12456 457.82 MB 06 Aug 2008 - 22:28
    Inktomi Slurp 11229 1.04 GB 06 Aug 2008 - 22:31
    Unknown robot (identified by 'spider') 1131 25.41 MB 06 Aug 2008 - 22:28
    Unknown robot (identified by 'crawl') 142 2.96 MB 06 Aug 2008 - 18:33
    EchO! 76 2.05 MB 06 Aug 2008 - 17:43
    AskJeeves 66 17.77 MB 06 Aug 2008 - 11:59
    Unknown robot (identified by 'robot') 57 6.26 MB 06 Aug 2008 - 18:25
    Alexa (IA Archiver) 17 544.06 KB 06 Aug 2008 - 05:00
    Voila 2 34.27 KB 06 Aug 2008 - 15:31
    Netcraft 1 0 01 Aug 2008 - 13:59

    It's interesting to compare the two sets. You have two more visiting bots. You have twice the amount I do of googlebot visits.

    Interesting to note that my msnbot and your googlebot visits are about the same but they have used twice the bandwidth as mine.

    Looking at my googlebot and msnbot visits they have used roughly the same bandwidth as each other yet msn bots are roughly double. Clearly googlebot eats up twice the bandwidth of msnbot. I only use 90Gb in bandwidth a month out of my allotted 1,000 GB, so bots using bandwidth isn't a concern, just interesting to me what bots use what.

    How long has your site been up Frank?

    Have you noticed a large decline/increase in a a particular bots activity over the last few months?

  14. #14
    Join Date
    Sep 2002
    Location
    Gold Coast, Qld, Australia
    Posts
    6,986
    Originally posted by Matt
    Clearly googlebot eats up twice the bandwidth of msnbot.
    Looking at it further, even more useless statistics.

    MSNBot 28830 572.32 MB
    Googlebot 13327 496.83 MB
    Inktomi Slurp 12451 1.13 GB

    It's clear that while googlebot eats twice the bandwidth as msnbot, Inktomi Slurp uses twice that of googlebot.

    As you can see, I'm bored today

  15. #15
    Join Date
    Mar 2004
    Posts
    2,122
    Heya Boss,

    The site has been up and running for over a year. The content has not changed since then, it is a multi user blog. It gets hit daily with spam.. and I will start another topic on that LOL

    But I seen that google traffic has come down a lot for a certian keywords.

    Here are my search engine traffic stats.

    Connect to site from
    Origin Pages Percent Hits Percent
    Direct address / Bookmarks 98569 99 % 98741 98.8 %
    Links from a NewsGroup
    Links from an Internet Search Engine - Full list
    - Windows Live 345 345
    - Google 197 197
    - Google (Images) 79 79
    - Yahoo! 42 42
    - MSN Search 27 27
    - Unknown search engines 4 4
    - Sphere (Blog) 1 1
    - AOL 1 1
    Originally posted by Matt
    Robots/Spiders visitors (Top 25) - Full list - Last visit
    11 different robots* Hits Bandwidth Last visit
    MSNBot 26343 524.33 MB 06 Aug 2008 - 22:30
    Googlebot 12456 457.82 MB 06 Aug 2008 - 22:28
    Inktomi Slurp 11229 1.04 GB 06 Aug 2008 - 22:31
    Unknown robot (identified by 'spider') 1131 25.41 MB 06 Aug 2008 - 22:28
    Unknown robot (identified by 'crawl') 142 2.96 MB 06 Aug 2008 - 18:33
    EchO! 76 2.05 MB 06 Aug 2008 - 17:43
    AskJeeves 66 17.77 MB 06 Aug 2008 - 11:59
    Unknown robot (identified by 'robot') 57 6.26 MB 06 Aug 2008 - 18:25
    Alexa (IA Archiver) 17 544.06 KB 06 Aug 2008 - 05:00
    Voila 2 34.27 KB 06 Aug 2008 - 15:31
    Netcraft 1 0 01 Aug 2008 - 13:59

    It's interesting to compare the two sets. You have two more visiting bots. You have twice the amount I do of googlebot visits.

    Interesting to note that my msnbot and your googlebot visits are about the same but they have used twice the bandwidth as mine.

    Looking at my googlebot and msnbot visits they have used roughly the same bandwidth as each other yet msn bots are roughly double. Clearly googlebot eats up twice the bandwidth of msnbot. I only use 90Gb in bandwidth a month out of my allotted 1,000 GB, so bots using bandwidth isn't a concern, just interesting to me what bots use what.

    How long has your site been up Frank?

    Have you noticed a large decline/increase in a a particular bots activity over the last few months?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  

Sponsors

DomainState.com
Advertise   |   Contact Us   |   Domain Glossary   |   Domain Links   |   Domain Tools   |   FAQ   |   Members   |   Terms   |   RSS   |   Link To Us
Other Related Trellian Services:
Above Domain Parking Manager   |   Free Search Toolbar   |   Free Webpage Builder   |   Keyword Research   |   Search Engine Submission   |   SEO Tools
Copyright © 2002    DomainState.com a Trellian Company