Search engines and RSS aggregators patterns

Published on:

Looking at my server logs (using Summary) I found the following search engines patterns interesting:

search engines patterns

I'm amazed at the differences between them, in how they crawl this site. Yahoo! is by far the most ressource intensive (the less efficient?) with the top score in terms of visits, hits and bandwidth consumed. Recrunched by visit over the past 12 months (from 10/01/05 to 09/30/06), it gives us:

  • Yahoo!: 1.28 hit/visit, 8.2 KB/visit, 1446 visits/day, 1851 hits/day, 11.9 MB/day
  • Google: 231 hits/visit, 100 KB/visit, 99.3 visits/day, 1009 hits/day, 9.97 MB/day
  • MSN Search: 7.93 hits/visit, 155 KB/visit, 56.4 visits/day, 447.4 hits/day, 8.76 MB/day
  • Ask Jeeves: 26.9 hits/visit, 209 KB/visit, 10.5 visit/day, 280.9 hits/day, 2.19 MB/day

Quite different behaviors! The way Summary distinguishes two visits may get in the way in defavor of Yahoo!, so hits and bandwidth are, I think, better metrics for comparisons.

During the same period, I've seen the following patterns from RSS aggregators:

  • Bloglines: 44,098 visits / 84,889 hits / 14.5MB
  • NewsGator: 45,402 visits / 84,785 hits / 51MB
  • Yahoo! RSS Syndication System: 7,003 visits / 7,837 hits / 95.4MB

So Yahoo! RSS consumes twice as much bandwidth as NewsGator in 11 times less hits! Weird, and here again they earn the biggest payload.