The Technorati prism vs reality

Publié le :

Technorati is a great source for information, trends and numbers about the blogosphere. But one has to take figures coming out from there with a bit of salt, and make their own mind about the Technorati prism, which is not the reality.

Take for example the Oct 06 state of the blogosphere from David Sifry. Here in France, some are "worried" that French would be underrepresented. Without entering into any (useless) debate about the importance (of even merits) of French vs other languages, I strongly suspect that Technorati is actually incapable of accurately tracking languages, at least not automatically. The proof? The latest "top 100 French blogs" co-branded by Technorati and Edelmann misses out a lot of prominent French blogs that should have been listed. Why? Because each blogger has to 1) get a profile on Technorati, 2) claim their blog, 3) manually set their blog "primary language" in their Technorati profile. You can bet that most of them didn't go that far. I didn't even notice I had to do that until I investigated about the inaccuracies in the aforementioned listing. And if they don't, well, my bet is that they're "assimilated" in some way, most probably into the mono cultural prism that predominates in a certain part of North America1 ;-). Provided that the process works in the first place, which is far from certain2!

Come on guys, a $1,995 Google Mini is able to autodetect languages in any document. All prominent search engines do. How does Technorati deal with languages today? Manually? My bad, they're using languid to automate that, but David says it needs to be improved.

A second hint about the level of accuracy of the Technorati figures is this phrase from David (emphasis mine):

My gut feeling is that since we're better at dealing with Spam now, even some of the blue areas in last quarter's graph were probably accountable to spam, which would mean that rather than the bumpy ride shown above, we're actually seeing a steady increased (but slower) growth of the blogosphere.

Also, there are lots of sites that aren't blogs in their index. Evidence, how come this corporate site has a rank (and a totally false "updated" info), how is it counted or separated from blogs? Add to that they also exclude a very large chunk of French blogs by not indexing Skyblog (5.9M blogs as of today, not insignificant compared to the size of the French blogopshere).

Another dirty little secret I've been suspecting for a long time, is that Technorati doesn't go further than a blog home page for links counting, at least for the ranking that serves as the "authority" level. I'd like to be wrong on that one.

But don't get me wrong on this, I positively applaud the work David is doing with his regular states of the blogosphere and I have a lot of respect for the folks at Technorati. But I would really welcome a little bit more clarity about the methods they're using to get those numbers (and assumptions) out.

So far, you really have to read between the lines and make your own mind about the Technorati prism vs reality.

(1) Technorati isn't localized, so only those who read English can go through the registration process. I find it weird that they can claim any accuracy in following foreign blogs when they start by excluding those bloggers who don't speak English.
(2) I set the primary language as French for my French blog a few weeks ago. Today, verifying the process while writing this post, I discovered that my "primary language" preference was reset to "all languages". So I had to set it again to French, but this preference doesn't stick, it keeps falling back to "all languages". Funnily enough, the same information for my blog in English is correctly labeled as English. Something's really wrong here!