Blog spam and search engines, continued

Published on:

Yahoo! Search's most prominent (and unofficial) blogger, Jeremy Zawodny, writes about comment spam. And he's not shy about the key role that search engines have in fighting against this plague:

Then a partial solution is fairly clear. I've heard and seen others discuss it over the past few months. The search engines need to be smarter about reading and indexing content.

When folks like Tim build software that classifies pages, the software needs to be able to recognize the difference between links produced by the blog owner(s) and those contributed by readers and spambots.

Once you can identify the difference between those two types of links, you simply stop using the second type of link when calculating rank. Sure, you can still count them for the purpose of providing link counts--just don't factor them into the ranking.

How's that for removing the incentive?

There are already several proposals on how to do that. My favorite is a simple pair of comments that act as a wrapper around any content that you don't want the search engines to index, and it's my favorite because it's the simplest I've seen so far and it gives me control on what gets indexed or not within a page, not just links. Others involve a fairly comprehensive qualification of links relationships that paves the way to lots of very interesting applications but have about zero chance to be effective until it's built-in with a very simple GUI in web editing tools so that Joe Average starts using it. There might already be more solutions around than blog spammers, and I'm sure even Google and Yahoo! have their own ideas. The most important thing is that they agree on the same one, call it the industry standard and tell world + dog to happily use it on their web sites.

Brad Choate objects that the fix is already out there: just use a redirection service for links. However this has three main drawbacks: 1) it kills the referrer information, 2) it wastes resources for handling a simple link, 3) it prevents links mining services such as Technorati and alike to map connections between sites. Ironically, it should be pointed that Movable Type does provide a redirection of sorts, except that it doesn't work in comments body, making it fairly useless. But the killer reason why this is not the horse I'd bet on, is that nothing prevents the search engines bots to eventually be smart enough to follow the redirections until they reach the destination, and handle it exactly as those redirections did not exist. The redirection that Brad mentions is actually a hack based on the current inability of those engines to follow some redirection mechanisms.