The database of intentions

Published on:

I'm reading John Battelle's book, The Search, and the recent release of search data by AOL couldn't be a better example for his metaphor of the Database of Intentions. What comes out of this giant screw-up is as fascinating as AOL mistake is big.

It's pretty obvious that with such amount of data, some of it can easily lead to personally identifiable information. As note several commenters to the aforementioned post on Google Blogoscoped, using the timestamp information will allow some sites owners to link 3 months of a user's search track to an IP, or directly a name if they have a registration record or profile there. It shows a lot about how Americans regard privacy issues. See also posts by Ars Technica, TechCrunch, Valleywag citing the NYT which took no time to identify one guy out of this so-called "anonymous" data, and John Battelle. There are probably a lot more stories already published or in the waiting, since the released data is now floating around the internet for a long time, although AOL removed it promptly after the uproar.

It also gives a lot of hints on how this gigantic amount of individual intentions can be exploited, and will be exploited no matter what, for this is a goldmine for both commercial and political interests.

About six years ago, I was pitched by one startup on tracking visitors using web beacons. During the discussion, the vendor highlighted how, via the use of cookies and datamining past logs, we could allow visitors to browse our entire site anonymously, then recollect their whole behavior since their first visit and associate it with their personal profile as soon as they registered with us. I could see the mental hard-on on the marketers face, while I was already wondering about the implications on data privacy and yet another hair-splitting-nightmare with the lawyers about the legal fineprint in the site terms of use. The truth, as always, is somewhere in between, and proper (and transparent) use of technology. But I still think about it today when someone comes to me to complain that people clear their cookies too often and that it makes our statistics "unreliable".