GWA is the new net SUV

Published on:

If the Google Web Accelerator breaks your web application, here are a few ways to protect them from this little sucker:

From the GWA Webmaster FAQ:

Can I specify which links Google Web Accelerator will prefetch on my pages?

Yes, you can. For each link you'd like us to prefetch, simply add the following snippet of code somewhere in your page's HTML source code:

<link rel="prefetch" href="http://url/to/get/">

The href value should be the actual URL you want prefetched. Google will prefetch this page, and when your users click on this link, that page will load more quickly.

You can learn more about the >link> tag on the Mozilla website.

Also worth knowing: the GWA will not prefetch secure pages, so any URL under https is safe.

If you want to block the GWA at the Apache level, see this tip which can be summarized as putting this in a .htaccess file or your Apache configuration:

If you want to redirect GWA users to an explanation page (here gwa-forbidden.html) use:
RewriteEngine on
RewriteBase /
RewriteCond %{REMOTE_ADDR} ^(72.14.192.|72.14.194.)
RewriteCond %{REQUEST_URI} !^/gwa-forbidden.html$
RewriteRule ^.*$ /gwa-forbidden.html

If you want to send a 403 FORBIDDEN error use:
RewriteEngine on
RewriteBase /
RewriteCond ^(72.14.192.|72.14.194.)
RewriteRule ^.*$ - [F]

Though it would be better to send a 412 PRECONDITION FAILED rather than a 403, and mod_security would be a good tool to use for this with either one or the other following set of rules (blocking by IP or blocking by HTTP header):

SecFilterSelective "REMOTE_ADDR" "^72.14.192.*$" "deny,log,status:412"
SecFilterSelective "REMOTE_ADDR" "^72.14.194.*$" "deny,log,status:412"

or
SecFilterSelective "HTTP_X_MOZ" "prefetch" "deny,log,status:412"

Another way to filter proxy requests at the Apache level, without relying on IP ranges (which Google can modify pretty easily) is to detect the "X-moz: prefetch" header (tip from jpack's comment, which also provides a way to log proxied requests to a separate file):

RewriteEngine On
SetEnvIfNoCase X-Forwarded-For .+ proxy=yes
SetEnvIfNoCase X-moz prefetch no_access=yes

# block pre-fetch requests with X-moz headers
RewriteCond %{ENV:no_access} yes
RewriteRule .* - [F,L]

# write out all proxy requests to another log
CustomLog logs/ursite.com-access_log combined env=!proxy
CustomLog logs/ursite.com-proxy_log combined env=proxy

In PHP, one could do a test like this: if(strtoupper($_SERVER[‘HTTP_X_MOZ’]) == ‘PREFETCH’) ...

For Ruby on Rails applications, see How to show Google's Web Accelerator the door in Rails.

For ColdFusion, see: Use CF to block problems with Google Accelorator.

For some context and perspective about the issues brought by the GWA, and mainly the purists' take that the issue comes from broken web applications that rely on GET when they should be using POST, see:

My own take on this is that although it is indeed a recommendation that one should not implement any destructive or otherwise data-modifying action over an HTTP GET request, the reality is that there are tons of web applications out there that implement such actions using regular links (e.g. Google's Blogger or even its own API!). And the very first reason that comes to mind for doing it is that it's not possible to design a POST request that looks like a regular link without resorting to javascript. I particularly subscribe to Jarkko's comment here:

The spec says that developers shouldn't use GET, it doesn't say they are violating the specs if they do. Actually it's specifically said that there can be valid reasons to disobey these recommendations.

I sincerely admit that we as web app developers have a lot to learn from this episode but I still think you're distorting the discussion by bashing 37signals for this. It would be understandable if web application development would start from ground zero today. But it isn't. There's a whole sea of existing applications in the web that will be bitten by this and it's just plain nonsense going around screaming that it's your own fault.

As soon as people start using GWA and wreaking havoc in this imperfect world, they'll just be mad at Google and stop using the Accelerator. That's hardly what Google wants and as it's impossible for them to fix all the broken web apps in the world, there's realistically only one option left for them.

For another (bad) metaphor, this is about the same as leaving all the safety equipment away from a car because "if everyone obeys the traffic rules and laws, there will be no accidents".

But besides the reality check, my other problem with GWA is that it's not a good net citizen -- in fetching objects that most probably will not be displayed by visitors, it's wasting bandwidth and server resources. To me, GWA is the equivalent of an SUV on the net: it gives some sense of comfort to its users at the expense of others' resources.