Debugging charset encoding mismatch with Apache
While setting up a new weblog using UTF-8 as the default encoding charset, I spent literally hours trying to figure out why my first name persisted to show up as FranÃ§ois instead of François. Not that I'm not used to it already, but I have this foolish hope that computers should eventually facilitate our life.
It turned out that despite a correct definition of the charset encoding in all pages (
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />), some pages (output from CGI scripts) would be recognized as carrying the proper encoding while others (HTML, PHP) were always reported as having an ISO-8859-1 charset.
Thanks to the excellent Web Developer toolbar for Firefox, I found out that certain pages had a charset definition superposed on them via a
Content-Type HTTP header (See headers in Tools > Web Developer > Information > View Response Headers, very handy). After more digging, I found that the pages that were behaving properly would already provide a
Content-Type header, which turned my suspicion to the brand new Apache 2 installation on my server.
Bingo! Apache 2 now ships with a default
AddDefaultCharset directive that forces the charset to ISO-8859-1 when one is not provided in the headers by an external module (such as a script). Since the HTTP headers have precedence on the META headers tag in the HTML code, this basically voids your efforts to provide this information within HTML pages.
If you experience this odd behavior, what you have to do is find the
AddDefaultCharset in your httpd.conf and change it to this:
This will prevent Apache 2 to override the charset encodings that you provide through META tags. Apache 1.3.x ships without this directive, which means it's off by default. You should have Apache force the charset only in very specific cases, but that should never be the default behavior IMHO.