Debugging charset encoding mismatch with Apache
While setting up a new weblog using UTF-8 as the default encoding charset, I spent literally hours trying to figure out why my first name persisted to show up as François instead of François. Not that I'm not used to it already, but I have this foolish hope that computers should eventually facilitate our life.
It turned out that despite a correct definition of the charset encoding in all pages (<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
), some pages (output from CGI scripts) would be recognized as carrying the proper encoding while others (HTML, PHP) were always reported as having an ISO-8859-1 charset.
Thanks to the excellent Web Developer toolbar for Firefox, I found out that certain pages had a charset definition superposed on them via a Content-Type
HTTP header (See headers in Tools > Web Developer > Information > View Response Headers, very handy). After more digging, I found that the pages that were behaving properly would already provide a Content-Type
header, which turned my suspicion to the brand new Apache 2 installation on my server.
Bingo! Apache 2 now ships with a default AddDefaultCharset
directive that forces the charset to ISO-8859-1 when one is not provided in the headers by an external module (such as a script). Since the HTTP headers have precedence on the META headers tag in the HTML code, this basically voids your efforts to provide this information within HTML pages.
This has been flagged, with merit, as the Apache bug 23421 (see also Apache bug 14513).
If you experience this odd behavior, what you have to do is find the AddDefaultCharset
in your httpd.conf and change it to this:
AddDefaultCharset Off
This will prevent Apache 2 to override the charset encodings that you provide through META tags. Apache 1.3.x ships without this directive, which means it's off by default. You should have Apache force the charset only in very specific cases, but that should never be the default behavior IMHO.
Commentaires
michael lee
Hi,
I encountered similar problem when using Apache 2.0. Using a tools (wfetch.exe) from microsoft, i can see the response header and include charset=ISO-8859-1. However, after modify httpd.conf to 'AddDefaultCharset Off', the problem still exists. Response header now do not have the charset encoding, just "Context-type: html/text", however IE still use ISO encoding. The web page have meta tag specified charset=big5 (for traditional chinese).
Any hints for my problems.
Thanks & Regards,
Michael
François
I guess you meant Content-type: text/html
Have you cleared IE's cache? (sorry for the trivial proposal but I don't have a PC so I don't know about Win IE cache behavior)
michael lee
Thanks for your reply. Yes, it is the problem on cache. After i touch the page on the server and re-visit the page, everything okay.
Regards,
Michael
test
add this:
AddDefaultCharset utf-8
Steve
François THANK YOU - this has been plaguing me for days and eventually I too noticed the UTF-8 mismatch in my headers on some scripts thanks to the excellent web developer tools with Firefox.
A quick google to your page and I am solved - thank you SO much for taking the time to post this information.
Steve
François
You're welcome Steve :-)
Rob
Cheers - I've spent ages trying to work out why some UTF content wasn't being displayed. It seems that if an HTTP charset is defined, it will overridge any META tags. Most annoying. But, finally, finding your page has put an end to all my misery. Thanks
David Chin
Hm. Is there a way to do this if I don't have access to the server configs? My on-campus server has the same problem.
François
David, you can try placing the Apache rule into a file named .htaccess at the root of your HTML document folder. If the server authorizes it, it will work. If not, then I'm afraid your only hope is to convince the server admin to make that change in the main Apache configuration file.
Wh1t3w0lf
This was exactly the problem I had on my server and now it's gone. Thanks a LOT for posting this!
Elias
Uday
I too have the same kind of issue with Websphere application server 5.0.2.6. It is using apache 1.3.26. The Thai characters are displayed as some junk characters. But when the encoding in the browser is changed to Thai(Windows) they are displayed fine. We need to do this every time we access this page. I tried the AddDefaultCharset Off. I restarted my IBM HTTPServer service. I also cleared my browsers cache. This does not help me in fixing this. Am I missing anything still
François
Uday, do you send the right charset along with the page (it can come from the HTML in the page, or Apache, or WebSphere)? Check with Firefox and the web developer toolbar (Live HTTP headers) to see the headers sent to the browser.
freddiee
Hi Francois, a very fast and useful link from google and my problem with charset is solved ;) Earlier, I used to reconfigure httpd.conf like this:
AddCharset iso8859-2
AddDefaultCharset iso8859-2
(or something like that), but when I needed to change the charset for one site, I could f*** myself... Thx a lot :)
snipe
Thank so much for posting this - it was driving me crazy!
Sébastien
Wow.. thanks so much! It did solve the problem that I had been experiencing for some time now. :D
Tino
THANKS!!! It took me hours to find your posting :-) ...
Germán W.
Thank you very much! I was able to solve this by adding a .htaccess file in the root directory with the line "AddDefaultCharset utf-8" (it didnt work setting it to off).
Cheers to you!
François
I'm amazed that my little post still continues to help people with that stupid Apache configuration. That was the purpose after all ;-).
@Germán W.: that probably means that you're not setting the charset at all anywhere in your pages and scripts. If all your pages are in UTF-8 then that's fine, what you're doing is what Apache 2 does by default, only with a more suitable charset than Latin1!
davlee
AAAH!! THANK YOU.
I thought I was losing my mind.
silence
Thank you, this page is a life saver... actually also nerve saver :P I am glad there are people out there like you that will put together this type of information together and let us google it.. precious..
Andrej
I found this to work at my provider:
CharsetDisable On
AddDefaultCharset utf-8
dan jones
hey - thanks for the info. I have been having a rough time messing around with character encodings on my blog (accented characters showing up as jumbled codes, etc) and finding this post basically solved the problem. apparently, according to my admin, more recent patches of Apache 1.3 have this setting on by default too. thanks for the tip!
-
thanx for this info, helped me out
zhou
very useful!!
Bas
We run an application that allows the user to upload their own files. It is up to them, or their tools, what encoding they choose. We therefore set
AddDefaultCharset off
And add a setting to the header of pages that require a specific encoding (utf-8)
sam
François, thanks a lot for this info - I've spent hours now on trying to resolve this painful behavior. awesome!!!
sam
Borja
Thank you so much for your posting. Amezingly clarifying!!!
Naim
If you are using PHP5 with Apache, then the problem can also reside in php.ini.
I could not understand why apache 1.3.34 kept using the iso-8859-1 charset since I had changed AddDefaultCharset to Off.
After about two hours of strugle I stumbled upon this in php.ini:
default_mimetype = "text/html"
default_charset = "iso-8859-1"
To disable those set them to empty like this:
default_mimetype = ""
default_charset = ""
And finally my problem went away :-)
Kim
If you are using PHP5 with Apache, then the problem can also reside in php.ini.
I could not understand why apache 1.3.34 kept using the iso-8859-1 charset since I had changed AddDefaultCharset to Off.
After about two hours of strugle I stumbled upon this in php.ini:
default_mimetype = "text/html"
default_charset = "iso-8859-1"
To disable those set them to empty like this:
default_mimetype = ""
default_charset = ""
And finally my problem went away :-)
JoTo
Hey Francois,
thank you so much for this great hint. I have had the same problem with german umlauts when i transfered 100ers of pages in our intranet from Apache 1.3 to 2.0.
Your above mentioned solution worked like charm.
You saved me hours of hours of work.
THANK YOU AGAIN
JoTo
Thomas Herbstreuth
Thanks a lot. Made my day :-)
Thomas
Stephane Leclercq
Thanks a lot François, I stumbled upon this problem while internationalizing a gwt/j2ee application. You prevented me from having more grey hair ;)
João Ascenso
Solved my problem. Many thanks for the info.
Simon Guillem-Lessard
Thanks for the post and also to Kim (http://padawan.info/2004/07/debugging-chars.html#comment-11858), After playing around with Apache, I resolved my charset problem by adding in the vhost :
AddDefaultCharset ISO-8859-1
php_value default_charset "ISO-8859-1"
CN
I was reluctant at first to post my thanks on an article from 2004, but it seems like I'm not the only one who is still finding this page useful!
Thank you so much -- not only for creating this page but for *explaining* why it works: the server settings take priority over the encoding specified by the page, unless you tell it otherwise.
I couldn't figure that out from the official Apache docs or my first few searches.
Your advice helped me display pages created in iWeb (a Macintosh HTML editor) correctly on our Apache server.
Thank you kindly.
Celia
Hey Francois,
i have a big problem, with which i have fighted for weeks, even at the weekend.
We use Websphere plus Apache. I wish to open PDF in Browser. I set the contenttype "Application/pdf" and disposition und contentlenghth. (but not charset). Wenn the pdf is opened, it shows unreadable codes. Do you now why??
I hope so much that you can help me ....
Thanks in advanced
Celia
François Nonnenmacher en réponse au commentaire de Celia
Before looking at the web server (I don't think it's where your problem is), I would first verify that the encoding issue isn't within the PDF itself. I've seen that before, and it's due to the way (or the software with which) the PDF is generated in the first place.
Celia
It is dued to the RequestDispatcher. I use it to forward/include my request to another Servlet, and
afterthat i cannot set contentType. now it works. The browser shows my pdf file. I use the web developer toolbar to view the header, the contenttype is still text/html. Even after i deleted the row "DefaultType text/html" in the httpd.conf. (we don't use .htaccess-files). Any idea why? Following is the header:
Date: Mon, 15 Sep 2008 10:12:07 GMT
Server: IBM_HTTP_Server/6.1.0.17 Apache/2.0.47 (Unix)
Content-Length: 8859
Keep-Alive: timeout=10, max=96
Connection: Keep-Alive
Content-Type: text/html
Content-Language: en-US
200 OK
François Nonnenmacher en réponse au commentaire de Celia
No idea besides maybe the fact that serving a PDF file with a content-type of text/html might not be a good idea ;-).
Michiel
You just saved my life :D