Beware of the ç (bis repetita)
More than five years ago I wrote this, in beware of the ç:
Don't you think that, in 2003, it's an absolute shame that a character as common as ç can still cause such trouble for most information technologies out there?I think I've found the worst example of the US ASCII monoculture with Offbeat Guides. Here's what it does with my first name, François:
[...] I can't tell you how many times a day I struggle with registration systems, cookies, etc. that cannot properly swallow my first name. François is my second name :(.
- in an internal URL: /name/Fran%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%A7ois/
- in the resulting page: FranÃÂçois
Spooky. Is it really that hard for American coders to learn about something else than US ASCII? Is Unicode an extra-terrestrial character encoding that's unknown of or out of reach for you guys?
Apparently, exporting IT jobs to Mumbai didn't help a bit ;-p.
For the record, Joel Spolsky wrote The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) in 2003 with this pearl:
So I have an announcement to make: if you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will.
P.S. Wow, I got a response from David Sifry exactly 3 minutes after I sent them an email to point this out. That was fast! It's a bug, not a design misconception, as he tells me everything is stored as UTF-8 internally.