Beware of the ç (bis repetita)

Published on:

More than five years ago I wrote this, in beware of the ç:

Don't you think that, in 2003, it's an absolute shame that a character as common as ç can still cause such trouble for most information technologies out there?
[...] I can't tell you how many times a day I struggle with registration systems, cookies, etc. that cannot properly swallow my first name. François is my second name :(.
I think I've found the worst example of the US ASCII monoculture with Offbeat Guides. Here's what it does with my first name, François:
  • in an internal URL: /name/Fran%C3%83%C2%83%C3%82%C2%83%C3%83%C2%82%C3%82%C2%A7ois/
  • in the resulting page: François

Spooky. Is it really that hard for American coders to learn about something else than US ASCII? Is Unicode an extra-terrestrial character encoding that's unknown of or out of reach for you guys?

Apparently, exporting IT jobs to Mumbai didn't help a bit ;-p.

For the record, Joel Spolsky wrote The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) in 2003 with this pearl:

So I have an announcement to make: if you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will.

P.S. Wow, I got a response from David Sifry exactly 3 minutes after I sent them an email to point this out. That was fast! It's a bug, not a design misconception, as he tells me everything is stored as UTF-8 internally.