UTF-8 Encoding Tips

Version 9.1 by Ramsey Gurley on 2009/11/17 19:39
Warning: For security reasons, the document is displayed in restricted mode as it is not the current version. There may be differences and errors due to this.

UTF-8 Encoding Tips

Encoding questions are asked frequently on the mailing list. This is just a collection of tips for using UTF-8. It's a checklist of sorts.  Make sure you've done all the things specified here before pitching your computer into the ocean emoticon_smile

Check your database

The database needs to be storing values in UTF-8.  If it isn't, then all your effort is wasted.  For example, on MySQL that means a db url like

Unknown macro: noformat. Click on this message for details.

And setting your default charset and collation in your my.cnf file

options.png

Fonts & CSS

Different fonts may not have all the characters to display the different characters.  If you're using a default stylesheet, then the browsers may be displaying differently simply because of fonts.  Speaking of stylesheets, you probably want to encode that in UTF-8 also. Start your stylesheet with something like

Unknown macro: noformat. Click on this message for details.

Set eclipse encoding

preferences.png

Use Project Wonder

I think this goes without saying but: Use Wonder. Set encoding in the properties file. Notice it is UTF-8 with a hyphen. It it always UTF-8 with a hyphen... well, except with the MySQL image above because they excel in doing things differently emoticon_smile

Unknown macro: noformat. Click on this message for details.

Set encoding in your page wrapper

Unknown macro: noformat. Click on this message for details.

Localizable strings should be in UTF-16!

Localizable.strings should be encoded in UTF-16. The localizer can detect UTF-16 without error, where it can confuse UTF-8 with other encodings. Pascal says use UTF-16LE if you want to be explicit about things...  Especially if you are editing your strings files in an external editor like BBEdit or whatnot.  I use the eclipse editor and UTF-16 myself and all seems to work fine for me. So to each his own.

Your tips go here.

It's a wiki, ya know emoticon_smile