Wiki source code of UTF-8 Encoding Tips

Version 9.1 by Ramsey Gurley on 2009/11/17 19:39

Hide last authors
Antoine Berry 6.1 1 = UTF-8 Encoding Tips =
2
Ramsey Gurley 8.1 3 Encoding questions are asked frequently on the mailing list. This is just a collection of tips for using UTF-8. It's a checklist of sorts. Make sure you've done all the things specified here before pitching your computer into the ocean :)
Antoine Berry 6.1 4
5 == Check your database ==
6
Ramsey Gurley 8.1 7 The database needs to be storing values in UTF-8. If it isn't, then all your effort is wasted. For example, on MySQL that means a db url like
Antoine Berry 6.1 8
9 {{noformat}}
10
11 jdbc:mysql://localhost/Example?capitalizeTypenames=true&zeroDateTimeBehavior=convertToNull&useUnicode=true&characterEncoding=UTF-8
12
13 {{/noformat}}
14
15 And setting your default charset and collation in your my.cnf file
16
Ramsey Gurley 8.1 17 [[image:options.png]]
Antoine Berry 6.1 18
19 == Fonts & CSS ==
20
Ramsey Gurley 8.1 21 Different fonts may not have all the characters to display the different characters. If you're using a default stylesheet, then the browsers may be displaying differently simply because of fonts. Speaking of stylesheets, you probably want to encode that in UTF-8 also. Start your stylesheet with something like
Antoine Berry 6.1 22
23 {{noformat}}
24
25 @charset "UTF-8";
26 @import url("reset.css");
27
28 /* Begin site CSS */
29
30 {{/noformat}}
31
32 == Set eclipse encoding ==
33
Ramsey Gurley 8.1 34 [[image:preferences.png]]
Antoine Berry 6.1 35
36 == Use Project Wonder ==
37
Ramsey Gurley 8.1 38 I think this goes without saying but: **Use Wonder**. Set encoding in the properties file. Notice it is UTF-8 with a hyphen. It it always UTF-8 with a hyphen... well, except with the MySQL image above because they excel in doing things differently :)
Antoine Berry 6.1 39
40 {{noformat}}
41
42 # Project Encoding
43 er.extensions.ERXApplication.DefaultEncoding=UTF-8
44
45 {{/noformat}}
46
47 == Set encoding in your page wrapper ==
48
49 {{noformat}}
50
51 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
52 <!DOCTYPE html PUBLIC
53 "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
54 "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg-flat.dtd">
55
56 {{/noformat}}
57
58 == Localizable strings should be in UTF-16! ==
59
Ramsey Gurley 8.1 60 Localizable.strings should be encoded in UTF-16. The localizer can detect UTF-16 without error, where it can confuse UTF-8 with other encodings. Pascal says use UTF-16LE if you want to be explicit about things... Especially if you are editing your strings files in an external editor like BBEdit or whatnot. I use the eclipse editor and UTF-16 myself and all seems to work fine for me. So to each his own.
Antoine Berry 6.1 61
Ramsey Gurley 8.1 62 == Your tips go here. ==
Antoine Berry 6.1 63
Ramsey Gurley 8.1 64 It's a wiki, ya know :)