6.12. Localisation and Unicode
Karrigell includes a program to facilitate localization of scripts
6.12.1 Translation
In a script, every time you want a message translated into a given language, instead of writing it as a normal string with quotes, it's written using a function called _, this way :
print _("Hello everybody")
In Python Inside HTML (PIH) you can use the shortcut <%_ > :
<%_ "Hello everybody" %>
The administration menu provides a simple web interface to create and modify translations of strings
6.12.2 Unicode support
Unicode is a normalized standard used to represent all the writing styles in the world. For each sign (a letter in any alphabet, an ideogram in an Asiatic language) Unicode defines a unique number, called a "code point". Since computers and networks can only manage bytes, a mapping between "code points" and one or several bytes must be defined ; these mappings are called "encodings"
Because there are many different encodings, when a program has to print a sign (a greek letter, a math symbol, a Chinese sign) it must receive two pieces of information : the string representing the sign (a sequence of bytes) and the encoding used. If it receives only a string, the program can try to guess an encoding (this is what a web browser usually does) but with no guarantee of success
The best thing to do when you write a script is to define explicitely the
encoding used : for this, you can use the built-in function
SET_UNICODE_OUT(encoding)
, where encoding
is a string like 'iso-8859-1' or 'utf-8'
If not set, the encoding for the document will be the one defined in the
host configuration file by output_encoding
. The default value is
None
, meaning that no encoding is defined : it's much safer
to define one, usually 'iso-8859-1' for languages using the latin alphabet
and 'utf-8' for other writings. If not defined, you rely on the browser
for guessing the encoding used, which can lead to unexpected rendering
6.12.3 Example
from HTMLTags import * def index(): SET_UNICODE_OUT("utf-8") print FORM(INPUT(name="foo")+INPUT(Type="submit",value="Ok"), action="bar") def bar(foo): foo = unicode(foo,"utf-8").encode("iso-8859-1") SET_UNICODE_OUT("iso-8859-1") print foo
In index()
, we set the encoding to utf-8 ; the browser will send the
value enteredby the user encoded with this encoding
The function bar
receives the value foo as a bytestring, the utf-8
encoding of a Unicode string. We want to print it using another encoding, set by the
line SET_UNICODE_OUT("iso-8859-1")
: so we must first encode the
Unicode string in this encoding, which is done in the first line of bar()
.
We can then print foo, it will be rendered as expected