unicode
Fun with unicode, aka the limbo of coding
How do I use unicode with zope, page templates and the zmi?
(you can also jump to the final conclusion at the bottom <#bottom>....)
Ok, there are two issues - displaying unicode, and getting unicode in
from forms. First, I need to remember that unicode are abstract objects,
that can be encoding to a certain mapping, e.g. utf-8, which represents it.
To display unicode, all we need to do is tell the browser (and zope in
the same go) that we want and have utf-8 encoding:
http://wiki.zope.org/zope2/HowToInternationaliseWithPTS#encoding
e.g.
Now, the page will be encoded in utf-8, and zope knows that this is its
job, so it tries its best.
Ok, the way back needs to participants: the browser encoding form input
in the proper way, and zope knowing whats coming:
The browser knows what encoding to use because we have set the encoding
to utf-8 in the first place - logic being, that if the whole page is
utf-8, so is the form, so is the content of the form when submitted.
Now, the browser keeps that encoding as its little secret, and does not
tell the server its posting the form to.
e.g.
Host: localhost:8094
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20070321 Firefox/2.0.0.3 (Swiftfox)
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.7,de-de;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
So, how does the server (aka zope) know about it? Well, here its getting
clumsy. We need to tell for each and every itsy tiny form field that a
unicode is coming, and be how it is encoded.
e.g
or
(utf8 and utf-8 both seem to work)
Now, zope knows that its unicode, and proper utf-8 encoded, and we get
unicode(!) objects, like u'foo' in our scripts. Great!
But...
How to store them?
If storing it in a normal string property, it seems to work as well....
...kind of. Try to change the content in the ZMI, and you get an error.
Point being - the ZMI needs to know that the property contains unicode
as as well.
In the zmi we have the choice of string/ustring for the properties. Not
hard to guess that ustring is the right one. If we store our beloved
unicode string object in a nice and cosy ustring property, it sleeps
really nice in its little place in the ZODB, and the ZMI is cool and
happy, because it knows about it.
So, the results:
* Set the content-type for output to utf-8:
setHeader('Content-Type','text/html;; charset=utf-8')
* Mark all the form fields as unicode and utf8: name="text:ustring:utf8"
* Store them in proper unicode fields: ustring, ulines.
The alternative would be, it seems, to not mark the fields, get utf-8
instead of unicode into the scripts, and then either decode or store the
utf-8 strings... but this surely leads to one hell of a mess, I would
say - and how would you change data in the zmi?
(...lot of testing in the meantime....)
Ok, there is one problem - you can't turn the title of objects into a
ustring. So what know? Surprisingly simple, it seems:
* Set the content-type for output to utf-8:
setHeader('Content-Type','text/html;; charset=utf-8')
* Don't mark the fields specially
* Set a property called management_page_charset to utf-8 on the apps
topfolder or the rootfolder
The result: we send out utf-8, get utf-8 back, and store it as utf-8. No
encoding done at all. The zmi knows about it, and displays all strings
as utf-8. We know only need to make sure that our indexes in the
catalogs know about it (do we actually?)
Sources:
* http://wiki.zope.org/zope2/Internationalization
* http://www.zope.org/Members/htrd/howto/unicode
* http://article.gmane.org/gmane.comp.web.zope.plone.internationalization/1076
* tav, of course
