Finally, UTF-8 locale (and about Compose)

Pá 23 září 2005

I have finally bitten the bullet and switched my locale to cs_CZ.UTF-8. When still writing this blog in gvim (the end of my relation with vim and here), I begun to write it in UTF-8 and it was such a relief. Suddenly, I didn‘t have to use ugly kludges like `` or --. Of course, the problem is that there are so many supplementary characters which could be suddenly used, that no keyboard layout is able to handle all of them (I think) and some other solution has to be found. Vim has digraphs which are really quite useful, but as everything else in vim, there is no connection to the outside world. Switch to Kate/KWrite was very pleasant issue, but obviously there are no digraphs native to them. My first reaction was to use HTML entities and translate them to the pure UTF-8 version with my special Python script. However, I felt very strongly that this is not the way.

I asked on cz.comp.linux about experience of people with inserting these non-keyboardish characters and the answer was “Use Compose key”. I begun to search on Google for the answer how to make it work and finally I found that actually the best source of information about the combinations of keys for Compose (aside from the article on Wikipedia) is directly in my computer. The only problem was that with ISO 8859-2 based locale only very small part of keys actually worked. This was the last straw which broke my back of resistance towards switching whole computer to UTF-8. The problem is (as always) Midnight Commander, which Debian version doesn’t work with UTF-8 at all (especially, panel frames are affected by this). So, again, Googling and Googling until I've found this thread on some discussion board, which contains a link to patched version of MC (requires also non-standard version of slang), which somehow works in my console. However, MC is not a critical for me anymore, now when Krusader is finally stable enough and featurefull enough to compete with MC.

One more problem—when I have switched to UTF-8 many filenames with accented characters were suddenly broken. I thought that Linux filesystems store all metada in UTF-8 already. Oh well, they probably don’t. So I had to run output of locate through cstocs and then to find out with diff what all has been changed.

Looking at all this issue with at least some distance, it seems to that actually Compose key combines best from all the options—it works as well as vim’s digraphs, but it is X11-wide, which is cool (and yes, of course, it is much better than M$-Windows’s Alt+<number>).

Category: computer Tagged: utf8 linux