it is obvious that a text editor needs ability to input text from keyboard, otherwise the text editor is entirely useless. Similarly, an internationalized text editor needs ability to input characters used for various languages. Other softwares such as shells, libraries such as readline, environments such as consoles and X terminal emulators, script languages such as perl, tcl/tk, python, and ruby, and application softwares such as word processors, draw and paints, file managers such as Midnight Commander, web browsers, mailers, and so on also need ability to input internationalized text. Otherwise these softwares are entirely useless.
There are various languages in the world. Thus, proper input methods vary from languages to languages.
Different technologies are used for these languages. The aim of this chapter is to introduce technologies for them.
Ideally, it is a responsibility for console and X terminal emulators to supply an input method. This situation is already achieved for simple languages which don't need complicated input methods. Thus, non-X softwares don't need to care about input methods.
There are a few Debian packages for consoles and X terminal emulators which supply input methods for particular languages.
And more, there are a few softwares which supply input methods for existing console environment.
However, since input methods for complex languages have not been available historically, a few non-X softwares have been developed with input methods.
You have to take care of the differences between number of characters,
columns, and bytes. For example, you can find immediately
that bash
cannot handle UTF-8 input properly when you invoke
bash
on UTF-8 Xterm and push BackSpace key. This is because
readline
always erase one column on the screen and one byte in the
internal buffer for one stroke of 'BackSpace' key. To solve this problem,
wide character should be used for internal processing. One
stroke of 'BackSpace' should erase wcwidth() columns on the screen
and one wchar_t unit in the internal buffer.
X11R5 is the first internationalized version of X Window System. However, X11R5 supplied two sample implements of international text input. They are Xsi and Ximp. Existence of two different protocols was an annoying situation. However, X11R6 determined XIM, a new protocol for internationalized text input, as the standard. Internationalized X softwares should support text input using XIM.
They are designed using server-client model. The client calls the server when necessary. The server supplies conversion from key stroke to internationalized text.
Kinput and kinput2 are protocols for Japanese
text input, which existed before X11R5. Some softwares such as
kterm
and so on supports kinput2 protocol. kinput2
is the server software. Since the current version of kinput2
supports XIM protocol, you don't need to support kinput protocol.
***** Not written yet *****
Development of XIM client is a bit complicated. You can read source code for
rxvt
and xedit
to study.
Programming for
Japanse characters input
is a good introduction to XIM programming.
The following are examples of softwares which can work as XIM clients.
krxvt
, kterm
, and so on.
xedit
, gedit
, and so on.
mozilla
.
The following are examples of softwares which can work as XIM servers.
kinput
and skkinput
for Japanese.
Here I will explain how to use XIM input with Debian system. This will help developers and package maintainers who want to test XIM facility of their softwares. Debian Woody or later systems are assumed.
At first, locale database has to be prepared. Uncomment ja_JP.EUC-JP
EUC-JP, ko_KR.EUC-KR EUC-KR, zh_CN.GB2312, and
zh_TW BIG5 lines in /etc/locale.gen and invoke
/usr/sbin/locale-gen
. This will prepare locale database under
/usr/share/locale/. For systems other than Debian Woody or later,
please take the valid procedure for these systems to prepare locale database.
Basic Chinese, Japanese, and Korean X fonts are included in
xfonts-base
package for Debian Woody and later.
XIM server must be installed. For Japanese,
kinput2
or skkinput
packages are available.
kinput2
supports Japanese input engines of Canna
and FreeWnn and skkinput
supports
SKK. For Korean, ami
is
available. For traditional Chinese and simplified
Chinese, xcin
is available.
Of course you need an XIM client software. xedit
in
xbase-clients
package is an example of XIM client.
Then, login as a non-root user. Environment variables of LC_ALL (or LANG) and XMODIFIERS must be set as following.
Then invoke the XIM server. Just invoke it with background mode (with &). kinput2 and ami don't open a new window while xcin opens a new window and outputs some messages.
Then invoke the XIM client. Focus on an input area of the software. Hit Shift-Space or Control-Space and type something. Did some strange characters appear? This document is too brief to explain how to input valid CJK characters and sentences with these XIM servers. Please consult documents of XIM servers.
GNU Emacs and XEmacs take an entirely different model for international input.
They supply all input methods for various languages. Instead of relying on console or XIM, they use these input methods. These input methods can be selected by M-x set-input-method command. The selected input method can be switched on and off by M-x toggle-input-method command.
GNU Emacs supplies input methods for British, Catalan, Chinese (array30, 4corner, b5-quick, cns-quick, cns-tsangchi, ctlau, ctlaub, ecdict, etzy, punct, punct-b5, py, py-b5, py-punct, py-punct-b5, qj, qj-b5, sw, tonepy, ziranma, zozy), Czech, Danish, Devanagari, Esperanto, Ethiopic, Finnish, French, German, Greek, Hebrew, Icelandic, IPA, Irish, Italian, Japanese (egg-wnn, skk), Korean (hangul, hangul3, hanja, hanja3), Lao, Norwegian, Portuguese, Romanian, Scandinavian, Slovak, Spanish, Swedish, Thai, Tibetan, Turkish, Vietnamese, Latin-{1,2,3,4,5}, Cyrillic (beylorussian, jcuken, jis-russian, macedonian, serbian, transit, transit-bulgarian, ulrainian, yawerty), and so on.
Introduction to i18n
14 February 2003kubota@debian.org