[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ next ]

Introduction to i18n
Chapter 11 - Libraries and Components


We sometimes use libraries and components which are not very popular. We may have to pay special attention for internationalization of these libraries and components.

On the other hand, we can use libraries and components for improvement of internationalization. This chapter introduces such a libraries and components.


11.1 Gettext and Translation

GNU Gettext is a tool to internationalize messages a software outputs according to locale status of LC_MESSAGES. A gettextized software contains messages written in various languages (according to available translators) and a user can choose them using environmental variables. GNU gettext is a part of Debian system.

Install gettext package and read info pages for details.

Don't use non-ASCII characters for 'msgid'. Be careful because you may tend to use ISO-8859-1 characters. For example, '©' (copyright mark; you may be not able to read the copyright mark NOW in THIS document) is non-ASCII character (0xa9 in ISO-8859-1). Otherwise, translators may feel difficulty to edit catalog files because of conflict between encodings for msgid and in msgstr.

Be sure the message can be displayed in the assumed environment. In other words, you have to read the chapter of 'Output to Display' in this document and internationalize the output mechanism of your software prior to gettextization. ENGLISH MESSAGES ARE PREFERRED EVEN FOR NON-ENGLISH-SPEAKING PEOPLE, THAN MEANINGLESS BROKEN MESSAGES.

The 2nd (3rd, ...) byte of multibyte characters or all bytes of non-ASCII characters in stateful encodings can be 0x5c (same to backslash in ASCII) or 0x22 (same to double quote in ASCII). These characters have to properly escaped because present version of GNU gettext doesn't care the 'charset' subitem of 'Content-Type' item for 'msgstr'.

A gettexted message must not used in multiple contexts. This is because a word may have different meaning in different context. For example, a verb means an order or a command if it appears at the top of the sentence in English. However, different languages have different grammar. If a verb is gettexted and it is used both in a usual sentence and in an imperative sentence, one cannot translate it.

If a sentence is gettexted, never divide the sentence. If a sentence is divided in the original source code, connect them so as to single string contains the full sentence. This is because the order of words in a sentence is different among languages. For example, a routine

     printf("There ");
     switch(num_of_files) {
     case 0:
             printf("are no files ");
             break;
     case 1:
             printf("is 1 file ");
             break;
     default:
             printf("are %d files ", num_of_files);
             break;
     }
     printf("in %s directory.\n", dir_name);

has to be written like that:

     switch(num_of_files) {
     case 0:
             printf("There are no files in %s directory", dir_name);
             break;
     case 1:
             printf("There is 1 file in %s directory", dir_name);
             break;
     default:
             printf("There are %d files in %s directory", num_of_files, dir_name);
             break;
     }

before it is gettextized.

A software with gettexted messages should not depend on the length of the messages. The messages may get longer in different language.

When two or more '%' directive for formatted output functions such as printf() appear in a message, the order of these '%' directives may be changed by translation. In such a case, the translator can specify the order. See section of 'Special Comments preceding Keywords' in info page of gettext for detail.

Now there are projects to translate messages in various softwares. For example, Translation Project.


11.1.1 Gettext-ization of A Software

At first, the software has to have the following lines.

     int main(int argc, char **argv)
     {
             ...
             setlocale (LC_ALL, "");   /* This is not for gettext but 
                                          all i18n software should have
                                          this line. */
             bindtextdomain (PACKAGE, LOCALEDIR);
             textdomain (PACKAGE);
             ...
     }

where PACKAGE is the name of the catalog file and LOCALEDIR is "/usr/share/locale" for Debian. PACKAGE and LOCALEDIR should be defined in a header file or Makefile.

It is convenient to prepare the following header file.

     #include <libintl.h>
     #define _(String) gettext((String))

and messages in source files should be written as _("message"), instead of "message".

Next, catalog files have to be prepared.

At first, a template for catalog file is prepared using xgettext. At default a template file message.po is prepared. [27]


11.1.2 Translation

Though gettextization of a software is a temporal work, translation is a continuing work because you have to translate new (or modified) messages when (or before) a new version of the software is released.


11.2 Readline Library

***** Not written yet *****

Readline library need to be internationalized.


11.3 Ncurses Library

***** Not written yet *****

Ncurses is a free implementation of curses library. Though this library is now maintained by Free Software Foundation, it is not covered by GNU General Public License.

Ncurses library need to be internationalized.


[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ next ]

Introduction to i18n

14 February 2003
Tomohiro KUBOTA kubota@debian.org