jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: GNU gettext utilities: 11. The Programmer's View jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68:

jpayne@68: jpayne@68:

jpayne@68: jpayne@68: jpayne@68:

11. The Programmer's View

jpayne@68: jpayne@68: jpayne@68:

One aim of the current message catalog implementation provided by jpayne@68: GNU gettext was to use the system's message catalog handling, if the jpayne@68: installer wishes to do so. So we perhaps should first take a look at jpayne@68: the solutions we know about. The people in the POSIX committee did not jpayne@68: manage to agree on one of the semi-official standards which we'll jpayne@68: describe below. In fact they couldn't agree on anything, so they decided jpayne@68: only to include an example of an interface. The major Unix vendors jpayne@68: are split in the usage of the two most important specifications: X/Open's jpayne@68: catgets vs. Uniforum's gettext interface. We'll describe them both and jpayne@68: later explain our solution of this dilemma. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.1 About `catgets`

jpayne@68: jpayne@68:

The catgets implementation is defined in the X/Open Portability jpayne@68: Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the jpayne@68: process of creating this standard seemed to be too slow for some of jpayne@68: the Unix vendors so they created their implementations on preliminary jpayne@68: versions of the standard. Of course this leads again to problems while jpayne@68: writing platform independent programs: even the usage of catgets jpayne@68: does not guarantee a unique interface. jpayne@68:

jpayne@68:

Another, personal comment on this that only a bunch of committee members jpayne@68: could have made this interface. They never really tried to program jpayne@68: using this interface. It is a fast, memory-saving implementation, an jpayne@68: user can happily live with it. But programmers hate it (at least I and jpayne@68: some others do…) jpayne@68:

jpayne@68:

But we must not forget one point: after all the trouble with transferring jpayne@68: the rights on Unix they at last came to X/Open, the very same who jpayne@68: published this specification. This leads me to making the prediction jpayne@68: that this interface will be in future Unix standards (e.g. Spec1170) and jpayne@68: therefore part of all Unix implementation (implementations, which are jpayne@68: allowed to wear this name). jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.1.1 The Interface

jpayne@68: jpayne@68:

The interface to the catgets implementation consists of three jpayne@68: functions which correspond to those used in file access: catopen jpayne@68: to open the catalog for using, catgets for accessing the message jpayne@68: tables, and catclose for closing after work is done. Prototypes jpayne@68: for the functions and the needed definitions are in the jpayne@68: <nl_types.h> header file. jpayne@68:

jpayne@68: jpayne@68:

catopen is used like in this: jpayne@68:

jpayne@68:

nl_catd catd = catopen ("catalog_name", 0);
jpayne@68:

jpayne@68: jpayne@68:

The function takes as the argument the name of the catalog. This usual jpayne@68: refers to the name of the program or the package. The second parameter jpayne@68: is not further specified in the standard. I don't even know whether it jpayne@68: is implemented consistently among various systems. So the common advice jpayne@68: is to use 0 as the value. The return value is a handle to the jpayne@68: message catalog, equivalent to handles to file returned by open. jpayne@68:

jpayne@68: jpayne@68:

This handle is of course used in the catgets function which can jpayne@68: be used like this: jpayne@68:

jpayne@68:

char *translation = catgets (catd, set_no, msg_id, "original string");
jpayne@68:

jpayne@68: jpayne@68:

The first parameter is this catalog descriptor. The second parameter jpayne@68: specifies the set of messages in this catalog, in which the message jpayne@68: described by msg_id is obtained. catgets therefore uses a jpayne@68: three-stage addressing: jpayne@68:

jpayne@68:

catalog name ⇒ set number ⇒ message ID ⇒ translation
jpayne@68:

jpayne@68: jpayne@68: jpayne@68:

The fourth argument is not used to address the translation. It is given jpayne@68: as a default value in case when one of the addressing stages fail. One jpayne@68: important thing to remember is that although the return type of catgets jpayne@68: is char * the resulting string must not be changed. It jpayne@68: should better be const char *, but the standard is published in jpayne@68: 1988, one year before ANSI C. jpayne@68:

jpayne@68: jpayne@68:

The last of these functions is used and behaves as expected: jpayne@68:

jpayne@68:

catclose (catd);
jpayne@68:

jpayne@68: jpayne@68:

After this no catgets call using the descriptor is legal anymore. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.1.2 Problems with the `catgets` Interface?!

jpayne@68: jpayne@68:

Now that this description seemed to be really easy — where are the jpayne@68: problems we speak of? In fact the interface could be used in a jpayne@68: reasonable way, but constructing the message catalogs is a pain. The jpayne@68: reason for this lies in the third argument of catgets: the unique jpayne@68: message ID. This has to be a numeric value for all messages in a single jpayne@68: set. Perhaps you could imagine the problems keeping such a list while jpayne@68: changing the source code. Add a new message here, remove one there. Of jpayne@68: course there have been developed a lot of tools helping to organize this jpayne@68: chaos but one as the other fails in one aspect or the other. We don't jpayne@68: want to say that the other approach has no problems but they are far jpayne@68: more easy to manage. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.2 About `gettext`

jpayne@68: jpayne@68:

The definition of the gettext interface comes from a Uniforum jpayne@68: proposal. It was submitted there by Sun, who had implemented the jpayne@68: gettext function in SunOS 4, around 1990. Nowadays, the jpayne@68: gettext interface is specified by the OpenI18N standard. jpayne@68:

jpayne@68:

The main point about this solution is that it does not follow the jpayne@68: method of normal file handling (open-use-close) and that it does not jpayne@68: burden the programmer with so many tasks, especially the unique key handling. jpayne@68: Of course here also a unique key is needed, but this key is the message jpayne@68: itself (how long or short it is). See Comparing the Two Interfaces for a more jpayne@68: detailed comparison of the two methods. jpayne@68:

jpayne@68:

The following section contains a rather detailed description of the jpayne@68: interface. We make it that detailed because this is the interface jpayne@68: we chose for the GNU gettext Library. Programmers interested jpayne@68: in using this library will be interested in this description. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.2.1 The Interface

jpayne@68: jpayne@68:

The minimal functionality an interface must have is a) to select a jpayne@68: domain the strings are coming from (a single domain for all programs is jpayne@68: not reasonable because its construction and maintenance is difficult, jpayne@68: perhaps impossible) and b) to access a string in a selected domain. jpayne@68:

jpayne@68:

This is principally the description of the gettext interface. It jpayne@68: has a global domain which unqualified usages reference. Of course this jpayne@68: domain is selectable by the user. jpayne@68:

jpayne@68:

char *textdomain (const char *domain_name);
jpayne@68:

jpayne@68: jpayne@68:

This provides the possibility to change or query the current status of jpayne@68: the current global domain of the LC_MESSAGE category. The jpayne@68: argument is a null-terminated string, whose characters must be legal in jpayne@68: the use in filenames. If the domain_name argument is NULL, jpayne@68: the function returns the current value. If no value has been set jpayne@68: before, the name of the default domain is returned: messages. jpayne@68: Please note that although the return value of textdomain is of jpayne@68: type char * no changing is allowed. It is also important to know jpayne@68: that no checks of the availability are made. If the name is not jpayne@68: available you will see this by the fact that no translations are provided. jpayne@68:

jpayne@68:

To use a domain set by textdomain the function jpayne@68:

jpayne@68:

char *gettext (const char *msgid);
jpayne@68:

jpayne@68: jpayne@68:

is to be used. This is the simplest reasonable form one can imagine. jpayne@68: The translation of the string msgid is returned if it is available jpayne@68: in the current domain. If it is not available, the argument itself is jpayne@68: returned. If the argument is NULL the result is undefined. jpayne@68:

jpayne@68:

One thing which should come into mind is that no explicit dependency to jpayne@68: the used domain is given. The current value of the domain is used. jpayne@68: If this changes between two jpayne@68: executions of the same gettext call in the program, both calls jpayne@68: reference a different message catalog. jpayne@68:

jpayne@68:

For the easiest case, which is normally used in internationalized jpayne@68: packages, once at the beginning of execution a call to textdomain jpayne@68: is issued, setting the domain to a unique name, normally the package jpayne@68: name. In the following code all strings which have to be translated are jpayne@68: filtered through the gettext function. That's all, the package speaks jpayne@68: your language. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.2.2 Solving Ambiguities

jpayne@68: jpayne@68:

While this single name domain works well for most applications there jpayne@68: might be the need to get translations from more than one domain. Of jpayne@68: course one could switch between different domains with calls to jpayne@68: textdomain, but this is really not convenient nor is it fast. A jpayne@68: possible situation could be one case subject to discussion during this jpayne@68: writing: all jpayne@68: error messages of functions in the set of common used functions should jpayne@68: go into a separate domain error. By this mean we would only need jpayne@68: to translate them once. jpayne@68: Another case are messages from a library, as these have to be jpayne@68: independent of the current domain set by the application. jpayne@68:

jpayne@68:

For this reasons there are two more functions to retrieve strings: jpayne@68:

jpayne@68:

char *dgettext (const char *domain_name, const char *msgid);
jpayne@68: char *dcgettext (const char *domain_name, const char *msgid,
jpayne@68:                  int category);
jpayne@68:

jpayne@68: jpayne@68:

Both take an additional argument at the first place, which corresponds jpayne@68: to the argument of textdomain. The third argument of jpayne@68: dcgettext allows to use another locale category but LC_MESSAGES. jpayne@68: But I really don't know where this can be useful. If the jpayne@68: domain_name is NULL or category has an value beside jpayne@68: the known ones, the result is undefined. It should also be noted that jpayne@68: this function is not part of the second known implementation of this jpayne@68: function family, the one found in Solaris. jpayne@68:

jpayne@68:

A second ambiguity can arise by the fact, that perhaps more than one jpayne@68: domain has the same name. This can be solved by specifying where the jpayne@68: needed message catalog files can be found. jpayne@68:

jpayne@68:

char *bindtextdomain (const char *domain_name,
jpayne@68:                       const char *dir_name);
jpayne@68:

jpayne@68: jpayne@68:

Calling this function binds the given domain to a file in the specified jpayne@68: directory (how this file is determined follows below). Especially a jpayne@68: file in the systems default place is not favored against the specified jpayne@68: file anymore (as it would be by solely using textdomain). A jpayne@68: NULL pointer for the dir_name parameter returns the binding jpayne@68: associated with domain_name. If domain_name itself is jpayne@68: NULL nothing happens and a NULL pointer is returned. Here jpayne@68: again as for all the other functions is true that none of the return jpayne@68: value must be changed! jpayne@68:

jpayne@68:

It is important to remember that relative path names for the jpayne@68: dir_name parameter can be trouble. Since the path is always jpayne@68: computed relative to the current directory different results will be jpayne@68: achieved when the program executes a chdir command. Relative jpayne@68: paths should always be avoided to avoid dependencies and jpayne@68: unreliabilities. jpayne@68:

jpayne@68:

wchar_t *wbindtextdomain (const char *domain_name,
jpayne@68:                           const wchar_t *dir_name);
jpayne@68:

jpayne@68: jpayne@68:

This function is provided only on native Windows platforms. It is like jpayne@68: bindtextdomain, except that the dir_name parameter is a jpayne@68: wide string (in UTF-16 encoding, as usual on Windows). jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.2.3 Locating Message Catalog Files

jpayne@68: jpayne@68:

Because many different languages for many different packages have to be jpayne@68: stored we need some way to add these information to file message catalog jpayne@68: files. The way usually used in Unix environments is have this encoding jpayne@68: in the file name. This is also done here. The directory name given in jpayne@68: bindtextdomains second argument (or the default directory), jpayne@68: followed by the name of the locale, the locale category, and the domain name jpayne@68: are concatenated: jpayne@68:

jpayne@68:

dir_name/locale/LC_category/domain_name.mo
jpayne@68:

jpayne@68: jpayne@68:

The default value for dir_name is system specific. For the GNU jpayne@68: library, and for packages adhering to its conventions, it's: jpayne@68:

/usr/local/share/locale
jpayne@68:

jpayne@68: jpayne@68:

locale is the name of the locale category which is designated by jpayne@68: LC_category. For gettext and dgettext this jpayne@68: LC_category is always LC_MESSAGES.(3) jpayne@68: The name of the locale category is determined through jpayne@68: setlocale (LC_category, NULL). jpayne@68: (4) jpayne@68: When using the function dcgettext, you can specify the locale category jpayne@68: through the third argument. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.2.4 How to specify the output character set `gettext` uses

jpayne@68: jpayne@68:

gettext not only looks up a translation in a message catalog. It jpayne@68: also converts the translation on the fly to the desired output character jpayne@68: set. This is useful if the user is working in a different character set jpayne@68: than the translator who created the message catalog, because it avoids jpayne@68: distributing variants of message catalogs which differ only in the jpayne@68: character set. jpayne@68:

jpayne@68:

The output character set is, by default, the value of nl_langinfo jpayne@68: (CODESET), which depends on the LC_CTYPE part of the current jpayne@68: locale. But programs which store strings in a locale independent way jpayne@68: (e.g. UTF-8) can request that gettext and related functions jpayne@68: return the translations in that encoding, by use of the jpayne@68: bind_textdomain_codeset function. jpayne@68:

jpayne@68:

Note that the msgid argument to gettext is not subject to jpayne@68: character set conversion. Also, when gettext does not find a jpayne@68: translation for msgid, it returns msgid unchanged – jpayne@68: independently of the current output character set. It is therefore jpayne@68: recommended that all msgids be US-ASCII strings. jpayne@68:

jpayne@68:

Function: char * bind_textdomain_codeset (const char *domainname, const char *codeset) jpayne@68: jpayne@68:

The bind_textdomain_codeset function can be used to specify the jpayne@68: output character set for message catalogs for domain domainname. jpayne@68: The codeset argument must be a valid codeset name which can be used jpayne@68: for the iconv_open function, or a null pointer. jpayne@68:

jpayne@68:

If the codeset parameter is the null pointer, jpayne@68: bind_textdomain_codeset returns the currently selected codeset jpayne@68: for the domain with the name domainname. It returns NULL if jpayne@68: no codeset has yet been selected. jpayne@68:

jpayne@68:

The bind_textdomain_codeset function can be used several times. jpayne@68: If used multiple times with the same domainname argument, the jpayne@68: later call overrides the settings made by the earlier one. jpayne@68:

jpayne@68:

The bind_textdomain_codeset function returns a pointer to a jpayne@68: string containing the name of the selected codeset. The string is jpayne@68: allocated internally in the function and must not be changed by the jpayne@68: user. If the system went out of core during the execution of jpayne@68: bind_textdomain_codeset, the return value is NULL and the jpayne@68: global variable errno is set accordingly. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.2.5 Using contexts for solving ambiguities

jpayne@68: jpayne@68:

One place where the gettext functions, if used normally, have big jpayne@68: problems is within programs with graphical user interfaces (GUIs). The jpayne@68: problem is that many of the strings which have to be translated are very jpayne@68: short. They have to appear in pull-down menus which restricts the jpayne@68: length. But strings which are not containing entire sentences or at jpayne@68: least large fragments of a sentence may appear in more than one jpayne@68: situation in the program but might have different translations. This is jpayne@68: especially true for the one-word strings which are frequently used in jpayne@68: GUI programs. jpayne@68:

jpayne@68:

As a consequence many people say that the gettext approach is jpayne@68: wrong and instead catgets should be used which indeed does not jpayne@68: have this problem. But there is a very simple and powerful method to jpayne@68: handle this kind of problems with the gettext functions. jpayne@68:

jpayne@68:

Contexts can be added to strings to be translated. A context dependent jpayne@68: translation lookup is when a translation for a given string is searched, jpayne@68: that is limited to a given context. The translation for the same string jpayne@68: in a different context can be different. The different translations of jpayne@68: the same string in different contexts can be stored in the in the same jpayne@68: MO file, and can be edited by the translator in the same PO file. jpayne@68:

jpayne@68:

The ‘gettext.h’ include file contains the lookup macros for strings jpayne@68: with contexts. They are implemented as thin macros and inline functions jpayne@68: over the functions from <libintl.h>. jpayne@68:

jpayne@68: jpayne@68:

const char *pgettext (const char *msgctxt, const char *msgid);
jpayne@68:

jpayne@68: jpayne@68:

In a call of this macro, msgctxt and msgid must be string jpayne@68: literals. The macro returns the translation of msgid, restricted jpayne@68: to the context given by msgctxt. jpayne@68:

jpayne@68:

The msgctxt string is visible in the PO file to the translator. jpayne@68: You should try to make it somehow canonical and never changing. Because jpayne@68: every time you change an msgctxt, the translator will have to review jpayne@68: the translation of msgid. jpayne@68:

jpayne@68:

Finding a canonical msgctxt string that doesn't change over time can jpayne@68: be hard. But you shouldn't use the file name or class name containing the jpayne@68: pgettext call – because it is a common development task to rename jpayne@68: a file or a class, and it shouldn't cause translator work. Also you shouldn't jpayne@68: use a comment in the form of a complete English sentence as msgctxt – jpayne@68: because orthography or grammar changes are often applied to such sentences, jpayne@68: and again, it shouldn't force the translator to do a review. jpayne@68:

jpayne@68:

The ‘p’ in ‘pgettext’ stands for “particular”: pgettext jpayne@68: fetches a particular translation of the msgid. jpayne@68:

jpayne@68: jpayne@68: jpayne@68:

const char *dpgettext (const char *domain_name,
jpayne@68:                        const char *msgctxt, const char *msgid);
jpayne@68: const char *dcpgettext (const char *domain_name,
jpayne@68:                         const char *msgctxt, const char *msgid,
jpayne@68:                         int category);
jpayne@68:

jpayne@68: jpayne@68:

These are generalizations of pgettext. They behave similarly to jpayne@68: dgettext and dcgettext, respectively. The domain_name jpayne@68: argument defines the translation domain. The category argument jpayne@68: allows to use another locale category than LC_MESSAGES. jpayne@68:

jpayne@68:

As as example consider the following fictional situation. A GUI program jpayne@68: has a menu bar with the following entries: jpayne@68:

jpayne@68:

+------------+------------+--------------------------------------+
jpayne@68: | File       | Printer    |                                      |
jpayne@68: +------------+------------+--------------------------------------+
jpayne@68: | Open     | | Select   |
jpayne@68: | New      | | Open     |
jpayne@68: +----------+ | Connect  |
jpayne@68:              +----------+
jpayne@68:

jpayne@68: jpayne@68:

To have the strings File, Printer, Open, jpayne@68: New, Select, and Connect translated there has to be jpayne@68: at some point in the code a call to a function of the gettext jpayne@68: family. But in two places the string passed into the function would be jpayne@68: Open. The translations might not be the same and therefore we jpayne@68: are in the dilemma described above. jpayne@68:

jpayne@68:

What distinguishes the two places is the menu path from the menu root to jpayne@68: the particular menu entries: jpayne@68:

jpayne@68:

Menu|File
jpayne@68: Menu|Printer
jpayne@68: Menu|File|Open
jpayne@68: Menu|File|New
jpayne@68: Menu|Printer|Select
jpayne@68: Menu|Printer|Open
jpayne@68: Menu|Printer|Connect
jpayne@68:

jpayne@68: jpayne@68:

The context is thus the menu path without its last part. So, the calls jpayne@68: look like this: jpayne@68:

jpayne@68:

pgettext ("Menu|", "File")
jpayne@68: pgettext ("Menu|", "Printer")
jpayne@68: pgettext ("Menu|File|", "Open")
jpayne@68: pgettext ("Menu|File|", "New")
jpayne@68: pgettext ("Menu|Printer|", "Select")
jpayne@68: pgettext ("Menu|Printer|", "Open")
jpayne@68: pgettext ("Menu|Printer|", "Connect")
jpayne@68:

jpayne@68: jpayne@68:

Whether or not to use the ‘|’ character at the end of the context is a jpayne@68: matter of style. jpayne@68:

jpayne@68:

For more complex cases, where the msgctxt or msgid are not jpayne@68: string literals, more general macros are available: jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

const char *pgettext_expr (const char *msgctxt, const char *msgid);
jpayne@68: const char *dpgettext_expr (const char *domain_name,
jpayne@68:                             const char *msgctxt, const char *msgid);
jpayne@68: const char *dcpgettext_expr (const char *domain_name,
jpayne@68:                              const char *msgctxt, const char *msgid,
jpayne@68:                              int category);
jpayne@68:

jpayne@68: jpayne@68:

Here msgctxt and msgid can be arbitrary string-valued expressions. jpayne@68: These macros are more general. But in the case that both argument expressions jpayne@68: are string literals, the macros without the ‘_expr’ suffix are more jpayne@68: efficient. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.2.6 Additional functions for plural forms

jpayne@68: jpayne@68:

The functions of the gettext family described so far (and all the jpayne@68: catgets functions as well) have one problem in the real world jpayne@68: which have been neglected completely in all existing approaches. What jpayne@68: is meant here is the handling of plural forms. jpayne@68:

jpayne@68:

Looking through Unix source code before the time anybody thought about jpayne@68: internationalization (and, sadly, even afterwards) one can often find jpayne@68: code similar to the following: jpayne@68:

jpayne@68:

   printf ("%d file%s deleted", n, n == 1 ? "" : "s");
jpayne@68:

jpayne@68: jpayne@68:

After the first complaints from people internationalizing the code people jpayne@68: either completely avoided formulations like this or used strings like jpayne@68: "file(s)". Both look unnatural and should be avoided. First jpayne@68: tries to solve the problem correctly looked like this: jpayne@68:

jpayne@68:

   if (n == 1)
jpayne@68:      printf ("%d file deleted", n);
jpayne@68:    else
jpayne@68:      printf ("%d files deleted", n);
jpayne@68:

jpayne@68: jpayne@68:

But this does not solve the problem. It helps languages where the jpayne@68: plural form of a noun is not simply constructed by adding an jpayne@68: ‘s’ jpayne@68: but that is all. Once again people fell into the trap of believing the jpayne@68: rules their language is using are universal. But the handling of plural jpayne@68: forms differs widely between the language families. For example, jpayne@68: Rafal Maszkowski <rzm@mat.uni.torun.pl> reports: jpayne@68:

jpayne@68:

In Polish we use e.g. plik (file) this way: jpayne@68:
1 plik
jpayne@68: 2,3,4 pliki
jpayne@68: 5-21 pliko'w
jpayne@68: 22-24 pliki
jpayne@68: 25-31 pliko'w
jpayne@68: 
jpayne@68:
and so on (o' means 8859-2 oacute which should be rather okreska, jpayne@68: similar to aogonek). jpayne@68:

jpayne@68: jpayne@68:

There are two things which can differ between languages (and even inside jpayne@68: language families); jpayne@68:

jpayne@68:

jpayne@68: The form how plural forms are built differs. This is a problem with jpayne@68: languages which have many irregularities. German, for instance, is a jpayne@68: drastic case. Though English and German are part of the same language jpayne@68: family (Germanic), the almost regular forming of plural noun forms jpayne@68: (appending an jpayne@68: ‘s’) jpayne@68: is hardly found in German. jpayne@68: jpayne@68:
jpayne@68: The number of plural forms differ. This is somewhat surprising for jpayne@68: those who only have experiences with Romanic and Germanic languages jpayne@68: since here the number is the same (there are two). jpayne@68: jpayne@68:
But other language families have only one form or many forms. More jpayne@68: information on this in an extra section. jpayne@68:

jpayne@68: jpayne@68:

The consequence of this is that application writers should not try to jpayne@68: solve the problem in their code. This would be localization since it is jpayne@68: only usable for certain, hardcoded language environments. Instead the jpayne@68: extended gettext interface should be used. jpayne@68:

jpayne@68:

These extra functions are taking instead of the one key string two jpayne@68: strings and a numerical argument. The idea behind this is that using jpayne@68: the numerical argument and the first string as a key, the implementation jpayne@68: can select using rules specified by the translator the right plural jpayne@68: form. The two string arguments then will be used to provide a return jpayne@68: value in case no message catalog is found (similar to the normal jpayne@68: gettext behavior). In this case the rules for Germanic language jpayne@68: is used and it is assumed that the first string argument is the singular jpayne@68: form, the second the plural form. jpayne@68:

jpayne@68:

This has the consequence that programs without language catalogs can jpayne@68: display the correct strings only if the program itself is written using jpayne@68: a Germanic language. This is a limitation but since the GNU C library jpayne@68: (as well as the GNU gettext package) are written as part of the jpayne@68: GNU package and the coding standards for the GNU project require program jpayne@68: being written in English, this solution nevertheless fulfills its jpayne@68: purpose. jpayne@68:

jpayne@68:

Function: char * ngettext (const char *msgid1, const char *msgid2, unsigned long int n) jpayne@68: jpayne@68:

The ngettext function is similar to the gettext function jpayne@68: as it finds the message catalogs in the same way. But it takes two jpayne@68: extra arguments. The msgid1 parameter must contain the singular jpayne@68: form of the string to be converted. It is also used as the key for the jpayne@68: search in the catalog. The msgid2 parameter is the plural form. jpayne@68: The parameter n is used to determine the plural form. If no jpayne@68: message catalog is found msgid1 is returned if n == 1, jpayne@68: otherwise msgid2. jpayne@68:

jpayne@68:

An example for the use of this function is: jpayne@68:

jpayne@68:

printf (ngettext ("%d file removed", "%d files removed", n), n);
jpayne@68:

jpayne@68: jpayne@68:

Please note that the numeric value n has to be passed to the jpayne@68: printf function as well. It is not sufficient to pass it only to jpayne@68: ngettext. jpayne@68:

jpayne@68:

In the English singular case, the number – always 1 – can be replaced with jpayne@68: "one": jpayne@68:

jpayne@68:

printf (ngettext ("One file removed", "%d files removed", n), n);
jpayne@68:

jpayne@68: jpayne@68:

This works because the ‘printf’ function discards excess arguments that jpayne@68: are not consumed by the format string. jpayne@68:

jpayne@68:

If this function is meant to yield a format string that takes two or more jpayne@68: arguments, you can not use it like this: jpayne@68:

jpayne@68:

printf (ngettext ("%d file removed from directory %s",
jpayne@68:                   "%d files removed from directory %s",
jpayne@68:                   n),
jpayne@68:         n, dir);
jpayne@68:

jpayne@68: jpayne@68:

because in many languages the translators want to replace the ‘%d’ jpayne@68: with an explicit word in the singular case, just like “one” in English, jpayne@68: and C format strings cannot consume the second argument but skip the first jpayne@68: argument. Instead, you have to reorder the arguments so that ‘n’ jpayne@68: comes last: jpayne@68:

jpayne@68:

printf (ngettext ("%2$d file removed from directory %1$s",
jpayne@68:                   "%2$d files removed from directory %1$s",
jpayne@68:                   n),
jpayne@68:         dir, n);
jpayne@68:

jpayne@68: jpayne@68:

See C Format Strings for details about this argument reordering syntax. jpayne@68:

jpayne@68:

When you know that the value of n is within a given range, you can jpayne@68: specify it as a comment directed to the xgettext tool. This jpayne@68: information may help translators to use more adequate translations. Like jpayne@68: this: jpayne@68:

jpayne@68:

if (days > 7 && days < 14)
jpayne@68:   /* xgettext: range: 1..6 */
jpayne@68:   printf (ngettext ("one week and one day", "one week and %d days",
jpayne@68:                     days - 7),
jpayne@68:           days - 7);
jpayne@68:

jpayne@68: jpayne@68:

It is also possible to use this function when the strings don't contain a jpayne@68: cardinal number: jpayne@68:

jpayne@68:

puts (ngettext ("Delete the selected file?",
jpayne@68:                 "Delete the selected files?",
jpayne@68:                 n));
jpayne@68:

jpayne@68: jpayne@68:

In this case the number n is only used to choose the plural form. jpayne@68:

jpayne@68: jpayne@68:

Function: char * dngettext (const char *domain, const char *msgid1, const char *msgid2, unsigned long int n) jpayne@68: jpayne@68:: The dngettext is similar to the dgettext function in the jpayne@68: way the message catalog is selected. The difference is that it takes jpayne@68: two extra parameter to provide the correct plural form. These two jpayne@68: parameters are handled in the same way ngettext handles them. jpayne@68:

jpayne@68: jpayne@68:

Function: char * dcngettext (const char *domain, const char *msgid1, const char *msgid2, unsigned long int n, int category) jpayne@68: jpayne@68:: The dcngettext is similar to the dcgettext function in the jpayne@68: way the message catalog is selected. The difference is that it takes jpayne@68: two extra parameter to provide the correct plural form. These two jpayne@68: parameters are handled in the same way ngettext handles them. jpayne@68:

jpayne@68: jpayne@68:

Now, how do these functions solve the problem of the plural forms? jpayne@68: Without the input of linguists (which was not available) it was not jpayne@68: possible to determine whether there are only a few different forms in jpayne@68: which plural forms are formed or whether the number can increase with jpayne@68: every new supported language. jpayne@68:

jpayne@68:

Therefore the solution implemented is to allow the translator to specify jpayne@68: the rules of how to select the plural form. Since the formula varies jpayne@68: with every language this is the only viable solution except for jpayne@68: hardcoding the information in the code (which still would require the jpayne@68: possibility of extensions to not prevent the use of new languages). jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

The information about the plural form selection has to be stored in the jpayne@68: header entry of the PO file (the one with the empty msgid string). jpayne@68: The plural form information looks like this: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
jpayne@68:

jpayne@68: jpayne@68:

The nplurals value must be a decimal number which specifies how jpayne@68: many different plural forms exist for this language. The string jpayne@68: following plural is an expression which is using the C language jpayne@68: syntax. Exceptions are that no negative numbers are allowed, numbers jpayne@68: must be decimal, and the only variable allowed is n. Spaces are jpayne@68: allowed in the expression, but backslash-newlines are not; in the jpayne@68: examples below the backslash-newlines are present for formatting purposes jpayne@68: only. This expression will be evaluated whenever one of the functions jpayne@68: ngettext, dngettext, or dcngettext is called. The jpayne@68: numeric value passed to these functions is then substituted for all uses jpayne@68: of the variable n in the expression. The resulting value then jpayne@68: must be greater or equal to zero and smaller than the value given as the jpayne@68: value of nplurals. jpayne@68:

jpayne@68: jpayne@68:

The following rules are known at this point. The language with families jpayne@68: are listed. But this does not necessarily mean the information can be jpayne@68: generalized for the whole family (as can be easily seen in the table jpayne@68: below).(5) jpayne@68:

jpayne@68:

Only one form:

Some languages only require one single form. There is no distinction jpayne@68: between the singular and plural form. An appropriate header entry jpayne@68: would look like this: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=1; plural=0;
jpayne@68:

jpayne@68: jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Asian family: Japanese, Vietnamese, Korean
Tai-Kadai family: Thai

jpayne@68: jpayne@68:

Two forms, singular used for one only

This is the form used in most existing programs since it is what English jpayne@68: is using. A header entry would look like this: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=2; plural=n != 1;
jpayne@68:

jpayne@68: jpayne@68:

(Note: this uses the feature of C expressions that boolean expressions jpayne@68: have to value zero or one.) jpayne@68:

jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Germanic family: English, German, Dutch, Swedish, Danish, Norwegian, Faroese
Romanic family: Spanish, Portuguese, Italian
Latin/Greek family: Greek
Slavic family: Bulgarian
Finno-Ugric family: Finnish, Estonian
Semitic family: Hebrew
Austronesian family: Bahasa Indonesian
Artificial: Esperanto

jpayne@68: jpayne@68:

Other languages using the same header entry are: jpayne@68:

jpayne@68:

Finno-Ugric family: Hungarian
Turkic/Altaic family: Turkish

jpayne@68: jpayne@68:

Hungarian does not appear to have a plural if you look at sentences involving jpayne@68: cardinal numbers. For example, “1 apple” is “1 alma”, and “123 apples” is jpayne@68: “123 alma”. But when the number is not explicit, the distinction between jpayne@68: singular and plural exists: “the apple” is “az alma”, and “the apples” is jpayne@68: “az almák”. Since ngettext has to support both types of sentences, jpayne@68: it is classified here, under “two forms”. jpayne@68:

jpayne@68:

The same holds for Turkish: “1 apple” is “1 elma”, and “123 apples” is jpayne@68: “123 elma”. But when the number is omitted, the distinction between singular jpayne@68: and plural exists: “the apple” is “elma”, and “the apples” is jpayne@68: “elmalar”. jpayne@68:

jpayne@68:

Two forms, singular used for zero and one

Exceptional case in the language family. The header entry would be: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=2; plural=n>1;
jpayne@68:

jpayne@68: jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Romanic family: Brazilian Portuguese, French

jpayne@68: jpayne@68:

Three forms, special case for zero

The header entry would be: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
jpayne@68:

jpayne@68: jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Baltic family: Latvian

jpayne@68: jpayne@68:

Three forms, special cases for one and two

The header entry would be: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
jpayne@68:

jpayne@68: jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Celtic: Gaeilge (Irish)

jpayne@68: jpayne@68:

Three forms, special case for numbers ending in 00 or [2-9][0-9]

The header entry would be: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=3; \
jpayne@68:     plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2;
jpayne@68:

jpayne@68: jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Romanic family: Romanian

jpayne@68: jpayne@68:

Three forms, special case for numbers ending in 1[2-9]

The header entry would look like this: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=3; \
jpayne@68:     plural=n%10==1 && n%100!=11 ? 0 : \
jpayne@68:            n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
jpayne@68:

jpayne@68: jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Baltic family: Lithuanian

jpayne@68: jpayne@68:

Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]

The header entry would look like this: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=3; \
jpayne@68:     plural=n%10==1 && n%100!=11 ? 0 : \
jpayne@68:            n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
jpayne@68:

jpayne@68: jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Slavic family: Russian, Ukrainian, Belarusian, Serbian, Croatian

jpayne@68: jpayne@68:

Three forms, special cases for 1 and 2, 3, 4

The header entry would look like this: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=3; \
jpayne@68:     plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;
jpayne@68:

jpayne@68: jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Slavic family: Czech, Slovak

jpayne@68: jpayne@68:

Three forms, special case for one and some numbers ending in 2, 3, or 4

The header entry would look like this: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=3; \
jpayne@68:     plural=n==1 ? 0 : \
jpayne@68:            n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
jpayne@68:

jpayne@68: jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Slavic family: Polish

jpayne@68: jpayne@68:

Four forms, special case for one and all numbers ending in 02, 03, or 04

The header entry would look like this: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=4; \
jpayne@68:     plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
jpayne@68:

jpayne@68: jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Slavic family: Slovenian

jpayne@68: jpayne@68:

Six forms, special cases for one, two, all numbers ending in 02, 03, … 10, all numbers ending in 11 … 99, and others

The header entry would look like this: jpayne@68:

jpayne@68:

Plural-Forms: nplurals=6; \
jpayne@68:     plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 \
jpayne@68:     : n%100>=11 ? 4 : 5;
jpayne@68:

jpayne@68: jpayne@68:

Languages with this property include: jpayne@68:

jpayne@68:

Afroasiatic family: Arabic

jpayne@68:

jpayne@68: jpayne@68:

You might now ask, ngettext handles only numbers n of type jpayne@68: ‘unsigned long’. What about larger integer types? What about negative jpayne@68: numbers? What about floating-point numbers? jpayne@68:

jpayne@68:

About larger integer types, such as ‘uintmax_t’ or jpayne@68: ‘unsigned long long’: they can be handled by reducing the value to a jpayne@68: range that fits in an ‘unsigned long’. Simply casting the value to jpayne@68: ‘unsigned long’ would not do the right thing, since it would treat jpayne@68: ULONG_MAX + 1 like zero, ULONG_MAX + 2 like singular, and jpayne@68: the like. Here you can exploit the fact that all mentioned plural form jpayne@68: formulas eventually become periodic, with a period that is a divisor of 100 jpayne@68: (or 1000 or 1000000). So, when you reduce a large value to another one in jpayne@68: the range [1000000, 1999999] that ends in the same 6 decimal digits, you jpayne@68: can assume that it will lead to the same plural form selection. This code jpayne@68: does this: jpayne@68:

jpayne@68:

#include <inttypes.h>
jpayne@68: uintmax_t nbytes = ...;
jpayne@68: printf (ngettext ("The file has %"PRIuMAX" byte.",
jpayne@68:                   "The file has %"PRIuMAX" bytes.",
jpayne@68:                   (nbytes > ULONG_MAX
jpayne@68:                    ? (nbytes % 1000000) + 1000000
jpayne@68:                    : nbytes)),
jpayne@68:         nbytes);
jpayne@68:

jpayne@68: jpayne@68:

Negative and floating-point values usually represent physical entities for jpayne@68: which singular and plural don't clearly apply. In such cases, there is no jpayne@68: need to use ngettext; a simple gettext call with a form suitable jpayne@68: for all values will do. For example: jpayne@68:

jpayne@68:

printf (gettext ("Time elapsed: %.3f seconds"),
jpayne@68:         num_milliseconds * 0.001);
jpayne@68:

jpayne@68: jpayne@68:

Even if num_milliseconds happens to be a multiple of 1000, the output jpayne@68:

Time elapsed: 1.000 seconds
jpayne@68:

jpayne@68:

is acceptable in English, and similarly for other languages. jpayne@68:

jpayne@68:

The translators' perspective regarding plural forms is explained in jpayne@68: Translating plural forms. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.2.7 Optimization of the *gettext functions

jpayne@68: jpayne@68:

At this point of the discussion we should talk about an advantage of the jpayne@68: GNU gettext implementation. Some readers might have pointed out jpayne@68: that an internationalized program might have a poor performance if some jpayne@68: string has to be translated in an inner loop. While this is unavoidable jpayne@68: when the string varies from one run of the loop to the other it is jpayne@68: simply a waste of time when the string is always the same. Take the jpayne@68: following example: jpayne@68:

jpayne@68:

{
jpayne@68:   while (…)
jpayne@68:     {
jpayne@68:       puts (gettext ("Hello world"));
jpayne@68:     }
jpayne@68: }
jpayne@68:

jpayne@68: jpayne@68:

When the locale selection does not change between two runs the resulting jpayne@68: string is always the same. One way to use this is: jpayne@68:

jpayne@68:

{
jpayne@68:   str = gettext ("Hello world");
jpayne@68:   while (…)
jpayne@68:     {
jpayne@68:       puts (str);
jpayne@68:     }
jpayne@68: }
jpayne@68:

jpayne@68: jpayne@68:

But this solution is not usable in all situation (e.g. when the locale jpayne@68: selection changes) nor does it lead to legible code. jpayne@68:

jpayne@68:

For this reason, GNU gettext caches previous translation results. jpayne@68: When the same translation is requested twice, with no new message jpayne@68: catalogs being loaded in between, gettext will, the second time, jpayne@68: find the result through a single cache lookup. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.3 Comparing the Two Interfaces

jpayne@68: jpayne@68: jpayne@68:

The following discussion is perhaps a little bit colored. As said jpayne@68: above we implemented GNU gettext following the Uniforum jpayne@68: proposal and this surely has its reasons. But it should show how we jpayne@68: came to this decision. jpayne@68:

jpayne@68:

First we take a look at the developing process. When we write an jpayne@68: application using NLS provided by gettext we proceed as always. jpayne@68: Only when we come to a string which might be seen by the users and thus jpayne@68: has to be translated we use gettext("…") instead of jpayne@68: "…". At the beginning of each source file (or in a central jpayne@68: header file) we define jpayne@68:

jpayne@68:

#define gettext(String) (String)
jpayne@68:

jpayne@68: jpayne@68:

Even this definition can be avoided when the system supports the jpayne@68: gettext function in its C library. When we compile this code the jpayne@68: result is the same as if no NLS code is used. When you take a look at jpayne@68: the GNU gettext code you will see that we use _("…") jpayne@68: instead of gettext("…"). This reduces the number of jpayne@68: additional characters per translatable string to 3 (in words: jpayne@68: three). jpayne@68:

jpayne@68:

When now a production version of the program is needed we simply replace jpayne@68: the definition jpayne@68:

jpayne@68:

#define _(String) (String)
jpayne@68:

jpayne@68: jpayne@68:

by jpayne@68:

jpayne@68: jpayne@68:

#include <libintl.h>
jpayne@68: #define _(String) gettext (String)
jpayne@68:

jpayne@68: jpayne@68:

Additionally we run the program ‘xgettext’ on all source code file jpayne@68: which contain translatable strings and that's it: we have a running jpayne@68: program which does not depend on translations to be available, but which jpayne@68: can use any that becomes available. jpayne@68:

jpayne@68: jpayne@68:

The same procedure can be done for the gettext_noop invocations jpayne@68: (see section Special Cases of Translatable Strings). One usually defines gettext_noop as a jpayne@68: no-op macro. So you should consider the following code for your project: jpayne@68:

jpayne@68:

#define gettext_noop(String) String
jpayne@68: #define N_(String) gettext_noop (String)
jpayne@68:

jpayne@68: jpayne@68:

N_ is a short form similar to _. The ‘Makefile’ in jpayne@68: the ‘po/’ directory of GNU gettext knows by default both of the jpayne@68: mentioned short forms so you are invited to follow this proposal for jpayne@68: your own ease. jpayne@68:

jpayne@68:

Now to catgets. The main problem is the work for the jpayne@68: programmer. Every time he comes to a translatable string he has to jpayne@68: define a number (or a symbolic constant) which has also be defined in jpayne@68: the message catalog file. He also has to take care for duplicate jpayne@68: entries, duplicate message IDs etc. If he wants to have the same jpayne@68: quality in the message catalog as the GNU gettext program jpayne@68: provides he also has to put the descriptive comments for the strings and jpayne@68: the location in all source code files in the message catalog. This is jpayne@68: nearly a Mission: Impossible. jpayne@68:

jpayne@68:

But there are also some points people might call advantages speaking for jpayne@68: catgets. If you have a single word in a string and this string jpayne@68: is used in different contexts it is likely that in one or the other jpayne@68: language the word has different translations. Example: jpayne@68:

jpayne@68:

printf ("%s: %d", gettext ("number"), number_of_errors)
jpayne@68: 
jpayne@68: printf ("you should see %d %s", number_count,
jpayne@68:         number_count == 1 ? gettext ("number") : gettext ("numbers"))
jpayne@68:

jpayne@68: jpayne@68:

Here we have to translate two times the string "number". Even jpayne@68: if you do not speak a language beside English it might be possible to jpayne@68: recognize that the two words have a different meaning. In German the jpayne@68: first appearance has to be translated to "Anzahl" and the second jpayne@68: to "Zahl". jpayne@68:

jpayne@68:

Now you can say that this example is really esoteric. And you are jpayne@68: right! This is exactly how we felt about this problem and decide that jpayne@68: it does not weight that much. The solution for the above problem could jpayne@68: be very easy: jpayne@68:

jpayne@68:

printf ("%s %d", gettext ("number:"), number_of_errors)
jpayne@68: 
jpayne@68: printf (number_count == 1 ? gettext ("you should see %d number")
jpayne@68:                           : gettext ("you should see %d numbers"),
jpayne@68:         number_count)
jpayne@68:

jpayne@68: jpayne@68:

We believe that we can solve all conflicts with this method. If it is jpayne@68: difficult one can also consider changing one of the conflicting string a jpayne@68: little bit. But it is not impossible to overcome. jpayne@68:

jpayne@68:

catgets allows same original entry to have different translations, jpayne@68: but gettext has another, scalable approach for solving ambiguities jpayne@68: of this kind: See section Solving Ambiguities. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.4 Using libintl.a in own programs

jpayne@68: jpayne@68:

Starting with version 0.9.4 the library libintl.h should be jpayne@68: self-contained. I.e., you can use it in your own programs without jpayne@68: providing additional functions. The ‘Makefile’ will put the header jpayne@68: and the library in directories selected using the $(prefix). jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.5 Being a `gettext` grok

jpayne@68: jpayne@68:

NOTE: This documentation section is outdated and needs to be jpayne@68: revised. jpayne@68:

jpayne@68:

To fully exploit the functionality of the GNU gettext library it jpayne@68: is surely helpful to read the source code. But for those who don't want jpayne@68: to spend that much time in reading the (sometimes complicated) code here jpayne@68: is a list comments: jpayne@68:

jpayne@68:

Changing the language at runtime jpayne@68: jpayne@68: jpayne@68:

For interactive programs it might be useful to offer a selection of the jpayne@68: used language at runtime. To understand how to do this one need to know jpayne@68: how the used language is determined while executing the gettext jpayne@68: function. The method which is presented here only works correctly jpayne@68: with the GNU implementation of the gettext functions. jpayne@68:

jpayne@68:

In the function dcgettext at every call the current setting of jpayne@68: the highest priority environment variable is determined and used. jpayne@68: Highest priority means here the following list with decreasing jpayne@68: priority: jpayne@68:

jpayne@68:

jpayne@68:
LANGUAGE jpayne@68: jpayne@68:
LC_ALL jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68:
LC_xxx, according to selected locale category jpayne@68: jpayne@68:
LANG jpayne@68:

jpayne@68: jpayne@68:

Afterwards the path is constructed using the found value and the jpayne@68: translation file is loaded if available. jpayne@68:

jpayne@68:

What happens now when the value for, say, LANGUAGE changes? According jpayne@68: to the process explained above the new value of this variable is found jpayne@68: as soon as the dcgettext function is called. But this also means jpayne@68: the (perhaps) different message catalog file is loaded. In other jpayne@68: words: the used language is changed. jpayne@68:

jpayne@68:

But there is one little hook. The code for gcc-2.7.0 and up provides jpayne@68: some optimization. This optimization normally prevents the calling of jpayne@68: the dcgettext function as long as no new catalog is loaded. But jpayne@68: if dcgettext is not called the program also cannot find the jpayne@68: LANGUAGE variable be changed (see section Optimization of the *gettext functions). A jpayne@68: solution for this is very easy. Include the following code in the jpayne@68: language switching function. jpayne@68:

jpayne@68:

  /* Change language.  */
jpayne@68:   setenv ("LANGUAGE", "fr", 1);
jpayne@68: 
jpayne@68:   /* Make change known.  */
jpayne@68:   {
jpayne@68:     extern int  _nl_msg_cat_cntr;
jpayne@68:     ++_nl_msg_cat_cntr;
jpayne@68:   }
jpayne@68:

jpayne@68: jpayne@68: jpayne@68:

The variable _nl_msg_cat_cntr is defined in ‘loadmsgcat.c’. jpayne@68: You don't need to know what this is for. But it can be used to detect jpayne@68: whether a gettext implementation is GNU gettext and not non-GNU jpayne@68: system's native gettext implementation. jpayne@68:

jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.6 Temporary Notes for the Programmers Chapter

jpayne@68: jpayne@68:

NOTE: This documentation section is outdated and needs to be jpayne@68: revised. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.6.1 Temporary - Two Possible Implementations

jpayne@68: jpayne@68:

There are two competing methods for language independent messages: jpayne@68: the X/Open catgets method, and the Uniforum gettext jpayne@68: method. The catgets method indexes messages by integers; the jpayne@68: gettext method indexes them by their English translations. jpayne@68: The catgets method has been around longer and is supported jpayne@68: by more vendors. The gettext method is supported by Sun, jpayne@68: and it has been heard that the COSE multi-vendor initiative is jpayne@68: supporting it. Neither method is a POSIX standard; the POSIX.1 jpayne@68: committee had a lot of disagreement in this area. jpayne@68:

jpayne@68:

Neither one is in the POSIX standard. There was much disagreement jpayne@68: in the POSIX.1 committee about using the gettext routines jpayne@68: vs. catgets (XPG). In the end the committee couldn't jpayne@68: agree on anything, so no messaging system was included as part jpayne@68: of the standard. I believe the informative annex of the standard jpayne@68: includes the XPG3 messaging interfaces, “…as an example of jpayne@68: a messaging system that has been implemented…” jpayne@68:

jpayne@68:

They were very careful not to say anywhere that you should use one jpayne@68: set of interfaces over the other. For more on this topic please jpayne@68: see the Programming for Internationalization FAQ. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.6.2 Temporary - About `catgets`

jpayne@68: jpayne@68:

There have been a few discussions of late on the use of jpayne@68: catgets as a base. I think it important to present both jpayne@68: sides of the argument and hence am opting to play devil's advocate jpayne@68: for a little bit. jpayne@68:

jpayne@68:

I'll not deny the fact that catgets could have been designed jpayne@68: a lot better. It currently has quite a number of limitations and jpayne@68: these have already been pointed out. jpayne@68:

jpayne@68:

However there is a great deal to be said for consistency and jpayne@68: standardization. A common recurring problem when writing Unix jpayne@68: software is the myriad portability problems across Unix platforms. jpayne@68: It seems as if every Unix vendor had a look at the operating system jpayne@68: and found parts they could improve upon. Undoubtedly, these jpayne@68: modifications are probably innovative and solve real problems. jpayne@68: However, software developers have a hard time keeping up with all jpayne@68: these changes across so many platforms. jpayne@68:

jpayne@68:

And this has prompted the Unix vendors to begin to standardize their jpayne@68: systems. Hence the impetus for Spec1170. Every major Unix vendor jpayne@68: has committed to supporting this standard and every Unix software jpayne@68: developer waits with glee the day they can write software to this jpayne@68: standard and simply recompile (without having to use autoconf) jpayne@68: across different platforms. jpayne@68:

jpayne@68:

As I understand it, Spec1170 is roughly based upon version 4 of the jpayne@68: X/Open Portability Guidelines (XPG4). Because catgets and jpayne@68: friends are defined in XPG4, I'm led to believe that catgets jpayne@68: is a part of Spec1170 and hence will become a standardized component jpayne@68: of all Unix systems. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.6.3 Temporary - Why a single implementation

jpayne@68: jpayne@68:

Now it seems kind of wasteful to me to have two different systems jpayne@68: installed for accessing message catalogs. If we do want to remedy jpayne@68: catgets deficiencies why don't we try to expand catgets jpayne@68: (in a compatible manner) rather than implement an entirely new system. jpayne@68: Otherwise, we'll end up with two message catalog access systems installed jpayne@68: with an operating system - one set of routines for packages using GNU jpayne@68: gettext for their internationalization, and another set of routines jpayne@68: (catgets) for all other software. Bloated? jpayne@68:

jpayne@68:

Supposing another catalog access system is implemented. Which do jpayne@68: we recommend? At least for Linux, we need to attract as many jpayne@68: software developers as possible. Hence we need to make it as easy jpayne@68: for them to port their software as possible. Which means supporting jpayne@68: catgets. We will be implementing the libintl code jpayne@68: within our libc, but does this mean we also have to incorporate jpayne@68: another message catalog access scheme within our libc as well? jpayne@68: And what about people who are going to be using the libintl jpayne@68: + non-catgets routines. When they port their software to jpayne@68: other platforms, they're now going to have to include the front-end jpayne@68: (libintl) code plus the back-end code (the non-catgets jpayne@68: access routines) with their software instead of just including the jpayne@68: libintl code with their software. jpayne@68:

jpayne@68:

Message catalog support is however only the tip of the iceberg. jpayne@68: What about the data for the other locale categories? They also have jpayne@68: a number of deficiencies. Are we going to abandon them as well and jpayne@68: develop another duplicate set of routines (should libintl jpayne@68: expand beyond message catalog support)? jpayne@68:

jpayne@68:

Like many parts of Unix that can be improved upon, we're stuck with balancing jpayne@68: compatibility with the past with useful improvements and innovations for jpayne@68: the future. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68:

11.6.4 Temporary - Notes

jpayne@68: jpayne@68:

X/Open agreed very late on the standard form so that many jpayne@68: implementations differ from the final form. Both of my system (old jpayne@68: Linux catgets and Ultrix-4) have a strange variation. jpayne@68:

jpayne@68:

OK. After incorporating the last changes I have to spend some time on jpayne@68: making the GNU/Linux libc gettext functions. So in future jpayne@68: Solaris is not the only system having gettext. jpayne@68:

jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68:

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

jpayne@68:

jpayne@68: jpayne@68: This document was generated by Bruno Haible on February, 21 2024 using texi2html 1.78a. jpayne@68: jpayne@68:
jpayne@68: jpayne@68:

jpayne@68: jpayne@68: