Mercurial > repos > rliterman > csp2
diff CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/doc/gettext/gettext_11.html @ 68:5028fdace37b
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author | jpayne |
---|---|
date | Tue, 18 Mar 2025 16:23:26 -0400 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/doc/gettext/gettext_11.html Tue Mar 18 16:23:26 2025 -0400 @@ -0,0 +1,1412 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html401/loose.dtd"> +<html> +<!-- Created on February, 21 2024 by texi2html 1.78a --> +<!-- +Written by: Lionel Cons <Lionel.Cons@cern.ch> (original author) + Karl Berry <karl@freefriends.org> + Olaf Bachmann <obachman@mathematik.uni-kl.de> + and many others. +Maintained by: Many creative people. +Send bugs and suggestions to <texi2html-bug@nongnu.org> + +--> +<head> +<title>GNU gettext utilities: 11. The Programmer's View</title> + +<meta name="description" content="GNU gettext utilities: 11. The Programmer's View"> +<meta name="keywords" content="GNU gettext utilities: 11. The Programmer's View"> +<meta name="resource-type" content="document"> +<meta name="distribution" content="global"> +<meta name="Generator" content="texi2html 1.78a"> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<style type="text/css"> +<!-- +a.summary-letter {text-decoration: none} +pre.display {font-family: serif} +pre.format {font-family: serif} +pre.menu-comment {font-family: serif} +pre.menu-preformatted {font-family: serif} +pre.smalldisplay {font-family: serif; font-size: smaller} +pre.smallexample {font-size: smaller} +pre.smallformat {font-family: serif; font-size: smaller} +pre.smalllisp {font-size: smaller} +span.roman {font-family:serif; font-weight:normal;} +span.sansserif {font-family:sans-serif; font-weight:normal;} +ul.toc {list-style: none} +--> +</style> + + +</head> + +<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> + +<table cellpadding="1" cellspacing="1" border="0"> +<tr><td valign="middle" align="left">[<a href="gettext_10.html#SEC173" title="Beginning of this chapter or previous chapter"> << </a>]</td> +<td valign="middle" align="left">[<a href="gettext_12.html#SEC217" title="Next chapter"> >> </a>]</td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left">[<a href="gettext_toc.html#SEC_Top" title="Cover (top) of document">Top</a>]</td> +<td valign="middle" align="left">[<a href="gettext_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td> +<td valign="middle" align="left">[<a href="gettext_21.html#SEC389" title="Index">Index</a>]</td> +<td valign="middle" align="left">[<a href="gettext_abt.html#SEC_About" title="About (help)"> ? </a>]</td> +</tr></table> + +<hr size="2"> +<a name="Programmers"></a> +<a name="SEC197"></a> +<h1 class="chapter"> <a href="gettext_toc.html#TOC190">11. The Programmer's View</a> </h1> + + +<p>One aim of the current message catalog implementation provided by +GNU <code>gettext</code> was to use the system's message catalog handling, if the +installer wishes to do so. So we perhaps should first take a look at +the solutions we know about. The people in the POSIX committee did not +manage to agree on one of the semi-official standards which we'll +describe below. In fact they couldn't agree on anything, so they decided +only to include an example of an interface. The major Unix vendors +are split in the usage of the two most important specifications: X/Open's +catgets vs. Uniforum's gettext interface. We'll describe them both and +later explain our solution of this dilemma. +</p> + + +<a name="catgets"></a> +<a name="SEC198"></a> +<h2 class="section"> <a href="gettext_toc.html#TOC191">11.1 About <code>catgets</code></a> </h2> + +<p>The <code>catgets</code> implementation is defined in the X/Open Portability +Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the +process of creating this standard seemed to be too slow for some of +the Unix vendors so they created their implementations on preliminary +versions of the standard. Of course this leads again to problems while +writing platform independent programs: even the usage of <code>catgets</code> +does not guarantee a unique interface. +</p> +<p>Another, personal comment on this that only a bunch of committee members +could have made this interface. They never really tried to program +using this interface. It is a fast, memory-saving implementation, an +user can happily live with it. But programmers hate it (at least I and +some others do…) +</p> +<p>But we must not forget one point: after all the trouble with transferring +the rights on Unix they at last came to X/Open, the very same who +published this specification. This leads me to making the prediction +that this interface will be in future Unix standards (e.g. Spec1170) and +therefore part of all Unix implementation (implementations, which are +<em>allowed</em> to wear this name). +</p> + + +<a name="Interface-to-catgets"></a> +<a name="SEC199"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC192">11.1.1 The Interface</a> </h3> + +<p>The interface to the <code>catgets</code> implementation consists of three +functions which correspond to those used in file access: <code>catopen</code> +to open the catalog for using, <code>catgets</code> for accessing the message +tables, and <code>catclose</code> for closing after work is done. Prototypes +for the functions and the needed definitions are in the +<code><nl_types.h></code> header file. +</p> +<a name="IDX1059"></a> +<p><code>catopen</code> is used like in this: +</p> +<table><tr><td> </td><td><pre class="example">nl_catd catd = catopen ("catalog_name", 0); +</pre></td></tr></table> + +<p>The function takes as the argument the name of the catalog. This usual +refers to the name of the program or the package. The second parameter +is not further specified in the standard. I don't even know whether it +is implemented consistently among various systems. So the common advice +is to use <code>0</code> as the value. The return value is a handle to the +message catalog, equivalent to handles to file returned by <code>open</code>. +</p> +<a name="IDX1060"></a> +<p>This handle is of course used in the <code>catgets</code> function which can +be used like this: +</p> +<table><tr><td> </td><td><pre class="example">char *translation = catgets (catd, set_no, msg_id, "original string"); +</pre></td></tr></table> + +<p>The first parameter is this catalog descriptor. The second parameter +specifies the set of messages in this catalog, in which the message +described by <code>msg_id</code> is obtained. <code>catgets</code> therefore uses a +three-stage addressing: +</p> +<table><tr><td> </td><td><pre class="display">catalog name ⇒ set number ⇒ message ID ⇒ translation +</pre></td></tr></table> + + +<p>The fourth argument is not used to address the translation. It is given +as a default value in case when one of the addressing stages fail. One +important thing to remember is that although the return type of catgets +is <code>char *</code> the resulting string <em>must not</em> be changed. It +should better be <code>const char *</code>, but the standard is published in +1988, one year before ANSI C. +</p> +<a name="IDX1061"></a> +<p>The last of these functions is used and behaves as expected: +</p> +<table><tr><td> </td><td><pre class="example">catclose (catd); +</pre></td></tr></table> + +<p>After this no <code>catgets</code> call using the descriptor is legal anymore. +</p> + +<a name="Problems-with-catgets"></a> +<a name="SEC200"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC193">11.1.2 Problems with the <code>catgets</code> Interface?!</a> </h3> + +<p>Now that this description seemed to be really easy — where are the +problems we speak of? In fact the interface could be used in a +reasonable way, but constructing the message catalogs is a pain. The +reason for this lies in the third argument of <code>catgets</code>: the unique +message ID. This has to be a numeric value for all messages in a single +set. Perhaps you could imagine the problems keeping such a list while +changing the source code. Add a new message here, remove one there. Of +course there have been developed a lot of tools helping to organize this +chaos but one as the other fails in one aspect or the other. We don't +want to say that the other approach has no problems but they are far +more easy to manage. +</p> + +<a name="gettext"></a> +<a name="SEC201"></a> +<h2 class="section"> <a href="gettext_toc.html#TOC194">11.2 About <code>gettext</code></a> </h2> + +<p>The definition of the <code>gettext</code> interface comes from a Uniforum +proposal. It was submitted there by Sun, who had implemented the +<code>gettext</code> function in SunOS 4, around 1990. Nowadays, the +<code>gettext</code> interface is specified by the OpenI18N standard. +</p> +<p>The main point about this solution is that it does not follow the +method of normal file handling (open-use-close) and that it does not +burden the programmer with so many tasks, especially the unique key handling. +Of course here also a unique key is needed, but this key is the message +itself (how long or short it is). See <a href="#SEC209">Comparing the Two Interfaces</a> for a more +detailed comparison of the two methods. +</p> +<p>The following section contains a rather detailed description of the +interface. We make it that detailed because this is the interface +we chose for the GNU <code>gettext</code> Library. Programmers interested +in using this library will be interested in this description. +</p> + + +<a name="Interface-to-gettext"></a> +<a name="SEC202"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC195">11.2.1 The Interface</a> </h3> + +<p>The minimal functionality an interface must have is a) to select a +domain the strings are coming from (a single domain for all programs is +not reasonable because its construction and maintenance is difficult, +perhaps impossible) and b) to access a string in a selected domain. +</p> +<p>This is principally the description of the <code>gettext</code> interface. It +has a global domain which unqualified usages reference. Of course this +domain is selectable by the user. +</p> +<table><tr><td> </td><td><pre class="example">char *textdomain (const char *domain_name); +</pre></td></tr></table> + +<p>This provides the possibility to change or query the current status of +the current global domain of the <code>LC_MESSAGE</code> category. The +argument is a null-terminated string, whose characters must be legal in +the use in filenames. If the <var>domain_name</var> argument is <code>NULL</code>, +the function returns the current value. If no value has been set +before, the name of the default domain is returned: <em>messages</em>. +Please note that although the return value of <code>textdomain</code> is of +type <code>char *</code> no changing is allowed. It is also important to know +that no checks of the availability are made. If the name is not +available you will see this by the fact that no translations are provided. +</p> +<p>To use a domain set by <code>textdomain</code> the function +</p> +<table><tr><td> </td><td><pre class="example">char *gettext (const char *msgid); +</pre></td></tr></table> + +<p>is to be used. This is the simplest reasonable form one can imagine. +The translation of the string <var>msgid</var> is returned if it is available +in the current domain. If it is not available, the argument itself is +returned. If the argument is <code>NULL</code> the result is undefined. +</p> +<p>One thing which should come into mind is that no explicit dependency to +the used domain is given. The current value of the domain is used. +If this changes between two +executions of the same <code>gettext</code> call in the program, both calls +reference a different message catalog. +</p> +<p>For the easiest case, which is normally used in internationalized +packages, once at the beginning of execution a call to <code>textdomain</code> +is issued, setting the domain to a unique name, normally the package +name. In the following code all strings which have to be translated are +filtered through the gettext function. That's all, the package speaks +your language. +</p> + +<a name="Ambiguities"></a> +<a name="SEC203"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC196">11.2.2 Solving Ambiguities</a> </h3> + +<p>While this single name domain works well for most applications there +might be the need to get translations from more than one domain. Of +course one could switch between different domains with calls to +<code>textdomain</code>, but this is really not convenient nor is it fast. A +possible situation could be one case subject to discussion during this +writing: all +error messages of functions in the set of common used functions should +go into a separate domain <code>error</code>. By this mean we would only need +to translate them once. +Another case are messages from a library, as these <em>have</em> to be +independent of the current domain set by the application. +</p> +<p>For this reasons there are two more functions to retrieve strings: +</p> +<table><tr><td> </td><td><pre class="example">char *dgettext (const char *domain_name, const char *msgid); +char *dcgettext (const char *domain_name, const char *msgid, + int category); +</pre></td></tr></table> + +<p>Both take an additional argument at the first place, which corresponds +to the argument of <code>textdomain</code>. The third argument of +<code>dcgettext</code> allows to use another locale category but <code>LC_MESSAGES</code>. +But I really don't know where this can be useful. If the +<var>domain_name</var> is <code>NULL</code> or <var>category</var> has an value beside +the known ones, the result is undefined. It should also be noted that +this function is not part of the second known implementation of this +function family, the one found in Solaris. +</p> +<p>A second ambiguity can arise by the fact, that perhaps more than one +domain has the same name. This can be solved by specifying where the +needed message catalog files can be found. +</p> +<table><tr><td> </td><td><pre class="example">char *bindtextdomain (const char *domain_name, + const char *dir_name); +</pre></td></tr></table> + +<p>Calling this function binds the given domain to a file in the specified +directory (how this file is determined follows below). Especially a +file in the systems default place is not favored against the specified +file anymore (as it would be by solely using <code>textdomain</code>). A +<code>NULL</code> pointer for the <var>dir_name</var> parameter returns the binding +associated with <var>domain_name</var>. If <var>domain_name</var> itself is +<code>NULL</code> nothing happens and a <code>NULL</code> pointer is returned. Here +again as for all the other functions is true that none of the return +value must be changed! +</p> +<p>It is important to remember that relative path names for the +<var>dir_name</var> parameter can be trouble. Since the path is always +computed relative to the current directory different results will be +achieved when the program executes a <code>chdir</code> command. Relative +paths should always be avoided to avoid dependencies and +unreliabilities. +</p> +<table><tr><td> </td><td><pre class="example">wchar_t *wbindtextdomain (const char *domain_name, + const wchar_t *dir_name); +</pre></td></tr></table> + +<p>This function is provided only on native Windows platforms. It is like +<code>bindtextdomain</code>, except that the <var>dir_name</var> parameter is a +wide string (in UTF-16 encoding, as usual on Windows). +</p> + +<a name="Locating-Catalogs"></a> +<a name="SEC204"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC197">11.2.3 Locating Message Catalog Files</a> </h3> + +<p>Because many different languages for many different packages have to be +stored we need some way to add these information to file message catalog +files. The way usually used in Unix environments is have this encoding +in the file name. This is also done here. The directory name given in +<code>bindtextdomain</code>s second argument (or the default directory), +followed by the name of the locale, the locale category, and the domain name +are concatenated: +</p> +<table><tr><td> </td><td><pre class="example"><var>dir_name</var>/<var>locale</var>/LC_<var>category</var>/<var>domain_name</var>.mo +</pre></td></tr></table> + +<p>The default value for <var>dir_name</var> is system specific. For the GNU +library, and for packages adhering to its conventions, it's: +</p><table><tr><td> </td><td><pre class="example">/usr/local/share/locale +</pre></td></tr></table> + +<p><var>locale</var> is the name of the locale category which is designated by +<code>LC_<var>category</var></code>. For <code>gettext</code> and <code>dgettext</code> this +<code>LC_<var>category</var></code> is always <code>LC_MESSAGES</code>.<a name="DOCF3" href="gettext_fot.html#FOOT3">(3)</a> +The name of the locale category is determined through +<code>setlocale (LC_<var>category</var>, NULL)</code>. +<a name="DOCF4" href="gettext_fot.html#FOOT4">(4)</a> +When using the function <code>dcgettext</code>, you can specify the locale category +through the third argument. +</p> + +<a name="Charset-conversion"></a> +<a name="SEC205"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC198">11.2.4 How to specify the output character set <code>gettext</code> uses</a> </h3> + +<p><code>gettext</code> not only looks up a translation in a message catalog. It +also converts the translation on the fly to the desired output character +set. This is useful if the user is working in a different character set +than the translator who created the message catalog, because it avoids +distributing variants of message catalogs which differ only in the +character set. +</p> +<p>The output character set is, by default, the value of <code>nl_langinfo +(CODESET)</code>, which depends on the <code>LC_CTYPE</code> part of the current +locale. But programs which store strings in a locale independent way +(e.g. UTF-8) can request that <code>gettext</code> and related functions +return the translations in that encoding, by use of the +<code>bind_textdomain_codeset</code> function. +</p> +<p>Note that the <var>msgid</var> argument to <code>gettext</code> is not subject to +character set conversion. Also, when <code>gettext</code> does not find a +translation for <var>msgid</var>, it returns <var>msgid</var> unchanged – +independently of the current output character set. It is therefore +recommended that all <var>msgid</var>s be US-ASCII strings. +</p> +<dl> +<dt><u>Function:</u> char * <b>bind_textdomain_codeset</b><i> (const char *<var>domainname</var>, const char *<var>codeset</var>)</i> +<a name="IDX1062"></a> +</dt> +<dd><p>The <code>bind_textdomain_codeset</code> function can be used to specify the +output character set for message catalogs for domain <var>domainname</var>. +The <var>codeset</var> argument must be a valid codeset name which can be used +for the <code>iconv_open</code> function, or a null pointer. +</p> +<p>If the <var>codeset</var> parameter is the null pointer, +<code>bind_textdomain_codeset</code> returns the currently selected codeset +for the domain with the name <var>domainname</var>. It returns <code>NULL</code> if +no codeset has yet been selected. +</p> +<p>The <code>bind_textdomain_codeset</code> function can be used several times. +If used multiple times with the same <var>domainname</var> argument, the +later call overrides the settings made by the earlier one. +</p> +<p>The <code>bind_textdomain_codeset</code> function returns a pointer to a +string containing the name of the selected codeset. The string is +allocated internally in the function and must not be changed by the +user. If the system went out of core during the execution of +<code>bind_textdomain_codeset</code>, the return value is <code>NULL</code> and the +global variable <var>errno</var> is set accordingly. +</p></dd></dl> + + +<a name="Contexts"></a> +<a name="SEC206"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC199">11.2.5 Using contexts for solving ambiguities</a> </h3> + +<p>One place where the <code>gettext</code> functions, if used normally, have big +problems is within programs with graphical user interfaces (GUIs). The +problem is that many of the strings which have to be translated are very +short. They have to appear in pull-down menus which restricts the +length. But strings which are not containing entire sentences or at +least large fragments of a sentence may appear in more than one +situation in the program but might have different translations. This is +especially true for the one-word strings which are frequently used in +GUI programs. +</p> +<p>As a consequence many people say that the <code>gettext</code> approach is +wrong and instead <code>catgets</code> should be used which indeed does not +have this problem. But there is a very simple and powerful method to +handle this kind of problems with the <code>gettext</code> functions. +</p> +<p>Contexts can be added to strings to be translated. A context dependent +translation lookup is when a translation for a given string is searched, +that is limited to a given context. The translation for the same string +in a different context can be different. The different translations of +the same string in different contexts can be stored in the in the same +MO file, and can be edited by the translator in the same PO file. +</p> +<p>The ‘<tt>gettext.h</tt>’ include file contains the lookup macros for strings +with contexts. They are implemented as thin macros and inline functions +over the functions from <code><libintl.h></code>. +</p> +<a name="IDX1063"></a> +<table><tr><td> </td><td><pre class="example">const char *pgettext (const char *msgctxt, const char *msgid); +</pre></td></tr></table> + +<p>In a call of this macro, <var>msgctxt</var> and <var>msgid</var> must be string +literals. The macro returns the translation of <var>msgid</var>, restricted +to the context given by <var>msgctxt</var>. +</p> +<p>The <var>msgctxt</var> string is visible in the PO file to the translator. +You should try to make it somehow canonical and never changing. Because +every time you change an <var>msgctxt</var>, the translator will have to review +the translation of <var>msgid</var>. +</p> +<p>Finding a canonical <var>msgctxt</var> string that doesn't change over time can +be hard. But you shouldn't use the file name or class name containing the +<code>pgettext</code> call – because it is a common development task to rename +a file or a class, and it shouldn't cause translator work. Also you shouldn't +use a comment in the form of a complete English sentence as <var>msgctxt</var> – +because orthography or grammar changes are often applied to such sentences, +and again, it shouldn't force the translator to do a review. +</p> +<p>The ‘<samp>p</samp>’ in ‘<samp>pgettext</samp>’ stands for “particular”: <code>pgettext</code> +fetches a particular translation of the <var>msgid</var>. +</p> +<a name="IDX1064"></a> +<a name="IDX1065"></a> +<table><tr><td> </td><td><pre class="example">const char *dpgettext (const char *domain_name, + const char *msgctxt, const char *msgid); +const char *dcpgettext (const char *domain_name, + const char *msgctxt, const char *msgid, + int category); +</pre></td></tr></table> + +<p>These are generalizations of <code>pgettext</code>. They behave similarly to +<code>dgettext</code> and <code>dcgettext</code>, respectively. The <var>domain_name</var> +argument defines the translation domain. The <var>category</var> argument +allows to use another locale category than <code>LC_MESSAGES</code>. +</p> +<p>As as example consider the following fictional situation. A GUI program +has a menu bar with the following entries: +</p> +<table><tr><td> </td><td><pre class="smallexample">+------------+------------+--------------------------------------+ +| File | Printer | | ++------------+------------+--------------------------------------+ +| Open | | Select | +| New | | Open | ++----------+ | Connect | + +----------+ +</pre></td></tr></table> + +<p>To have the strings <code>File</code>, <code>Printer</code>, <code>Open</code>, +<code>New</code>, <code>Select</code>, and <code>Connect</code> translated there has to be +at some point in the code a call to a function of the <code>gettext</code> +family. But in two places the string passed into the function would be +<code>Open</code>. The translations might not be the same and therefore we +are in the dilemma described above. +</p> +<p>What distinguishes the two places is the menu path from the menu root to +the particular menu entries: +</p> +<table><tr><td> </td><td><pre class="smallexample">Menu|File +Menu|Printer +Menu|File|Open +Menu|File|New +Menu|Printer|Select +Menu|Printer|Open +Menu|Printer|Connect +</pre></td></tr></table> + +<p>The context is thus the menu path without its last part. So, the calls +look like this: +</p> +<table><tr><td> </td><td><pre class="smallexample">pgettext ("Menu|", "File") +pgettext ("Menu|", "Printer") +pgettext ("Menu|File|", "Open") +pgettext ("Menu|File|", "New") +pgettext ("Menu|Printer|", "Select") +pgettext ("Menu|Printer|", "Open") +pgettext ("Menu|Printer|", "Connect") +</pre></td></tr></table> + +<p>Whether or not to use the ‘<samp>|</samp>’ character at the end of the context is a +matter of style. +</p> +<p>For more complex cases, where the <var>msgctxt</var> or <var>msgid</var> are not +string literals, more general macros are available: +</p> +<a name="IDX1066"></a> +<a name="IDX1067"></a> +<a name="IDX1068"></a> +<table><tr><td> </td><td><pre class="example">const char *pgettext_expr (const char *msgctxt, const char *msgid); +const char *dpgettext_expr (const char *domain_name, + const char *msgctxt, const char *msgid); +const char *dcpgettext_expr (const char *domain_name, + const char *msgctxt, const char *msgid, + int category); +</pre></td></tr></table> + +<p>Here <var>msgctxt</var> and <var>msgid</var> can be arbitrary string-valued expressions. +These macros are more general. But in the case that both argument expressions +are string literals, the macros without the ‘<samp>_expr</samp>’ suffix are more +efficient. +</p> + +<a name="Plural-forms"></a> +<a name="SEC207"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC200">11.2.6 Additional functions for plural forms</a> </h3> + +<p>The functions of the <code>gettext</code> family described so far (and all the +<code>catgets</code> functions as well) have one problem in the real world +which have been neglected completely in all existing approaches. What +is meant here is the handling of plural forms. +</p> +<p>Looking through Unix source code before the time anybody thought about +internationalization (and, sadly, even afterwards) one can often find +code similar to the following: +</p> +<table><tr><td> </td><td><pre class="smallexample"> printf ("%d file%s deleted", n, n == 1 ? "" : "s"); +</pre></td></tr></table> + +<p>After the first complaints from people internationalizing the code people +either completely avoided formulations like this or used strings like +<code>"file(s)"</code>. Both look unnatural and should be avoided. First +tries to solve the problem correctly looked like this: +</p> +<table><tr><td> </td><td><pre class="smallexample"> if (n == 1) + printf ("%d file deleted", n); + else + printf ("%d files deleted", n); +</pre></td></tr></table> + +<p>But this does not solve the problem. It helps languages where the +plural form of a noun is not simply constructed by adding an +‘s’ +but that is all. Once again people fell into the trap of believing the +rules their language is using are universal. But the handling of plural +forms differs widely between the language families. For example, +Rafal Maszkowski <code><rzm@mat.uni.torun.pl></code> reports: +</p> +<blockquote><p>In Polish we use e.g. plik (file) this way: +</p><table><tr><td> </td><td><pre class="example">1 plik +2,3,4 pliki +5-21 pliko'w +22-24 pliki +25-31 pliko'w +</pre></td></tr></table> +<p>and so on (o' means 8859-2 oacute which should be rather okreska, +similar to aogonek). +</p></blockquote> + +<p>There are two things which can differ between languages (and even inside +language families); +</p> +<ul> +<li> +The form how plural forms are built differs. This is a problem with +languages which have many irregularities. German, for instance, is a +drastic case. Though English and German are part of the same language +family (Germanic), the almost regular forming of plural noun forms +(appending an +‘s’) +is hardly found in German. + +</li><li> +The number of plural forms differ. This is somewhat surprising for +those who only have experiences with Romanic and Germanic languages +since here the number is the same (there are two). + +<p>But other language families have only one form or many forms. More +information on this in an extra section. +</p></li></ul> + +<p>The consequence of this is that application writers should not try to +solve the problem in their code. This would be localization since it is +only usable for certain, hardcoded language environments. Instead the +extended <code>gettext</code> interface should be used. +</p> +<p>These extra functions are taking instead of the one key string two +strings and a numerical argument. The idea behind this is that using +the numerical argument and the first string as a key, the implementation +can select using rules specified by the translator the right plural +form. The two string arguments then will be used to provide a return +value in case no message catalog is found (similar to the normal +<code>gettext</code> behavior). In this case the rules for Germanic language +is used and it is assumed that the first string argument is the singular +form, the second the plural form. +</p> +<p>This has the consequence that programs without language catalogs can +display the correct strings only if the program itself is written using +a Germanic language. This is a limitation but since the GNU C library +(as well as the GNU <code>gettext</code> package) are written as part of the +GNU package and the coding standards for the GNU project require program +being written in English, this solution nevertheless fulfills its +purpose. +</p> +<dl> +<dt><u>Function:</u> char * <b>ngettext</b><i> (const char *<var>msgid1</var>, const char *<var>msgid2</var>, unsigned long int <var>n</var>)</i> +<a name="IDX1069"></a> +</dt> +<dd><p>The <code>ngettext</code> function is similar to the <code>gettext</code> function +as it finds the message catalogs in the same way. But it takes two +extra arguments. The <var>msgid1</var> parameter must contain the singular +form of the string to be converted. It is also used as the key for the +search in the catalog. The <var>msgid2</var> parameter is the plural form. +The parameter <var>n</var> is used to determine the plural form. If no +message catalog is found <var>msgid1</var> is returned if <code>n == 1</code>, +otherwise <code>msgid2</code>. +</p> +<p>An example for the use of this function is: +</p> +<table><tr><td> </td><td><pre class="smallexample">printf (ngettext ("%d file removed", "%d files removed", n), n); +</pre></td></tr></table> + +<p>Please note that the numeric value <var>n</var> has to be passed to the +<code>printf</code> function as well. It is not sufficient to pass it only to +<code>ngettext</code>. +</p> +<p>In the English singular case, the number – always 1 – can be replaced with +"one": +</p> +<table><tr><td> </td><td><pre class="smallexample">printf (ngettext ("One file removed", "%d files removed", n), n); +</pre></td></tr></table> + +<p>This works because the ‘<samp>printf</samp>’ function discards excess arguments that +are not consumed by the format string. +</p> +<p>If this function is meant to yield a format string that takes two or more +arguments, you can not use it like this: +</p> +<table><tr><td> </td><td><pre class="smallexample">printf (ngettext ("%d file removed from directory %s", + "%d files removed from directory %s", + n), + n, dir); +</pre></td></tr></table> + +<p>because in many languages the translators want to replace the ‘<samp>%d</samp>’ +with an explicit word in the singular case, just like “one” in English, +and C format strings cannot consume the second argument but skip the first +argument. Instead, you have to reorder the arguments so that ‘<samp>n</samp>’ +comes last: +</p> +<table><tr><td> </td><td><pre class="smallexample">printf (ngettext ("%2$d file removed from directory %1$s", + "%2$d files removed from directory %1$s", + n), + dir, n); +</pre></td></tr></table> + +<p>See <a href="gettext_15.html#SEC267">C Format Strings</a> for details about this argument reordering syntax. +</p> +<p>When you know that the value of <code>n</code> is within a given range, you can +specify it as a comment directed to the <code>xgettext</code> tool. This +information may help translators to use more adequate translations. Like +this: +</p> +<table><tr><td> </td><td><pre class="smallexample">if (days > 7 && days < 14) + /* xgettext: range: 1..6 */ + printf (ngettext ("one week and one day", "one week and %d days", + days - 7), + days - 7); +</pre></td></tr></table> + +<p>It is also possible to use this function when the strings don't contain a +cardinal number: +</p> +<table><tr><td> </td><td><pre class="smallexample">puts (ngettext ("Delete the selected file?", + "Delete the selected files?", + n)); +</pre></td></tr></table> + +<p>In this case the number <var>n</var> is only used to choose the plural form. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> char * <b>dngettext</b><i> (const char *<var>domain</var>, const char *<var>msgid1</var>, const char *<var>msgid2</var>, unsigned long int <var>n</var>)</i> +<a name="IDX1070"></a> +</dt> +<dd><p>The <code>dngettext</code> is similar to the <code>dgettext</code> function in the +way the message catalog is selected. The difference is that it takes +two extra parameter to provide the correct plural form. These two +parameters are handled in the same way <code>ngettext</code> handles them. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> char * <b>dcngettext</b><i> (const char *<var>domain</var>, const char *<var>msgid1</var>, const char *<var>msgid2</var>, unsigned long int <var>n</var>, int <var>category</var>)</i> +<a name="IDX1071"></a> +</dt> +<dd><p>The <code>dcngettext</code> is similar to the <code>dcgettext</code> function in the +way the message catalog is selected. The difference is that it takes +two extra parameter to provide the correct plural form. These two +parameters are handled in the same way <code>ngettext</code> handles them. +</p></dd></dl> + +<p>Now, how do these functions solve the problem of the plural forms? +Without the input of linguists (which was not available) it was not +possible to determine whether there are only a few different forms in +which plural forms are formed or whether the number can increase with +every new supported language. +</p> +<p>Therefore the solution implemented is to allow the translator to specify +the rules of how to select the plural form. Since the formula varies +with every language this is the only viable solution except for +hardcoding the information in the code (which still would require the +possibility of extensions to not prevent the use of new languages). +</p> +<a name="IDX1072"></a> +<a name="IDX1073"></a> +<a name="IDX1074"></a> +<p>The information about the plural form selection has to be stored in the +header entry of the PO file (the one with the empty <code>msgid</code> string). +The plural form information looks like this: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; +</pre></td></tr></table> + +<p>The <code>nplurals</code> value must be a decimal number which specifies how +many different plural forms exist for this language. The string +following <code>plural</code> is an expression which is using the C language +syntax. Exceptions are that no negative numbers are allowed, numbers +must be decimal, and the only variable allowed is <code>n</code>. Spaces are +allowed in the expression, but backslash-newlines are not; in the +examples below the backslash-newlines are present for formatting purposes +only. This expression will be evaluated whenever one of the functions +<code>ngettext</code>, <code>dngettext</code>, or <code>dcngettext</code> is called. The +numeric value passed to these functions is then substituted for all uses +of the variable <code>n</code> in the expression. The resulting value then +must be greater or equal to zero and smaller than the value given as the +value of <code>nplurals</code>. +</p> +<a name="IDX1075"></a> +<p>The following rules are known at this point. The language with families +are listed. But this does not necessarily mean the information can be +generalized for the whole family (as can be easily seen in the table +below).<a name="DOCF5" href="gettext_fot.html#FOOT5">(5)</a> +</p> +<dl compact="compact"> +<dt> Only one form:</dt> +<dd><p>Some languages only require one single form. There is no distinction +between the singular and plural form. An appropriate header entry +would look like this: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=1; plural=0; +</pre></td></tr></table> + +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Asian family</dt> +<dd><p>Japanese, Vietnamese, Korean </p></dd> +<dt> Tai-Kadai family</dt> +<dd><p>Thai </p></dd> +</dl> + +</dd> +<dt> Two forms, singular used for one only</dt> +<dd><p>This is the form used in most existing programs since it is what English +is using. A header entry would look like this: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=2; plural=n != 1; +</pre></td></tr></table> + +<p>(Note: this uses the feature of C expressions that boolean expressions +have to value zero or one.) +</p> +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Germanic family</dt> +<dd><p>English, German, Dutch, Swedish, Danish, Norwegian, Faroese </p></dd> +<dt> Romanic family</dt> +<dd><p>Spanish, Portuguese, Italian </p></dd> +<dt> Latin/Greek family</dt> +<dd><p>Greek </p></dd> +<dt> Slavic family</dt> +<dd><p>Bulgarian </p></dd> +<dt> Finno-Ugric family</dt> +<dd><p>Finnish, Estonian </p></dd> +<dt> Semitic family</dt> +<dd><p>Hebrew </p></dd> +<dt> Austronesian family</dt> +<dd><p>Bahasa Indonesian </p></dd> +<dt> Artificial</dt> +<dd><p>Esperanto </p></dd> +</dl> + +<p>Other languages using the same header entry are: +</p> +<dl compact="compact"> +<dt> Finno-Ugric family</dt> +<dd><p>Hungarian </p></dd> +<dt> Turkic/Altaic family</dt> +<dd><p>Turkish </p></dd> +</dl> + +<p>Hungarian does not appear to have a plural if you look at sentences involving +cardinal numbers. For example, “1 apple” is “1 alma”, and “123 apples” is +“123 alma”. But when the number is not explicit, the distinction between +singular and plural exists: “the apple” is “az alma”, and “the apples” is +“az almák”. Since <code>ngettext</code> has to support both types of sentences, +it is classified here, under “two forms”. +</p> +<p>The same holds for Turkish: “1 apple” is “1 elma”, and “123 apples” is +“123 elma”. But when the number is omitted, the distinction between singular +and plural exists: “the apple” is “elma”, and “the apples” is +“elmalar”. +</p> +</dd> +<dt> Two forms, singular used for zero and one</dt> +<dd><p>Exceptional case in the language family. The header entry would be: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=2; plural=n>1; +</pre></td></tr></table> + +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Romanic family</dt> +<dd><p>Brazilian Portuguese, French </p></dd> +</dl> + +</dd> +<dt> Three forms, special case for zero</dt> +<dd><p>The header entry would be: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2; +</pre></td></tr></table> + +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Baltic family</dt> +<dd><p>Latvian </p></dd> +</dl> + +</dd> +<dt> Three forms, special cases for one and two</dt> +<dd><p>The header entry would be: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2; +</pre></td></tr></table> + +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Celtic</dt> +<dd><p>Gaeilge (Irish) </p></dd> +</dl> + +</dd> +<dt> Three forms, special case for numbers ending in 00 or [2-9][0-9]</dt> +<dd><p>The header entry would be: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; \ + plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2; +</pre></td></tr></table> + +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Romanic family</dt> +<dd><p>Romanian </p></dd> +</dl> + +</dd> +<dt> Three forms, special case for numbers ending in 1[2-9]</dt> +<dd><p>The header entry would look like this: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; \ + plural=n%10==1 && n%100!=11 ? 0 : \ + n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2; +</pre></td></tr></table> + +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Baltic family</dt> +<dd><p>Lithuanian </p></dd> +</dl> + +</dd> +<dt> Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]</dt> +<dd><p>The header entry would look like this: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; \ + plural=n%10==1 && n%100!=11 ? 0 : \ + n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; +</pre></td></tr></table> + +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Slavic family</dt> +<dd><p>Russian, Ukrainian, Belarusian, Serbian, Croatian </p></dd> +</dl> + +</dd> +<dt> Three forms, special cases for 1 and 2, 3, 4</dt> +<dd><p>The header entry would look like this: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; \ + plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2; +</pre></td></tr></table> + +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Slavic family</dt> +<dd><p>Czech, Slovak </p></dd> +</dl> + +</dd> +<dt> Three forms, special case for one and some numbers ending in 2, 3, or 4</dt> +<dd><p>The header entry would look like this: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; \ + plural=n==1 ? 0 : \ + n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; +</pre></td></tr></table> + +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Slavic family</dt> +<dd><p>Polish </p></dd> +</dl> + +</dd> +<dt> Four forms, special case for one and all numbers ending in 02, 03, or 04</dt> +<dd><p>The header entry would look like this: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=4; \ + plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3; +</pre></td></tr></table> + +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Slavic family</dt> +<dd><p>Slovenian </p></dd> +</dl> + +</dd> +<dt> Six forms, special cases for one, two, all numbers ending in 02, 03, … 10, all numbers ending in 11 … 99, and others</dt> +<dd><p>The header entry would look like this: +</p> +<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=6; \ + plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 \ + : n%100>=11 ? 4 : 5; +</pre></td></tr></table> + +<p>Languages with this property include: +</p> +<dl compact="compact"> +<dt> Afroasiatic family</dt> +<dd><p>Arabic </p></dd> +</dl> +</dd> +</dl> + +<p>You might now ask, <code>ngettext</code> handles only numbers <var>n</var> of type +‘<samp>unsigned long</samp>’. What about larger integer types? What about negative +numbers? What about floating-point numbers? +</p> +<p>About larger integer types, such as ‘<samp>uintmax_t</samp>’ or +‘<samp>unsigned long long</samp>’: they can be handled by reducing the value to a +range that fits in an ‘<samp>unsigned long</samp>’. Simply casting the value to +‘<samp>unsigned long</samp>’ would not do the right thing, since it would treat +<code>ULONG_MAX + 1</code> like zero, <code>ULONG_MAX + 2</code> like singular, and +the like. Here you can exploit the fact that all mentioned plural form +formulas eventually become periodic, with a period that is a divisor of 100 +(or 1000 or 1000000). So, when you reduce a large value to another one in +the range [1000000, 1999999] that ends in the same 6 decimal digits, you +can assume that it will lead to the same plural form selection. This code +does this: +</p> +<table><tr><td> </td><td><pre class="smallexample">#include <inttypes.h> +uintmax_t nbytes = ...; +printf (ngettext ("The file has %"PRIuMAX" byte.", + "The file has %"PRIuMAX" bytes.", + (nbytes > ULONG_MAX + ? (nbytes % 1000000) + 1000000 + : nbytes)), + nbytes); +</pre></td></tr></table> + +<p>Negative and floating-point values usually represent physical entities for +which singular and plural don't clearly apply. In such cases, there is no +need to use <code>ngettext</code>; a simple <code>gettext</code> call with a form suitable +for all values will do. For example: +</p> +<table><tr><td> </td><td><pre class="smallexample">printf (gettext ("Time elapsed: %.3f seconds"), + num_milliseconds * 0.001); +</pre></td></tr></table> + +<p>Even if <var>num_milliseconds</var> happens to be a multiple of 1000, the output +</p><table><tr><td> </td><td><pre class="smallexample">Time elapsed: 1.000 seconds +</pre></td></tr></table> +<p>is acceptable in English, and similarly for other languages. +</p> +<p>The translators' perspective regarding plural forms is explained in +<a href="gettext_12.html#SEC228">Translating plural forms</a>. +</p> + +<a name="Optimized-gettext"></a> +<a name="SEC208"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC201">11.2.7 Optimization of the *gettext functions</a> </h3> + +<p>At this point of the discussion we should talk about an advantage of the +GNU <code>gettext</code> implementation. Some readers might have pointed out +that an internationalized program might have a poor performance if some +string has to be translated in an inner loop. While this is unavoidable +when the string varies from one run of the loop to the other it is +simply a waste of time when the string is always the same. Take the +following example: +</p> +<table><tr><td> </td><td><pre class="example">{ + while (…) + { + puts (gettext ("Hello world")); + } +} +</pre></td></tr></table> + +<p>When the locale selection does not change between two runs the resulting +string is always the same. One way to use this is: +</p> +<table><tr><td> </td><td><pre class="example">{ + str = gettext ("Hello world"); + while (…) + { + puts (str); + } +} +</pre></td></tr></table> + +<p>But this solution is not usable in all situation (e.g. when the locale +selection changes) nor does it lead to legible code. +</p> +<p>For this reason, GNU <code>gettext</code> caches previous translation results. +When the same translation is requested twice, with no new message +catalogs being loaded in between, <code>gettext</code> will, the second time, +find the result through a single cache lookup. +</p> + +<a name="Comparison"></a> +<a name="SEC209"></a> +<h2 class="section"> <a href="gettext_toc.html#TOC202">11.3 Comparing the Two Interfaces</a> </h2> + + +<p>The following discussion is perhaps a little bit colored. As said +above we implemented GNU <code>gettext</code> following the Uniforum +proposal and this surely has its reasons. But it should show how we +came to this decision. +</p> +<p>First we take a look at the developing process. When we write an +application using NLS provided by <code>gettext</code> we proceed as always. +Only when we come to a string which might be seen by the users and thus +has to be translated we use <code>gettext("…")</code> instead of +<code>"…"</code>. At the beginning of each source file (or in a central +header file) we define +</p> +<table><tr><td> </td><td><pre class="example">#define gettext(String) (String) +</pre></td></tr></table> + +<p>Even this definition can be avoided when the system supports the +<code>gettext</code> function in its C library. When we compile this code the +result is the same as if no NLS code is used. When you take a look at +the GNU <code>gettext</code> code you will see that we use <code>_("…")</code> +instead of <code>gettext("…")</code>. This reduces the number of +additional characters per translatable string to <em>3</em> (in words: +three). +</p> +<p>When now a production version of the program is needed we simply replace +the definition +</p> +<table><tr><td> </td><td><pre class="example">#define _(String) (String) +</pre></td></tr></table> + +<p>by +</p> +<a name="IDX1076"></a> +<table><tr><td> </td><td><pre class="example">#include <libintl.h> +#define _(String) gettext (String) +</pre></td></tr></table> + +<p>Additionally we run the program ‘<tt>xgettext</tt>’ on all source code file +which contain translatable strings and that's it: we have a running +program which does not depend on translations to be available, but which +can use any that becomes available. +</p> +<a name="IDX1077"></a> +<p>The same procedure can be done for the <code>gettext_noop</code> invocations +(see section <a href="gettext_4.html#SEC31">Special Cases of Translatable Strings</a>). One usually defines <code>gettext_noop</code> as a +no-op macro. So you should consider the following code for your project: +</p> +<table><tr><td> </td><td><pre class="example">#define gettext_noop(String) String +#define N_(String) gettext_noop (String) +</pre></td></tr></table> + +<p><code>N_</code> is a short form similar to <code>_</code>. The ‘<tt>Makefile</tt>’ in +the ‘<tt>po/</tt>’ directory of GNU <code>gettext</code> knows by default both of the +mentioned short forms so you are invited to follow this proposal for +your own ease. +</p> +<p>Now to <code>catgets</code>. The main problem is the work for the +programmer. Every time he comes to a translatable string he has to +define a number (or a symbolic constant) which has also be defined in +the message catalog file. He also has to take care for duplicate +entries, duplicate message IDs etc. If he wants to have the same +quality in the message catalog as the GNU <code>gettext</code> program +provides he also has to put the descriptive comments for the strings and +the location in all source code files in the message catalog. This is +nearly a Mission: Impossible. +</p> +<p>But there are also some points people might call advantages speaking for +<code>catgets</code>. If you have a single word in a string and this string +is used in different contexts it is likely that in one or the other +language the word has different translations. Example: +</p> +<table><tr><td> </td><td><pre class="example">printf ("%s: %d", gettext ("number"), number_of_errors) + +printf ("you should see %d %s", number_count, + number_count == 1 ? gettext ("number") : gettext ("numbers")) +</pre></td></tr></table> + +<p>Here we have to translate two times the string <code>"number"</code>. Even +if you do not speak a language beside English it might be possible to +recognize that the two words have a different meaning. In German the +first appearance has to be translated to <code>"Anzahl"</code> and the second +to <code>"Zahl"</code>. +</p> +<p>Now you can say that this example is really esoteric. And you are +right! This is exactly how we felt about this problem and decide that +it does not weight that much. The solution for the above problem could +be very easy: +</p> +<table><tr><td> </td><td><pre class="example">printf ("%s %d", gettext ("number:"), number_of_errors) + +printf (number_count == 1 ? gettext ("you should see %d number") + : gettext ("you should see %d numbers"), + number_count) +</pre></td></tr></table> + +<p>We believe that we can solve all conflicts with this method. If it is +difficult one can also consider changing one of the conflicting string a +little bit. But it is not impossible to overcome. +</p> +<p><code>catgets</code> allows same original entry to have different translations, +but <code>gettext</code> has another, scalable approach for solving ambiguities +of this kind: See section <a href="#SEC203">Solving Ambiguities</a>. +</p> + +<a name="Using-libintl_002ea"></a> +<a name="SEC210"></a> +<h2 class="section"> <a href="gettext_toc.html#TOC203">11.4 Using libintl.a in own programs</a> </h2> + +<p>Starting with version 0.9.4 the library <code>libintl.h</code> should be +self-contained. I.e., you can use it in your own programs without +providing additional functions. The ‘<tt>Makefile</tt>’ will put the header +and the library in directories selected using the <code>$(prefix)</code>. +</p> + +<a name="gettext-grok"></a> +<a name="SEC211"></a> +<h2 class="section"> <a href="gettext_toc.html#TOC204">11.5 Being a <code>gettext</code> grok</a> </h2> + +<p><strong> NOTE: </strong> This documentation section is outdated and needs to be +revised. +</p> +<p>To fully exploit the functionality of the GNU <code>gettext</code> library it +is surely helpful to read the source code. But for those who don't want +to spend that much time in reading the (sometimes complicated) code here +is a list comments: +</p> +<ul> +<li> Changing the language at runtime +<a name="IDX1078"></a> + +<p>For interactive programs it might be useful to offer a selection of the +used language at runtime. To understand how to do this one need to know +how the used language is determined while executing the <code>gettext</code> +function. The method which is presented here only works correctly +with the GNU implementation of the <code>gettext</code> functions. +</p> +<p>In the function <code>dcgettext</code> at every call the current setting of +the highest priority environment variable is determined and used. +Highest priority means here the following list with decreasing +priority: +</p> +<ol> +<li><a name="IDX1079"></a> +</li><li> <code>LANGUAGE</code> +<a name="IDX1080"></a> +</li><li> <code>LC_ALL</code> +<a name="IDX1081"></a> +<a name="IDX1082"></a> +<a name="IDX1083"></a> +<a name="IDX1084"></a> +<a name="IDX1085"></a> +<a name="IDX1086"></a> +</li><li> <code>LC_xxx</code>, according to selected locale category +<a name="IDX1087"></a> +</li><li> <code>LANG</code> +</li></ol> + +<p>Afterwards the path is constructed using the found value and the +translation file is loaded if available. +</p> +<p>What happens now when the value for, say, <code>LANGUAGE</code> changes? According +to the process explained above the new value of this variable is found +as soon as the <code>dcgettext</code> function is called. But this also means +the (perhaps) different message catalog file is loaded. In other +words: the used language is changed. +</p> +<p>But there is one little hook. The code for gcc-2.7.0 and up provides +some optimization. This optimization normally prevents the calling of +the <code>dcgettext</code> function as long as no new catalog is loaded. But +if <code>dcgettext</code> is not called the program also cannot find the +<code>LANGUAGE</code> variable be changed (see section <a href="#SEC208">Optimization of the *gettext functions</a>). A +solution for this is very easy. Include the following code in the +language switching function. +</p> +<table><tr><td> </td><td><pre class="example"> /* Change language. */ + setenv ("LANGUAGE", "fr", 1); + + /* Make change known. */ + { + extern int _nl_msg_cat_cntr; + ++_nl_msg_cat_cntr; + } +</pre></td></tr></table> + +<a name="IDX1088"></a> +<p>The variable <code>_nl_msg_cat_cntr</code> is defined in ‘<tt>loadmsgcat.c</tt>’. +You don't need to know what this is for. But it can be used to detect +whether a <code>gettext</code> implementation is GNU gettext and not non-GNU +system's native gettext implementation. +</p> +</li></ul> + + +<a name="Temp-Programmers"></a> +<a name="SEC212"></a> +<h2 class="section"> <a href="gettext_toc.html#TOC205">11.6 Temporary Notes for the Programmers Chapter</a> </h2> + +<p><strong> NOTE: </strong> This documentation section is outdated and needs to be +revised. +</p> + + +<a name="Temp-Implementations"></a> +<a name="SEC213"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC206">11.6.1 Temporary - Two Possible Implementations</a> </h3> + +<p>There are two competing methods for language independent messages: +the X/Open <code>catgets</code> method, and the Uniforum <code>gettext</code> +method. The <code>catgets</code> method indexes messages by integers; the +<code>gettext</code> method indexes them by their English translations. +The <code>catgets</code> method has been around longer and is supported +by more vendors. The <code>gettext</code> method is supported by Sun, +and it has been heard that the COSE multi-vendor initiative is +supporting it. Neither method is a POSIX standard; the POSIX.1 +committee had a lot of disagreement in this area. +</p> +<p>Neither one is in the POSIX standard. There was much disagreement +in the POSIX.1 committee about using the <code>gettext</code> routines +vs. <code>catgets</code> (XPG). In the end the committee couldn't +agree on anything, so no messaging system was included as part +of the standard. I believe the informative annex of the standard +includes the XPG3 messaging interfaces, “…as an example of +a messaging system that has been implemented…” +</p> +<p>They were very careful not to say anywhere that you should use one +set of interfaces over the other. For more on this topic please +see the Programming for Internationalization FAQ. +</p> + +<a name="Temp-catgets"></a> +<a name="SEC214"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC207">11.6.2 Temporary - About <code>catgets</code></a> </h3> + +<p>There have been a few discussions of late on the use of +<code>catgets</code> as a base. I think it important to present both +sides of the argument and hence am opting to play devil's advocate +for a little bit. +</p> +<p>I'll not deny the fact that <code>catgets</code> could have been designed +a lot better. It currently has quite a number of limitations and +these have already been pointed out. +</p> +<p>However there is a great deal to be said for consistency and +standardization. A common recurring problem when writing Unix +software is the myriad portability problems across Unix platforms. +It seems as if every Unix vendor had a look at the operating system +and found parts they could improve upon. Undoubtedly, these +modifications are probably innovative and solve real problems. +However, software developers have a hard time keeping up with all +these changes across so many platforms. +</p> +<p>And this has prompted the Unix vendors to begin to standardize their +systems. Hence the impetus for Spec1170. Every major Unix vendor +has committed to supporting this standard and every Unix software +developer waits with glee the day they can write software to this +standard and simply recompile (without having to use autoconf) +across different platforms. +</p> +<p>As I understand it, Spec1170 is roughly based upon version 4 of the +X/Open Portability Guidelines (XPG4). Because <code>catgets</code> and +friends are defined in XPG4, I'm led to believe that <code>catgets</code> +is a part of Spec1170 and hence will become a standardized component +of all Unix systems. +</p> + +<a name="Temp-WSI"></a> +<a name="SEC215"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC208">11.6.3 Temporary - Why a single implementation</a> </h3> + +<p>Now it seems kind of wasteful to me to have two different systems +installed for accessing message catalogs. If we do want to remedy +<code>catgets</code> deficiencies why don't we try to expand <code>catgets</code> +(in a compatible manner) rather than implement an entirely new system. +Otherwise, we'll end up with two message catalog access systems installed +with an operating system - one set of routines for packages using GNU +<code>gettext</code> for their internationalization, and another set of routines +(catgets) for all other software. Bloated? +</p> +<p>Supposing another catalog access system is implemented. Which do +we recommend? At least for Linux, we need to attract as many +software developers as possible. Hence we need to make it as easy +for them to port their software as possible. Which means supporting +<code>catgets</code>. We will be implementing the <code>libintl</code> code +within our <code>libc</code>, but does this mean we also have to incorporate +another message catalog access scheme within our <code>libc</code> as well? +And what about people who are going to be using the <code>libintl</code> ++ non-<code>catgets</code> routines. When they port their software to +other platforms, they're now going to have to include the front-end +(<code>libintl</code>) code plus the back-end code (the non-<code>catgets</code> +access routines) with their software instead of just including the +<code>libintl</code> code with their software. +</p> +<p>Message catalog support is however only the tip of the iceberg. +What about the data for the other locale categories? They also have +a number of deficiencies. Are we going to abandon them as well and +develop another duplicate set of routines (should <code>libintl</code> +expand beyond message catalog support)? +</p> +<p>Like many parts of Unix that can be improved upon, we're stuck with balancing +compatibility with the past with useful improvements and innovations for +the future. +</p> + +<a name="Temp-Notes"></a> +<a name="SEC216"></a> +<h3 class="subsection"> <a href="gettext_toc.html#TOC209">11.6.4 Temporary - Notes</a> </h3> + +<p>X/Open agreed very late on the standard form so that many +implementations differ from the final form. Both of my system (old +Linux catgets and Ultrix-4) have a strange variation. +</p> +<p>OK. After incorporating the last changes I have to spend some time on +making the GNU/Linux <code>libc</code> <code>gettext</code> functions. So in future +Solaris is not the only system having <code>gettext</code>. +</p> + +<table cellpadding="1" cellspacing="1" border="0"> +<tr><td valign="middle" align="left">[<a href="#SEC197" title="Beginning of this chapter or previous chapter"> << </a>]</td> +<td valign="middle" align="left">[<a href="gettext_12.html#SEC217" title="Next chapter"> >> </a>]</td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left">[<a href="gettext_toc.html#SEC_Top" title="Cover (top) of document">Top</a>]</td> +<td valign="middle" align="left">[<a href="gettext_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td> +<td valign="middle" align="left">[<a href="gettext_21.html#SEC389" title="Index">Index</a>]</td> +<td valign="middle" align="left">[<a href="gettext_abt.html#SEC_About" title="About (help)"> ? </a>]</td> +</tr></table> +<p> + <font size="-1"> + This document was generated by <em>Bruno Haible</em> on <em>February, 21 2024</em> using <a href="https://www.nongnu.org/texi2html/"><em>texi2html 1.78a</em></a>. + </font> + <br> + +</p> +</body> +</html>