annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/doc/gettext/gettext_11.html @ 68:5028fdace37b

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 16:23:26 -0400
parents
children
rev   line source
jpayne@68 1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html401/loose.dtd">
jpayne@68 2 <html>
jpayne@68 3 <!-- Created on February, 21 2024 by texi2html 1.78a -->
jpayne@68 4 <!--
jpayne@68 5 Written by: Lionel Cons <Lionel.Cons@cern.ch> (original author)
jpayne@68 6 Karl Berry <karl@freefriends.org>
jpayne@68 7 Olaf Bachmann <obachman@mathematik.uni-kl.de>
jpayne@68 8 and many others.
jpayne@68 9 Maintained by: Many creative people.
jpayne@68 10 Send bugs and suggestions to <texi2html-bug@nongnu.org>
jpayne@68 11
jpayne@68 12 -->
jpayne@68 13 <head>
jpayne@68 14 <title>GNU gettext utilities: 11. The Programmer's View</title>
jpayne@68 15
jpayne@68 16 <meta name="description" content="GNU gettext utilities: 11. The Programmer's View">
jpayne@68 17 <meta name="keywords" content="GNU gettext utilities: 11. The Programmer's View">
jpayne@68 18 <meta name="resource-type" content="document">
jpayne@68 19 <meta name="distribution" content="global">
jpayne@68 20 <meta name="Generator" content="texi2html 1.78a">
jpayne@68 21 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
jpayne@68 22 <style type="text/css">
jpayne@68 23 <!--
jpayne@68 24 a.summary-letter {text-decoration: none}
jpayne@68 25 pre.display {font-family: serif}
jpayne@68 26 pre.format {font-family: serif}
jpayne@68 27 pre.menu-comment {font-family: serif}
jpayne@68 28 pre.menu-preformatted {font-family: serif}
jpayne@68 29 pre.smalldisplay {font-family: serif; font-size: smaller}
jpayne@68 30 pre.smallexample {font-size: smaller}
jpayne@68 31 pre.smallformat {font-family: serif; font-size: smaller}
jpayne@68 32 pre.smalllisp {font-size: smaller}
jpayne@68 33 span.roman {font-family:serif; font-weight:normal;}
jpayne@68 34 span.sansserif {font-family:sans-serif; font-weight:normal;}
jpayne@68 35 ul.toc {list-style: none}
jpayne@68 36 -->
jpayne@68 37 </style>
jpayne@68 38
jpayne@68 39
jpayne@68 40 </head>
jpayne@68 41
jpayne@68 42 <body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
jpayne@68 43
jpayne@68 44 <table cellpadding="1" cellspacing="1" border="0">
jpayne@68 45 <tr><td valign="middle" align="left">[<a href="gettext_10.html#SEC173" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
jpayne@68 46 <td valign="middle" align="left">[<a href="gettext_12.html#SEC217" title="Next chapter"> &gt;&gt; </a>]</td>
jpayne@68 47 <td valign="middle" align="left"> &nbsp; </td>
jpayne@68 48 <td valign="middle" align="left"> &nbsp; </td>
jpayne@68 49 <td valign="middle" align="left"> &nbsp; </td>
jpayne@68 50 <td valign="middle" align="left"> &nbsp; </td>
jpayne@68 51 <td valign="middle" align="left"> &nbsp; </td>
jpayne@68 52 <td valign="middle" align="left">[<a href="gettext_toc.html#SEC_Top" title="Cover (top) of document">Top</a>]</td>
jpayne@68 53 <td valign="middle" align="left">[<a href="gettext_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
jpayne@68 54 <td valign="middle" align="left">[<a href="gettext_21.html#SEC389" title="Index">Index</a>]</td>
jpayne@68 55 <td valign="middle" align="left">[<a href="gettext_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
jpayne@68 56 </tr></table>
jpayne@68 57
jpayne@68 58 <hr size="2">
jpayne@68 59 <a name="Programmers"></a>
jpayne@68 60 <a name="SEC197"></a>
jpayne@68 61 <h1 class="chapter"> <a href="gettext_toc.html#TOC190">11. The Programmer's View</a> </h1>
jpayne@68 62
jpayne@68 63
jpayne@68 64 <p>One aim of the current message catalog implementation provided by
jpayne@68 65 GNU <code>gettext</code> was to use the system's message catalog handling, if the
jpayne@68 66 installer wishes to do so. So we perhaps should first take a look at
jpayne@68 67 the solutions we know about. The people in the POSIX committee did not
jpayne@68 68 manage to agree on one of the semi-official standards which we'll
jpayne@68 69 describe below. In fact they couldn't agree on anything, so they decided
jpayne@68 70 only to include an example of an interface. The major Unix vendors
jpayne@68 71 are split in the usage of the two most important specifications: X/Open's
jpayne@68 72 catgets vs. Uniforum's gettext interface. We'll describe them both and
jpayne@68 73 later explain our solution of this dilemma.
jpayne@68 74 </p>
jpayne@68 75
jpayne@68 76
jpayne@68 77 <a name="catgets"></a>
jpayne@68 78 <a name="SEC198"></a>
jpayne@68 79 <h2 class="section"> <a href="gettext_toc.html#TOC191">11.1 About <code>catgets</code></a> </h2>
jpayne@68 80
jpayne@68 81 <p>The <code>catgets</code> implementation is defined in the X/Open Portability
jpayne@68 82 Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the
jpayne@68 83 process of creating this standard seemed to be too slow for some of
jpayne@68 84 the Unix vendors so they created their implementations on preliminary
jpayne@68 85 versions of the standard. Of course this leads again to problems while
jpayne@68 86 writing platform independent programs: even the usage of <code>catgets</code>
jpayne@68 87 does not guarantee a unique interface.
jpayne@68 88 </p>
jpayne@68 89 <p>Another, personal comment on this that only a bunch of committee members
jpayne@68 90 could have made this interface. They never really tried to program
jpayne@68 91 using this interface. It is a fast, memory-saving implementation, an
jpayne@68 92 user can happily live with it. But programmers hate it (at least I and
jpayne@68 93 some others do&hellip;)
jpayne@68 94 </p>
jpayne@68 95 <p>But we must not forget one point: after all the trouble with transferring
jpayne@68 96 the rights on Unix they at last came to X/Open, the very same who
jpayne@68 97 published this specification. This leads me to making the prediction
jpayne@68 98 that this interface will be in future Unix standards (e.g. Spec1170) and
jpayne@68 99 therefore part of all Unix implementation (implementations, which are
jpayne@68 100 <em>allowed</em> to wear this name).
jpayne@68 101 </p>
jpayne@68 102
jpayne@68 103
jpayne@68 104 <a name="Interface-to-catgets"></a>
jpayne@68 105 <a name="SEC199"></a>
jpayne@68 106 <h3 class="subsection"> <a href="gettext_toc.html#TOC192">11.1.1 The Interface</a> </h3>
jpayne@68 107
jpayne@68 108 <p>The interface to the <code>catgets</code> implementation consists of three
jpayne@68 109 functions which correspond to those used in file access: <code>catopen</code>
jpayne@68 110 to open the catalog for using, <code>catgets</code> for accessing the message
jpayne@68 111 tables, and <code>catclose</code> for closing after work is done. Prototypes
jpayne@68 112 for the functions and the needed definitions are in the
jpayne@68 113 <code>&lt;nl_types.h&gt;</code> header file.
jpayne@68 114 </p>
jpayne@68 115 <a name="IDX1059"></a>
jpayne@68 116 <p><code>catopen</code> is used like in this:
jpayne@68 117 </p>
jpayne@68 118 <table><tr><td>&nbsp;</td><td><pre class="example">nl_catd catd = catopen (&quot;catalog_name&quot;, 0);
jpayne@68 119 </pre></td></tr></table>
jpayne@68 120
jpayne@68 121 <p>The function takes as the argument the name of the catalog. This usual
jpayne@68 122 refers to the name of the program or the package. The second parameter
jpayne@68 123 is not further specified in the standard. I don't even know whether it
jpayne@68 124 is implemented consistently among various systems. So the common advice
jpayne@68 125 is to use <code>0</code> as the value. The return value is a handle to the
jpayne@68 126 message catalog, equivalent to handles to file returned by <code>open</code>.
jpayne@68 127 </p>
jpayne@68 128 <a name="IDX1060"></a>
jpayne@68 129 <p>This handle is of course used in the <code>catgets</code> function which can
jpayne@68 130 be used like this:
jpayne@68 131 </p>
jpayne@68 132 <table><tr><td>&nbsp;</td><td><pre class="example">char *translation = catgets (catd, set_no, msg_id, &quot;original string&quot;);
jpayne@68 133 </pre></td></tr></table>
jpayne@68 134
jpayne@68 135 <p>The first parameter is this catalog descriptor. The second parameter
jpayne@68 136 specifies the set of messages in this catalog, in which the message
jpayne@68 137 described by <code>msg_id</code> is obtained. <code>catgets</code> therefore uses a
jpayne@68 138 three-stage addressing:
jpayne@68 139 </p>
jpayne@68 140 <table><tr><td>&nbsp;</td><td><pre class="display">catalog name &rArr; set number &rArr; message ID &rArr; translation
jpayne@68 141 </pre></td></tr></table>
jpayne@68 142
jpayne@68 143
jpayne@68 144 <p>The fourth argument is not used to address the translation. It is given
jpayne@68 145 as a default value in case when one of the addressing stages fail. One
jpayne@68 146 important thing to remember is that although the return type of catgets
jpayne@68 147 is <code>char *</code> the resulting string <em>must not</em> be changed. It
jpayne@68 148 should better be <code>const char *</code>, but the standard is published in
jpayne@68 149 1988, one year before ANSI C.
jpayne@68 150 </p>
jpayne@68 151 <a name="IDX1061"></a>
jpayne@68 152 <p>The last of these functions is used and behaves as expected:
jpayne@68 153 </p>
jpayne@68 154 <table><tr><td>&nbsp;</td><td><pre class="example">catclose (catd);
jpayne@68 155 </pre></td></tr></table>
jpayne@68 156
jpayne@68 157 <p>After this no <code>catgets</code> call using the descriptor is legal anymore.
jpayne@68 158 </p>
jpayne@68 159
jpayne@68 160 <a name="Problems-with-catgets"></a>
jpayne@68 161 <a name="SEC200"></a>
jpayne@68 162 <h3 class="subsection"> <a href="gettext_toc.html#TOC193">11.1.2 Problems with the <code>catgets</code> Interface?!</a> </h3>
jpayne@68 163
jpayne@68 164 <p>Now that this description seemed to be really easy &mdash; where are the
jpayne@68 165 problems we speak of? In fact the interface could be used in a
jpayne@68 166 reasonable way, but constructing the message catalogs is a pain. The
jpayne@68 167 reason for this lies in the third argument of <code>catgets</code>: the unique
jpayne@68 168 message ID. This has to be a numeric value for all messages in a single
jpayne@68 169 set. Perhaps you could imagine the problems keeping such a list while
jpayne@68 170 changing the source code. Add a new message here, remove one there. Of
jpayne@68 171 course there have been developed a lot of tools helping to organize this
jpayne@68 172 chaos but one as the other fails in one aspect or the other. We don't
jpayne@68 173 want to say that the other approach has no problems but they are far
jpayne@68 174 more easy to manage.
jpayne@68 175 </p>
jpayne@68 176
jpayne@68 177 <a name="gettext"></a>
jpayne@68 178 <a name="SEC201"></a>
jpayne@68 179 <h2 class="section"> <a href="gettext_toc.html#TOC194">11.2 About <code>gettext</code></a> </h2>
jpayne@68 180
jpayne@68 181 <p>The definition of the <code>gettext</code> interface comes from a Uniforum
jpayne@68 182 proposal. It was submitted there by Sun, who had implemented the
jpayne@68 183 <code>gettext</code> function in SunOS 4, around 1990. Nowadays, the
jpayne@68 184 <code>gettext</code> interface is specified by the OpenI18N standard.
jpayne@68 185 </p>
jpayne@68 186 <p>The main point about this solution is that it does not follow the
jpayne@68 187 method of normal file handling (open-use-close) and that it does not
jpayne@68 188 burden the programmer with so many tasks, especially the unique key handling.
jpayne@68 189 Of course here also a unique key is needed, but this key is the message
jpayne@68 190 itself (how long or short it is). See <a href="#SEC209">Comparing the Two Interfaces</a> for a more
jpayne@68 191 detailed comparison of the two methods.
jpayne@68 192 </p>
jpayne@68 193 <p>The following section contains a rather detailed description of the
jpayne@68 194 interface. We make it that detailed because this is the interface
jpayne@68 195 we chose for the GNU <code>gettext</code> Library. Programmers interested
jpayne@68 196 in using this library will be interested in this description.
jpayne@68 197 </p>
jpayne@68 198
jpayne@68 199
jpayne@68 200 <a name="Interface-to-gettext"></a>
jpayne@68 201 <a name="SEC202"></a>
jpayne@68 202 <h3 class="subsection"> <a href="gettext_toc.html#TOC195">11.2.1 The Interface</a> </h3>
jpayne@68 203
jpayne@68 204 <p>The minimal functionality an interface must have is a) to select a
jpayne@68 205 domain the strings are coming from (a single domain for all programs is
jpayne@68 206 not reasonable because its construction and maintenance is difficult,
jpayne@68 207 perhaps impossible) and b) to access a string in a selected domain.
jpayne@68 208 </p>
jpayne@68 209 <p>This is principally the description of the <code>gettext</code> interface. It
jpayne@68 210 has a global domain which unqualified usages reference. Of course this
jpayne@68 211 domain is selectable by the user.
jpayne@68 212 </p>
jpayne@68 213 <table><tr><td>&nbsp;</td><td><pre class="example">char *textdomain (const char *domain_name);
jpayne@68 214 </pre></td></tr></table>
jpayne@68 215
jpayne@68 216 <p>This provides the possibility to change or query the current status of
jpayne@68 217 the current global domain of the <code>LC_MESSAGE</code> category. The
jpayne@68 218 argument is a null-terminated string, whose characters must be legal in
jpayne@68 219 the use in filenames. If the <var>domain_name</var> argument is <code>NULL</code>,
jpayne@68 220 the function returns the current value. If no value has been set
jpayne@68 221 before, the name of the default domain is returned: <em>messages</em>.
jpayne@68 222 Please note that although the return value of <code>textdomain</code> is of
jpayne@68 223 type <code>char *</code> no changing is allowed. It is also important to know
jpayne@68 224 that no checks of the availability are made. If the name is not
jpayne@68 225 available you will see this by the fact that no translations are provided.
jpayne@68 226 </p>
jpayne@68 227 <p>To use a domain set by <code>textdomain</code> the function
jpayne@68 228 </p>
jpayne@68 229 <table><tr><td>&nbsp;</td><td><pre class="example">char *gettext (const char *msgid);
jpayne@68 230 </pre></td></tr></table>
jpayne@68 231
jpayne@68 232 <p>is to be used. This is the simplest reasonable form one can imagine.
jpayne@68 233 The translation of the string <var>msgid</var> is returned if it is available
jpayne@68 234 in the current domain. If it is not available, the argument itself is
jpayne@68 235 returned. If the argument is <code>NULL</code> the result is undefined.
jpayne@68 236 </p>
jpayne@68 237 <p>One thing which should come into mind is that no explicit dependency to
jpayne@68 238 the used domain is given. The current value of the domain is used.
jpayne@68 239 If this changes between two
jpayne@68 240 executions of the same <code>gettext</code> call in the program, both calls
jpayne@68 241 reference a different message catalog.
jpayne@68 242 </p>
jpayne@68 243 <p>For the easiest case, which is normally used in internationalized
jpayne@68 244 packages, once at the beginning of execution a call to <code>textdomain</code>
jpayne@68 245 is issued, setting the domain to a unique name, normally the package
jpayne@68 246 name. In the following code all strings which have to be translated are
jpayne@68 247 filtered through the gettext function. That's all, the package speaks
jpayne@68 248 your language.
jpayne@68 249 </p>
jpayne@68 250
jpayne@68 251 <a name="Ambiguities"></a>
jpayne@68 252 <a name="SEC203"></a>
jpayne@68 253 <h3 class="subsection"> <a href="gettext_toc.html#TOC196">11.2.2 Solving Ambiguities</a> </h3>
jpayne@68 254
jpayne@68 255 <p>While this single name domain works well for most applications there
jpayne@68 256 might be the need to get translations from more than one domain. Of
jpayne@68 257 course one could switch between different domains with calls to
jpayne@68 258 <code>textdomain</code>, but this is really not convenient nor is it fast. A
jpayne@68 259 possible situation could be one case subject to discussion during this
jpayne@68 260 writing: all
jpayne@68 261 error messages of functions in the set of common used functions should
jpayne@68 262 go into a separate domain <code>error</code>. By this mean we would only need
jpayne@68 263 to translate them once.
jpayne@68 264 Another case are messages from a library, as these <em>have</em> to be
jpayne@68 265 independent of the current domain set by the application.
jpayne@68 266 </p>
jpayne@68 267 <p>For this reasons there are two more functions to retrieve strings:
jpayne@68 268 </p>
jpayne@68 269 <table><tr><td>&nbsp;</td><td><pre class="example">char *dgettext (const char *domain_name, const char *msgid);
jpayne@68 270 char *dcgettext (const char *domain_name, const char *msgid,
jpayne@68 271 int category);
jpayne@68 272 </pre></td></tr></table>
jpayne@68 273
jpayne@68 274 <p>Both take an additional argument at the first place, which corresponds
jpayne@68 275 to the argument of <code>textdomain</code>. The third argument of
jpayne@68 276 <code>dcgettext</code> allows to use another locale category but <code>LC_MESSAGES</code>.
jpayne@68 277 But I really don't know where this can be useful. If the
jpayne@68 278 <var>domain_name</var> is <code>NULL</code> or <var>category</var> has an value beside
jpayne@68 279 the known ones, the result is undefined. It should also be noted that
jpayne@68 280 this function is not part of the second known implementation of this
jpayne@68 281 function family, the one found in Solaris.
jpayne@68 282 </p>
jpayne@68 283 <p>A second ambiguity can arise by the fact, that perhaps more than one
jpayne@68 284 domain has the same name. This can be solved by specifying where the
jpayne@68 285 needed message catalog files can be found.
jpayne@68 286 </p>
jpayne@68 287 <table><tr><td>&nbsp;</td><td><pre class="example">char *bindtextdomain (const char *domain_name,
jpayne@68 288 const char *dir_name);
jpayne@68 289 </pre></td></tr></table>
jpayne@68 290
jpayne@68 291 <p>Calling this function binds the given domain to a file in the specified
jpayne@68 292 directory (how this file is determined follows below). Especially a
jpayne@68 293 file in the systems default place is not favored against the specified
jpayne@68 294 file anymore (as it would be by solely using <code>textdomain</code>). A
jpayne@68 295 <code>NULL</code> pointer for the <var>dir_name</var> parameter returns the binding
jpayne@68 296 associated with <var>domain_name</var>. If <var>domain_name</var> itself is
jpayne@68 297 <code>NULL</code> nothing happens and a <code>NULL</code> pointer is returned. Here
jpayne@68 298 again as for all the other functions is true that none of the return
jpayne@68 299 value must be changed!
jpayne@68 300 </p>
jpayne@68 301 <p>It is important to remember that relative path names for the
jpayne@68 302 <var>dir_name</var> parameter can be trouble. Since the path is always
jpayne@68 303 computed relative to the current directory different results will be
jpayne@68 304 achieved when the program executes a <code>chdir</code> command. Relative
jpayne@68 305 paths should always be avoided to avoid dependencies and
jpayne@68 306 unreliabilities.
jpayne@68 307 </p>
jpayne@68 308 <table><tr><td>&nbsp;</td><td><pre class="example">wchar_t *wbindtextdomain (const char *domain_name,
jpayne@68 309 const wchar_t *dir_name);
jpayne@68 310 </pre></td></tr></table>
jpayne@68 311
jpayne@68 312 <p>This function is provided only on native Windows platforms. It is like
jpayne@68 313 <code>bindtextdomain</code>, except that the <var>dir_name</var> parameter is a
jpayne@68 314 wide string (in UTF-16 encoding, as usual on Windows).
jpayne@68 315 </p>
jpayne@68 316
jpayne@68 317 <a name="Locating-Catalogs"></a>
jpayne@68 318 <a name="SEC204"></a>
jpayne@68 319 <h3 class="subsection"> <a href="gettext_toc.html#TOC197">11.2.3 Locating Message Catalog Files</a> </h3>
jpayne@68 320
jpayne@68 321 <p>Because many different languages for many different packages have to be
jpayne@68 322 stored we need some way to add these information to file message catalog
jpayne@68 323 files. The way usually used in Unix environments is have this encoding
jpayne@68 324 in the file name. This is also done here. The directory name given in
jpayne@68 325 <code>bindtextdomain</code>s second argument (or the default directory),
jpayne@68 326 followed by the name of the locale, the locale category, and the domain name
jpayne@68 327 are concatenated:
jpayne@68 328 </p>
jpayne@68 329 <table><tr><td>&nbsp;</td><td><pre class="example"><var>dir_name</var>/<var>locale</var>/LC_<var>category</var>/<var>domain_name</var>.mo
jpayne@68 330 </pre></td></tr></table>
jpayne@68 331
jpayne@68 332 <p>The default value for <var>dir_name</var> is system specific. For the GNU
jpayne@68 333 library, and for packages adhering to its conventions, it's:
jpayne@68 334 </p><table><tr><td>&nbsp;</td><td><pre class="example">/usr/local/share/locale
jpayne@68 335 </pre></td></tr></table>
jpayne@68 336
jpayne@68 337 <p><var>locale</var> is the name of the locale category which is designated by
jpayne@68 338 <code>LC_<var>category</var></code>. For <code>gettext</code> and <code>dgettext</code> this
jpayne@68 339 <code>LC_<var>category</var></code> is always <code>LC_MESSAGES</code>.<a name="DOCF3" href="gettext_fot.html#FOOT3">(3)</a>
jpayne@68 340 The name of the locale category is determined through
jpayne@68 341 <code>setlocale (LC_<var>category</var>, NULL)</code>.
jpayne@68 342 <a name="DOCF4" href="gettext_fot.html#FOOT4">(4)</a>
jpayne@68 343 When using the function <code>dcgettext</code>, you can specify the locale category
jpayne@68 344 through the third argument.
jpayne@68 345 </p>
jpayne@68 346
jpayne@68 347 <a name="Charset-conversion"></a>
jpayne@68 348 <a name="SEC205"></a>
jpayne@68 349 <h3 class="subsection"> <a href="gettext_toc.html#TOC198">11.2.4 How to specify the output character set <code>gettext</code> uses</a> </h3>
jpayne@68 350
jpayne@68 351 <p><code>gettext</code> not only looks up a translation in a message catalog. It
jpayne@68 352 also converts the translation on the fly to the desired output character
jpayne@68 353 set. This is useful if the user is working in a different character set
jpayne@68 354 than the translator who created the message catalog, because it avoids
jpayne@68 355 distributing variants of message catalogs which differ only in the
jpayne@68 356 character set.
jpayne@68 357 </p>
jpayne@68 358 <p>The output character set is, by default, the value of <code>nl_langinfo
jpayne@68 359 (CODESET)</code>, which depends on the <code>LC_CTYPE</code> part of the current
jpayne@68 360 locale. But programs which store strings in a locale independent way
jpayne@68 361 (e.g. UTF-8) can request that <code>gettext</code> and related functions
jpayne@68 362 return the translations in that encoding, by use of the
jpayne@68 363 <code>bind_textdomain_codeset</code> function.
jpayne@68 364 </p>
jpayne@68 365 <p>Note that the <var>msgid</var> argument to <code>gettext</code> is not subject to
jpayne@68 366 character set conversion. Also, when <code>gettext</code> does not find a
jpayne@68 367 translation for <var>msgid</var>, it returns <var>msgid</var> unchanged &ndash;
jpayne@68 368 independently of the current output character set. It is therefore
jpayne@68 369 recommended that all <var>msgid</var>s be US-ASCII strings.
jpayne@68 370 </p>
jpayne@68 371 <dl>
jpayne@68 372 <dt><u>Function:</u> char * <b>bind_textdomain_codeset</b><i> (const&nbsp;char&nbsp;*<var>domainname</var>, const&nbsp;char&nbsp;*<var>codeset</var>)</i>
jpayne@68 373 <a name="IDX1062"></a>
jpayne@68 374 </dt>
jpayne@68 375 <dd><p>The <code>bind_textdomain_codeset</code> function can be used to specify the
jpayne@68 376 output character set for message catalogs for domain <var>domainname</var>.
jpayne@68 377 The <var>codeset</var> argument must be a valid codeset name which can be used
jpayne@68 378 for the <code>iconv_open</code> function, or a null pointer.
jpayne@68 379 </p>
jpayne@68 380 <p>If the <var>codeset</var> parameter is the null pointer,
jpayne@68 381 <code>bind_textdomain_codeset</code> returns the currently selected codeset
jpayne@68 382 for the domain with the name <var>domainname</var>. It returns <code>NULL</code> if
jpayne@68 383 no codeset has yet been selected.
jpayne@68 384 </p>
jpayne@68 385 <p>The <code>bind_textdomain_codeset</code> function can be used several times.
jpayne@68 386 If used multiple times with the same <var>domainname</var> argument, the
jpayne@68 387 later call overrides the settings made by the earlier one.
jpayne@68 388 </p>
jpayne@68 389 <p>The <code>bind_textdomain_codeset</code> function returns a pointer to a
jpayne@68 390 string containing the name of the selected codeset. The string is
jpayne@68 391 allocated internally in the function and must not be changed by the
jpayne@68 392 user. If the system went out of core during the execution of
jpayne@68 393 <code>bind_textdomain_codeset</code>, the return value is <code>NULL</code> and the
jpayne@68 394 global variable <var>errno</var> is set accordingly.
jpayne@68 395 </p></dd></dl>
jpayne@68 396
jpayne@68 397
jpayne@68 398 <a name="Contexts"></a>
jpayne@68 399 <a name="SEC206"></a>
jpayne@68 400 <h3 class="subsection"> <a href="gettext_toc.html#TOC199">11.2.5 Using contexts for solving ambiguities</a> </h3>
jpayne@68 401
jpayne@68 402 <p>One place where the <code>gettext</code> functions, if used normally, have big
jpayne@68 403 problems is within programs with graphical user interfaces (GUIs). The
jpayne@68 404 problem is that many of the strings which have to be translated are very
jpayne@68 405 short. They have to appear in pull-down menus which restricts the
jpayne@68 406 length. But strings which are not containing entire sentences or at
jpayne@68 407 least large fragments of a sentence may appear in more than one
jpayne@68 408 situation in the program but might have different translations. This is
jpayne@68 409 especially true for the one-word strings which are frequently used in
jpayne@68 410 GUI programs.
jpayne@68 411 </p>
jpayne@68 412 <p>As a consequence many people say that the <code>gettext</code> approach is
jpayne@68 413 wrong and instead <code>catgets</code> should be used which indeed does not
jpayne@68 414 have this problem. But there is a very simple and powerful method to
jpayne@68 415 handle this kind of problems with the <code>gettext</code> functions.
jpayne@68 416 </p>
jpayne@68 417 <p>Contexts can be added to strings to be translated. A context dependent
jpayne@68 418 translation lookup is when a translation for a given string is searched,
jpayne@68 419 that is limited to a given context. The translation for the same string
jpayne@68 420 in a different context can be different. The different translations of
jpayne@68 421 the same string in different contexts can be stored in the in the same
jpayne@68 422 MO file, and can be edited by the translator in the same PO file.
jpayne@68 423 </p>
jpayne@68 424 <p>The &lsquo;<tt>gettext.h</tt>&rsquo; include file contains the lookup macros for strings
jpayne@68 425 with contexts. They are implemented as thin macros and inline functions
jpayne@68 426 over the functions from <code>&lt;libintl.h&gt;</code>.
jpayne@68 427 </p>
jpayne@68 428 <a name="IDX1063"></a>
jpayne@68 429 <table><tr><td>&nbsp;</td><td><pre class="example">const char *pgettext (const char *msgctxt, const char *msgid);
jpayne@68 430 </pre></td></tr></table>
jpayne@68 431
jpayne@68 432 <p>In a call of this macro, <var>msgctxt</var> and <var>msgid</var> must be string
jpayne@68 433 literals. The macro returns the translation of <var>msgid</var>, restricted
jpayne@68 434 to the context given by <var>msgctxt</var>.
jpayne@68 435 </p>
jpayne@68 436 <p>The <var>msgctxt</var> string is visible in the PO file to the translator.
jpayne@68 437 You should try to make it somehow canonical and never changing. Because
jpayne@68 438 every time you change an <var>msgctxt</var>, the translator will have to review
jpayne@68 439 the translation of <var>msgid</var>.
jpayne@68 440 </p>
jpayne@68 441 <p>Finding a canonical <var>msgctxt</var> string that doesn't change over time can
jpayne@68 442 be hard. But you shouldn't use the file name or class name containing the
jpayne@68 443 <code>pgettext</code> call &ndash; because it is a common development task to rename
jpayne@68 444 a file or a class, and it shouldn't cause translator work. Also you shouldn't
jpayne@68 445 use a comment in the form of a complete English sentence as <var>msgctxt</var> &ndash;
jpayne@68 446 because orthography or grammar changes are often applied to such sentences,
jpayne@68 447 and again, it shouldn't force the translator to do a review.
jpayne@68 448 </p>
jpayne@68 449 <p>The &lsquo;<samp>p</samp>&rsquo; in &lsquo;<samp>pgettext</samp>&rsquo; stands for &ldquo;particular&rdquo;: <code>pgettext</code>
jpayne@68 450 fetches a particular translation of the <var>msgid</var>.
jpayne@68 451 </p>
jpayne@68 452 <a name="IDX1064"></a>
jpayne@68 453 <a name="IDX1065"></a>
jpayne@68 454 <table><tr><td>&nbsp;</td><td><pre class="example">const char *dpgettext (const char *domain_name,
jpayne@68 455 const char *msgctxt, const char *msgid);
jpayne@68 456 const char *dcpgettext (const char *domain_name,
jpayne@68 457 const char *msgctxt, const char *msgid,
jpayne@68 458 int category);
jpayne@68 459 </pre></td></tr></table>
jpayne@68 460
jpayne@68 461 <p>These are generalizations of <code>pgettext</code>. They behave similarly to
jpayne@68 462 <code>dgettext</code> and <code>dcgettext</code>, respectively. The <var>domain_name</var>
jpayne@68 463 argument defines the translation domain. The <var>category</var> argument
jpayne@68 464 allows to use another locale category than <code>LC_MESSAGES</code>.
jpayne@68 465 </p>
jpayne@68 466 <p>As as example consider the following fictional situation. A GUI program
jpayne@68 467 has a menu bar with the following entries:
jpayne@68 468 </p>
jpayne@68 469 <table><tr><td>&nbsp;</td><td><pre class="smallexample">+------------+------------+--------------------------------------+
jpayne@68 470 | File | Printer | |
jpayne@68 471 +------------+------------+--------------------------------------+
jpayne@68 472 | Open | | Select |
jpayne@68 473 | New | | Open |
jpayne@68 474 +----------+ | Connect |
jpayne@68 475 +----------+
jpayne@68 476 </pre></td></tr></table>
jpayne@68 477
jpayne@68 478 <p>To have the strings <code>File</code>, <code>Printer</code>, <code>Open</code>,
jpayne@68 479 <code>New</code>, <code>Select</code>, and <code>Connect</code> translated there has to be
jpayne@68 480 at some point in the code a call to a function of the <code>gettext</code>
jpayne@68 481 family. But in two places the string passed into the function would be
jpayne@68 482 <code>Open</code>. The translations might not be the same and therefore we
jpayne@68 483 are in the dilemma described above.
jpayne@68 484 </p>
jpayne@68 485 <p>What distinguishes the two places is the menu path from the menu root to
jpayne@68 486 the particular menu entries:
jpayne@68 487 </p>
jpayne@68 488 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Menu|File
jpayne@68 489 Menu|Printer
jpayne@68 490 Menu|File|Open
jpayne@68 491 Menu|File|New
jpayne@68 492 Menu|Printer|Select
jpayne@68 493 Menu|Printer|Open
jpayne@68 494 Menu|Printer|Connect
jpayne@68 495 </pre></td></tr></table>
jpayne@68 496
jpayne@68 497 <p>The context is thus the menu path without its last part. So, the calls
jpayne@68 498 look like this:
jpayne@68 499 </p>
jpayne@68 500 <table><tr><td>&nbsp;</td><td><pre class="smallexample">pgettext (&quot;Menu|&quot;, &quot;File&quot;)
jpayne@68 501 pgettext (&quot;Menu|&quot;, &quot;Printer&quot;)
jpayne@68 502 pgettext (&quot;Menu|File|&quot;, &quot;Open&quot;)
jpayne@68 503 pgettext (&quot;Menu|File|&quot;, &quot;New&quot;)
jpayne@68 504 pgettext (&quot;Menu|Printer|&quot;, &quot;Select&quot;)
jpayne@68 505 pgettext (&quot;Menu|Printer|&quot;, &quot;Open&quot;)
jpayne@68 506 pgettext (&quot;Menu|Printer|&quot;, &quot;Connect&quot;)
jpayne@68 507 </pre></td></tr></table>
jpayne@68 508
jpayne@68 509 <p>Whether or not to use the &lsquo;<samp>|</samp>&rsquo; character at the end of the context is a
jpayne@68 510 matter of style.
jpayne@68 511 </p>
jpayne@68 512 <p>For more complex cases, where the <var>msgctxt</var> or <var>msgid</var> are not
jpayne@68 513 string literals, more general macros are available:
jpayne@68 514 </p>
jpayne@68 515 <a name="IDX1066"></a>
jpayne@68 516 <a name="IDX1067"></a>
jpayne@68 517 <a name="IDX1068"></a>
jpayne@68 518 <table><tr><td>&nbsp;</td><td><pre class="example">const char *pgettext_expr (const char *msgctxt, const char *msgid);
jpayne@68 519 const char *dpgettext_expr (const char *domain_name,
jpayne@68 520 const char *msgctxt, const char *msgid);
jpayne@68 521 const char *dcpgettext_expr (const char *domain_name,
jpayne@68 522 const char *msgctxt, const char *msgid,
jpayne@68 523 int category);
jpayne@68 524 </pre></td></tr></table>
jpayne@68 525
jpayne@68 526 <p>Here <var>msgctxt</var> and <var>msgid</var> can be arbitrary string-valued expressions.
jpayne@68 527 These macros are more general. But in the case that both argument expressions
jpayne@68 528 are string literals, the macros without the &lsquo;<samp>_expr</samp>&rsquo; suffix are more
jpayne@68 529 efficient.
jpayne@68 530 </p>
jpayne@68 531
jpayne@68 532 <a name="Plural-forms"></a>
jpayne@68 533 <a name="SEC207"></a>
jpayne@68 534 <h3 class="subsection"> <a href="gettext_toc.html#TOC200">11.2.6 Additional functions for plural forms</a> </h3>
jpayne@68 535
jpayne@68 536 <p>The functions of the <code>gettext</code> family described so far (and all the
jpayne@68 537 <code>catgets</code> functions as well) have one problem in the real world
jpayne@68 538 which have been neglected completely in all existing approaches. What
jpayne@68 539 is meant here is the handling of plural forms.
jpayne@68 540 </p>
jpayne@68 541 <p>Looking through Unix source code before the time anybody thought about
jpayne@68 542 internationalization (and, sadly, even afterwards) one can often find
jpayne@68 543 code similar to the following:
jpayne@68 544 </p>
jpayne@68 545 <table><tr><td>&nbsp;</td><td><pre class="smallexample"> printf (&quot;%d file%s deleted&quot;, n, n == 1 ? &quot;&quot; : &quot;s&quot;);
jpayne@68 546 </pre></td></tr></table>
jpayne@68 547
jpayne@68 548 <p>After the first complaints from people internationalizing the code people
jpayne@68 549 either completely avoided formulations like this or used strings like
jpayne@68 550 <code>&quot;file(s)&quot;</code>. Both look unnatural and should be avoided. First
jpayne@68 551 tries to solve the problem correctly looked like this:
jpayne@68 552 </p>
jpayne@68 553 <table><tr><td>&nbsp;</td><td><pre class="smallexample"> if (n == 1)
jpayne@68 554 printf (&quot;%d file deleted&quot;, n);
jpayne@68 555 else
jpayne@68 556 printf (&quot;%d files deleted&quot;, n);
jpayne@68 557 </pre></td></tr></table>
jpayne@68 558
jpayne@68 559 <p>But this does not solve the problem. It helps languages where the
jpayne@68 560 plural form of a noun is not simply constructed by adding an
jpayne@68 561 ‘s’
jpayne@68 562 but that is all. Once again people fell into the trap of believing the
jpayne@68 563 rules their language is using are universal. But the handling of plural
jpayne@68 564 forms differs widely between the language families. For example,
jpayne@68 565 Rafal Maszkowski <code>&lt;rzm@mat.uni.torun.pl&gt;</code> reports:
jpayne@68 566 </p>
jpayne@68 567 <blockquote><p>In Polish we use e.g. plik (file) this way:
jpayne@68 568 </p><table><tr><td>&nbsp;</td><td><pre class="example">1 plik
jpayne@68 569 2,3,4 pliki
jpayne@68 570 5-21 pliko'w
jpayne@68 571 22-24 pliki
jpayne@68 572 25-31 pliko'w
jpayne@68 573 </pre></td></tr></table>
jpayne@68 574 <p>and so on (o' means 8859-2 oacute which should be rather okreska,
jpayne@68 575 similar to aogonek).
jpayne@68 576 </p></blockquote>
jpayne@68 577
jpayne@68 578 <p>There are two things which can differ between languages (and even inside
jpayne@68 579 language families);
jpayne@68 580 </p>
jpayne@68 581 <ul>
jpayne@68 582 <li>
jpayne@68 583 The form how plural forms are built differs. This is a problem with
jpayne@68 584 languages which have many irregularities. German, for instance, is a
jpayne@68 585 drastic case. Though English and German are part of the same language
jpayne@68 586 family (Germanic), the almost regular forming of plural noun forms
jpayne@68 587 (appending an
jpayne@68 588 ‘s’)
jpayne@68 589 is hardly found in German.
jpayne@68 590
jpayne@68 591 </li><li>
jpayne@68 592 The number of plural forms differ. This is somewhat surprising for
jpayne@68 593 those who only have experiences with Romanic and Germanic languages
jpayne@68 594 since here the number is the same (there are two).
jpayne@68 595
jpayne@68 596 <p>But other language families have only one form or many forms. More
jpayne@68 597 information on this in an extra section.
jpayne@68 598 </p></li></ul>
jpayne@68 599
jpayne@68 600 <p>The consequence of this is that application writers should not try to
jpayne@68 601 solve the problem in their code. This would be localization since it is
jpayne@68 602 only usable for certain, hardcoded language environments. Instead the
jpayne@68 603 extended <code>gettext</code> interface should be used.
jpayne@68 604 </p>
jpayne@68 605 <p>These extra functions are taking instead of the one key string two
jpayne@68 606 strings and a numerical argument. The idea behind this is that using
jpayne@68 607 the numerical argument and the first string as a key, the implementation
jpayne@68 608 can select using rules specified by the translator the right plural
jpayne@68 609 form. The two string arguments then will be used to provide a return
jpayne@68 610 value in case no message catalog is found (similar to the normal
jpayne@68 611 <code>gettext</code> behavior). In this case the rules for Germanic language
jpayne@68 612 is used and it is assumed that the first string argument is the singular
jpayne@68 613 form, the second the plural form.
jpayne@68 614 </p>
jpayne@68 615 <p>This has the consequence that programs without language catalogs can
jpayne@68 616 display the correct strings only if the program itself is written using
jpayne@68 617 a Germanic language. This is a limitation but since the GNU C library
jpayne@68 618 (as well as the GNU <code>gettext</code> package) are written as part of the
jpayne@68 619 GNU package and the coding standards for the GNU project require program
jpayne@68 620 being written in English, this solution nevertheless fulfills its
jpayne@68 621 purpose.
jpayne@68 622 </p>
jpayne@68 623 <dl>
jpayne@68 624 <dt><u>Function:</u> char * <b>ngettext</b><i> (const&nbsp;char&nbsp;*<var>msgid1</var>, const&nbsp;char&nbsp;*<var>msgid2</var>, unsigned&nbsp;long&nbsp;int&nbsp;<var>n</var>)</i>
jpayne@68 625 <a name="IDX1069"></a>
jpayne@68 626 </dt>
jpayne@68 627 <dd><p>The <code>ngettext</code> function is similar to the <code>gettext</code> function
jpayne@68 628 as it finds the message catalogs in the same way. But it takes two
jpayne@68 629 extra arguments. The <var>msgid1</var> parameter must contain the singular
jpayne@68 630 form of the string to be converted. It is also used as the key for the
jpayne@68 631 search in the catalog. The <var>msgid2</var> parameter is the plural form.
jpayne@68 632 The parameter <var>n</var> is used to determine the plural form. If no
jpayne@68 633 message catalog is found <var>msgid1</var> is returned if <code>n == 1</code>,
jpayne@68 634 otherwise <code>msgid2</code>.
jpayne@68 635 </p>
jpayne@68 636 <p>An example for the use of this function is:
jpayne@68 637 </p>
jpayne@68 638 <table><tr><td>&nbsp;</td><td><pre class="smallexample">printf (ngettext (&quot;%d file removed&quot;, &quot;%d files removed&quot;, n), n);
jpayne@68 639 </pre></td></tr></table>
jpayne@68 640
jpayne@68 641 <p>Please note that the numeric value <var>n</var> has to be passed to the
jpayne@68 642 <code>printf</code> function as well. It is not sufficient to pass it only to
jpayne@68 643 <code>ngettext</code>.
jpayne@68 644 </p>
jpayne@68 645 <p>In the English singular case, the number &ndash; always 1 &ndash; can be replaced with
jpayne@68 646 &quot;one&quot;:
jpayne@68 647 </p>
jpayne@68 648 <table><tr><td>&nbsp;</td><td><pre class="smallexample">printf (ngettext (&quot;One file removed&quot;, &quot;%d files removed&quot;, n), n);
jpayne@68 649 </pre></td></tr></table>
jpayne@68 650
jpayne@68 651 <p>This works because the &lsquo;<samp>printf</samp>&rsquo; function discards excess arguments that
jpayne@68 652 are not consumed by the format string.
jpayne@68 653 </p>
jpayne@68 654 <p>If this function is meant to yield a format string that takes two or more
jpayne@68 655 arguments, you can not use it like this:
jpayne@68 656 </p>
jpayne@68 657 <table><tr><td>&nbsp;</td><td><pre class="smallexample">printf (ngettext (&quot;%d file removed from directory %s&quot;,
jpayne@68 658 &quot;%d files removed from directory %s&quot;,
jpayne@68 659 n),
jpayne@68 660 n, dir);
jpayne@68 661 </pre></td></tr></table>
jpayne@68 662
jpayne@68 663 <p>because in many languages the translators want to replace the &lsquo;<samp>%d</samp>&rsquo;
jpayne@68 664 with an explicit word in the singular case, just like &ldquo;one&rdquo; in English,
jpayne@68 665 and C format strings cannot consume the second argument but skip the first
jpayne@68 666 argument. Instead, you have to reorder the arguments so that &lsquo;<samp>n</samp>&rsquo;
jpayne@68 667 comes last:
jpayne@68 668 </p>
jpayne@68 669 <table><tr><td>&nbsp;</td><td><pre class="smallexample">printf (ngettext (&quot;%2$d file removed from directory %1$s&quot;,
jpayne@68 670 &quot;%2$d files removed from directory %1$s&quot;,
jpayne@68 671 n),
jpayne@68 672 dir, n);
jpayne@68 673 </pre></td></tr></table>
jpayne@68 674
jpayne@68 675 <p>See <a href="gettext_15.html#SEC267">C Format Strings</a> for details about this argument reordering syntax.
jpayne@68 676 </p>
jpayne@68 677 <p>When you know that the value of <code>n</code> is within a given range, you can
jpayne@68 678 specify it as a comment directed to the <code>xgettext</code> tool. This
jpayne@68 679 information may help translators to use more adequate translations. Like
jpayne@68 680 this:
jpayne@68 681 </p>
jpayne@68 682 <table><tr><td>&nbsp;</td><td><pre class="smallexample">if (days &gt; 7 &amp;&amp; days &lt; 14)
jpayne@68 683 /* xgettext: range: 1..6 */
jpayne@68 684 printf (ngettext (&quot;one week and one day&quot;, &quot;one week and %d days&quot;,
jpayne@68 685 days - 7),
jpayne@68 686 days - 7);
jpayne@68 687 </pre></td></tr></table>
jpayne@68 688
jpayne@68 689 <p>It is also possible to use this function when the strings don't contain a
jpayne@68 690 cardinal number:
jpayne@68 691 </p>
jpayne@68 692 <table><tr><td>&nbsp;</td><td><pre class="smallexample">puts (ngettext (&quot;Delete the selected file?&quot;,
jpayne@68 693 &quot;Delete the selected files?&quot;,
jpayne@68 694 n));
jpayne@68 695 </pre></td></tr></table>
jpayne@68 696
jpayne@68 697 <p>In this case the number <var>n</var> is only used to choose the plural form.
jpayne@68 698 </p></dd></dl>
jpayne@68 699
jpayne@68 700 <dl>
jpayne@68 701 <dt><u>Function:</u> char * <b>dngettext</b><i> (const&nbsp;char&nbsp;*<var>domain</var>, const&nbsp;char&nbsp;*<var>msgid1</var>, const&nbsp;char&nbsp;*<var>msgid2</var>, unsigned&nbsp;long&nbsp;int&nbsp;<var>n</var>)</i>
jpayne@68 702 <a name="IDX1070"></a>
jpayne@68 703 </dt>
jpayne@68 704 <dd><p>The <code>dngettext</code> is similar to the <code>dgettext</code> function in the
jpayne@68 705 way the message catalog is selected. The difference is that it takes
jpayne@68 706 two extra parameter to provide the correct plural form. These two
jpayne@68 707 parameters are handled in the same way <code>ngettext</code> handles them.
jpayne@68 708 </p></dd></dl>
jpayne@68 709
jpayne@68 710 <dl>
jpayne@68 711 <dt><u>Function:</u> char * <b>dcngettext</b><i> (const&nbsp;char&nbsp;*<var>domain</var>, const&nbsp;char&nbsp;*<var>msgid1</var>, const&nbsp;char&nbsp;*<var>msgid2</var>, unsigned&nbsp;long&nbsp;int&nbsp;<var>n</var>, int&nbsp;<var>category</var>)</i>
jpayne@68 712 <a name="IDX1071"></a>
jpayne@68 713 </dt>
jpayne@68 714 <dd><p>The <code>dcngettext</code> is similar to the <code>dcgettext</code> function in the
jpayne@68 715 way the message catalog is selected. The difference is that it takes
jpayne@68 716 two extra parameter to provide the correct plural form. These two
jpayne@68 717 parameters are handled in the same way <code>ngettext</code> handles them.
jpayne@68 718 </p></dd></dl>
jpayne@68 719
jpayne@68 720 <p>Now, how do these functions solve the problem of the plural forms?
jpayne@68 721 Without the input of linguists (which was not available) it was not
jpayne@68 722 possible to determine whether there are only a few different forms in
jpayne@68 723 which plural forms are formed or whether the number can increase with
jpayne@68 724 every new supported language.
jpayne@68 725 </p>
jpayne@68 726 <p>Therefore the solution implemented is to allow the translator to specify
jpayne@68 727 the rules of how to select the plural form. Since the formula varies
jpayne@68 728 with every language this is the only viable solution except for
jpayne@68 729 hardcoding the information in the code (which still would require the
jpayne@68 730 possibility of extensions to not prevent the use of new languages).
jpayne@68 731 </p>
jpayne@68 732 <a name="IDX1072"></a>
jpayne@68 733 <a name="IDX1073"></a>
jpayne@68 734 <a name="IDX1074"></a>
jpayne@68 735 <p>The information about the plural form selection has to be stored in the
jpayne@68 736 header entry of the PO file (the one with the empty <code>msgid</code> string).
jpayne@68 737 The plural form information looks like this:
jpayne@68 738 </p>
jpayne@68 739 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
jpayne@68 740 </pre></td></tr></table>
jpayne@68 741
jpayne@68 742 <p>The <code>nplurals</code> value must be a decimal number which specifies how
jpayne@68 743 many different plural forms exist for this language. The string
jpayne@68 744 following <code>plural</code> is an expression which is using the C language
jpayne@68 745 syntax. Exceptions are that no negative numbers are allowed, numbers
jpayne@68 746 must be decimal, and the only variable allowed is <code>n</code>. Spaces are
jpayne@68 747 allowed in the expression, but backslash-newlines are not; in the
jpayne@68 748 examples below the backslash-newlines are present for formatting purposes
jpayne@68 749 only. This expression will be evaluated whenever one of the functions
jpayne@68 750 <code>ngettext</code>, <code>dngettext</code>, or <code>dcngettext</code> is called. The
jpayne@68 751 numeric value passed to these functions is then substituted for all uses
jpayne@68 752 of the variable <code>n</code> in the expression. The resulting value then
jpayne@68 753 must be greater or equal to zero and smaller than the value given as the
jpayne@68 754 value of <code>nplurals</code>.
jpayne@68 755 </p>
jpayne@68 756 <a name="IDX1075"></a>
jpayne@68 757 <p>The following rules are known at this point. The language with families
jpayne@68 758 are listed. But this does not necessarily mean the information can be
jpayne@68 759 generalized for the whole family (as can be easily seen in the table
jpayne@68 760 below).<a name="DOCF5" href="gettext_fot.html#FOOT5">(5)</a>
jpayne@68 761 </p>
jpayne@68 762 <dl compact="compact">
jpayne@68 763 <dt> Only one form:</dt>
jpayne@68 764 <dd><p>Some languages only require one single form. There is no distinction
jpayne@68 765 between the singular and plural form. An appropriate header entry
jpayne@68 766 would look like this:
jpayne@68 767 </p>
jpayne@68 768 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=1; plural=0;
jpayne@68 769 </pre></td></tr></table>
jpayne@68 770
jpayne@68 771 <p>Languages with this property include:
jpayne@68 772 </p>
jpayne@68 773 <dl compact="compact">
jpayne@68 774 <dt> Asian family</dt>
jpayne@68 775 <dd><p>Japanese, Vietnamese, Korean </p></dd>
jpayne@68 776 <dt> Tai-Kadai family</dt>
jpayne@68 777 <dd><p>Thai </p></dd>
jpayne@68 778 </dl>
jpayne@68 779
jpayne@68 780 </dd>
jpayne@68 781 <dt> Two forms, singular used for one only</dt>
jpayne@68 782 <dd><p>This is the form used in most existing programs since it is what English
jpayne@68 783 is using. A header entry would look like this:
jpayne@68 784 </p>
jpayne@68 785 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=2; plural=n != 1;
jpayne@68 786 </pre></td></tr></table>
jpayne@68 787
jpayne@68 788 <p>(Note: this uses the feature of C expressions that boolean expressions
jpayne@68 789 have to value zero or one.)
jpayne@68 790 </p>
jpayne@68 791 <p>Languages with this property include:
jpayne@68 792 </p>
jpayne@68 793 <dl compact="compact">
jpayne@68 794 <dt> Germanic family</dt>
jpayne@68 795 <dd><p>English, German, Dutch, Swedish, Danish, Norwegian, Faroese </p></dd>
jpayne@68 796 <dt> Romanic family</dt>
jpayne@68 797 <dd><p>Spanish, Portuguese, Italian </p></dd>
jpayne@68 798 <dt> Latin/Greek family</dt>
jpayne@68 799 <dd><p>Greek </p></dd>
jpayne@68 800 <dt> Slavic family</dt>
jpayne@68 801 <dd><p>Bulgarian </p></dd>
jpayne@68 802 <dt> Finno-Ugric family</dt>
jpayne@68 803 <dd><p>Finnish, Estonian </p></dd>
jpayne@68 804 <dt> Semitic family</dt>
jpayne@68 805 <dd><p>Hebrew </p></dd>
jpayne@68 806 <dt> Austronesian family</dt>
jpayne@68 807 <dd><p>Bahasa Indonesian </p></dd>
jpayne@68 808 <dt> Artificial</dt>
jpayne@68 809 <dd><p>Esperanto </p></dd>
jpayne@68 810 </dl>
jpayne@68 811
jpayne@68 812 <p>Other languages using the same header entry are:
jpayne@68 813 </p>
jpayne@68 814 <dl compact="compact">
jpayne@68 815 <dt> Finno-Ugric family</dt>
jpayne@68 816 <dd><p>Hungarian </p></dd>
jpayne@68 817 <dt> Turkic/Altaic family</dt>
jpayne@68 818 <dd><p>Turkish </p></dd>
jpayne@68 819 </dl>
jpayne@68 820
jpayne@68 821 <p>Hungarian does not appear to have a plural if you look at sentences involving
jpayne@68 822 cardinal numbers. For example, &ldquo;1 apple&rdquo; is &ldquo;1 alma&rdquo;, and &ldquo;123 apples&rdquo; is
jpayne@68 823 &ldquo;123 alma&rdquo;. But when the number is not explicit, the distinction between
jpayne@68 824 singular and plural exists: &ldquo;the apple&rdquo; is &ldquo;az alma&rdquo;, and &ldquo;the apples&rdquo; is
jpayne@68 825 &ldquo;az alm&aacute;k&rdquo;. Since <code>ngettext</code> has to support both types of sentences,
jpayne@68 826 it is classified here, under &ldquo;two forms&rdquo;.
jpayne@68 827 </p>
jpayne@68 828 <p>The same holds for Turkish: &ldquo;1 apple&rdquo; is &ldquo;1 elma&rdquo;, and &ldquo;123 apples&rdquo; is
jpayne@68 829 &ldquo;123 elma&rdquo;. But when the number is omitted, the distinction between singular
jpayne@68 830 and plural exists: &ldquo;the apple&rdquo; is &ldquo;elma&rdquo;, and &ldquo;the apples&rdquo; is
jpayne@68 831 &ldquo;elmalar&rdquo;.
jpayne@68 832 </p>
jpayne@68 833 </dd>
jpayne@68 834 <dt> Two forms, singular used for zero and one</dt>
jpayne@68 835 <dd><p>Exceptional case in the language family. The header entry would be:
jpayne@68 836 </p>
jpayne@68 837 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=2; plural=n&gt;1;
jpayne@68 838 </pre></td></tr></table>
jpayne@68 839
jpayne@68 840 <p>Languages with this property include:
jpayne@68 841 </p>
jpayne@68 842 <dl compact="compact">
jpayne@68 843 <dt> Romanic family</dt>
jpayne@68 844 <dd><p>Brazilian Portuguese, French </p></dd>
jpayne@68 845 </dl>
jpayne@68 846
jpayne@68 847 </dd>
jpayne@68 848 <dt> Three forms, special case for zero</dt>
jpayne@68 849 <dd><p>The header entry would be:
jpayne@68 850 </p>
jpayne@68 851 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=3; plural=n%10==1 &amp;&amp; n%100!=11 ? 0 : n != 0 ? 1 : 2;
jpayne@68 852 </pre></td></tr></table>
jpayne@68 853
jpayne@68 854 <p>Languages with this property include:
jpayne@68 855 </p>
jpayne@68 856 <dl compact="compact">
jpayne@68 857 <dt> Baltic family</dt>
jpayne@68 858 <dd><p>Latvian </p></dd>
jpayne@68 859 </dl>
jpayne@68 860
jpayne@68 861 </dd>
jpayne@68 862 <dt> Three forms, special cases for one and two</dt>
jpayne@68 863 <dd><p>The header entry would be:
jpayne@68 864 </p>
jpayne@68 865 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
jpayne@68 866 </pre></td></tr></table>
jpayne@68 867
jpayne@68 868 <p>Languages with this property include:
jpayne@68 869 </p>
jpayne@68 870 <dl compact="compact">
jpayne@68 871 <dt> Celtic</dt>
jpayne@68 872 <dd><p>Gaeilge (Irish) </p></dd>
jpayne@68 873 </dl>
jpayne@68 874
jpayne@68 875 </dd>
jpayne@68 876 <dt> Three forms, special case for numbers ending in 00 or [2-9][0-9]</dt>
jpayne@68 877 <dd><p>The header entry would be:
jpayne@68 878 </p>
jpayne@68 879 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=3; \
jpayne@68 880 plural=n==1 ? 0 : (n==0 || (n%100 &gt; 0 &amp;&amp; n%100 &lt; 20)) ? 1 : 2;
jpayne@68 881 </pre></td></tr></table>
jpayne@68 882
jpayne@68 883 <p>Languages with this property include:
jpayne@68 884 </p>
jpayne@68 885 <dl compact="compact">
jpayne@68 886 <dt> Romanic family</dt>
jpayne@68 887 <dd><p>Romanian </p></dd>
jpayne@68 888 </dl>
jpayne@68 889
jpayne@68 890 </dd>
jpayne@68 891 <dt> Three forms, special case for numbers ending in 1[2-9]</dt>
jpayne@68 892 <dd><p>The header entry would look like this:
jpayne@68 893 </p>
jpayne@68 894 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=3; \
jpayne@68 895 plural=n%10==1 &amp;&amp; n%100!=11 ? 0 : \
jpayne@68 896 n%10&gt;=2 &amp;&amp; (n%100&lt;10 || n%100&gt;=20) ? 1 : 2;
jpayne@68 897 </pre></td></tr></table>
jpayne@68 898
jpayne@68 899 <p>Languages with this property include:
jpayne@68 900 </p>
jpayne@68 901 <dl compact="compact">
jpayne@68 902 <dt> Baltic family</dt>
jpayne@68 903 <dd><p>Lithuanian </p></dd>
jpayne@68 904 </dl>
jpayne@68 905
jpayne@68 906 </dd>
jpayne@68 907 <dt> Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]</dt>
jpayne@68 908 <dd><p>The header entry would look like this:
jpayne@68 909 </p>
jpayne@68 910 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=3; \
jpayne@68 911 plural=n%10==1 &amp;&amp; n%100!=11 ? 0 : \
jpayne@68 912 n%10&gt;=2 &amp;&amp; n%10&lt;=4 &amp;&amp; (n%100&lt;10 || n%100&gt;=20) ? 1 : 2;
jpayne@68 913 </pre></td></tr></table>
jpayne@68 914
jpayne@68 915 <p>Languages with this property include:
jpayne@68 916 </p>
jpayne@68 917 <dl compact="compact">
jpayne@68 918 <dt> Slavic family</dt>
jpayne@68 919 <dd><p>Russian, Ukrainian, Belarusian, Serbian, Croatian </p></dd>
jpayne@68 920 </dl>
jpayne@68 921
jpayne@68 922 </dd>
jpayne@68 923 <dt> Three forms, special cases for 1 and 2, 3, 4</dt>
jpayne@68 924 <dd><p>The header entry would look like this:
jpayne@68 925 </p>
jpayne@68 926 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=3; \
jpayne@68 927 plural=(n==1) ? 0 : (n&gt;=2 &amp;&amp; n&lt;=4) ? 1 : 2;
jpayne@68 928 </pre></td></tr></table>
jpayne@68 929
jpayne@68 930 <p>Languages with this property include:
jpayne@68 931 </p>
jpayne@68 932 <dl compact="compact">
jpayne@68 933 <dt> Slavic family</dt>
jpayne@68 934 <dd><p>Czech, Slovak </p></dd>
jpayne@68 935 </dl>
jpayne@68 936
jpayne@68 937 </dd>
jpayne@68 938 <dt> Three forms, special case for one and some numbers ending in 2, 3, or 4</dt>
jpayne@68 939 <dd><p>The header entry would look like this:
jpayne@68 940 </p>
jpayne@68 941 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=3; \
jpayne@68 942 plural=n==1 ? 0 : \
jpayne@68 943 n%10&gt;=2 &amp;&amp; n%10&lt;=4 &amp;&amp; (n%100&lt;10 || n%100&gt;=20) ? 1 : 2;
jpayne@68 944 </pre></td></tr></table>
jpayne@68 945
jpayne@68 946 <p>Languages with this property include:
jpayne@68 947 </p>
jpayne@68 948 <dl compact="compact">
jpayne@68 949 <dt> Slavic family</dt>
jpayne@68 950 <dd><p>Polish </p></dd>
jpayne@68 951 </dl>
jpayne@68 952
jpayne@68 953 </dd>
jpayne@68 954 <dt> Four forms, special case for one and all numbers ending in 02, 03, or 04</dt>
jpayne@68 955 <dd><p>The header entry would look like this:
jpayne@68 956 </p>
jpayne@68 957 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=4; \
jpayne@68 958 plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
jpayne@68 959 </pre></td></tr></table>
jpayne@68 960
jpayne@68 961 <p>Languages with this property include:
jpayne@68 962 </p>
jpayne@68 963 <dl compact="compact">
jpayne@68 964 <dt> Slavic family</dt>
jpayne@68 965 <dd><p>Slovenian </p></dd>
jpayne@68 966 </dl>
jpayne@68 967
jpayne@68 968 </dd>
jpayne@68 969 <dt> Six forms, special cases for one, two, all numbers ending in 02, 03, &hellip; 10, all numbers ending in 11 &hellip; 99, and others</dt>
jpayne@68 970 <dd><p>The header entry would look like this:
jpayne@68 971 </p>
jpayne@68 972 <table><tr><td>&nbsp;</td><td><pre class="smallexample">Plural-Forms: nplurals=6; \
jpayne@68 973 plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100&gt;=3 &amp;&amp; n%100&lt;=10 ? 3 \
jpayne@68 974 : n%100&gt;=11 ? 4 : 5;
jpayne@68 975 </pre></td></tr></table>
jpayne@68 976
jpayne@68 977 <p>Languages with this property include:
jpayne@68 978 </p>
jpayne@68 979 <dl compact="compact">
jpayne@68 980 <dt> Afroasiatic family</dt>
jpayne@68 981 <dd><p>Arabic </p></dd>
jpayne@68 982 </dl>
jpayne@68 983 </dd>
jpayne@68 984 </dl>
jpayne@68 985
jpayne@68 986 <p>You might now ask, <code>ngettext</code> handles only numbers <var>n</var> of type
jpayne@68 987 &lsquo;<samp>unsigned long</samp>&rsquo;. What about larger integer types? What about negative
jpayne@68 988 numbers? What about floating-point numbers?
jpayne@68 989 </p>
jpayne@68 990 <p>About larger integer types, such as &lsquo;<samp>uintmax_t</samp>&rsquo; or
jpayne@68 991 &lsquo;<samp>unsigned long long</samp>&rsquo;: they can be handled by reducing the value to a
jpayne@68 992 range that fits in an &lsquo;<samp>unsigned long</samp>&rsquo;. Simply casting the value to
jpayne@68 993 &lsquo;<samp>unsigned long</samp>&rsquo; would not do the right thing, since it would treat
jpayne@68 994 <code>ULONG_MAX + 1</code> like zero, <code>ULONG_MAX + 2</code> like singular, and
jpayne@68 995 the like. Here you can exploit the fact that all mentioned plural form
jpayne@68 996 formulas eventually become periodic, with a period that is a divisor of 100
jpayne@68 997 (or 1000 or 1000000). So, when you reduce a large value to another one in
jpayne@68 998 the range [1000000, 1999999] that ends in the same 6 decimal digits, you
jpayne@68 999 can assume that it will lead to the same plural form selection. This code
jpayne@68 1000 does this:
jpayne@68 1001 </p>
jpayne@68 1002 <table><tr><td>&nbsp;</td><td><pre class="smallexample">#include &lt;inttypes.h&gt;
jpayne@68 1003 uintmax_t nbytes = ...;
jpayne@68 1004 printf (ngettext (&quot;The file has %&quot;PRIuMAX&quot; byte.&quot;,
jpayne@68 1005 &quot;The file has %&quot;PRIuMAX&quot; bytes.&quot;,
jpayne@68 1006 (nbytes &gt; ULONG_MAX
jpayne@68 1007 ? (nbytes % 1000000) + 1000000
jpayne@68 1008 : nbytes)),
jpayne@68 1009 nbytes);
jpayne@68 1010 </pre></td></tr></table>
jpayne@68 1011
jpayne@68 1012 <p>Negative and floating-point values usually represent physical entities for
jpayne@68 1013 which singular and plural don't clearly apply. In such cases, there is no
jpayne@68 1014 need to use <code>ngettext</code>; a simple <code>gettext</code> call with a form suitable
jpayne@68 1015 for all values will do. For example:
jpayne@68 1016 </p>
jpayne@68 1017 <table><tr><td>&nbsp;</td><td><pre class="smallexample">printf (gettext (&quot;Time elapsed: %.3f seconds&quot;),
jpayne@68 1018 num_milliseconds * 0.001);
jpayne@68 1019 </pre></td></tr></table>
jpayne@68 1020
jpayne@68 1021 <p>Even if <var>num_milliseconds</var> happens to be a multiple of 1000, the output
jpayne@68 1022 </p><table><tr><td>&nbsp;</td><td><pre class="smallexample">Time elapsed: 1.000 seconds
jpayne@68 1023 </pre></td></tr></table>
jpayne@68 1024 <p>is acceptable in English, and similarly for other languages.
jpayne@68 1025 </p>
jpayne@68 1026 <p>The translators' perspective regarding plural forms is explained in
jpayne@68 1027 <a href="gettext_12.html#SEC228">Translating plural forms</a>.
jpayne@68 1028 </p>
jpayne@68 1029
jpayne@68 1030 <a name="Optimized-gettext"></a>
jpayne@68 1031 <a name="SEC208"></a>
jpayne@68 1032 <h3 class="subsection"> <a href="gettext_toc.html#TOC201">11.2.7 Optimization of the *gettext functions</a> </h3>
jpayne@68 1033
jpayne@68 1034 <p>At this point of the discussion we should talk about an advantage of the
jpayne@68 1035 GNU <code>gettext</code> implementation. Some readers might have pointed out
jpayne@68 1036 that an internationalized program might have a poor performance if some
jpayne@68 1037 string has to be translated in an inner loop. While this is unavoidable
jpayne@68 1038 when the string varies from one run of the loop to the other it is
jpayne@68 1039 simply a waste of time when the string is always the same. Take the
jpayne@68 1040 following example:
jpayne@68 1041 </p>
jpayne@68 1042 <table><tr><td>&nbsp;</td><td><pre class="example">{
jpayne@68 1043 while (&hellip;)
jpayne@68 1044 {
jpayne@68 1045 puts (gettext (&quot;Hello world&quot;));
jpayne@68 1046 }
jpayne@68 1047 }
jpayne@68 1048 </pre></td></tr></table>
jpayne@68 1049
jpayne@68 1050 <p>When the locale selection does not change between two runs the resulting
jpayne@68 1051 string is always the same. One way to use this is:
jpayne@68 1052 </p>
jpayne@68 1053 <table><tr><td>&nbsp;</td><td><pre class="example">{
jpayne@68 1054 str = gettext (&quot;Hello world&quot;);
jpayne@68 1055 while (&hellip;)
jpayne@68 1056 {
jpayne@68 1057 puts (str);
jpayne@68 1058 }
jpayne@68 1059 }
jpayne@68 1060 </pre></td></tr></table>
jpayne@68 1061
jpayne@68 1062 <p>But this solution is not usable in all situation (e.g. when the locale
jpayne@68 1063 selection changes) nor does it lead to legible code.
jpayne@68 1064 </p>
jpayne@68 1065 <p>For this reason, GNU <code>gettext</code> caches previous translation results.
jpayne@68 1066 When the same translation is requested twice, with no new message
jpayne@68 1067 catalogs being loaded in between, <code>gettext</code> will, the second time,
jpayne@68 1068 find the result through a single cache lookup.
jpayne@68 1069 </p>
jpayne@68 1070
jpayne@68 1071 <a name="Comparison"></a>
jpayne@68 1072 <a name="SEC209"></a>
jpayne@68 1073 <h2 class="section"> <a href="gettext_toc.html#TOC202">11.3 Comparing the Two Interfaces</a> </h2>
jpayne@68 1074
jpayne@68 1075
jpayne@68 1076 <p>The following discussion is perhaps a little bit colored. As said
jpayne@68 1077 above we implemented GNU <code>gettext</code> following the Uniforum
jpayne@68 1078 proposal and this surely has its reasons. But it should show how we
jpayne@68 1079 came to this decision.
jpayne@68 1080 </p>
jpayne@68 1081 <p>First we take a look at the developing process. When we write an
jpayne@68 1082 application using NLS provided by <code>gettext</code> we proceed as always.
jpayne@68 1083 Only when we come to a string which might be seen by the users and thus
jpayne@68 1084 has to be translated we use <code>gettext(&quot;&hellip;&quot;)</code> instead of
jpayne@68 1085 <code>&quot;&hellip;&quot;</code>. At the beginning of each source file (or in a central
jpayne@68 1086 header file) we define
jpayne@68 1087 </p>
jpayne@68 1088 <table><tr><td>&nbsp;</td><td><pre class="example">#define gettext(String) (String)
jpayne@68 1089 </pre></td></tr></table>
jpayne@68 1090
jpayne@68 1091 <p>Even this definition can be avoided when the system supports the
jpayne@68 1092 <code>gettext</code> function in its C library. When we compile this code the
jpayne@68 1093 result is the same as if no NLS code is used. When you take a look at
jpayne@68 1094 the GNU <code>gettext</code> code you will see that we use <code>_(&quot;&hellip;&quot;)</code>
jpayne@68 1095 instead of <code>gettext(&quot;&hellip;&quot;)</code>. This reduces the number of
jpayne@68 1096 additional characters per translatable string to <em>3</em> (in words:
jpayne@68 1097 three).
jpayne@68 1098 </p>
jpayne@68 1099 <p>When now a production version of the program is needed we simply replace
jpayne@68 1100 the definition
jpayne@68 1101 </p>
jpayne@68 1102 <table><tr><td>&nbsp;</td><td><pre class="example">#define _(String) (String)
jpayne@68 1103 </pre></td></tr></table>
jpayne@68 1104
jpayne@68 1105 <p>by
jpayne@68 1106 </p>
jpayne@68 1107 <a name="IDX1076"></a>
jpayne@68 1108 <table><tr><td>&nbsp;</td><td><pre class="example">#include &lt;libintl.h&gt;
jpayne@68 1109 #define _(String) gettext (String)
jpayne@68 1110 </pre></td></tr></table>
jpayne@68 1111
jpayne@68 1112 <p>Additionally we run the program &lsquo;<tt>xgettext</tt>&rsquo; on all source code file
jpayne@68 1113 which contain translatable strings and that's it: we have a running
jpayne@68 1114 program which does not depend on translations to be available, but which
jpayne@68 1115 can use any that becomes available.
jpayne@68 1116 </p>
jpayne@68 1117 <a name="IDX1077"></a>
jpayne@68 1118 <p>The same procedure can be done for the <code>gettext_noop</code> invocations
jpayne@68 1119 (see section <a href="gettext_4.html#SEC31">Special Cases of Translatable Strings</a>). One usually defines <code>gettext_noop</code> as a
jpayne@68 1120 no-op macro. So you should consider the following code for your project:
jpayne@68 1121 </p>
jpayne@68 1122 <table><tr><td>&nbsp;</td><td><pre class="example">#define gettext_noop(String) String
jpayne@68 1123 #define N_(String) gettext_noop (String)
jpayne@68 1124 </pre></td></tr></table>
jpayne@68 1125
jpayne@68 1126 <p><code>N_</code> is a short form similar to <code>_</code>. The &lsquo;<tt>Makefile</tt>&rsquo; in
jpayne@68 1127 the &lsquo;<tt>po/</tt>&rsquo; directory of GNU <code>gettext</code> knows by default both of the
jpayne@68 1128 mentioned short forms so you are invited to follow this proposal for
jpayne@68 1129 your own ease.
jpayne@68 1130 </p>
jpayne@68 1131 <p>Now to <code>catgets</code>. The main problem is the work for the
jpayne@68 1132 programmer. Every time he comes to a translatable string he has to
jpayne@68 1133 define a number (or a symbolic constant) which has also be defined in
jpayne@68 1134 the message catalog file. He also has to take care for duplicate
jpayne@68 1135 entries, duplicate message IDs etc. If he wants to have the same
jpayne@68 1136 quality in the message catalog as the GNU <code>gettext</code> program
jpayne@68 1137 provides he also has to put the descriptive comments for the strings and
jpayne@68 1138 the location in all source code files in the message catalog. This is
jpayne@68 1139 nearly a Mission: Impossible.
jpayne@68 1140 </p>
jpayne@68 1141 <p>But there are also some points people might call advantages speaking for
jpayne@68 1142 <code>catgets</code>. If you have a single word in a string and this string
jpayne@68 1143 is used in different contexts it is likely that in one or the other
jpayne@68 1144 language the word has different translations. Example:
jpayne@68 1145 </p>
jpayne@68 1146 <table><tr><td>&nbsp;</td><td><pre class="example">printf (&quot;%s: %d&quot;, gettext (&quot;number&quot;), number_of_errors)
jpayne@68 1147
jpayne@68 1148 printf (&quot;you should see %d %s&quot;, number_count,
jpayne@68 1149 number_count == 1 ? gettext (&quot;number&quot;) : gettext (&quot;numbers&quot;))
jpayne@68 1150 </pre></td></tr></table>
jpayne@68 1151
jpayne@68 1152 <p>Here we have to translate two times the string <code>&quot;number&quot;</code>. Even
jpayne@68 1153 if you do not speak a language beside English it might be possible to
jpayne@68 1154 recognize that the two words have a different meaning. In German the
jpayne@68 1155 first appearance has to be translated to <code>&quot;Anzahl&quot;</code> and the second
jpayne@68 1156 to <code>&quot;Zahl&quot;</code>.
jpayne@68 1157 </p>
jpayne@68 1158 <p>Now you can say that this example is really esoteric. And you are
jpayne@68 1159 right! This is exactly how we felt about this problem and decide that
jpayne@68 1160 it does not weight that much. The solution for the above problem could
jpayne@68 1161 be very easy:
jpayne@68 1162 </p>
jpayne@68 1163 <table><tr><td>&nbsp;</td><td><pre class="example">printf (&quot;%s %d&quot;, gettext (&quot;number:&quot;), number_of_errors)
jpayne@68 1164
jpayne@68 1165 printf (number_count == 1 ? gettext (&quot;you should see %d number&quot;)
jpayne@68 1166 : gettext (&quot;you should see %d numbers&quot;),
jpayne@68 1167 number_count)
jpayne@68 1168 </pre></td></tr></table>
jpayne@68 1169
jpayne@68 1170 <p>We believe that we can solve all conflicts with this method. If it is
jpayne@68 1171 difficult one can also consider changing one of the conflicting string a
jpayne@68 1172 little bit. But it is not impossible to overcome.
jpayne@68 1173 </p>
jpayne@68 1174 <p><code>catgets</code> allows same original entry to have different translations,
jpayne@68 1175 but <code>gettext</code> has another, scalable approach for solving ambiguities
jpayne@68 1176 of this kind: See section <a href="#SEC203">Solving Ambiguities</a>.
jpayne@68 1177 </p>
jpayne@68 1178
jpayne@68 1179 <a name="Using-libintl_002ea"></a>
jpayne@68 1180 <a name="SEC210"></a>
jpayne@68 1181 <h2 class="section"> <a href="gettext_toc.html#TOC203">11.4 Using libintl.a in own programs</a> </h2>
jpayne@68 1182
jpayne@68 1183 <p>Starting with version 0.9.4 the library <code>libintl.h</code> should be
jpayne@68 1184 self-contained. I.e., you can use it in your own programs without
jpayne@68 1185 providing additional functions. The &lsquo;<tt>Makefile</tt>&rsquo; will put the header
jpayne@68 1186 and the library in directories selected using the <code>$(prefix)</code>.
jpayne@68 1187 </p>
jpayne@68 1188
jpayne@68 1189 <a name="gettext-grok"></a>
jpayne@68 1190 <a name="SEC211"></a>
jpayne@68 1191 <h2 class="section"> <a href="gettext_toc.html#TOC204">11.5 Being a <code>gettext</code> grok</a> </h2>
jpayne@68 1192
jpayne@68 1193 <p><strong> NOTE: </strong> This documentation section is outdated and needs to be
jpayne@68 1194 revised.
jpayne@68 1195 </p>
jpayne@68 1196 <p>To fully exploit the functionality of the GNU <code>gettext</code> library it
jpayne@68 1197 is surely helpful to read the source code. But for those who don't want
jpayne@68 1198 to spend that much time in reading the (sometimes complicated) code here
jpayne@68 1199 is a list comments:
jpayne@68 1200 </p>
jpayne@68 1201 <ul>
jpayne@68 1202 <li> Changing the language at runtime
jpayne@68 1203 <a name="IDX1078"></a>
jpayne@68 1204
jpayne@68 1205 <p>For interactive programs it might be useful to offer a selection of the
jpayne@68 1206 used language at runtime. To understand how to do this one need to know
jpayne@68 1207 how the used language is determined while executing the <code>gettext</code>
jpayne@68 1208 function. The method which is presented here only works correctly
jpayne@68 1209 with the GNU implementation of the <code>gettext</code> functions.
jpayne@68 1210 </p>
jpayne@68 1211 <p>In the function <code>dcgettext</code> at every call the current setting of
jpayne@68 1212 the highest priority environment variable is determined and used.
jpayne@68 1213 Highest priority means here the following list with decreasing
jpayne@68 1214 priority:
jpayne@68 1215 </p>
jpayne@68 1216 <ol>
jpayne@68 1217 <li><a name="IDX1079"></a>
jpayne@68 1218 </li><li> <code>LANGUAGE</code>
jpayne@68 1219 <a name="IDX1080"></a>
jpayne@68 1220 </li><li> <code>LC_ALL</code>
jpayne@68 1221 <a name="IDX1081"></a>
jpayne@68 1222 <a name="IDX1082"></a>
jpayne@68 1223 <a name="IDX1083"></a>
jpayne@68 1224 <a name="IDX1084"></a>
jpayne@68 1225 <a name="IDX1085"></a>
jpayne@68 1226 <a name="IDX1086"></a>
jpayne@68 1227 </li><li> <code>LC_xxx</code>, according to selected locale category
jpayne@68 1228 <a name="IDX1087"></a>
jpayne@68 1229 </li><li> <code>LANG</code>
jpayne@68 1230 </li></ol>
jpayne@68 1231
jpayne@68 1232 <p>Afterwards the path is constructed using the found value and the
jpayne@68 1233 translation file is loaded if available.
jpayne@68 1234 </p>
jpayne@68 1235 <p>What happens now when the value for, say, <code>LANGUAGE</code> changes? According
jpayne@68 1236 to the process explained above the new value of this variable is found
jpayne@68 1237 as soon as the <code>dcgettext</code> function is called. But this also means
jpayne@68 1238 the (perhaps) different message catalog file is loaded. In other
jpayne@68 1239 words: the used language is changed.
jpayne@68 1240 </p>
jpayne@68 1241 <p>But there is one little hook. The code for gcc-2.7.0 and up provides
jpayne@68 1242 some optimization. This optimization normally prevents the calling of
jpayne@68 1243 the <code>dcgettext</code> function as long as no new catalog is loaded. But
jpayne@68 1244 if <code>dcgettext</code> is not called the program also cannot find the
jpayne@68 1245 <code>LANGUAGE</code> variable be changed (see section <a href="#SEC208">Optimization of the *gettext functions</a>). A
jpayne@68 1246 solution for this is very easy. Include the following code in the
jpayne@68 1247 language switching function.
jpayne@68 1248 </p>
jpayne@68 1249 <table><tr><td>&nbsp;</td><td><pre class="example"> /* Change language. */
jpayne@68 1250 setenv (&quot;LANGUAGE&quot;, &quot;fr&quot;, 1);
jpayne@68 1251
jpayne@68 1252 /* Make change known. */
jpayne@68 1253 {
jpayne@68 1254 extern int _nl_msg_cat_cntr;
jpayne@68 1255 ++_nl_msg_cat_cntr;
jpayne@68 1256 }
jpayne@68 1257 </pre></td></tr></table>
jpayne@68 1258
jpayne@68 1259 <a name="IDX1088"></a>
jpayne@68 1260 <p>The variable <code>_nl_msg_cat_cntr</code> is defined in &lsquo;<tt>loadmsgcat.c</tt>&rsquo;.
jpayne@68 1261 You don't need to know what this is for. But it can be used to detect
jpayne@68 1262 whether a <code>gettext</code> implementation is GNU gettext and not non-GNU
jpayne@68 1263 system's native gettext implementation.
jpayne@68 1264 </p>
jpayne@68 1265 </li></ul>
jpayne@68 1266
jpayne@68 1267
jpayne@68 1268 <a name="Temp-Programmers"></a>
jpayne@68 1269 <a name="SEC212"></a>
jpayne@68 1270 <h2 class="section"> <a href="gettext_toc.html#TOC205">11.6 Temporary Notes for the Programmers Chapter</a> </h2>
jpayne@68 1271
jpayne@68 1272 <p><strong> NOTE: </strong> This documentation section is outdated and needs to be
jpayne@68 1273 revised.
jpayne@68 1274 </p>
jpayne@68 1275
jpayne@68 1276
jpayne@68 1277 <a name="Temp-Implementations"></a>
jpayne@68 1278 <a name="SEC213"></a>
jpayne@68 1279 <h3 class="subsection"> <a href="gettext_toc.html#TOC206">11.6.1 Temporary - Two Possible Implementations</a> </h3>
jpayne@68 1280
jpayne@68 1281 <p>There are two competing methods for language independent messages:
jpayne@68 1282 the X/Open <code>catgets</code> method, and the Uniforum <code>gettext</code>
jpayne@68 1283 method. The <code>catgets</code> method indexes messages by integers; the
jpayne@68 1284 <code>gettext</code> method indexes them by their English translations.
jpayne@68 1285 The <code>catgets</code> method has been around longer and is supported
jpayne@68 1286 by more vendors. The <code>gettext</code> method is supported by Sun,
jpayne@68 1287 and it has been heard that the COSE multi-vendor initiative is
jpayne@68 1288 supporting it. Neither method is a POSIX standard; the POSIX.1
jpayne@68 1289 committee had a lot of disagreement in this area.
jpayne@68 1290 </p>
jpayne@68 1291 <p>Neither one is in the POSIX standard. There was much disagreement
jpayne@68 1292 in the POSIX.1 committee about using the <code>gettext</code> routines
jpayne@68 1293 vs. <code>catgets</code> (XPG). In the end the committee couldn't
jpayne@68 1294 agree on anything, so no messaging system was included as part
jpayne@68 1295 of the standard. I believe the informative annex of the standard
jpayne@68 1296 includes the XPG3 messaging interfaces, &ldquo;&hellip;as an example of
jpayne@68 1297 a messaging system that has been implemented&hellip;&rdquo;
jpayne@68 1298 </p>
jpayne@68 1299 <p>They were very careful not to say anywhere that you should use one
jpayne@68 1300 set of interfaces over the other. For more on this topic please
jpayne@68 1301 see the Programming for Internationalization FAQ.
jpayne@68 1302 </p>
jpayne@68 1303
jpayne@68 1304 <a name="Temp-catgets"></a>
jpayne@68 1305 <a name="SEC214"></a>
jpayne@68 1306 <h3 class="subsection"> <a href="gettext_toc.html#TOC207">11.6.2 Temporary - About <code>catgets</code></a> </h3>
jpayne@68 1307
jpayne@68 1308 <p>There have been a few discussions of late on the use of
jpayne@68 1309 <code>catgets</code> as a base. I think it important to present both
jpayne@68 1310 sides of the argument and hence am opting to play devil's advocate
jpayne@68 1311 for a little bit.
jpayne@68 1312 </p>
jpayne@68 1313 <p>I'll not deny the fact that <code>catgets</code> could have been designed
jpayne@68 1314 a lot better. It currently has quite a number of limitations and
jpayne@68 1315 these have already been pointed out.
jpayne@68 1316 </p>
jpayne@68 1317 <p>However there is a great deal to be said for consistency and
jpayne@68 1318 standardization. A common recurring problem when writing Unix
jpayne@68 1319 software is the myriad portability problems across Unix platforms.
jpayne@68 1320 It seems as if every Unix vendor had a look at the operating system
jpayne@68 1321 and found parts they could improve upon. Undoubtedly, these
jpayne@68 1322 modifications are probably innovative and solve real problems.
jpayne@68 1323 However, software developers have a hard time keeping up with all
jpayne@68 1324 these changes across so many platforms.
jpayne@68 1325 </p>
jpayne@68 1326 <p>And this has prompted the Unix vendors to begin to standardize their
jpayne@68 1327 systems. Hence the impetus for Spec1170. Every major Unix vendor
jpayne@68 1328 has committed to supporting this standard and every Unix software
jpayne@68 1329 developer waits with glee the day they can write software to this
jpayne@68 1330 standard and simply recompile (without having to use autoconf)
jpayne@68 1331 across different platforms.
jpayne@68 1332 </p>
jpayne@68 1333 <p>As I understand it, Spec1170 is roughly based upon version 4 of the
jpayne@68 1334 X/Open Portability Guidelines (XPG4). Because <code>catgets</code> and
jpayne@68 1335 friends are defined in XPG4, I'm led to believe that <code>catgets</code>
jpayne@68 1336 is a part of Spec1170 and hence will become a standardized component
jpayne@68 1337 of all Unix systems.
jpayne@68 1338 </p>
jpayne@68 1339
jpayne@68 1340 <a name="Temp-WSI"></a>
jpayne@68 1341 <a name="SEC215"></a>
jpayne@68 1342 <h3 class="subsection"> <a href="gettext_toc.html#TOC208">11.6.3 Temporary - Why a single implementation</a> </h3>
jpayne@68 1343
jpayne@68 1344 <p>Now it seems kind of wasteful to me to have two different systems
jpayne@68 1345 installed for accessing message catalogs. If we do want to remedy
jpayne@68 1346 <code>catgets</code> deficiencies why don't we try to expand <code>catgets</code>
jpayne@68 1347 (in a compatible manner) rather than implement an entirely new system.
jpayne@68 1348 Otherwise, we'll end up with two message catalog access systems installed
jpayne@68 1349 with an operating system - one set of routines for packages using GNU
jpayne@68 1350 <code>gettext</code> for their internationalization, and another set of routines
jpayne@68 1351 (catgets) for all other software. Bloated?
jpayne@68 1352 </p>
jpayne@68 1353 <p>Supposing another catalog access system is implemented. Which do
jpayne@68 1354 we recommend? At least for Linux, we need to attract as many
jpayne@68 1355 software developers as possible. Hence we need to make it as easy
jpayne@68 1356 for them to port their software as possible. Which means supporting
jpayne@68 1357 <code>catgets</code>. We will be implementing the <code>libintl</code> code
jpayne@68 1358 within our <code>libc</code>, but does this mean we also have to incorporate
jpayne@68 1359 another message catalog access scheme within our <code>libc</code> as well?
jpayne@68 1360 And what about people who are going to be using the <code>libintl</code>
jpayne@68 1361 + non-<code>catgets</code> routines. When they port their software to
jpayne@68 1362 other platforms, they're now going to have to include the front-end
jpayne@68 1363 (<code>libintl</code>) code plus the back-end code (the non-<code>catgets</code>
jpayne@68 1364 access routines) with their software instead of just including the
jpayne@68 1365 <code>libintl</code> code with their software.
jpayne@68 1366 </p>
jpayne@68 1367 <p>Message catalog support is however only the tip of the iceberg.
jpayne@68 1368 What about the data for the other locale categories? They also have
jpayne@68 1369 a number of deficiencies. Are we going to abandon them as well and
jpayne@68 1370 develop another duplicate set of routines (should <code>libintl</code>
jpayne@68 1371 expand beyond message catalog support)?
jpayne@68 1372 </p>
jpayne@68 1373 <p>Like many parts of Unix that can be improved upon, we're stuck with balancing
jpayne@68 1374 compatibility with the past with useful improvements and innovations for
jpayne@68 1375 the future.
jpayne@68 1376 </p>
jpayne@68 1377
jpayne@68 1378 <a name="Temp-Notes"></a>
jpayne@68 1379 <a name="SEC216"></a>
jpayne@68 1380 <h3 class="subsection"> <a href="gettext_toc.html#TOC209">11.6.4 Temporary - Notes</a> </h3>
jpayne@68 1381
jpayne@68 1382 <p>X/Open agreed very late on the standard form so that many
jpayne@68 1383 implementations differ from the final form. Both of my system (old
jpayne@68 1384 Linux catgets and Ultrix-4) have a strange variation.
jpayne@68 1385 </p>
jpayne@68 1386 <p>OK. After incorporating the last changes I have to spend some time on
jpayne@68 1387 making the GNU/Linux <code>libc</code> <code>gettext</code> functions. So in future
jpayne@68 1388 Solaris is not the only system having <code>gettext</code>.
jpayne@68 1389 </p>
jpayne@68 1390
jpayne@68 1391 <table cellpadding="1" cellspacing="1" border="0">
jpayne@68 1392 <tr><td valign="middle" align="left">[<a href="#SEC197" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
jpayne@68 1393 <td valign="middle" align="left">[<a href="gettext_12.html#SEC217" title="Next chapter"> &gt;&gt; </a>]</td>
jpayne@68 1394 <td valign="middle" align="left"> &nbsp; </td>
jpayne@68 1395 <td valign="middle" align="left"> &nbsp; </td>
jpayne@68 1396 <td valign="middle" align="left"> &nbsp; </td>
jpayne@68 1397 <td valign="middle" align="left"> &nbsp; </td>
jpayne@68 1398 <td valign="middle" align="left"> &nbsp; </td>
jpayne@68 1399 <td valign="middle" align="left">[<a href="gettext_toc.html#SEC_Top" title="Cover (top) of document">Top</a>]</td>
jpayne@68 1400 <td valign="middle" align="left">[<a href="gettext_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
jpayne@68 1401 <td valign="middle" align="left">[<a href="gettext_21.html#SEC389" title="Index">Index</a>]</td>
jpayne@68 1402 <td valign="middle" align="left">[<a href="gettext_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
jpayne@68 1403 </tr></table>
jpayne@68 1404 <p>
jpayne@68 1405 <font size="-1">
jpayne@68 1406 This document was generated by <em>Bruno Haible</em> on <em>February, 21 2024</em> using <a href="https://www.nongnu.org/texi2html/"><em>texi2html 1.78a</em></a>.
jpayne@68 1407 </font>
jpayne@68 1408 <br>
jpayne@68 1409
jpayne@68 1410 </p>
jpayne@68 1411 </body>
jpayne@68 1412 </html>