jpayne@68: jpayne@68: jpayne@68: jpayne@68: jpayne@68:
jpayne@68:[ << ] | jpayne@68:[ >> ] | jpayne@68:jpayne@68: | jpayne@68: | jpayne@68: | jpayne@68: | jpayne@68: | [Top] | jpayne@68:[Contents] | jpayne@68:[Index] | jpayne@68:[ ? ] | jpayne@68:
For the programmer, changes to the C source code fall into three
jpayne@68: categories. First, you have to make the localization functions
jpayne@68: known to all modules needing message translation. Second, you should
jpayne@68: properly trigger the operation of GNU gettext
when the program
jpayne@68: initializes, usually from the main
function. Last, you should
jpayne@68: identify, adjust and mark all constant strings in your program
jpayne@68: needing translation.
jpayne@68:
gettext
declaration Presuming that your set of programs, or package, has been adjusted
jpayne@68: so all needed GNU gettext
files are available, and your
jpayne@68: ‘Makefile’ files are adjusted (see section The Maintainer's View), each C module
jpayne@68: having translated C strings should contain the line:
jpayne@68:
#include <libintl.h> jpayne@68: |
Similarly, each C module containing printf()
/fprintf()
/...
jpayne@68: calls with a format string that could be a translated C string (even if
jpayne@68: the C string comes from a different C module) should contain the line:
jpayne@68:
#include <libintl.h> jpayne@68: |
gettext
Operations The initialization of locale data should be done with more or less jpayne@68: the same code in every program, as demonstrated below: jpayne@68:
jpayne@68:int jpayne@68: main (int argc, char *argv[]) jpayne@68: { jpayne@68: … jpayne@68: setlocale (LC_ALL, ""); jpayne@68: bindtextdomain (PACKAGE, LOCALEDIR); jpayne@68: textdomain (PACKAGE); jpayne@68: … jpayne@68: } jpayne@68: |
PACKAGE and LOCALEDIR should be provided either by
jpayne@68: ‘config.h’ or by the Makefile. For now consult the gettext
jpayne@68: or hello
sources for more information.
jpayne@68:
The use of LC_ALL
might not be appropriate for you.
jpayne@68: LC_ALL
includes all locale categories and especially
jpayne@68: LC_CTYPE
. This latter category is responsible for determining
jpayne@68: character classes with the isalnum
etc. functions from
jpayne@68: ‘ctype.h’ which could especially for programs, which process some
jpayne@68: kind of input language, be wrong. For example this would mean that a
jpayne@68: source code using the ç (c-cedilla character) is runnable in
jpayne@68: France but not in the U.S.
jpayne@68:
Some systems also have problems with parsing numbers using the
jpayne@68: scanf
functions if an other but the LC_ALL
locale category is
jpayne@68: used. The standards say that additional formats but the one known in the
jpayne@68: "C"
locale might be recognized. But some systems seem to reject
jpayne@68: numbers in the "C"
locale format. In some situation, it might
jpayne@68: also be a problem with the notation itself which makes it impossible to
jpayne@68: recognize whether the number is in the "C"
locale or the local
jpayne@68: format. This can happen if thousands separator characters are used.
jpayne@68: Some locales define this character according to the national
jpayne@68: conventions to '.'
which is the same character used in the
jpayne@68: "C"
locale to denote the decimal point.
jpayne@68:
So it is sometimes necessary to replace the LC_ALL
line in the
jpayne@68: code above by a sequence of setlocale
lines
jpayne@68:
{ jpayne@68: … jpayne@68: setlocale (LC_CTYPE, ""); jpayne@68: setlocale (LC_MESSAGES, ""); jpayne@68: … jpayne@68: } jpayne@68: |
On all POSIX conformant systems the locale categories LC_CTYPE
,
jpayne@68: LC_MESSAGES
, LC_COLLATE
, LC_MONETARY
,
jpayne@68: LC_NUMERIC
, and LC_TIME
are available. On some systems
jpayne@68: which are only ISO C compliant, LC_MESSAGES
is missing, but
jpayne@68: a substitute for it is defined in GNU gettext's <libintl.h>
and
jpayne@68: in GNU gnulib's <locale.h>
.
jpayne@68:
Note that changing the LC_CTYPE
also affects the functions
jpayne@68: declared in the <ctype.h>
standard header and some functions
jpayne@68: declared in the <string.h>
and <stdlib.h>
standard headers.
jpayne@68: If this is not
jpayne@68: desirable in your application (for example in a compiler's parser),
jpayne@68: you can use a set of substitute functions which hardwire the C locale,
jpayne@68: such as found in the modules ‘c-ctype’, ‘c-strcase’,
jpayne@68: ‘c-strcasestr’, ‘c-strtod’, ‘c-strtold’ in the GNU gnulib
jpayne@68: source distribution.
jpayne@68:
It is also possible to switch the locale forth and back between the
jpayne@68: environment dependent locale and the C locale, but this approach is
jpayne@68: normally avoided because a setlocale
call is expensive,
jpayne@68: because it is tedious to determine the places where a locale switch
jpayne@68: is needed in a large program's source, and because switching a locale
jpayne@68: is not multithread-safe.
jpayne@68:
Before strings can be marked for translations, they sometimes need to jpayne@68: be adjusted. Usually preparing a string for translation is done right jpayne@68: before marking it, during the marking phase which is described in the jpayne@68: next sections. What you have to keep in mind while doing that is the jpayne@68: following. jpayne@68:
jpayne@68:Let's look at some examples of these guidelines. jpayne@68:
jpayne@68: jpayne@68:Translatable strings should be in good English style. If slang language jpayne@68: with abbreviations and shortcuts is used, often translators will not jpayne@68: understand the message and will produce very inappropriate translations. jpayne@68:
jpayne@68:"%s: is parameter\n" jpayne@68: |
This is nearly untranslatable: Is the displayed item a parameter or jpayne@68: the parameter? jpayne@68:
jpayne@68:"No match" jpayne@68: |
The ambiguity in this message makes it unintelligible: Is the program jpayne@68: attempting to set something on fire? Does it mean "The given object does jpayne@68: not match the template"? Does it mean "The template does not fit for any jpayne@68: of the objects"? jpayne@68:
jpayne@68: jpayne@68:In both cases, adding more words to the message will help both the jpayne@68: translator and the English speaking user. jpayne@68:
jpayne@68: jpayne@68:Translatable strings should be entire sentences. It is often not possible jpayne@68: to translate single verbs or adjectives in a substitutable way. jpayne@68:
jpayne@68:printf ("File %s is %s protected", filename, rw ? "write" : "read"); jpayne@68: |
Most translators will not look at the source and will thus only see the
jpayne@68: string "File %s is %s protected"
, which is unintelligible. Change
jpayne@68: this to
jpayne@68:
printf (rw ? "File %s is write protected" : "File %s is read protected", jpayne@68: filename); jpayne@68: |
This way the translator will not only understand the message, she will jpayne@68: also be able to find the appropriate grammatical construction. A French jpayne@68: translator for example translates "write protected" like "protected jpayne@68: against writing". jpayne@68:
jpayne@68:Entire sentences are also important because in many languages, the jpayne@68: declination of some word in a sentence depends on the gender or the jpayne@68: number (singular/plural) of another part of the sentence. There are jpayne@68: usually more interdependencies between words than in English. The jpayne@68: consequence is that asking a translator to translate two half-sentences jpayne@68: and then combining these two half-sentences through dumb string concatenation jpayne@68: will not work, for many languages, even though it would work for English. jpayne@68: That's why translators need to handle entire sentences. jpayne@68:
jpayne@68:Often sentences don't fit into a single line. If a sentence is output
jpayne@68: using two subsequent printf
statements, like this
jpayne@68:
printf ("Locale charset \"%s\" is different from\n", lcharset); jpayne@68: printf ("input file charset \"%s\".\n", fcharset); jpayne@68: |
the translator would have to translate two half sentences, but nothing
jpayne@68: in the POT file would tell her that the two half sentences belong together.
jpayne@68: It is necessary to merge the two printf
statements so that the
jpayne@68: translator can handle the entire sentence at once and decide at which
jpayne@68: place to insert a line break in the translation (if at all):
jpayne@68:
printf ("Locale charset \"%s\" is different from\n\ jpayne@68: input file charset \"%s\".\n", lcharset, fcharset); jpayne@68: |
You may now ask: how about two or more adjacent sentences? Like in this case: jpayne@68:
jpayne@68:puts ("Apollo 13 scenario: Stack overflow handling failed."); jpayne@68: puts ("On the next stack overflow we will crash!!!"); jpayne@68: |
Should these two statements merged into a single one? I would recommend to jpayne@68: merge them if the two sentences are related to each other, because then it jpayne@68: makes it easier for the translator to understand and translate both. On jpayne@68: the other hand, if one of the two messages is a stereotypic one, occurring jpayne@68: in other places as well, you will do a favour to the translator by not jpayne@68: merging the two. (Identical messages occurring in several places are jpayne@68: combined by xgettext, so the translator has to handle them once only.) jpayne@68:
jpayne@68: jpayne@68:Translatable strings should be limited to one paragraph; don't let a jpayne@68: single message be longer than ten lines. The reason is that when the jpayne@68: translatable string changes, the translator is faced with the task of jpayne@68: updating the entire translated string. Maybe only a single word will jpayne@68: have changed in the English string, but the translator doesn't see that jpayne@68: (with the current translation tools), therefore she has to proofread jpayne@68: the entire message. jpayne@68:
jpayne@68: jpayne@68:Many GNU programs have a ‘--help’ output that extends over several jpayne@68: screen pages. It is a courtesy towards the translators to split such a jpayne@68: message into several ones of five to ten lines each. While doing that, jpayne@68: you can also attempt to split the documented options into groups, jpayne@68: such as the input options, the output options, and the informative jpayne@68: output options. This will help every user to find the option he is jpayne@68: looking for. jpayne@68:
jpayne@68: jpayne@68:Hardcoded string concatenation is sometimes used to construct English jpayne@68: strings: jpayne@68:
jpayne@68:strcpy (s, "Replace "); jpayne@68: strcat (s, object1); jpayne@68: strcat (s, " with "); jpayne@68: strcat (s, object2); jpayne@68: strcat (s, "?"); jpayne@68: |
In order to present to the translator only entire sentences, and also
jpayne@68: because in some languages the translator might want to swap the order
jpayne@68: of object1
and object2
, it is necessary to change this
jpayne@68: to use a format string:
jpayne@68:
sprintf (s, "Replace %s with %s?", object1, object2); jpayne@68: |
A similar case is compile time concatenation of strings. The ISO C 99
jpayne@68: include file <inttypes.h>
contains a macro PRId64
that
jpayne@68: can be used as a formatting directive for outputting an ‘int64_t’
jpayne@68: integer through printf
. It expands to a constant string, usually
jpayne@68: "d" or "ld" or "lld" or something like this, depending on the platform.
jpayne@68: Assume you have code like
jpayne@68:
printf ("The amount is %0" PRId64 "\n", number); jpayne@68: |
The gettext
tools and library have special support for these
jpayne@68: <inttypes.h>
macros. You can therefore simply write
jpayne@68:
printf (gettext ("The amount is %0" PRId64 "\n"), number); jpayne@68: |
The PO file will contain the string "The amount is %0<PRId64>\n".
jpayne@68: The translators will provide a translation containing "%0<PRId64>"
jpayne@68: as well, and at runtime the gettext
function's result will
jpayne@68: contain the appropriate constant string, "d" or "ld" or "lld".
jpayne@68:
This works only for the predefined <inttypes.h>
macros. If
jpayne@68: you have defined your own similar macros, let's say ‘MYPRId64’,
jpayne@68: that are not known to xgettext
, the solution for this problem
jpayne@68: is to change the code like this:
jpayne@68:
char buf1[100]; jpayne@68: sprintf (buf1, "%0" MYPRId64, number); jpayne@68: printf (gettext ("The amount is %s\n"), buf1); jpayne@68: |
This means, you put the platform dependent code in one statement, and the jpayne@68: internationalization code in a different statement. Note that a buffer length jpayne@68: of 100 is safe, because all available hardware integer types are limited to jpayne@68: 128 bits, and to print a 128 bit integer one needs at most 54 characters, jpayne@68: regardless whether in decimal, octal or hexadecimal. jpayne@68:
jpayne@68: jpayne@68: jpayne@68:All this applies to other programming languages as well. For example, in jpayne@68: Java and C#, string concatenation is very frequently used, because it is a jpayne@68: compiler built-in operator. Like in C, in Java, you would change jpayne@68:
jpayne@68:System.out.println("Replace "+object1+" with "+object2+"?"); jpayne@68: |
into a statement involving a format string: jpayne@68:
jpayne@68:System.out.println( jpayne@68: MessageFormat.format("Replace {0} with {1}?", jpayne@68: new Object[] { object1, object2 })); jpayne@68: |
Similarly, in C#, you would change jpayne@68:
jpayne@68:Console.WriteLine("Replace "+object1+" with "+object2+"?"); jpayne@68: |
into a statement involving a format string: jpayne@68:
jpayne@68:Console.WriteLine( jpayne@68: String.Format("Replace {0} with {1}?", object1, object2)); jpayne@68: |
It is good to not embed URLs in translatable strings, for several reasons: jpayne@68:
The same holds for email addresses. jpayne@68:
jpayne@68:So, you would change jpayne@68:
jpayne@68:fputs (_("GNU GPL version 3 <https://gnu.org/licenses/gpl.html>\n"), jpayne@68: stream); jpayne@68: |
to jpayne@68:
jpayne@68:fprintf (stream, _("GNU GPL version 3 <%s>\n"), jpayne@68: "https://gnu.org/licenses/gpl.html"); jpayne@68: |
The GNU C Library's <printf.h>
facility and the C++ standard library's <format>
header file make it possible for the programmer to define their own format string directives. However, such format directives cannot be used in translatable strings, for two reasons:
jpayne@68:
To avoid this situation, you need to move the formatting with the custom directive into a format string that does not get translated. jpayne@68:
jpayne@68:For example, assuming code that makes use of a %r
directive:
jpayne@68:
fprintf (stream, _("The contents is: %r"), data); jpayne@68: |
you would rewrite it to: jpayne@68:
jpayne@68:char *tmp; jpayne@68: if (asprintf (&tmp, "%r", data) < 0) jpayne@68: error (...); jpayne@68: fprintf (stream, _("The contents is: %s"), tmp); jpayne@68: free (tmp); jpayne@68: |
Similarly, in C++, assuming you have defined a custom formatter
for the type of data
, the code
jpayne@68:
cout << format (_("The contents is: {:#$#}"), data); jpayne@68: |
should be rewritten to: jpayne@68:
jpayne@68:string tmp = format ("{:#$#}", data); jpayne@68: cout << format (_("The contents is: {}"), tmp); jpayne@68: |
Unusual markup or control characters should not be used in translatable jpayne@68: strings. Translators will likely not understand the particular meaning jpayne@68: of the markup or control characters. jpayne@68:
jpayne@68:For example, if you have a convention that ‘|’ delimits the jpayne@68: left-hand and right-hand part of some GUI elements, translators will jpayne@68: often not understand it without specific comments. It might be jpayne@68: better to have the translator translate the left-hand and right-hand jpayne@68: part separately. jpayne@68:
jpayne@68:Another example is the ‘argp’ convention to use a single ‘\v’ jpayne@68: (vertical tab) control character to delimit two sections inside a jpayne@68: string. This is flawed. Some translators may convert it to a simple jpayne@68: newline, some to blank lines. With some PO file editors it may not be jpayne@68: easy to even enter a vertical tab control character. So, you cannot jpayne@68: be sure that the translation will contain a ‘\v’ character, at the jpayne@68: corresponding position. The solution is, again, to let the translator jpayne@68: translate two separate strings and combine at run-time the two translated jpayne@68: strings with the ‘\v’ required by the convention. jpayne@68:
jpayne@68:HTML markup, however, is common enough that it's probably ok to use in jpayne@68: translatable strings. But please bear in mind that the GNU gettext tools jpayne@68: don't verify that the translations are well-formed HTML. jpayne@68:
jpayne@68: jpayne@68: jpayne@68: jpayne@68:All strings requiring translation should be marked in the C sources. Marking
jpayne@68: is done in such a way that each translatable string appears to be
jpayne@68: the sole argument of some function or preprocessor macro. There are
jpayne@68: only a few such possible functions or macros meant for translation,
jpayne@68: and their names are said to be marking keywords. The marking is
jpayne@68: attached to strings themselves, rather than to what we do with them.
jpayne@68: This approach has more uses. A blatant example is an error message
jpayne@68: produced by formatting. The format string needs translation, as
jpayne@68: well as some strings inserted through some ‘%s’ specification
jpayne@68: in the format, while the result from sprintf
may have so many
jpayne@68: different instances that it is impractical to list them all in some
jpayne@68: ‘error_string_out()’ routine, say.
jpayne@68:
This marking operation has two goals. The first goal of marking jpayne@68: is for triggering the retrieval of the translation, at run time. jpayne@68: The keyword is possibly resolved into a routine able to dynamically jpayne@68: return the proper translation, as far as possible or wanted, for the jpayne@68: argument string. Most localizable strings are found in executable jpayne@68: positions, that is, attached to variables or given as parameters to jpayne@68: functions. But this is not universal usage, and some translatable jpayne@68: strings appear in structured initializations. See section Special Cases of Translatable Strings. jpayne@68:
jpayne@68:The second goal of the marking operation is to help xgettext
jpayne@68: at properly extracting all translatable strings when it scans a set
jpayne@68: of program sources and produces PO file templates.
jpayne@68:
The canonical keyword for marking translatable strings is
jpayne@68: ‘gettext’, it gave its name to the whole GNU gettext
jpayne@68: package. For packages making only light use of the ‘gettext’
jpayne@68: keyword, macro or function, it is easily used as is. However,
jpayne@68: for packages using the gettext
interface more heavily, it
jpayne@68: is usually more convenient to give the main keyword a shorter, less
jpayne@68: obtrusive name. Indeed, the keyword might appear on a lot of strings
jpayne@68: all over the package, and programmers usually do not want nor need
jpayne@68: their program sources to remind them forcefully, all the time, that they
jpayne@68: are internationalized. Further, a long keyword has the disadvantage
jpayne@68: of using more horizontal space, forcing more indentation work on
jpayne@68: sources for those trying to keep them within 79 or 80 columns.
jpayne@68:
Many packages use ‘_’ (a simple underline) as a keyword,
jpayne@68: and write ‘_("Translatable string")’ instead of ‘gettext
jpayne@68: ("Translatable string")’. Further, the coding rule, from GNU standards,
jpayne@68: wanting that there is a space between the keyword and the opening
jpayne@68: parenthesis is relaxed, in practice, for this particular usage.
jpayne@68: So, the textual overhead per translatable string is reduced to
jpayne@68: only three characters: the underline and the two parentheses.
jpayne@68: However, even if GNU gettext
uses this convention internally,
jpayne@68: it does not offer it officially. The real, genuine keyword is truly
jpayne@68: ‘gettext’ indeed. It is fairly easy for those wanting to use
jpayne@68: ‘_’ instead of ‘gettext’ to declare:
jpayne@68:
#include <libintl.h> jpayne@68: #define _(String) gettext (String) jpayne@68: |
instead of merely using ‘#include <libintl.h>’. jpayne@68:
jpayne@68:The marking keywords ‘gettext’ and ‘_’ take the translatable
jpayne@68: string as sole argument. It is also possible to define marking functions
jpayne@68: that take it at another argument position. It is even possible to make
jpayne@68: the marked argument position depend on the total number of arguments of
jpayne@68: the function call; this is useful in C++. All this is achieved using
jpayne@68: xgettext
's ‘--keyword’ option. How to pass such an option
jpayne@68: to xgettext
, assuming that gettextize
is used, is described
jpayne@68: in ‘Makevars’ in ‘po/’ and AM_XGETTEXT_OPTION in ‘po.m4’.
jpayne@68:
Note also that long strings can be split across lines, into multiple
jpayne@68: adjacent string tokens. Automatic string concatenation is performed
jpayne@68: at compile time according to ISO C and ISO C++; xgettext
also
jpayne@68: supports this syntax.
jpayne@68:
In C++, marking a C++ format string requires a small code change,
jpayne@68: because the first argument to std::format
must be a constant
jpayne@68: expression.
jpayne@68: For example,
jpayne@68:
std::format ("{} {}!", "Hello", "world") jpayne@68: |
needs to be changed to jpayne@68:
std::vformat (gettext ("{} {}!"), std::make_format_args("Hello", "world")) jpayne@68: |
Later on, the maintenance is relatively easy. If, as a programmer, jpayne@68: you add or modify a string, you will have to ask yourself if the jpayne@68: new or altered string requires translation, and include it within jpayne@68: ‘_()’ if you think it should be translated. For example, ‘"%s"’ jpayne@68: is an example of string not requiring translation. But jpayne@68: ‘"%s: %d"’ does require translation, because in French, unlike jpayne@68: in English, it's customary to put a space before a colon. jpayne@68:
jpayne@68: jpayne@68: jpayne@68: jpayne@68:In PO mode, one set of features is meant more for the programmer than jpayne@68: for the translator, and allows him to interactively mark which strings, jpayne@68: in a set of program sources, are translatable, and which are not. jpayne@68: Even if it is a fairly easy job for a programmer to find and mark jpayne@68: such strings by other means, using any editor of his choice, PO mode jpayne@68: makes this work more comfortable. Further, this gives translators jpayne@68: who feel a little like programmers, or programmers who feel a little jpayne@68: like translators, a tool letting them work at marking translatable jpayne@68: strings in the program sources, while simultaneously producing a set of jpayne@68: translation in some language, for the package being internationalized. jpayne@68:
jpayne@68: jpayne@68:The set of program sources, targeted by the PO mode commands describe jpayne@68: here, should have an Emacs tags table constructed for your project, jpayne@68: prior to using these PO file commands. This is easy to do. In any jpayne@68: shell window, change the directory to the root of your project, then jpayne@68: execute a command resembling: jpayne@68:
jpayne@68:etags src/*.[hc] lib/*.[hc] jpayne@68: |
presuming here you want to process all ‘.h’ and ‘.c’ files jpayne@68: from the ‘src/’ and ‘lib/’ directories. This command will jpayne@68: explore all said files and create a ‘TAGS’ file in your root jpayne@68: directory, somewhat summarizing the contents using a special file jpayne@68: format Emacs can understand. jpayne@68:
jpayne@68: jpayne@68:For packages following the GNU coding standards, there is
jpayne@68: a make goal tags
or TAGS
which constructs the tag files in
jpayne@68: all directories and for all files containing source code.
jpayne@68:
Once your ‘TAGS’ file is ready, the following commands assist jpayne@68: the programmer at marking translatable strings in his set of sources. jpayne@68: But these commands are necessarily driven from within a PO file jpayne@68: window, and it is likely that you do not even have such a PO file yet. jpayne@68: This is not a problem at all, as you may safely open a new, empty PO jpayne@68: file, mainly for using these commands. This empty PO file will slowly jpayne@68: fill in while you mark strings as translatable in your program sources. jpayne@68:
jpayne@68:Search through program sources for a string which looks like a
jpayne@68: candidate for translation (po-tags-search
).
jpayne@68:
Mark the last string found with ‘_()’ (po-mark-translatable
).
jpayne@68:
Mark the last string found with a keyword taken from a set of possible
jpayne@68: keywords. This command with a prefix allows some management of these
jpayne@68: keywords (po-select-mark-and-mark
).
jpayne@68:
The , (po-tags-search
) command searches for the next
jpayne@68: occurrence of a string which looks like a possible candidate for
jpayne@68: translation, and displays the program source in another Emacs window,
jpayne@68: positioned in such a way that the string is near the top of this other
jpayne@68: window. If the string is too big to fit whole in this window, it is
jpayne@68: positioned so only its end is shown. In any case, the cursor
jpayne@68: is left in the PO file window. If the shown string would be better
jpayne@68: presented differently in different native languages, you may mark it
jpayne@68: using M-, or M-.. Otherwise, you might rather ignore it
jpayne@68: and skip to the next string by merely repeating the , command.
jpayne@68:
A string is a good candidate for translation if it contains a sequence jpayne@68: of three or more letters. A string containing at most two letters in jpayne@68: a row will be considered as a candidate if it has more letters than jpayne@68: non-letters. The command disregards strings containing no letters, jpayne@68: or isolated letters only. It also disregards strings within comments, jpayne@68: or strings already marked with some keyword PO mode knows (see below). jpayne@68:
jpayne@68:If you have never told Emacs about some ‘TAGS’ file to use, the jpayne@68: command will request that you specify one from the minibuffer, the jpayne@68: first time you use the command. You may later change your ‘TAGS’ jpayne@68: file by using the regular Emacs command M-x visit-tags-table, jpayne@68: which will ask you to name the precise ‘TAGS’ file you want jpayne@68: to use. See (emacs)Tags section `Tag Tables' in The Emacs Editor. jpayne@68:
jpayne@68:Each time you use the , command, the search resumes from where it was jpayne@68: left by the previous search, and goes through all program sources, jpayne@68: obeying the ‘TAGS’ file, until all sources have been processed. jpayne@68: However, by giving a prefix argument to the command (C-u jpayne@68: ,), you may request that the search be restarted all over again jpayne@68: from the first program source; but in this case, strings that you jpayne@68: recently marked as translatable will be automatically skipped. jpayne@68:
jpayne@68:Using this , command does not prevent using of other regular
jpayne@68: Emacs tags commands. For example, regular tags-search
or
jpayne@68: tags-query-replace
commands may be used without disrupting the
jpayne@68: independent , search sequence. However, as implemented, the
jpayne@68: initial , command (or the , command is used with a
jpayne@68: prefix) might also reinitialize the regular Emacs tags searching to the
jpayne@68: first tags file, this reinitialization might be considered spurious.
jpayne@68:
The M-, (po-mark-translatable
) command will mark the
jpayne@68: recently found string with the ‘_’ keyword. The M-.
jpayne@68: (po-select-mark-and-mark
) command will request that you type
jpayne@68: one keyword from the minibuffer and use that keyword for marking
jpayne@68: the string. Both commands will automatically create a new PO file
jpayne@68: untranslated entry for the string being marked, and make it the
jpayne@68: current entry (making it easy for you to immediately proceed to its
jpayne@68: translation, if you feel like doing it right away). It is possible
jpayne@68: that the modifications made to the program source by M-, or
jpayne@68: M-. render some source line longer than 80 columns, forcing you
jpayne@68: to break and re-indent this line differently. You may use the O
jpayne@68: command from PO mode, or any other window changing command from
jpayne@68: Emacs, to break out into the program source window, and do any
jpayne@68: needed adjustments. You will have to use some regular Emacs command
jpayne@68: to return the cursor to the PO file window, if you want command
jpayne@68: , for the next string, say.
jpayne@68:
The M-. command has a few built-in speedups, so you do not jpayne@68: have to explicitly type all keywords all the time. The first such jpayne@68: speedup is that you are presented with a preferred keyword, jpayne@68: which you may accept by merely typing <RET> at the prompt. jpayne@68: The second speedup is that you may type any non-ambiguous prefix of the jpayne@68: keyword you really mean, and the command will complete it automatically jpayne@68: for you. This also means that PO mode has to know all jpayne@68: your possible keywords, and that it will not accept mistyped keywords. jpayne@68:
jpayne@68:If you reply ? to the keyword request, the command gives a jpayne@68: list of all known keywords, from which you may choose. When the jpayne@68: command is prefixed by an argument (C-u M-.), it inhibits jpayne@68: updating any program source or PO file buffer, and does some simple jpayne@68: keyword management instead. In this case, the command asks for a jpayne@68: keyword, written in full, which becomes a new allowed keyword for jpayne@68: later M-. commands. Moreover, this new keyword automatically jpayne@68: becomes the preferred keyword for later commands. By typing jpayne@68: an already known keyword in response to C-u M-., one merely jpayne@68: changes the preferred keyword and does nothing more. jpayne@68:
jpayne@68:All keywords known for M-. are recognized by the , command jpayne@68: when scanning for strings, and strings already marked by any of those jpayne@68: known keywords are automatically skipped. If many PO files are opened jpayne@68: simultaneously, each one has its own independent set of known keywords. jpayne@68: There is no provision in PO mode, currently, for deleting a known jpayne@68: keyword, you have to quit the file (maybe using q) and reopen jpayne@68: it afresh. When a PO file is newly brought up in an Emacs window, only jpayne@68: ‘gettext’ and ‘_’ are known as keywords, and ‘gettext’ jpayne@68: is preferred for the M-. command. In fact, this is not useful to jpayne@68: prefer ‘_’, as this one is already built in the M-, command. jpayne@68:
jpayne@68: jpayne@68: jpayne@68: jpayne@68:In C programs strings are often used within calls of functions from the
jpayne@68: printf
family. The special thing about these format strings is
jpayne@68: that they can contain format specifiers introduced with %. Assume
jpayne@68: we have the code
jpayne@68:
printf (gettext ("String `%s' has %d characters\n"), s, strlen (s)); jpayne@68: |
A possible German translation for the above string might be: jpayne@68:
jpayne@68:"%d Zeichen lang ist die Zeichenkette `%s'" jpayne@68: |
A C programmer, even if he cannot speak German, will recognize that
jpayne@68: there is something wrong here. The order of the two format specifiers
jpayne@68: is changed but of course the arguments in the printf
don't have.
jpayne@68: This will most probably lead to problems because now the length of the
jpayne@68: string is regarded as the address.
jpayne@68:
To prevent errors at runtime caused by translations, the msgfmt
jpayne@68: tool can check statically whether the arguments in the original and the
jpayne@68: translation string match in type and number. If this is not the case
jpayne@68: and the ‘-c’ option has been passed to msgfmt
, msgfmt
jpayne@68: will give an error and refuse to produce a MO file. Thus consistent
jpayne@68: use of ‘msgfmt -c’ will catch the error, so that it cannot cause
jpayne@68: problems at runtime.
jpayne@68:
If the word order in the above German translation would be correct one jpayne@68: would have to write jpayne@68:
jpayne@68:"%2$d Zeichen lang ist die Zeichenkette `%1$s'" jpayne@68: |
The routines in msgfmt
know about this special notation.
jpayne@68:
Because not all strings in a program will be format strings, it is not
jpayne@68: useful for msgfmt
to test all the strings in the ‘.po’ file.
jpayne@68: This might cause problems because the string might contain what looks
jpayne@68: like a format specifier, but the string is not used in printf
.
jpayne@68:
Therefore xgettext
adds a special tag to those messages it
jpayne@68: thinks might be a format string. There is no absolute rule for this,
jpayne@68: only a heuristic. In the ‘.po’ file the entry is marked using the
jpayne@68: c-format
flag in the #,
comment line (see section The Format of PO Files).
jpayne@68:
The careful reader now might say that this again can cause problems.
jpayne@68: The heuristic might guess it wrong. This is true and therefore
jpayne@68: xgettext
knows about a special kind of comment which lets
jpayne@68: the programmer take over the decision. If in the same line as or
jpayne@68: the immediately preceding line to the gettext
keyword
jpayne@68: the xgettext
program finds a comment containing the words
jpayne@68: xgettext:c-format
, it will mark the string in any case with
jpayne@68: the c-format
flag. This kind of comment should be used when
jpayne@68: xgettext
does not recognize the string as a format string but
jpayne@68: it really is one and it should be tested. Please note that when the
jpayne@68: comment is in the same line as the gettext
keyword, it must be
jpayne@68: before the string to be translated. Also note that a comment such as
jpayne@68: xgettext:c-format
applies only to the first string in the same
jpayne@68: or the next line, not to multiple strings.
jpayne@68:
This situation happens quite often. The printf
function is often
jpayne@68: called with strings which do not contain a format specifier. Of course
jpayne@68: one would normally use fputs
but it does happen. In this case
jpayne@68: xgettext
does not recognize this as a format string but what
jpayne@68: happens if the translation introduces a valid format specifier? The
jpayne@68: printf
function will try to access one of the parameters but none
jpayne@68: exists because the original code does not pass any parameters.
jpayne@68:
xgettext
of course could make a wrong decision the other way
jpayne@68: round, i.e. a string marked as a format string actually is not a format
jpayne@68: string. In this case the msgfmt
might give too many warnings and
jpayne@68: would prevent translating the ‘.po’ file. The method to prevent
jpayne@68: this wrong decision is similar to the one used above, only the comment
jpayne@68: to use must contain the string xgettext:no-c-format
.
jpayne@68:
If a string is marked with c-format
and this is not correct the
jpayne@68: user can find out who is responsible for the decision. See
jpayne@68: Invoking the xgettext
Program to see how the --debug
option can be
jpayne@68: used for solving this problem.
jpayne@68:
The attentive reader might now point out that it is not always possible
jpayne@68: to mark translatable string with gettext
or something like this.
jpayne@68: Consider the following case:
jpayne@68:
{ jpayne@68: static const char *messages[] = { jpayne@68: "some very meaningful message", jpayne@68: "and another one" jpayne@68: }; jpayne@68: const char *string; jpayne@68: … jpayne@68: string jpayne@68: = index > 1 ? "a default message" : messages[index]; jpayne@68: jpayne@68: fputs (string); jpayne@68: … jpayne@68: } jpayne@68: |
While it is no problem to mark the string "a default message"
it
jpayne@68: is not possible to mark the string initializers for messages
.
jpayne@68: What is to be done? We have to fulfill two tasks. First we have to mark the
jpayne@68: strings so that the xgettext
program (see section Invoking the xgettext
Program)
jpayne@68: can find them, and second we have to translate the string at runtime
jpayne@68: before printing them.
jpayne@68:
The first task can be fulfilled by creating a new keyword, which names a jpayne@68: no-op. For the second we have to mark all access points to a string jpayne@68: from the array. So one solution can look like this: jpayne@68:
jpayne@68:#define gettext_noop(String) String jpayne@68: jpayne@68: { jpayne@68: static const char *messages[] = { jpayne@68: gettext_noop ("some very meaningful message"), jpayne@68: gettext_noop ("and another one") jpayne@68: }; jpayne@68: const char *string; jpayne@68: … jpayne@68: string jpayne@68: = index > 1 ? gettext ("a default message") : gettext (messages[index]); jpayne@68: jpayne@68: fputs (string); jpayne@68: … jpayne@68: } jpayne@68: |
Please convince yourself that the string which is written by
jpayne@68: fputs
is translated in any case. How to get xgettext
know
jpayne@68: the additional keyword gettext_noop
is explained in Invoking the xgettext
Program.
jpayne@68:
The above is of course not the only solution. You could also come along jpayne@68: with the following one: jpayne@68:
jpayne@68:#define gettext_noop(String) String jpayne@68: jpayne@68: { jpayne@68: static const char *messages[] = { jpayne@68: gettext_noop ("some very meaningful message"), jpayne@68: gettext_noop ("and another one") jpayne@68: }; jpayne@68: const char *string; jpayne@68: … jpayne@68: string jpayne@68: = index > 1 ? gettext_noop ("a default message") : messages[index]; jpayne@68: jpayne@68: fputs (gettext (string)); jpayne@68: … jpayne@68: } jpayne@68: |
But this has a drawback. The programmer has to take care that
jpayne@68: he uses gettext_noop
for the string "a default message"
.
jpayne@68: A use of gettext
could have in rare cases unpredictable results.
jpayne@68:
One advantage is that you need not make control flow analysis to make jpayne@68: sure the output is really translated in any case. But this analysis is jpayne@68: generally not very difficult. If it should be in any situation you can jpayne@68: use this second method in this situation. jpayne@68:
jpayne@68: jpayne@68: jpayne@68: jpayne@68:Code sometimes has bugs, but translations sometimes have bugs too. The jpayne@68: users need to be able to report them. Reporting translation bugs to the jpayne@68: programmer or maintainer of a package is not very useful, since the jpayne@68: maintainer must never change a translation, except on behalf of the jpayne@68: translator. Hence the translation bugs must be reported to the jpayne@68: translators. jpayne@68:
jpayne@68:Here is a way to organize this so that the maintainer does not need to jpayne@68: forward translation bug reports, nor even keep a list of the addresses of jpayne@68: the translators or their translation teams. jpayne@68:
jpayne@68:Every program has a place where is shows the bug report address. For jpayne@68: GNU programs, it is the code which handles the “–help” option, jpayne@68: typically in a function called “usage”. In this place, instruct the jpayne@68: translator to add her own bug reporting address. For example, if that jpayne@68: code has a statement jpayne@68:
jpayne@68:printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); jpayne@68: |
you can add some translator instructions like this: jpayne@68:
jpayne@68:/* TRANSLATORS: The placeholder indicates the bug-reporting address jpayne@68: for this package. Please add _another line_ saying jpayne@68: "Report translation bugs to <...>\n" with the address for translation jpayne@68: bugs (typically your translation team's web or email address). */ jpayne@68: printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); jpayne@68: |
These will be extracted by ‘xgettext’, leading to a .pot file that jpayne@68: contains this: jpayne@68:
jpayne@68:#. TRANSLATORS: The placeholder indicates the bug-reporting address jpayne@68: #. for this package. Please add _another line_ saying jpayne@68: #. "Report translation bugs to <...>\n" with the address for translation jpayne@68: #. bugs (typically your translation team's web or email address). jpayne@68: #: src/hello.c:178 jpayne@68: #, c-format jpayne@68: msgid "Report bugs to <%s>.\n" jpayne@68: msgstr "" jpayne@68: |
Should names of persons, cities, locations etc. be marked for translation jpayne@68: or not? People who only know languages that can be written with Latin jpayne@68: letters (English, Spanish, French, German, etc.) are tempted to say “no”, jpayne@68: because names usually do not change when transported between these languages. jpayne@68: However, in general when translating from one script to another, names jpayne@68: are translated too, usually phonetically or by transliteration. For jpayne@68: example, Russian or Greek names are converted to the Latin alphabet when jpayne@68: being translated to English, and English or French names are converted jpayne@68: to the Katakana script when being translated to Japanese. This is jpayne@68: necessary because the speakers of the target language in general cannot jpayne@68: read the script the name is originally written in. jpayne@68:
jpayne@68:As a programmer, you should therefore make sure that names are marked jpayne@68: for translation, with a special comment telling the translators that it jpayne@68: is a proper name and how to pronounce it. In its simple form, it looks jpayne@68: like this: jpayne@68:
jpayne@68:printf (_("Written by %s.\n"), jpayne@68: /* TRANSLATORS: This is a proper name. See the gettext jpayne@68: manual, section Names. Note this is actually a non-ASCII jpayne@68: name: The first name is (with Unicode escapes) jpayne@68: "Fran\u00e7ois" or (with HTML entities) "François". jpayne@68: Pronunciation is like "fraa-swa pee-nar". */ jpayne@68: _("Francois Pinard")); jpayne@68: |
The GNU gnulib library offers a module ‘propername’ jpayne@68: (https://www.gnu.org/software/gnulib/MODULES.html#module=propername) jpayne@68: which takes care to automatically append the original name, in parentheses, jpayne@68: to the translated name. For names that cannot be written in ASCII, it jpayne@68: also frees the translator from the task of entering the appropriate non-ASCII jpayne@68: characters if no script change is needed. In this more comfortable form, jpayne@68: it looks like this: jpayne@68:
jpayne@68:printf (_("Written by %s and %s.\n"), jpayne@68: proper_name ("Ulrich Drepper"), jpayne@68: /* TRANSLATORS: This is a proper name. See the gettext jpayne@68: manual, section Names. Note this is actually a non-ASCII jpayne@68: name: The first name is (with Unicode escapes) jpayne@68: "Fran\u00e7ois" or (with HTML entities) "François". jpayne@68: Pronunciation is like "fraa-swa pee-nar". */ jpayne@68: proper_name_utf8 ("Francois Pinard", "Fran\303\247ois Pinard")); jpayne@68: |
You can also write the original name directly in Unicode (rather than with jpayne@68: Unicode escapes or HTML entities) and denote the pronunciation using the jpayne@68: International Phonetic Alphabet (see jpayne@68: https://en.wikipedia.org/wiki/International_Phonetic_Alphabet). jpayne@68:
jpayne@68:As a translator, you should use some care when translating names, because jpayne@68: it is frustrating if people see their names mutilated or distorted. jpayne@68:
jpayne@68:If your language uses the Latin script, all you need to do is to reproduce jpayne@68: the name as perfectly as you can within the usual character set of your jpayne@68: language. In this particular case, this means to provide a translation jpayne@68: containing the c-cedilla character. If your language uses a different jpayne@68: script and the people speaking it don't usually read Latin words, it means jpayne@68: transliteration. If the programmer used the simple case, you should still jpayne@68: give, in parentheses, the original writing of the name – for the sake of jpayne@68: the people that do read the Latin script. If the programmer used the jpayne@68: ‘propername’ module mentioned above, you don't need to give the original jpayne@68: writing of the name in parentheses, because the program will already do so. jpayne@68: Here is an example, using Greek as the target script: jpayne@68:
jpayne@68:#. This is a proper name. See the gettext jpayne@68: #. manual, section Names. Note this is actually a non-ASCII jpayne@68: #. name: The first name is (with Unicode escapes) jpayne@68: #. "Fran\u00e7ois" or (with HTML entities) "François". jpayne@68: #. Pronunciation is like "fraa-swa pee-nar". jpayne@68: msgid "Francois Pinard" jpayne@68: msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho" jpayne@68: " (Francois Pinard)" jpayne@68: |
Because translation of names is such a sensitive domain, it is a good jpayne@68: idea to test your translation before submitting it. jpayne@68:
jpayne@68: jpayne@68: jpayne@68: jpayne@68:When you are preparing a library, not a program, for the use of
jpayne@68: gettext
, only a few details are different. Here we assume that
jpayne@68: the library has a translation domain and a POT file of its own. (If
jpayne@68: it uses the translation domain and POT file of the main program, then
jpayne@68: the previous sections apply without changes.)
jpayne@68:
setlocale (LC_ALL, "")
. It's the
jpayne@68: responsibility of the main program to set the locale. The library's
jpayne@68: documentation should mention this fact, so that developers of programs
jpayne@68: using the library are aware of it.
jpayne@68:
jpayne@68: textdomain (PACKAGE)
, because it
jpayne@68: would interfere with the text domain set by the main program.
jpayne@68:
jpayne@68: setlocale (LC_ALL, ""); jpayne@68: bindtextdomain (PACKAGE, LOCALEDIR); jpayne@68: textdomain (PACKAGE); jpayne@68: |
For a library it is reduced to jpayne@68:
jpayne@68:bindtextdomain (PACKAGE, LOCALEDIR); jpayne@68: |
If your library's API doesn't already have an initialization function,
jpayne@68: you need to create one, containing at least the bindtextdomain
jpayne@68: invocation. However, you usually don't need to export and document this
jpayne@68: initialization function: It is sufficient that all entry points of the
jpayne@68: library call the initialization function if it hasn't been called before.
jpayne@68: The typical idiom used to achieve this is a static boolean variable that
jpayne@68: indicates whether the initialization function has been called. If the
jpayne@68: library is meant to be used in multithreaded applications, this variable
jpayne@68: needs to be marked volatile
, so that its value get propagated
jpayne@68: between threads. Like this:
jpayne@68:
static volatile bool libfoo_initialized; jpayne@68: jpayne@68: static void jpayne@68: libfoo_initialize (void) jpayne@68: { jpayne@68: bindtextdomain (PACKAGE, LOCALEDIR); jpayne@68: libfoo_initialized = true; jpayne@68: } jpayne@68: jpayne@68: /* This function is part of the exported API. */ jpayne@68: struct foo * jpayne@68: create_foo (...) jpayne@68: { jpayne@68: /* Must ensure the initialization is performed. */ jpayne@68: if (!libfoo_initialized) jpayne@68: libfoo_initialize (); jpayne@68: ... jpayne@68: } jpayne@68: jpayne@68: /* This function is part of the exported API. The argument must be jpayne@68: non-NULL and have been created through create_foo(). */ jpayne@68: int jpayne@68: foo_refcount (struct foo *argument) jpayne@68: { jpayne@68: /* No need to invoke the initialization function here, because jpayne@68: create_foo() must already have been called before. */ jpayne@68: ... jpayne@68: } jpayne@68: |
The more general solution for initialization functions, POSIX
jpayne@68: pthread_once
, is not needed in this case.
jpayne@68:
#include <libintl.h> jpayne@68: #define _(String) gettext (String) jpayne@68: |
for a program. For a library, which has its own translation domain, jpayne@68: it reads like this: jpayne@68:
jpayne@68:#include <libintl.h> jpayne@68: #define _(String) dgettext (PACKAGE, String) jpayne@68: |
In other words, dgettext
is used instead of gettext
.
jpayne@68: Similarly, the dngettext
function should be used in place of the
jpayne@68: ngettext
function.
jpayne@68:
[ << ] | jpayne@68:[ >> ] | jpayne@68:jpayne@68: | jpayne@68: | jpayne@68: | jpayne@68: | jpayne@68: | [Top] | jpayne@68:[Contents] | jpayne@68:[Index] | jpayne@68:[ ? ] | jpayne@68:
jpayne@68:
jpayne@68: This document was generated by Bruno Haible on February, 21 2024 using texi2html 1.78a.
jpayne@68:
jpayne@68:
jpayne@68:
jpayne@68: