455 lines
20 KiB
Plaintext
455 lines
20 KiB
Plaintext
This is a simple, respectively stupid Perl package that shows how the
|
|
complete internationalization process for a Perl package *could* be
|
|
done. It does not claim to be the smartest or the only possible
|
|
solution, but it provides at least a skeleton for real packages. If
|
|
libintl-perl should someday become an "established" Perl package, it
|
|
would probably be a lot better to seamlessly integrate the process
|
|
into ExtUtils::MakeMaker, but for now it's all we have.
|
|
|
|
The example focuses on the packaging process, i. e. on the things you
|
|
have to do to maintain an internationalized Perl package, so that
|
|
users of your package will benefit from translations you provide. It
|
|
therefore doesn't make use of any of the nitty-gritty details of
|
|
message translation like plural handling or the like.
|
|
|
|
Requirements
|
|
------------
|
|
|
|
The only requirement is a Perl aware version of GNU gettext. Perl
|
|
support was introduced only recently in GNU gettext, and you will have
|
|
to check whether your copy of GNU gettext already supports Perl.
|
|
Support for Perl was introduced in version 0.12.2 of GNU gettext. If
|
|
your version is older, you have to update GNU gettext.
|
|
|
|
First test
|
|
----------
|
|
|
|
The subdirectory "simplecal" contains a regular Perl package like the
|
|
ones you will find on the CPAN. You should first try to build and
|
|
use the package:
|
|
|
|
cd simplecal
|
|
perl Makefile.PL
|
|
make
|
|
|
|
If you see a warning that the prerequisite Locale::TextDomain is not
|
|
found, then you have to install libintl-perl first.
|
|
|
|
You should never "make install", the package is only a stupid example
|
|
and you will not really want to install it. You can simply try it out
|
|
from the installation directory itself:
|
|
|
|
perl -Ilib bin/simplecal.pl
|
|
|
|
It should print a crude calendar representation in English, or even in
|
|
your preferred language, depending on your system settings.
|
|
|
|
The Programming
|
|
---------------
|
|
|
|
Now we should dig into the sources. All relevant files are commented
|
|
and should give you a pretty good idea of what's going on. Change
|
|
your directory to the package directory "simplecal" and inspect the
|
|
source files.
|
|
|
|
The heart of the library is found in the file lib/SimpleCal.pm. This
|
|
Perl module defines functions that map numeric values to month names
|
|
or abbreviated week day names. You will find nothing unusual in this
|
|
module except for a line at the beginning of the file that reads:
|
|
|
|
use Locale::TextDomain qw (com.cantanea.simplecal);
|
|
|
|
In case you are not familiar with the operator "qw", this is an
|
|
equivalent writing of
|
|
|
|
use Locale::TextDomain ('com.cantanea.simplecal');
|
|
|
|
That line in the code does three things: It imports the module
|
|
Locale::TextDomain, *and* it states that the text domain (or
|
|
identifier) for this package is "com.cantanea.simplecal", *and* it says
|
|
that the translations for this package can be found in the
|
|
subdirectory "LocaleDate" of any component of @INC (unless it can be
|
|
found in one of the system locations). See the POD in
|
|
Locale::TextDomain for more information.
|
|
|
|
You may also find out that some strings have a "__" or a "N__" in
|
|
front of them. The explanation to these funny things has two sides:
|
|
First, they mark the following strings as being translatable, so that the
|
|
parser "xgettext" included in GNU gettext can find them. Yet, at runtime
|
|
both "__" and "N__" are really function names, and they will look up
|
|
their argument in the translation database. There is more
|
|
documentation available on this. Guess where! Yepp, in the POD of
|
|
Locale::TextDomain.
|
|
|
|
The library is used by a Perl script "bin/simplecal.pl". Let's have a
|
|
look at that script now. The first remarkable line is the one that
|
|
calls POSIX::setlocale():
|
|
|
|
setlocale (LC_MESSAGES, '');
|
|
|
|
The POD of the POSIX module gives additional information on the
|
|
function setlocale(). In brief, that call initializes the locale
|
|
settings for the category "LC_MESSAGES" to the pre-selected user
|
|
settings (this is indicated by the empty second argument). The
|
|
constant LC_MESSAGES is exported by Locale::Messages, which is always
|
|
a safe choice. If your script is only intended to run with Perl 5.8
|
|
or better, you can also import LC_MESSAGES from the POSIX module.
|
|
|
|
The rest of the program only prints a calendar for the current month.
|
|
It retrieves the name of the month and the abbreviated weekday names
|
|
from our little SimpleCal.pm module which provides this information in
|
|
a localized form.
|
|
|
|
A Dutch Calendar
|
|
----------------
|
|
|
|
We want to see the calendar in Dutch now. All you have to do is to
|
|
set the environment variable LANGUAGE to the value "nl". If you don't
|
|
know how to do this, add the following line somewhere at the top of
|
|
"bin/simplecal.pl":
|
|
|
|
$ENV{LANGUAGE} = "nl";
|
|
|
|
Now run the script again:
|
|
|
|
perl -Ilib bin/simplecal.pl
|
|
|
|
It should print out the calendar in Dutch. Look at the *.po files in
|
|
the subdirectory "po" for a list of other translations I have
|
|
prepared. You can try them out in a similar manner.
|
|
|
|
Please see the file "README-NLS" in subdirectory "sample/simplecal"
|
|
for details on how to set the language via environment variables.
|
|
|
|
The Subdirectory "po"
|
|
---------------------
|
|
|
|
This directory contains the raw translations and a Makefile that will
|
|
compile and install them. If you enter this directory and type "make"
|
|
you will see a list of the available Makefile targets.
|
|
|
|
The first one is the target "pot", a so-called phony target, i. e. it
|
|
is not related to a file with the name of "pot". The command "make
|
|
pot" will remake the master catalog of the package and place the
|
|
result in the file "com.cantanea.simplecal.pot"
|
|
("com.cantanea.simplecal" is the text domain resp. identifier for our
|
|
package). Type the command "make pot" now to see how the master
|
|
catalog is actually generated. If the output says something like
|
|
"nothing to be done for `pot'", then delete the file
|
|
"com.cantanea.simplecal.pot" and try again.
|
|
|
|
You should see now that the target file "com.cantanea.simplecal.pot" is
|
|
generated by the program xgettext with a plethora of options:
|
|
|
|
xgettext --output=./com.cantanea.simplecal.pox --from-code=utf-8 \
|
|
--add-comments=TRANSLATORS: --files-from=./POTFILES \
|
|
--copyright-holder="Imperia AG Huerth/Germany" \
|
|
--keyword --keyword='$__' --keyword=__ --keyword=__x \
|
|
--keyword=__n:1,2 --keyword=__nx:1,2 --keyword=__xn \
|
|
--keyword=N__ --language=perl && \
|
|
rm -f com.cantanea.simplecal.pot && \
|
|
mv com.cantanea.simplecal.pox com.cantanea.simplecal.pot
|
|
|
|
Type "xgettext --help" for a detailled explanation of the command line
|
|
options. In brief this invocation causes xgettext to read a list of
|
|
files from the file "POTFILES", extract all messages from these
|
|
source files and place the result in the output file
|
|
"com.cantanea.simplecal.pox". If the command succeeds, the old ".pot"
|
|
file is replaced by the new ".pox" file.
|
|
|
|
Yes, this is complicated, and that is why this skeleton Makefile is
|
|
provided here. You can copy it without any modification into your
|
|
package to use it.
|
|
|
|
The file POTFILES contains a list of source files to be scanned for
|
|
translatable strings. Have a look at it, and you will understand it.
|
|
|
|
The Makefile also includes a file called "PACKAGE". This file contains
|
|
all package-dependent information in a couple of Makefile variables:
|
|
|
|
- TEXTDOMAIN
|
|
This Makefile variable should contain the text domain/identifier
|
|
for your package. Please see the POD of Locale::TextDomain for advice
|
|
on a reasonable naming.
|
|
|
|
- LINGUAS
|
|
The language codes of all languages supported by your package. Each
|
|
entry corresponds to a po file in the po subdirectory.
|
|
|
|
- COPYRIGHT_HOLDER
|
|
Usually your name. Whatever you put here will be included as the
|
|
copyright holder in the header of the po files.
|
|
|
|
- MSGID_BUGS_ADDRESS
|
|
Usually your name and e-mail address. It will also be included in the
|
|
po header and translators will check this entry when they come across
|
|
a bug in a msgid, or when they have difficulties to translate a certain
|
|
message because of awkward coding on your side.
|
|
|
|
Okay, after "make pot" we have updated the master message catalog
|
|
TEXTDOMAIN.pot, in our case "com.cantanea.simplecal.pot". Have a look
|
|
into the file now. It contains the original English messages that
|
|
xgettext has extracted from our source files and blank translations.
|
|
The po files (the files the names of which end with ".po") contain
|
|
previous translations provided by our package translators. Whenever
|
|
you change the Perl sources, the list of messages may change. This
|
|
results in a maybe new .pot file and requires an update of all po
|
|
files. Try that now and type "make update-po"
|
|
|
|
You will see confusing output from "make" but you may get the idea
|
|
that every single po file (every language that the package supports)
|
|
gets updated, and the new strings are inserted into the po
|
|
files. Since nothing really changed here (we did not change the source
|
|
files yet) you can now try to update the compiled po files which end in
|
|
".gmo" (for GNU mo format) with "make update-mo".
|
|
|
|
Again, you will see maybe cryptic output from "make" that signifies
|
|
that all compiled files are re-generated now by a program called
|
|
"msgfmt".
|
|
|
|
The last step requires that you copy the (possibly changed) mo files
|
|
into your package by "make install". This will copy the gmo files as
|
|
".mo" files into the subdirectory "LocaleData" of your package so that
|
|
libintl-perl is able to find them at runtime.
|
|
|
|
You can perform all these steps at once by typing "make all" although
|
|
this is mostly useful for testing purposes. In reality the workflow
|
|
is different:
|
|
|
|
- You change your source files, messages may have been added, deleted
|
|
or modified. You will have to update the master message catalog by
|
|
typing "make pot".
|
|
|
|
- Since the translations may have gotten out-of-date, you will have to
|
|
merge your changes into all po files by "make update-po".
|
|
|
|
- Your translators will get copies of the po files, reflect your
|
|
changes in the po files and send them back to you.
|
|
|
|
- When you have received the updates, it is time to compile the po
|
|
files into a binary representation with "make update-mo".
|
|
|
|
- These binary mo files have to be installed under "LocaleData", and
|
|
you have to "make install". Note that "make install" installs the
|
|
mo files in your source package, not in the system location!
|
|
|
|
- Now that you have updated the translations for your package, you
|
|
will want to upload a new version to the CPAN.
|
|
|
|
Note that all these steps are *only* necessary for package
|
|
maintainers. As a user of the package, you will only see the
|
|
resulting mo files under "LocaleData". End users do *not* need any of
|
|
the gettext tools, and they do not have to perform any of the above
|
|
steps theirselves!
|
|
|
|
Changing the Sources
|
|
--------------------
|
|
|
|
You may wonder whether your translators have to re-translate
|
|
everything from scratch whenever you change your Perl sources. This
|
|
is, of course, not the case. Let's say, you want to add a welcome and
|
|
a good-bye message to the program output. Have a look into
|
|
"bin/simplecal.pl" and you will see that this is already prepared but
|
|
commented out (search for "Welcome to" and "Bye" if you can't find
|
|
it). Uncomment these lines and see what happens to the po files in
|
|
that case.
|
|
|
|
Before you proceed, you should have a look at the Dutch translation
|
|
file "nl.po". At the bottom you will find some lines that are
|
|
commented out with "#~" and that proove that I have already prepared
|
|
that case. The comment sign "#~" in po files signifies that a
|
|
particular translation is obsoleted, i. e. no longer needed because it
|
|
is no longer present in the source files.
|
|
|
|
Say, that you have really changed your mind, and you want to
|
|
re-introduce the welcome and good-bye messages to your program and you
|
|
uncomment the corresponding lines in "bin/simplecal.pl". You will
|
|
have to re-make the master catalog "com.cantanea.simplecal.pot" by
|
|
"make pot", and then "make update-po" to update the po files. In
|
|
fact, "make update-po" is sufficient because it will also update the
|
|
pot file if it is out-of-date (i. e. if any of the source files have
|
|
changed in the meantime).
|
|
|
|
Type "make update-po" now, and look again at "po/nl.po". You will see
|
|
that the previously translated welcome and good-bye messages have been
|
|
re-activated from the obsoleted entries. In fact your translators
|
|
will have nothing to do, because their old translations are still
|
|
valid. Type "make install" and then re-run "perl -Ilib
|
|
bin/simplecal.pl", set the environment variable "LANG" to any of the
|
|
available languages, and things will still work perfectly.
|
|
|
|
Of course, it is a rare case that messages are discarded and later
|
|
re-activated in programming sources. It is more likely that you will
|
|
modify a message, or maybe add a message that is similar to former
|
|
ones. Let's say that you want to change the exclamation mark in the
|
|
good-bye message at the bottom of the script to a simple full stop.
|
|
Look for the line that reads
|
|
|
|
print __"Bye!\n";
|
|
|
|
and change it into
|
|
|
|
print __"Bye.\n";
|
|
|
|
Change into the directory "po", update the translation files with
|
|
"make update-po" and inspect the file "nl.po". At first glance, you
|
|
may not see any change. But then: The entry for the good-bye message
|
|
has an additional comment "#, fuzzy". The fuzzy mark signifies that
|
|
the msgerge program has found that a message is very similar to a
|
|
previous message (even obsoleted ones are taken into account), and
|
|
that it proposes an old translation here. The translator will
|
|
normally modify the translation accordingly (without having to re-type
|
|
everything), remove the fuzzy mark and send back the translation to
|
|
you.
|
|
|
|
In fact you could also install translations that have not been revised
|
|
by the translator and are still marked as fuzzy. This is not
|
|
recommended however! The algorithm used in msgmerge is quite smart
|
|
and seldom fails to detect minimal changes in the source message and
|
|
propose the old translation. However, it often proposes translations
|
|
from other valid or obsoleted entries that are only vaguely related to
|
|
the real meaning. You should understand the fuzzy merging mechanism
|
|
as a helpful feature to the translator only and never install fuzzy
|
|
translations unless you absolutely know what you are doing.
|
|
|
|
Pass Comments to Translators
|
|
----------------------------
|
|
|
|
The po files contain references for every message to the corresponding
|
|
source files as comments. But you still may feel a need for giving
|
|
hints to the translators. You may want to tell the translators, that
|
|
the good-bye message can be somewhat sloppy (or whatever you like).
|
|
This is simple to do. Have a look at the good-bye message in
|
|
"bin/simplecal.pl" and you will see that it is preceded by a comment
|
|
introduced with the string "TRANSLATORS:". If you start your Perl
|
|
comment like this, it will end up as a comment for translators in the
|
|
resulting po (resp. pot) file and may serve as a hint for translators.
|
|
|
|
In fact, the string "TRANSLATORS:" is arbitrarily chosen. If you
|
|
prefer another string, change it in the invocation of "xgettext" in
|
|
the skeleton Makefile provided here.
|
|
|
|
Informational Files
|
|
-------------------
|
|
|
|
You should put two additional files in your distribution. The first
|
|
one is "README-NLS". It should be a verbatim copy of the most recent
|
|
version found in the "simplecal" sample package. Please send
|
|
corrections or improvements to this file to the maintainer Guido
|
|
Flohr <guido.flohr@cantanea.com>, and add package-specific notes to your
|
|
documentation instead. Users expect this file to have a standard
|
|
contents, and they will not check it for changes on a regular basis.
|
|
|
|
The file "TRANSLATIONS" should reflect the current translation status
|
|
of your package. It should list all currently availabe translations,
|
|
their completeness, and it should also inform your user which translations
|
|
are actively maintained, and which are not. You can find a sample
|
|
in the "simplecal" sample package.
|
|
|
|
Bringing It All Together
|
|
------------------------
|
|
|
|
The above sounds definitely more complicated than it is. In practice
|
|
you code as before but mark all your strings with "__" and friends
|
|
like described in the POD of Locale::TextDomain. Before a new release
|
|
you change into the directory "po" of your distribution and type "make
|
|
update-po" to update the available translations. Distribute the
|
|
modified po files to your translators, and once you have collected
|
|
them all, type "make install" to add them to your distribution.
|
|
|
|
That's all, all translations will be available in your package now.
|
|
|
|
Internationalizing Existing Packages
|
|
------------------------------------
|
|
|
|
Internationalizing an already existing package with libintl-perl is
|
|
less painful than you think. The following roadmap should do it with
|
|
minimal effort.
|
|
|
|
First create a subdirectory "po" in your sources, copy the "Makefile"
|
|
from this sample, and copy and edit the files "TEXTDOMAIN" and
|
|
"LINGUAS" (LINGUAS can set the Makefile variable "LINGUAS" to the
|
|
empty string and TEXTDOMAIN should set "TEXTDOMAIN" to a name as
|
|
advised in the POD of Locale::TextDomain).
|
|
|
|
Next you have to mark the translatable strings in your sources with
|
|
"__" and friends. You can do that by hand, but isn't that the kind of
|
|
job that you have bought a computer for? List your source files in
|
|
"po/POTFILES" and then try
|
|
|
|
xgettext -a --files-from=POTFILES -o all.pot
|
|
|
|
The option "-a" instructs xgettext to extract *all* strings from your
|
|
sources. This option may miss a few strings (consider a bug report in
|
|
that case), it will issue a lot of warnings about "illegal variable
|
|
interpolations" (see the POD of Locale::TextDomain for workarounds)
|
|
and will put a lot of strings extracted from your sources into the
|
|
file "all.pot".
|
|
|
|
Now, load the file "all.pot" into an editor of your choice. If your
|
|
choice is "GNU emacs" you will have maximum comfort: Select an entry,
|
|
type "s" and you can cycle through the source files that this
|
|
particular entry originates from. Other PO editors like KBabel or
|
|
PO-Edit provide similar functionality. But even with the "Notepad" on
|
|
MS-DOS you will be able to navigate to the corresponding source file.
|
|
Once you have found the origin in your sources, you have to decide
|
|
whether this is a false positive, and you simply ignore it. If it is
|
|
a translatable string you either simply mark it with "__" or you
|
|
"repair" it.
|
|
|
|
What does "repair" mean? Again, the POD of Locale::TextDomain... In
|
|
brief: Your Perl sources will be full of stuff like:
|
|
|
|
die "Cannot open file '$filename': $!\n";
|
|
|
|
This string is not suitable for translation, because it is not
|
|
constant. It may change depending on the value of the variable
|
|
$filename and the value of $!. You will have to change that into
|
|
something like:
|
|
|
|
die __x ("Cannot open file '{filename}': {err}\n",
|
|
filename => $filename, err => $err);
|
|
|
|
Once you are done with marking the strings, you can try to run your
|
|
scripts/modules and you will see a lot of complaints by Perl that it
|
|
doesn't know about "__" (in various incarnations). Remember that "__"
|
|
is really a function call and you have to import the function "__" and
|
|
its relatives into your namespace.
|
|
|
|
What you have to do is to invent an identifier for your package (see
|
|
Locale::TextDomain for hints) and then add the following line to all
|
|
of your source files that produced errors:
|
|
|
|
use Locale::TextDomain ('Name-Of-My-Package');
|
|
|
|
You will be happy if "Name-Of-My-Package" is the same as the Makefile
|
|
variable "TEXTDOMAIN" in the file "po/TEXTDOMAIN" that you have
|
|
created in the beginning.
|
|
|
|
For the common case of a pure library: Is that really all I have to
|
|
do? Yes! What about POSIX::setlocale(), don't I have to make a call
|
|
somewhere? No, not for a library! And what about calls to textdomain()
|
|
and bindtextdomain() that I know from C or other languages? No, this
|
|
is all hidden in "use TextDomain (PACKAGENAME)" for Perl.
|
|
|
|
To make it clear again: A library should NEVER change the locale
|
|
settings. The script that uses a library (or multiple libraries)
|
|
should do that, and this boils down to three lines of Perl:
|
|
|
|
use POSIX qw (setlocale);
|
|
use Locale::Messages (LC_MESSAGES);
|
|
|
|
setlocale (LC_MESSAGES, "");
|
|
|
|
That means: The *calling* Perl script, the one that uses possibly
|
|
internationalized libraries, should initialize the locale settings to
|
|
the user preferences. Libraries should honor that setting but should
|
|
never change it. If a script misses a call to setlocale(), your
|
|
internationalized library will happily continue to work flawlessly
|
|
with the original English messages, it is up to the client programmer
|
|
to reveal the i18n features in your code!
|
|
|
|
Good luck!
|
|
|
|
Guido
|