304 lines
12 KiB
Plaintext
304 lines
12 KiB
Plaintext
#! /bin/false
|
|
|
|
# vim: set autoindent shiftwidth=4 tabstop=4:
|
|
|
|
# High-level interface to Perl i18n.
|
|
# Copyright (C) 2002-2017 Guido Flohr <guido.flohr@cantanea.com>,
|
|
# all rights reserved.
|
|
|
|
# This program is free software; you can redistribute it and/or modify it
|
|
# under the terms of the GNU Library General Public License as published
|
|
# by the Free Software Foundation; either version 2, or (at your option)
|
|
# any later version.
|
|
|
|
# This program is distributed in the hope that it will be useful,
|
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
# Library General Public License for more details.
|
|
|
|
# You should have received a copy of the GNU Library General Public
|
|
# License along with this program; if not, write to the Free Software
|
|
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1335,
|
|
# USA.
|
|
|
|
=head1 NAME
|
|
|
|
Locale::TextDomain::FAQ - Frequently asked questions for libintl-perl
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
This FAQ
|
|
|
|
=head1 QUESTIONS AND ANSWERS
|
|
|
|
=head2 Why is libintl-perl so big? Why don't you use Encode(3pm) for character
|
|
set conversion instead of rolling your own version?
|
|
|
|
Encode(3pm) requires at least Perl 5.7.x, whereas libintl-perl needs
|
|
to be operational on Perl 5.004. Internally, libintl-perl uses Encode(3pm)
|
|
if it is available.
|
|
|
|
|
|
=head2 Why do the gettext functions always unset the utf-8 flag on the strings
|
|
it returns?
|
|
|
|
Because the gettext functions do not know whether the string is encoded
|
|
in utf-8 or not. Instead of taking guesses, it rather unsets the flag.
|
|
|
|
|
|
=head2 Can I set the utf-8 flag on strings returned by the gettext family of
|
|
functions?
|
|
|
|
Yes, but it is not recommended. If you absolutely want to do it,
|
|
use the function bind_textdomain_filter in Locale::Messages for it.
|
|
|
|
The strings returned by gettext and friends are by default encoded in
|
|
the preferred charset for the user's locale, but there is no portable
|
|
way to find out, whether this is utf-8 or not. That means, you either
|
|
have to enforce utf-8 as the output character set (by means of
|
|
bind_textdomain_codeset() and/or the environment variable
|
|
OUTPUT_CHARSET) and override the user preference, or you run the risk
|
|
of marking strings as utf-8 which really aren't utf-8.
|
|
|
|
The whole concept behind that utf-8 flag introduced in Perl 5.6 is
|
|
seriously broken, and the above described dilemma is a proof for that.
|
|
The best thing you can do with that flag is get rid of it, and turn
|
|
it off. Your code will benefit from it and become less error prone,
|
|
more portable and faster.
|
|
|
|
|
|
=head2 Why do non-ASCII characters in my Gtk2 application look messed up?
|
|
|
|
The Perl binding of Gtk2 has a design flaw. It expects all UI messages
|
|
to be in UTF-8 and it also expects messages to be flagged as utf-8. The
|
|
only solution for you is to enforce all your po files to be encoded
|
|
in utf-8 (convert them manually, if you need to), and also enforce that
|
|
charset in your application, regardless of the user's locale settings.
|
|
Assumed that your textdomain is "org.bar.foo", you have to code the
|
|
following into your main module or script:
|
|
|
|
BEGIN {
|
|
bind_textdomain_filter 'org.bar.foo', \&turn_utf_8_on;
|
|
bind_textdomain_codeset 'org.bar.foo', 'utf-8';
|
|
}
|
|
|
|
See the File GTestRunner.pm of Test::Unit::GTestRunner(3pm) for details.
|
|
|
|
|
|
=head2 How do I interface Glade2 UI definitions with libintl-perl?
|
|
|
|
Gtk2::GladeXML(3pm) seems to ignore calls to bind_textdomain().
|
|
See the File GTestRunner.pm of Test::Unit::GTestRunner(3pm) for a
|
|
possible solution.
|
|
|
|
|
|
=head2 Why does Locale::TextDomain use a double underscore? I am used
|
|
to a single underscore from C or other languages.
|
|
|
|
Function names that consist of exactly one non-alphanumerical character
|
|
make the function automatically global in Perl. Besides, in Perl
|
|
6 the concatenation operator will be the underscore instead of the
|
|
dot.
|
|
|
|
=head2 How do I switch languages or force a certain language independently
|
|
from user settings read from the environment?
|
|
|
|
The simple answer is:
|
|
|
|
use POSIX qw (setlocale LC_ALL);
|
|
|
|
my $language = 'fr';
|
|
my $country = 'FR';
|
|
my $charset = 'iso-8859-1';
|
|
|
|
setlocale LC_ALL, "${language}_$country.$charset";
|
|
|
|
Sadly enough, this will fail in many cases. The problem is that locale
|
|
identifiers are not standardized and are completely system-dependent. Not
|
|
only their overall format, but also other details like case-sensitivity.
|
|
Some systems are very forgiving about the system - for example normalizing
|
|
charset descriptions - others very strict. In order to be reasonably
|
|
platform independent, you should try a list of possible locale identifiers
|
|
for your desired settings. This is about what I would try for achieving the
|
|
above:
|
|
|
|
my @tries = qw (
|
|
fr_FR.iso-8859-1 fr_FR.iso8859-1 fr_FR.iso88591
|
|
fr_FR.ISO-8859-1 fr_FR.ISO8859-1 fr_FR.ISO88591
|
|
fr.iso-8859-1 fr.iso8859-1 fr.iso88591
|
|
fr.ISO-8859-1 fr.ISO8859-1 fr.ISO88591
|
|
fr_FR
|
|
French_France.iso-8859-1 French_France.iso8859-1 French_France.iso88591
|
|
French_France.ISO-8859-1 French_France.ISO8859-1 French_France.ISO88591
|
|
French.iso-8859-1 French.iso8859-1 French.iso88591
|
|
French.ISO-8859-1 French.ISO8859-1 French.ISO88591
|
|
);
|
|
foreach my $try (@tries) {
|
|
last if setlocale LC_ALL, $try;
|
|
}
|
|
|
|
Set Locale::Util(3pm) for functions that help you with this.
|
|
|
|
Alternatively, you can force a certain language by setting the environment
|
|
variables LANGUAGE, LANG and OUTPUT_CHARSET, but this is only guaranteed
|
|
to work, if you use the pure Perl implementation of gettext (see the
|
|
documentation for select_package() in Locale::Messages(3pm)). You would
|
|
do the above like this:
|
|
|
|
use Locale::Messages qw (nl_putenv);
|
|
|
|
# LANGUAGE is a colon separated list of languages.
|
|
nl_putenv("LANGUAGE=fr_FR");
|
|
|
|
# If LANGUAGE is set, LANG should be set to the primary language.
|
|
# This is not needed for gettext, but for other parts of the system
|
|
# it is.
|
|
nl_putenv("LANG=fr_FR");
|
|
|
|
# Force an output charset like this:
|
|
nl_putenv("OUTPUT_CHARSET=iso-8859-1");
|
|
|
|
setlocale (LC_MESSAGES, 'C');
|
|
|
|
These environment variables are GNU extensions, and they are also
|
|
honored by libintl-perl. Still, you should always try to set the
|
|
locale with setlocale for the catch-all category LC_ALL. If you miss
|
|
to do so, your program's output maybe cluttered, mixing languages
|
|
and charsets, if the system runs in a locale that is not compatible
|
|
with your own language settings.
|
|
|
|
Remember that these environment variables are not guaranteed to
|
|
work, if you use an XS version of gettext. In order to force usage
|
|
of the pure Perl implementation, do the following:
|
|
|
|
Locale::Messages->select_package ('gettext_pp');
|
|
|
|
If you think, this is brain-damaged, you are right, but I cannot help
|
|
you. Actually there should be a more flexible API than setlocale,
|
|
but at the time of this writing there isn't. Until then, the recommentation
|
|
goes like this:
|
|
|
|
1) Try setting LC_ALL with Locale::Util.
|
|
2) If that does not succeed, either give up or ...
|
|
3) Reset LC_MESSAGES to C/POSIX.
|
|
4) Switch to pure Perl for gettext.
|
|
5) Set the environment variables LANGUAGE, LANG,
|
|
and OUTPUT_CHARSET to your desired values.
|
|
|
|
=head2 What is the advantage of libintl-perl over Locale::Maketext?
|
|
|
|
Of course, I can only give my personal opinion as an answer.
|
|
|
|
Locale::Maketext claims to fix design flaws in gettext. These alleged
|
|
design flaws, however, boil down to one pathological case which always
|
|
has a workaround. But both programmers and translators pay this
|
|
fix with an unnecessarily complicated interface.
|
|
|
|
The paramount advantage of libintl-perl is that it uses an approved
|
|
technology and concept. Except for Java(tm) programs, this is the
|
|
state-of-the-art concept for localizing Un*x software. Programmers
|
|
that have already localized software in C, C++, C#, Python, PHP,
|
|
or a number of other languages will feel instantly at home, when
|
|
localizing software written in Perl with libintl-perl. The same
|
|
holds true for the translators, because the files they deal with
|
|
have exactly the same format as those for other programming languages.
|
|
They can use the same set of tools, and even the commands they have
|
|
to execute are the same.
|
|
|
|
With libintl-perl refactoring of the software is painless, even if
|
|
you modify, add or delete translatable strings. The gettext tools
|
|
are powerful enough to reduce the effort of the translators to the
|
|
bare minimum. Maintaining the message catalogs of Locale::Maketext
|
|
in larger scale projects, is IMHO unfeasible.
|
|
|
|
Editing the message catalogs of Locale::Maketext - they are really
|
|
Perl modules - asks too much from most translators, unless
|
|
they are programmers. The portable object (po) files used by
|
|
libintl-perl have a simple syntax, and there are a bunch of specialized
|
|
GUI editors for these files, that facilitate the translation process
|
|
and hide most complexity from the user.
|
|
|
|
Furthermore, libintl-perl makes it possible to mix programming
|
|
languages without a paradigm shift in localization. Without any special
|
|
efforts, you can write a localized software that has modules written
|
|
in C, modules in Perl, and builds a Gtk user interface with Glade.
|
|
All translatable strings end up in one single message catalog.
|
|
|
|
Last but not least, the interface used by libintl-perl is plain
|
|
simple: Prepend translatable strings with a double underscore,
|
|
and you are done in most cases.
|
|
|
|
=head2 Why do single-quoted strings not work?
|
|
|
|
You probably write something like this:
|
|
|
|
print __'Hello';
|
|
|
|
And you get an error message like "Can't find string terminator "'" anywhere
|
|
before EOF at ...", or even "Bareword found where operator expected at
|
|
... Might be a runaway multi-line '' string starting on". The above line
|
|
is (really!) essentially the same as writing:
|
|
|
|
print __::Hello';
|
|
|
|
A lesser know feature of Perl is that you can use a single quote ("'") as
|
|
the separator in packages instead of the double colon (":"). What the
|
|
Perl parser sees in the first example is a valid package name ("__")
|
|
followed by the separator ("'"), then another valid package name ("Hello")
|
|
followed by a lone single quote. It is therefore not a problem in
|
|
libintl-perl but simple wrong Perl syntax. You have to correct alternatives:
|
|
|
|
print __ 'Hello'; # Insert a space to disambiguate.
|
|
|
|
Or use double-quotes:
|
|
|
|
print __"Hello";
|
|
|
|
Thanks to Slavi Agafonkin for pointing me to the solution of this mystery.
|
|
|
|
=head2 What options should be used with xgettext?
|
|
|
|
More precise, the question should be which '--keyword' and '--flag'
|
|
options for xgettext should be used. All other options are completely
|
|
dependent on your use-case.
|
|
|
|
If you are using L<Locale::Messages> or L<Locale::Gettext> for localizing
|
|
Perl code, the default keywords and default flags built into xgettext
|
|
are correct.
|
|
|
|
If you are using L<Locale::TextDomain> you have to use a long plethora
|
|
of command-line options for xgettext. Beginning with libintl-perl 1.28
|
|
you can use the library itself to produce these options:
|
|
|
|
perl -MLocale::TextDomain -e 'print Locale::TextDomain->options'
|
|
|
|
If you want to disable the use of the built-in default keywords, precede
|
|
the output of the above command with '--keyword=""'. That will reset
|
|
the keywords for xgettext.
|
|
|
|
=head2 Why Isn't There A Function N__x(), N__nx(), or N__px()?
|
|
|
|
The sole purpose of these functions would be to set proper flags in the
|
|
output of B<xgettext(1)>. You probably thought of something like this:
|
|
|
|
xgettext --keyword=N__x --flag=N__x:1:perl-brace-format filename.pl
|
|
|
|
First of all, xgettext(1) will I<always> set the flag correctly if the
|
|
argument to N__() I<looks> like a brace format string.
|
|
|
|
Second, you can set any flag you want on the PO entry with a source code
|
|
comment:
|
|
|
|
# xgettext: no-perl-brace-format
|
|
my $msg = N__("Placeholders are enclosed in {curly} braces.");
|
|
|
|
When B<xgettext(1)> extracts the string, it will appear like this in the
|
|
F<.pot> file:
|
|
|
|
#: filename.pl:2304
|
|
#, no-perl-brace-format
|
|
msgid "Placeholders are enclosed in {curly} braces."
|
|
msgstr ""
|
|
|
|
No reason to pollute the namespace with N__x functions.
|