server/usr/share/perl5/Locale/libintlFAQ.pod

#! /bin/false

# vim: set autoindent shiftwidth=4 tabstop=4:

# High-level interface to Perl i18n.
# Copyright (C) 2002-2017 Guido Flohr <guido.flohr@cantanea.com>,
# all rights reserved.

# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU Library General Public License as published
# by the Free Software Foundation; either version 2, or (at your option)
# any later version.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Library General Public License for more details.

# You should have received a copy of the GNU Library General Public
# License along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1335,
# USA.

=head1 NAME

Locale::TextDomain::FAQ - Frequently asked questions for libintl-perl

=head1 DESCRIPTION

This FAQ

=head1 QUESTIONS AND ANSWERS

=head2 Why is libintl-perl so big?  Why don't you use Encode(3pm) for character
set conversion instead of rolling your own version?

Encode(3pm) requires at least Perl 5.7.x, whereas libintl-perl needs
to be operational on Perl 5.004.  Internally, libintl-perl uses Encode(3pm)
if it is available.


=head2 Why do the gettext functions always unset the utf-8 flag on the strings
it returns?

Because the gettext functions do not know whether the string is encoded
in utf-8 or not.  Instead of taking guesses, it rather unsets the flag.


=head2 Can I set the utf-8 flag on strings returned by the gettext family of
functions?

Yes, but it is not recommended.  If you absolutely want to do it,
use the function bind_textdomain_filter in Locale::Messages for it.

The strings returned by gettext and friends are by default encoded in
the preferred charset for the user's locale, but there is no portable
way to find out, whether this is utf-8 or not.  That means, you either
have to enforce utf-8 as the output character set (by means of
bind_textdomain_codeset() and/or the environment variable
OUTPUT_CHARSET) and override the user preference, or you run the risk
of marking strings as utf-8 which really aren't utf-8.

The whole concept behind that utf-8 flag introduced in Perl 5.6 is
seriously broken, and the above described dilemma is a proof for that.
The best thing you can do with that flag is get rid of it, and turn
it off.  Your code will benefit from it and become less error prone,
more portable and faster.


=head2 Why do non-ASCII characters in my Gtk2 application look messed up?

The Perl binding of Gtk2 has a design flaw.  It expects all UI messages
to be in UTF-8 and it also expects messages to be flagged as utf-8.  The
only solution for you is to enforce all your po files to be encoded
in utf-8 (convert them manually, if you need to), and also enforce that
charset in your application, regardless of the user's locale settings.
Assumed that your textdomain is "org.bar.foo", you have to code the
following into your main module or script:

  BEGIN {
      bind_textdomain_filter 'org.bar.foo', \&turn_utf_8_on;
      bind_textdomain_codeset 'org.bar.foo', 'utf-8';
  }

See the File GTestRunner.pm of Test::Unit::GTestRunner(3pm) for details.


=head2 How do I interface Glade2 UI definitions with libintl-perl?

Gtk2::GladeXML(3pm) seems to ignore calls to bind_textdomain().
See the File GTestRunner.pm of Test::Unit::GTestRunner(3pm) for a
possible solution.


=head2 Why does Locale::TextDomain use a double underscore?  I am used
to a single underscore from C or other languages.

Function names that consist of exactly one non-alphanumerical character
make the function automatically global in Perl.  Besides, in Perl
6 the concatenation operator will be the underscore instead of the
dot.

=head2 How do I switch languages or force a certain language independently
from user settings read from the environment?

The simple answer is:

    use POSIX qw (setlocale LC_ALL);

    my $language = 'fr';
    my $country = 'FR';
    my $charset = 'iso-8859-1';

    setlocale LC_ALL, "${language}_$country.$charset";

Sadly enough, this will fail in many cases.  The problem is that locale
identifiers are not standardized and are completely system-dependent.  Not
only their overall format, but also other details like case-sensitivity.
Some systems are very forgiving about the system - for example normalizing
charset descriptions - others very strict.  In order to be reasonably
platform independent, you should try a list of possible locale identifiers
for your desired settings.  This is about what I would try for achieving the
above:

   my @tries = qw (
   	fr_FR.iso-8859-1 fr_FR.iso8859-1 fr_FR.iso88591
	fr_FR.ISO-8859-1 fr_FR.ISO8859-1 fr_FR.ISO88591
   	fr.iso-8859-1 fr.iso8859-1 fr.iso88591
	fr.ISO-8859-1 fr.ISO8859-1 fr.ISO88591
	fr_FR
   	French_France.iso-8859-1 French_France.iso8859-1 French_France.iso88591
	French_France.ISO-8859-1 French_France.ISO8859-1 French_France.ISO88591
   	French.iso-8859-1 French.iso8859-1 French.iso88591
	French.ISO-8859-1 French.ISO8859-1 French.ISO88591
   );
   foreach my $try (@tries) {
   	last if setlocale LC_ALL, $try;
   }

Set Locale::Util(3pm) for functions that help you with this.

Alternatively, you can force a certain language by setting the environment
variables LANGUAGE, LANG and OUTPUT_CHARSET, but this is only guaranteed
to work, if you use the pure Perl implementation of gettext (see the
documentation for select_package() in Locale::Messages(3pm)). You would
do the above like this:

    use Locale::Messages qw (nl_putenv);

    # LANGUAGE is a colon separated list of languages.
    nl_putenv("LANGUAGE=fr_FR");

    # If LANGUAGE is set, LANG should be set to the primary language.
    # This is not needed for gettext, but for other parts of the system
    # it is.
    nl_putenv("LANG=fr_FR");

    # Force an output charset like this:
    nl_putenv("OUTPUT_CHARSET=iso-8859-1");

    setlocale (LC_MESSAGES, 'C');

These environment variables are GNU extensions, and they are also
honored by libintl-perl.  Still, you should always try to set the
locale with setlocale for the catch-all category LC_ALL.  If you miss
to do so, your program's output maybe cluttered, mixing languages
and charsets, if the system runs in a locale that is not compatible
with your own language settings.

Remember that these environment variables are not guaranteed to
work, if you use an XS version of gettext.  In order to force usage
of the pure Perl implementation, do the following:

    Locale::Messages->select_package ('gettext_pp');

If you think, this is brain-damaged, you are right, but I cannot help
you.  Actually there should be a more flexible API than setlocale,
but at the time of this writing there isn't.  Until then, the recommentation
goes like this:

	1) Try setting LC_ALL with Locale::Util.
	2) If that does not succeed, either give up or ...
	3) Reset LC_MESSAGES to C/POSIX.
	4) Switch to pure Perl for gettext.
        5) Set the environment variables LANGUAGE, LANG,
	   and OUTPUT_CHARSET to your desired values.

=head2 What is the advantage of libintl-perl over Locale::Maketext?

Of course, I can only give my personal opinion as an answer.

Locale::Maketext claims to fix design flaws in gettext.  These alleged
design flaws, however, boil down to one pathological case which always
has a workaround.  But both programmers and translators pay this
fix with an unnecessarily complicated interface.

The paramount advantage of libintl-perl is that it uses an approved
technology and concept.  Except for Java(tm) programs, this is the
state-of-the-art concept for localizing Un*x software.  Programmers
that have already localized software in C, C++, C#, Python, PHP,
or a number of other languages will feel instantly at home, when
localizing software written in Perl with libintl-perl.  The same
holds true for the translators, because the files they deal with
have exactly the same format as those for other programming languages.
They can use the same set of tools, and even the commands they have
to execute are the same.

With libintl-perl refactoring of the software is painless, even if
you modify, add or delete translatable strings.  The gettext tools
are powerful enough to reduce the effort of the translators to the
bare minimum.  Maintaining the message catalogs of Locale::Maketext
in larger scale projects, is IMHO unfeasible.

Editing the message catalogs of Locale::Maketext - they are really
Perl modules - asks too much from most translators, unless
they are programmers.  The portable object (po) files used by
libintl-perl have a simple syntax, and there are a bunch of specialized
GUI editors for these files, that facilitate the translation process
and hide most complexity from the user.

Furthermore, libintl-perl makes it possible to mix programming
languages without a paradigm shift in localization.  Without any special
efforts, you can write a localized software that has modules written
in C, modules in Perl, and builds a Gtk user interface with Glade.
All translatable strings end up in one single message catalog.

Last but not least, the interface used by libintl-perl is plain
simple:  Prepend translatable strings with a double underscore,
and you are done in most cases.

=head2 Why do single-quoted strings not work?

You probably write something like this:

    print __'Hello';

And you get an error message like "Can't find string terminator "'" anywhere
before EOF at ...", or even "Bareword found where operator expected at
... Might be a runaway multi-line '' string starting on".  The above line
is (really!) essentially the same as writing:

    print __::Hello';

A lesser know feature of Perl is that you can use a single quote ("'") as
the separator in packages instead of the double colon (":").  What the
Perl parser sees in the first example is a valid package name ("__")
followed by the separator ("'"), then another valid package name ("Hello")
followed by a lone single quote.  It is therefore not a problem in
libintl-perl but simple wrong Perl syntax.  You have to correct alternatives:

    print __ 'Hello';   # Insert a space to disambiguate.

Or use double-quotes:

    print __"Hello";

Thanks to Slavi Agafonkin for pointing me to the solution of this mystery.

=head2 What options should be used with xgettext?

More precise, the question should be which '--keyword' and '--flag'
options for xgettext should be used.  All other options are completely
dependent on your use-case.

If you are using L<Locale::Messages> or L<Locale::Gettext> for localizing
Perl code, the default keywords and default flags built into xgettext
are correct.

If you are using L<Locale::TextDomain> you have to use a long plethora
of command-line options for xgettext.  Beginning with libintl-perl 1.28
you can use the library itself to produce these options:

    perl -MLocale::TextDomain -e 'print Locale::TextDomain->options'

If you want to disable the use of the built-in default keywords, precede
the output of the above command with '--keyword=""'.  That will reset
the keywords for xgettext.

=head2 Why Isn't There A Function N__x(), N__nx(), or N__px()?

The sole purpose of these functions would be to set proper flags in the
output of B<xgettext(1)>.  You probably thought of something like this:

    xgettext --keyword=N__x --flag=N__x:1:perl-brace-format filename.pl

First of all, xgettext(1) will I<always> set the flag correctly if the
argument to N__() I<looks> like a brace format string.

Second, you can set any flag you want on the PO entry with a source code
comment:

   # xgettext: no-perl-brace-format
   my $msg = N__("Placeholders are enclosed in {curly} braces.");

When B<xgettext(1)> extracts the string, it will appear like this in the
F<.pot> file:

    #: filename.pl:2304
    #, no-perl-brace-format
    msgid "Placeholders are enclosed in {curly} braces."
    msgstr ""

No reason to pollute the namespace with N__x functions.