Using Perl's Unicode encoding with the template toolkit

If we have a template which is in unicode, such as the following:

# Extracted from "http://www.chinese-poems.com/bo3.html"

自河南经乱关内阻饥兄弟离散各在一处…

[% pinyin %]

and we try to run the Template Toolkit on it using a variable which is marked as a character string using use utf8:

#!/home/ben/software/install/bin/perl
use warnings;
use strict;
use Template;
use utf8;

my $tt = Template->new ({
    INCLUDE_PATH => ['.'],
});

my %vars = (
    pinyin => 'zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…',
);

$tt->process ('unicode-file.txt', \%vars, 'tt-no-utf8.txt')
    or die $tt->error ();

(download)

an error message Wide character in print is printed, and the output looks like the following:

# Extracted from "http://www.chinese-poems.com/bo3.html"

自河南经乱关内阻饥兄弟离散各在一处…

zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…

In order to use Perl's internal Unicode encoding with the Template Toolkit, Template, and UTF-8 encoded files, it is necessary to specify the encoding of the input templates when creating the template object using the ENCODING option:

#!/home/ben/software/install/bin/perl
use warnings;
use strict;
use Template;
use utf8;

# Give the object an argument ENCODING

my $tt = Template->new ({
    ENCODING => 'utf8',
    INCLUDE_PATH => ['.'],
});


my %vars = (
    pinyin => 'zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…',
);

$tt->process ('unicode-file.txt', \%vars, 'tt-no-binmode.txt')
    or die $tt->error ();

(download)

Now the output is OK:

# Extracted from "http://www.chinese-poems.com/bo3.html"

自河南经乱关内阻饥兄弟离散各在一处…

zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…

But this creates an error "wide character in print".

Wide character in print at /home/ben/software/install/lib/perl5/site_perl/5.18.1/i386-freebsd/Template.pm line 201.

To fix this, specify a further option to the object's process command, binmode => ':encoding(utf8)':

#!/home/ben/software/install/bin/perl
use warnings;
use strict;
use Template;
use utf8;

my $tt = Template->new ({
    ENCODING => 'utf8',
    INCLUDE_PATH => ['.'],
});

my %vars = (
    pinyin => 'zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…',
);

$tt->process ('unicode-file.txt', \%vars, 'tt-fixed.txt',
              binmode => ':encoding(utf8)')
    or die $tt->error ();

(download)

The output looks the same:

# Extracted from "http://www.chinese-poems.com/bo3.html"

自河南经乱关内阻饥兄弟离散各在一处…

zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…

Note that there is yet another case where problems can occur, where you tell Template that your file is in UTF-8, but you forget to mark the strings in your program as utf8 using use utf8;:

#!/home/ben/software/install/bin/perl
use warnings;
use strict;
use Template;

my $tt = Template->new ({
    ENCODING => 'utf8',
    INCLUDE_PATH => ['.'],
});

# We don't have "use utf8;" so Perl doesn't treat the following as a
# character string:

my %vars = (
    pinyin => 'zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…',
);

$tt->process ('unicode-file.txt', \%vars, 'tt-no-use-utf8.txt',
              binmode => ':encoding(utf8)')
    or die $tt->error ();

(download)

In this case, the strings from your program are corrupted:

# Extracted from "http://www.chinese-poems.com/bo3.html"

自河南经乱关内阻饥兄弟离散各在一处…

zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…

Finally, it's also possible to get correct results if you use none of the above.

#!/home/ben/software/install/bin/perl
use warnings;
use strict;
use Template;

my $tt = Template->new ({
    INCLUDE_PATH => ['.'],
});

my %vars = (
    pinyin => 'zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…',
);

$tt->process ('unicode-file.txt', \%vars, 'tt-all-off.txt')
    or die $tt->error ();

(download)

The output looks like this:

# Extracted from "http://www.chinese-poems.com/bo3.html"

自河南经乱关内阻饥兄弟离散各在一处…

zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…

However, in this case, it's up to the programmer to ensure that both the template and the Perl code are already in the same encoding, such as UTF-8, otherwise a garbled result will be produced. Perl will do no checking at all of the encoding of either input or output.

Web links


Copyright © Ben Bullock 2009-2024. All rights reserved. For comments, questions, and corrections, please email Ben Bullock (benkasminbullock@gmail.com). / Privacy / Disclaimer