Using Perl's Unicode encoding with the template toolkit
If we have a template which is in unicode, such as the following:
# Extracted from "http://www.chinese-poems.com/bo3.html" 自河南经乱关内阻饥兄弟离散各在一处… [% pinyin %]
and we try to run the Template Toolkit on it using a variable which is
marked as a character string using use utf8
:
#!/home/ben/software/install/bin/perl use warnings; use strict; use Template; use utf8; my $tt = Template->new ({ INCLUDE_PATH => ['.'], }); my %vars = ( pinyin => 'zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…', ); $tt->process ('unicode-file.txt', \%vars, 'tt-no-utf8.txt') or die $tt->error ();
an error message Wide character in print
is printed, and
the output looks like the following:
# Extracted from "http://www.chinese-poems.com/bo3.html" èªæ²³åç»ä¹±å ³å é»é¥¥å å¼ç¦»æ£åå¨ä¸å¤â¦ zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…
In order to use Perl's internal Unicode encoding with the Template
Toolkit, Template, and UTF-8 encoded files, it is necessary to
specify the encoding of the input templates when creating the template
object using the ENCODING
option:
#!/home/ben/software/install/bin/perl use warnings; use strict; use Template; use utf8; # Give the object an argument ENCODING my $tt = Template->new ({ ENCODING => 'utf8', INCLUDE_PATH => ['.'], }); my %vars = ( pinyin => 'zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…', ); $tt->process ('unicode-file.txt', \%vars, 'tt-no-binmode.txt') or die $tt->error ();
Now the output is OK:
# Extracted from "http://www.chinese-poems.com/bo3.html" 自河南经乱关内阻饥兄弟离散各在一处… zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…
But this creates an error "wide character in print".
Wide character in print at /home/ben/software/install/lib/perl5/site_perl/5.18.1/i386-freebsd/Template.pm line 201.
To fix this, specify a further option to the
object's process
command, binmode =>
':encoding(utf8)'
:
#!/home/ben/software/install/bin/perl use warnings; use strict; use Template; use utf8; my $tt = Template->new ({ ENCODING => 'utf8', INCLUDE_PATH => ['.'], }); my %vars = ( pinyin => 'zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…', ); $tt->process ('unicode-file.txt', \%vars, 'tt-fixed.txt', binmode => ':encoding(utf8)') or die $tt->error ();
The output looks the same:
# Extracted from "http://www.chinese-poems.com/bo3.html" 自河南经乱关内阻饥兄弟离散各在一处… zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…
Note that there is yet another case where problems can occur, where
you tell Template that your file is in UTF-8, but you forget to mark
the strings in your program as utf8 using use utf8;
:
#!/home/ben/software/install/bin/perl use warnings; use strict; use Template; my $tt = Template->new ({ ENCODING => 'utf8', INCLUDE_PATH => ['.'], }); # We don't have "use utf8;" so Perl doesn't treat the following as a # character string: my %vars = ( pinyin => 'zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…', ); $tt->process ('unicode-file.txt', \%vars, 'tt-no-use-utf8.txt', binmode => ':encoding(utf8)') or die $tt->error ();
In this case, the strings from your program are corrupted:
# Extracted from "http://www.chinese-poems.com/bo3.html" 自河南经乱关内阻饥兄弟离散各在一处… zì hé nán jÄ«ng luà n guÄn nèi zÇ jÄ« xiÅng dì là sÇn gè zà i yÄ« chùâ¦
Finally, it's also possible to get correct results if you use none of the above.
#!/home/ben/software/install/bin/perl use warnings; use strict; use Template; my $tt = Template->new ({ INCLUDE_PATH => ['.'], }); my %vars = ( pinyin => 'zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…', ); $tt->process ('unicode-file.txt', \%vars, 'tt-all-off.txt') or die $tt->error ();
The output looks like this:
# Extracted from "http://www.chinese-poems.com/bo3.html" 自河南经乱关内阻饥兄弟离散各在一处… zì hé nán jīng luàn guān nèi zǔ jī xiōng dì lí sǎn gè zài yī chù…
However, in this case, it's up to the programmer to ensure that both the template and the Perl code are already in the same encoding, such as UTF-8, otherwise a garbled result will be produced. Perl will do no checking at all of the encoding of either input or output.
Web links
- Documentation for the "process" method at template-toolkit.org.
- Documentation for the "process" method at search.cpan.org