Using Unicode in a Perl CGI script
Output Unicode
To use Unicode in a Perl CGI (Common Gateway Interface) program, the most convenient format is to encode the data in the UTF-8 format. In a CGI program, the Content-Type header should take the form
Content-Type: text/html; charset=UTF-8
With the CGI module from CPAN, this header may be obtained by
using the option -charset
when printing the header:
print header (-charset => 'UTF-8');
Alternatively, add the following to the program's HTML output:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
This tells the web server to send a Content-Type header as shown above, rather than doing it via the actual header which the CGI program prints.
To print out Unicode characters which have been encoded by Perl, use
binmode STDOUT, ":utf8";
Without this, Perl will print warnings of the form "Wide character in print", which usually go to the error log file.
If the CGI program itself contains Unicode characters, turn on Perl's
Unicode encoding using use utf8;
. This tells Perl that
the program text itself contains non-ASCII Unicode characters.
Input Unicode
If you have a CGI script using a GET method where the input comes from
the value of the query itself (as found
in $ENV{QUERY_STRING}
), and the input contains Unicode in
the form of percentage-encoded characters like "input=%E1%d3%99", you can
decode it using
use URI::Escape; use Encode 'decode_utf8'; my $query_string = $ENV{QUERY_STRING}; $query_string = uri_unescape ($query_string); $query_string = decode_utf8 ($query_string);
In practice it is necessary to parse the query string before reading it because the percentage-encoded parts of the query string may contain equals signs or ampersands, so it will be impossible to distinguish form parameters from decoded parts.
Using the CGI module,
use CGI; use Encode 'decode_utf8'; my $value = params ('input'); $value = decode_utf8 ($value);
This can be simplified using the -utf8
option to CGI:
use CGI '-utf8'; my $value = params ('input');
(According to CGI's documentation, the -utf8
may cause
problems with POST requests containing binary files.)