A mod_speling using Perl and CGI
This is an example of using Text::Fuzzy to provide an
alternative to mod_speling with a Perl CGI script. If this CGI
script is called something like "misspelt-web-page.cgi" and put in the
top directory, it can be used to handle "Not Found" errors with
a .htaccess
file containing the line
ErrorDocument 404 /misspelt-web-page.cgi
The redirection script is as follows:
#!/home/ben/software/install/bin/perl use warnings; use strict; use Text::Fuzzy; # The directory of files served by the web server. my $web_root = '/usr/local/www/data'; # If the query is "http://www.example.com/abc/xyz.html", $path_info is # "abc/xyz.html". my $path_info = $ENV{REQUEST_URI}; if (! defined $path_info) { fail ("No path info"); } if ($0 =~ /$path_info/) { fail ("Don't redirect to self"); } # This is the list of files under the main page. my @allfiles = get_all_files ($web_root, ''); # This is our spelling search engine. my $tf = Text::Fuzzy->new ($path_info); my $nearest = $tf->nearest (\@allfiles, max => 5); if (defined $nearest) { redirect ($allfiles[$nearest]); } else { fail ("Nothing like $path_info was found"); } exit; # Read all the files under "$root/$dir". This is recursive. The return # value is an array containing all files found. sub get_all_files { my ($root, $dir) = @_; my @allfiles; my $full_dir = "$root/$dir"; if (! -d $full_dir) { fail ("$full_dir is not a directory"); } opendir DIR, $full_dir or fail ("Can't open directory $full_dir: $!"); my @files = grep !/^\./, readdir DIR; closedir DIR or fail ("Can't close $full_dir: $!"); for my $file (@files) { my $dir_file = "$dir/$file"; my $full_file = "$root/$dir_file"; if (-d $full_file) { push @allfiles, get_all_files ($root, $dir_file); } else { push @allfiles, $dir_file; } } return @allfiles; } # Print a "permanent redirect" to the respelt name, then exit. sub redirect { my ($url) = @_; print <<EOF; Status: 301 Location: $url EOF exit; } # Print an error message for the sake of the requester, and print a # message to the error log, then exit. sub fail { my ($error) = @_; print <<EOF; Content-Type: text/plain $error EOF # Add the name of the program and the time to the error message, # otherwise the error log will get awfully confusing-looking. my $time = scalar gmtime (); print STDERR "$0: $time: $error\n"; exit; }
For the sake of this web server, the script is named with a suffix ".pl".
The script works by building a list of all the files in the directory, and then searching through this list using a Text::Fuzzy object for the nearest match. If it finds a match, it prints a redirection header which sends the user's browser to the matching file.
Copyright © Ben Bullock 2009-2024. All
rights reserved.
For comments, questions, and corrections, please email
Ben Bullock
(benkasminbullock@gmail.com).
/
Privacy /
Disclaimer