How to handle building and parsing HTTP URL's / URI's / paths in Perl

Posted by Robert S. Barnes on Stack Overflow See other posts from Stack Overflow or by Robert S. Barnes
Published on 2010-04-19T12:13:40Z Indexed on 2010/04/19 12:23 UTC
Read the original article Hit count: 213

Filed under:
|
|

I have a wget like script which downloads a page and then retrieves all the files linked in img tags on that page.

Given the URL of the original page and the the link extracted from the img tag in that page I need to build the URL for the image file I want to retrieve. Currently I use a function I wrote:

sub build_url {
    my ( $base, $path ) = @_;

    # if the path is absolute just prepend the domain to it
    if ($path =~ /^\//) {
        ($base) = $base =~ /^(?:http:\/\/)?(\w+(?:\.\w+)+)/;
        return "$base$path";
    }

    my @base = split '/', $base;
    my @path = split '/', $path;

    # remove a trailing filename
    pop @base if $base =~ /[[:alnum:]]+\/[\w\d]+\.[\w]+$/;

    # check for relative paths
    my $relcount = $path =~ /(\.\.\/)/g;
    while ( $relcount-- ) {
        pop @base;
        shift @path;
    }
    return join '/', @base, @path;
}

The thing is, I'm surely not the first person solving this problem, and in fact it's such a general problem that I assume there must be some better, more standard way of dealing with it, using either a core module or something from CPAN - although via a core module is preferable. I was thinking about File::Spec but wasn't sure if it has all the functionality I would need.

© Stack Overflow or respective owner

Related posts about perl

Related posts about http