How can I get the file extensions from relative links in HTML text using Perl?

Posted by Structure on Stack Overflow See other posts from Stack Overflow or by Structure
Published on 2010-03-26T15:13:47Z Indexed on 2010/03/27 13:03 UTC
Read the original article Hit count: 101

Filed under:
|

For example, scanning the contents of an HTML page with a Perl regular expression, I want to match all file extensions but not TLD's in domain names. To do this I am making the assumption that all file extensions must be within double quotes.

I came up with the following, and it is working, however, I am failing to figure out a way to exclude the TLDs in the domains. This will return "com", "net", etc.

m/"[^<>]+\.([0-9A-Za-z]*)"/g

Is it possible to negate the match if there is more than one period between the quotes that are separated by text? (ie: match foo.bar.com but not ./ or ../)

Edit I am using $1 to return the value within parentheses.

© Stack Overflow or respective owner

Related posts about perl

Related posts about regex