How can I get the file extensions from relative links in HTML text using Perl?

Posted by Structure on Stack Overflow See other posts from Stack Overflow or by Structure
Published on 2010-03-26T15:13:47Z Indexed on 2010/03/27 13:03 UTC
Read the original article Hit count: 140

Filed under:

perl

|

regex

For example, scanning the contents of an HTML page with a Perl regular expression, I want to match all file extensions but not TLD's in domain names. To do this I am making the assumption that all file extensions must be within double quotes.

I came up with the following, and it is working, however, I am failing to figure out a way to exclude the TLDs in the domains. This will return "com", "net", etc.

m/"[^<>]+\.([0-9A-Za-z]*)"/g

Is it possible to negate the match if there is more than one period between the quotes that are separated by text? (ie: match foo.bar.com but not ./ or ../)

Edit I am using $1 to return the value within parentheses.

© Stack Overflow or respective owner

Related posts about perl

Munin on Centos 6 - missing perl MODULE_COMPAT_5.8.8

as seen on Server Fault - Search for 'Server Fault'
I'm trying to install Munin on a new VPS through yum install munin but I keep getting an error about a missing perl module: Requires: perl(:MODULE_COMPAT_5.8.8). This is the perl version currently installed: v5.10.1. I've searched all around and still haven't found a solution for this. Here's the… >>> More
Pain removing a perl rootkit

as seen on Server Fault - Search for 'Server Fault'
So, we host a geoservice webserver thing at the office. Someone apparently broke into this box (probably via ftp or ssh), and put some kind of irc-managed rootkit thing. Now I'm trying to clean the whole thing up, I found the process pid who tries to connect via irc, but i can't figure out who's… >>> More
How To Avoid a Perl script calling an Another Perl Script

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, i am calling a perl script client.pl from a main script to capture the output of client.pl in @output. is there anyway to avoid the use of these two files so i can use the output of client.pl in main.pl itself here is my code.... main.pl ======= my @output = readpipe("client.pl"); client… >>> More
Perl :how to sort dates in perl

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, How can I sort the dates in perl. my @dates = ( "02/11/2009" , "12/20/2001" , "11/21/2010" ) ; I have above dates in my array . How can I sort those dates... ? My date format is dd/mm/YYYY. >>> More
please suggest a perl book exclusively for perl programs

as seen on Stack Overflow - Search for 'Stack Overflow'
I want tha name of a perl book for only PERL PROGRAMS. The reason behind is I want to improve my programming skill in perl >>> More

Related posts about regex

Find multiple regex in each line and skip result if one of the regex doesn't match

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a list of variables: variables = ['VariableA', 'VariableB','VariableC'] which I'm going to search for, line by line ifile = open("temp.txt",'r') d = {} match = zeros(len(variables)) for line in ifile: emptyCells=0 for i in range(len(variables)): regex = r'('+variables[i]+r')[:|=|\(](-… >>> More
OWASP Regex Repository: Is this regex correct?

as seen on Stack Overflow - Search for 'Stack Overflow'
I was looking at the regular expression for validating various data types from the (OWASP Regex Repository). One of the regular expressions in there is called safetext and looks like: ^[a-zA-Z0-9\s.\-]+$ My first question is: Is this regular expression correct? complementary question If this… >>> More
Make a Perl-style regex interpreter behave like a basic or extended regex interpreter

as seen on Stack Overflow - Search for 'Stack Overflow'
I am writing a tool to help students learn regular expressions. I will probably be writing it in Java. The idea is this: the student types in a regular expression and the tool shows which parts of a text will get matched by the regex. Simple enough. But I want to support several different regex… >>> More
JS regex isn't matching, even thought it works with a regex tester

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm writing a piece of client-side javascript code that takes a function and finds the derivative of it, however, the regex that's supposed to match with the power rule fails to work in the context of the javascript program, even though it sucessfully matches when it's used with an independent regex… >>> More
c# RegEx with "|"

as seen on Stack Overflow - Search for 'Stack Overflow'
I need to be able to check for a pattern with | in them. For example an expression like d*|*t should return true for a string like "dtest|test". I'm no regex hero so I just tried a couple of things, like: Regex Pattern = new Regex("s*\|*d"); //unable to build because of single backslash Regex Pattern… >>> More