Extract a pattern from the output of curl
        Posted  
        
            by allentown
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by allentown
        
        
        
        Published on 2010-04-25T04:14:29Z
        Indexed on 
            2010/04/25
            4:23 UTC
        
        
        Read the original article
        Hit count: 436
        
I would like to use curl, on the command line, to grab a url, pipe it to a pattern, and return a list of urls that match that pattern.
I am running into problems with greedy aspects of the pattern, and can not seem to get past it. Any help on this would be apprecaited.
curl http://www.reddit.com/r/pics/ | grep -ioE "http://imgur\.com/.+(jpg|jpeg|gif|png)"
So, grab the data from the url, which returns a mess of html, which may need some linebreaks somehow replaced in, onless the regex can return more than one pattern in a single line. The patter is pretty simple, any string that matches...
- starts with http://imgur.com/
 - has A-Z a-z 0-9 (maybe some others) and is so far, 5 chars long, 8 should cover it forever if I wanted to limit that aspect of the patter, which I don't
 - ends in a .grraphic_file_format_extention (jpg, jpeg, gif, png)
 
Thats about it, at that url, with default settings, I should generally get back a good set of images. I would not be objectionable to using the RSS feel url for the same page, it may be easier to parse actually.
Thanks everyone!
© Stack Overflow or respective owner