Retrieving the first picture with a HTML parser
        Posted  
        
            by justin01
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by justin01
        
        
        
        Published on 2010-06-03T19:02:37Z
        Indexed on 
            2010/06/03
            19:04 UTC
        
        
        Read the original article
        Hit count: 209
        
Hey guys,
(Not a native english speaker)
I'm doing a personal project in PHP in which I use the Simple HTML Parser to parse the HTML of a given URL and retrieve the first image in a DIV that have a specific ID or class (maincontent, content, main, wrapper, etc. - it's all in an array) and ignore ads. The goal is to take this image and make a thumbnail with it, pretty much like on Digg and others.
I thought everything was working fine until I tried my script with the website Snopes ("http://www.snopes.com/photos/animals/luckycoyote.asp" <- this page more exactly).
The source of the first image it gets is: " graphics/luckycoyote1.jpg ". So far, to correct this problem I created a little function that gets the domain name of the given URL and insert it before the IMG's source attribute. So for sites like Snopes.com, it gives me: "http://www.snopes.com/graphics/luckycoyote1.jpg" ... while the real URL for this image is "http://www.snopes.com*/photos/animals/graphics/luckycoyote1.jpg*" (or, more precisely: " http://graphics1.snopes.com/photos/animals/graphics/luckycoyote1.jpg " - note the subdomain here).
So my main question is: how can I externally/dynamically retrieve the full URL address of an image ("absolute path") when I am only given the "relative path"? I'm pretty sure this is possible, since when I paste the link in Facebook's "What are you doing?" field for example, it gives me the correct path to the image while on the website, the source of the image is only (example) "image/photo/example.jpg".
Thank you for your time.
© Stack Overflow or respective owner