visual analysis of web pages in ruby
        Posted  
        
            by 
                Clint Miller
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Clint Miller
        
        
        
        Published on 2011-01-06T18:39:32Z
        Indexed on 
            2011/01/06
            18:54 UTC
        
        
        Read the original article
        Hit count: 248
        
I'm looking to write some code that does visual analysis of web pages, preferably using Ruby. My code will need to be able to determine the top, left, width, height, background color, color, and font size for all the elements in the DOM. Of course, these values can only be calculated once all CSS is applied. So, I don't think that Nokogiri is up for the job. Ultimately, I'm trying to use this data in a VIPS-like (Vision-Based Page Segmentation) algorithm in an attempt to find the main content in downloaded news articles.
I've considered using Watir to drive Chrome or Firefox and then extract the data. The problem is that browsers can't be run headless through Watir (I think). Ultimately, this code will be running on an array of Linux servers in a data center. So, the code won't have easy access to an X Server for displaying the browser.
I suppose one solution is to use Watir and run a headless X Server on the Linux servers. That's a bit of a pain, but it looks like my best option right now.
Does anyone have any better ideas?
© Stack Overflow or respective owner