hpricot throws exception when trying to parse url which has noscript tag

Posted by anusuya on Stack Overflow See other posts from Stack Overflow or by anusuya
Published on 2010-04-08T11:40:21Z Indexed on 2010/04/08 11:43 UTC
Read the original article Hit count: 483

Filed under:
|

I use hpricot gem in ruby on rails to parse a webpage and extract the meta-tag contents. But if the website has a <noscrpit> tag just after the <head> tag it throws an exception

Exception: undefined method `[]' for nil:NilClass

I even tried to update the gem to the latest version. but still the same.

this is the sample code i use.

require 'rubygems'
require 'hpricot'
require 'open-uri'
begin
       index_page = Hpricot(open("http://sample.com"))
       puts index_page.at("/html/head/meta[@name='verification']")['content'].gsub(/\s/, "")
rescue Exception => e
       puts "Exception: #{e}"
end

i was thinking to remove the noscript tag before giving the webpage to hpricot. or is there anyother way to do it??

© Stack Overflow or respective owner

Related posts about ruby

Related posts about ruby-on-rails