Scraping &#151 character (long dash) error in Nokogiri

Posted by DavidP6 on Stack Overflow See other posts from Stack Overflow or by DavidP6
Published on 2010-05-12T18:43:19Z Indexed on 2010/05/12 18:44 UTC
Read the original article Hit count: 230

Filed under:
|

I having trouble scraping a certain long dash that is encoded as — ; on the Time magazine site. It looks like this: —. It works fine when this dash is encoded as mdash, but when the problem dash is scraped, it is returned as unknown characters. I am using Nokogiri and am wondering if I have to use some sort of special encoding? The page says it is encoded with UTF-8.

© Stack Overflow or respective owner

Related posts about nokogiri

Related posts about screen-scraping