How to set the mechanize page encoding?

Posted by Juan Medín on Stack Overflow See other posts from Stack Overflow or by Juan Medín
Published on 2009-12-12T03:31:43Z Indexed on 2010/03/08 8:06 UTC
Read the original article Hit count: 644

Filed under:
|
|

Hi,

I'm trying to get a page with an ISO-8859-1 encoding clicking on a link, so the code is similar to this:

page_result = page.link_with( :text => 'link_text' ).click

So far I get the result with a wrong encoding, so I see characters like:

'T?tulo:' instead of 'Título:'

I've tried several approaches, including:

  • Stating the encoding in the first request using the agent like:

    @page_search = @agent.get(
      :url => 'http://www.server.com',
      :headers => { 'Accept-Charset' => 'ISO-8859-1' } )
    
  • Stating the encoding for the page itself

      page_result.encoding = 'ISO-8859-1'
    

But I must be doing something wrong: a simple puts always show the wrong characters.

Do you know how to state the encoding?

Thanks in advance,

Added: Executable example:

require 'rubygems'
require 'mechanize'

WWW::Mechanize::Util::CODE_DIC[:SJIS] = "ISO-8859-1"

@agent = WWW::Mechanize.new

@page = @agent.get(
  :url => 'http://www.mcu.es/webISBN/tituloSimpleFilter.do?cache=init&layout=busquedaisbn&language=es',
  :headers => { 'Accept-Charset' => 'utf-8' } )

puts @page.body

© Stack Overflow or respective owner

Related posts about ruby

Related posts about mechanize