Trouble with encoding and urllib

Posted by Ockonal on Stack Overflow See other posts from Stack Overflow or by Ockonal
Published on 2010-05-14T14:05:56Z Indexed on 2010/05/14 14:14 UTC
Read the original article Hit count: 276

Filed under:
|
|

Hello, I'm loading web-page using urllib. Ther eis russian symbols, but page encoding is 'utf-8'

1

pageData = unicode(requestHandler.read()).decode('utf-8')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 262: ordinal not in range(128)

2

pageData = requestHandler.read()
soupHandler = BeautifulSoup(pageData)
print soupHandler.findAll(...)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 340-345: ordinal not in range(128)

© Stack Overflow or respective owner

Related posts about python

Related posts about urllib