How to prevent BeautifulSoup from stripping lines

Posted by Oli on Stack Overflow See other posts from Stack Overflow or by Oli
Published on 2010-06-07T09:14:03Z Indexed on 2010/06/07 9:32 UTC
Read the original article Hit count: 242

Filed under:
|

I'm trying to translate an online html page into text.

I have a problem with this structure:

<div align="justify"><b>Available in  
<a href="http://www.example.com.be/book.php?number=1">
French</a> and 
<a href="http://www.example.com.be/book.php?number=5">
English</a>.
</div>

Here is its representation as a python string:

'<div align="justify"><b>Available in  \r\n<a href="http://www.example.com.be/book.php?number=1">\r\nFrench</a>; \r\n<a href="http://www.example.com.be/book.php?number=5">\r\nEnglish</a>.\r\n</div>'

When using:

html_content = get_html_div_from_above()
para = BeautifulSoup(html_content)
txt = para.text

BeautifulSoup translate it (in the 'txt' variable) as:

u'Available inFrenchandEnglish.'

It probably strips each line in the original html string.

Do you have a clean solution about this problem ?

Thanks.

© Stack Overflow or respective owner

Related posts about python

Related posts about beautifulsoup