how to remove all text nodes and only preserve structure information of a html page with nokogiri

Posted by user58948 on Stack Overflow See other posts from Stack Overflow or by user58948
Published on 2010-12-25T10:34:44Z Indexed on 2010/12/25 10:54 UTC
Read the original article Hit count: 141

Filed under:

I want to remove all text from html page that I load with nokogiri. For example, if a page has the following:

<body><script>var x = 10;</script><div>Hello</div><div><h1>Hi</h1></div></body>

I want to process it with Nokogiri and return html like the following after stripping the text like so:

<body><script>var x = 10;</script><div></div><div><h1></h1></div></body>

(THat is, remove the actual h1 text, text between divs, text in p elements etc, but keep the tags. also, dont remove text in the script tags.)

How can I do that?

© Stack Overflow or respective owner

Related posts about nokogiri