Using Regex, how can I remove certain characters from the inside of tags in a string of html?

Posted by Iain Fraser on Stack Overflow See other posts from Stack Overflow or by Iain Fraser
Published on 2010-05-12T08:01:46Z Indexed on 2010/05/12 8:04 UTC
Read the original article Hit count: 368

Suppose I have a string of html that contains a bunch of control characters and I want to remove the control characters from inside tags only, leaving the characters outside the tags alone.

For example

Here the control character is the numeral "1".

Input

The quick 1<strong>orange</strong> lemming <sp11a1n 1class1='jumpe111r'11>jumps over</span> 1the idle 1frog

Desired Output

The quick 1<strong>orange</strong> lemming <span class='jumper'>jumps over</span> 1the idle 1frog

So far I can match tags which contain the control character but I can't remove them in one regex. I guess I could perform another regex on my matches, but I'd really like to know if there's a better way.

My regex

Bear in mind this one only matches tags which contain the control character.

<(([^>])*?`([^>])*?)*?>

Thanks very much for your time and consideration.

Iain Fraser

© Stack Overflow or respective owner

Related posts about regex

Related posts about string-manipulation