RegEx to extract all HTML tag attributes including inline JavaScript

Posted by Mike on Stack Overflow See other posts from Stack Overflow or by Mike
Published on 2010-03-08T10:04:42Z Indexed on 2010/03/08 10:06 UTC
Read the original article Hit count: 413

Filed under:
|
|
|

I found this useful regex code here while looking to parse HTML tag attributes:

(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?

It works great, but it's missing one key element that I need. Some attributes are event triggers that have inline Javascript code in them like this:

onclick="doSomething(this, 'foo', 'bar');return false;"

Or:

onclick='doSomething(this, "foo", "bar");return false;'

I can't figure out how to get the original expression to not count the quotes from the JS (single or double) while it's nested inside the set of quotes that contain the attribute's value.

© Stack Overflow or respective owner

Related posts about regex

Related posts about tag