Splitting a string according to a delimiter when elements in the string can contain the delimiter
- by Vivin Paliath
I have a string that looks like this:
"#Text() #SomeMoreText() #TextThatContainsDelimiter(#blah) #SomethingElse()"
I'd like to get back
[#Text(), #SomeMoreText(), #TextThatContainsDelimiter(#blah), #SomethingElse()]
One way I thought about doing this was to require that the # to be escaped into \#, which makes the input string:
"#Text() #SomeMoreText() #TextThatContainsDelimiter(\#blah) #SomethingElse()"
I can then split it using /[^\\]#/ which gives me:
[#Text(), SomeMoreText, TextThatContainsDelimiter(\#blah), SomethingElse()]
The first element will contain # but I can strip it out. However, is there a cleaner way to do this without having to escape the #, and which ensures that the first element will not contain a #? Basically I'd like it to split by # only if the # is not enclosed by parentheses.
My hunch is that since the # is context-sensitive and and regular expressions are only suited for context-free strings, this may not be the right tool. If so, would I have to write a grammar for this and roll my own parser/lexer?