Can the csv format be defined by a regex?

Posted by Spencer Rathbun on Programmers See other posts from Programmers or by Spencer Rathbun
Published on 2012-09-27T16:37:46Z Indexed on 2012/09/27 21:50 UTC
Read the original article Hit count: 376

Filed under:
|
|

A colleague and I have recently argued over whether a pure regex is capable of fully encapsulating the csv format, such that it is capable of parsing all files with any given escape char, quote char, and separator char.

The regex need not be capable of changing these chars after creation, but it must not fail on any other edge case.

I have argued that this is impossible for just a tokenizer. The only regex that might be able to do this is a very complex PCRE style that moves beyond just tokenizing.

I am looking for something along the lines of:

... the csv format is a context free grammar and as such, it is impossible to parse with regex alone ...

Or am I wrong? Is it possible to parse csv with just a POSIX regex?

For example, if both the escape char and the quote char are ", then these two lines are valid csv:

"""this is a test.""",""
"and he said,""What will be, will be."", to which I replied, ""Surely not!""","moving on to the next field here..."

© Programmers or respective owner

Related posts about parsing

Related posts about regex