Find Lines with N occurrences of a char

Posted by Martín Marconcini on Stack Overflow See other posts from Stack Overflow or by Martín Marconcini
Published on 2010-06-18T10:50:10Z Indexed on 2010/06/18 10:53 UTC
Read the original article Hit count: 137

Filed under:

I have a txt file that I’m trying to import as flat file into SQL2008 that looks like this:

“123456”,”some text”
“543210”,”some more text”
“111223”,”other text”
etc…

The file has more than 300.000 rows and the text is large (usually 200-500 chars), so scanning the file by hand is very time consuming and prone to error. Other similar (and even more complex files) were successfully imported.

The problem with this one, is that “some lines” contain quotes in the text… (this came from an export from an old SuperBase DB that didn’t let you specify a text quantifier, there’s nothing I can do with the file other than clear it and try to import it).

So the “offending” lines look like this:

“123456”,”this text “contains” a quote”
“543210”,”And the “above” text is bad”
etc…

You can see the problem here.

Now, 300.000 is not too much if I could perform a search using a text editor that can use regex, I’d manually remove the quotes from each line. The problem is not the number of offending lines, but the impossibility to find them with a simple search. I’m sure there are less than 500, but spread those in a 300.000 lines txt file and you know what I mean.

Based upon that, what would be the best regex I could use to identify these lines?

My first thought is: Tell me which lines contain more than 4 quotes (“).

But I couldn’t come up with anything (I’m not good at Regex beyond the basics).

© Stack Overflow or respective owner

Related posts about regex