Regular Expression Newbie

Posted by Registered User on Stack Overflow See other posts from Stack Overflow or by Registered User
Published on 2010-03-27T20:07:31Z Indexed on 2010/03/27 20:13 UTC
Read the original article Hit count: 284

Filed under:
|

I need to parse strings inputs where the columns are separated by columns and any field that contains a comma in the data is wrapped in quotes (commas separated, quoted text identifiers). For this project I need to remove the quotes and any commas that occur between pairs of quotes. Basically, I need to remove commas and quotes that are contained in fields while preserving the commas that are used to separate the fields. Here's a little code I put together that handles the simple scenario:

    // Sample input 1: This works and covers 99% of the records that I need to parse.
    string str1 = "[email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,\"This Address Works, Suite 200\",Some City,TN,09876-5432,9795551212x123,XYZ";
    str1 = Regex.Replace(str1, "\"([^\"^,]*),([^\"^,]*)\"", "$1$2");
    Console.WriteLine(str1);
    // Outputs: [email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,This Address Works Suite 200,Some City,TN,09876-5432,9795551212x123,XYZ

Although this code works for most of my records, it doesn't work when a field contains more than one commas. What I would like to do is modify the code so that it remove each instance of a comma contained within the column no matter how many commas there are in the field. I don't want to hard code only handling 2 commas, or 3 commas, or 25 commas. The code should just remove all the commas in the field. Below is an example of what my code doesn't handle properly.

    // Sample input 2: This doesn't work since there is more than 1 comma between the quotes.
    string str2 = "[email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,\"i,l,k,e, c,o,m,m,a,s, i,n ,m,y, f,i,e,l,d\",Some City,TN,09876-5432,9795551212x123,XYZ";
    str2 = Regex.Replace(str2, "\"([^\"^,]*),([^\"^,]*)\"", "$1$2");
    Console.WriteLine(str2);
    // Desired output: [email protected],2010/03/27 12:2:02,,some_first_name,some_last_name,,i like commas in my field,Some City,TN,09876-5432,9795551212x123,XYZ

Any help would be appreciated for this Regular Expression newbie.

© Stack Overflow or respective owner

Related posts about c#

Related posts about regex