Parsing large delimited files with dynamic number of columns

Posted by annelie on Stack Overflow See other posts from Stack Overflow or by annelie
Published on 2010-05-06T16:07:01Z Indexed on 2010/05/08 11:18 UTC
Read the original article Hit count: 383

Filed under:
|
|
|
|

Hi,

What would be the best approach to parse a delimited file when the columns are unknown before parsing the file?

The file format is Rightmove v3 (.blm), the structure looks like this:

#HEADER#
Version : 3
EOF : '^'
EOR : '~'
#DEFINITION#
AGENT_REF^ADDRESS_1^POSTCODE1^MEDIA_IMAGE_00~ // can be any number of columns
#DATA#
agent1^the address^the postcode^an image~
agent2^the address^the postcode^^~      // the records have to have the same number of columns as specified in the definition, however they can be empty
etc
#END#

The files can potentially be very large, the example file I have is 40Mb but they could be several hundred megabytes. Below is the code I had started on before I realised the columns were dynamic, I'm opening a filestream as I read that was the best way to handle large files. I'm not sure my idea of putting every record in a list then processing is any good though, don't know if that will work with such large files.

List<string> recordList = new List<string>();

try
{
    using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
    {
        StreamReader file = new StreamReader(fs);
        string line;
        while ((line = file.ReadLine()) != null)
        {
            string[] records = line.Split('~');

            foreach (string item in records)
            {
                if (item != String.Empty)
                {
                    recordList.Add(item);
                }
            }

        }
    }
}
catch (FileNotFoundException ex)
{
    Console.WriteLine(ex.Message);
}

foreach (string r in recordList)
{
    Property property = new Property();

    string[] fields = r.Split('^');

    // can't do this as I don't know which field is the post code
    property.PostCode = fields[2];
    // etc

    propertyList.Add(property);
}

Any ideas of how to do this better? It's C# 3.0 and .Net 3.5 if that helps.

Thanks,

Annelie

© Stack Overflow or respective owner

Related posts about c#

Related posts about .NET