Help parsing long (3.5mil lines) text file, line by line and storing data, need a strategy

Posted by Jarrod on Programmers See other posts from Programmers or by Jarrod
Published on 2012-12-08T20:35:44Z Indexed on 2012/12/08 23:35 UTC
Read the original article Hit count: 249

Filed under:
|

This is a question about solving a particular problem I am struggling with, I am parsing a long list of text data, line by line for a business app in PHP (cron script on the CLI). The file follows the format:

    HD: Some text here {text here too}

    DC: A description here
    DC: the description continues here
    DC: and it ends here.

    DT: 2012-08-01

    HD: Next header here {supplemental text}

    ... this repeats over and over for a few hundred megs

I have to read each line, parse out the HD: line and grab the text on this line. I then compare this text against data stored in a database. When a match is found, I want to then record the following DC: lines that succeed the matched HD:.

Pseudo code:

    while ( the_file_pointer_isnt_end_of_file) {
        line = getCurrentLineFromFile
        title = parseTitleFrom(line)
        matched = searchForMatchInDB(line)
        if ( matched ) {
            recordTheDCLines  // <- Best way to do this?
        }
    }

My problem is that because I am reading line by line, what is the best way to trigger the script to start saving DC lines, and then when they are finished save them to the database?

I have a vague idea, but have yet to properly implement it. I would love to hear the communities ideas\suggestions!

Thank you.

© Programmers or respective owner

Related posts about design

Related posts about php