Help parsing long (3.5mil lines) text file, line by line and storing data, need a strategy
        Posted  
        
            by 
                Jarrod
            
        on Programmers
        
        See other posts from Programmers
        
            or by Jarrod
        
        
        
        Published on 2012-12-08T20:35:44Z
        Indexed on 
            2012/12/08
            23:35 UTC
        
        
        Read the original article
        Hit count: 339
        
This is a question about solving a particular problem I am struggling with, I am parsing a long list of text data, line by line for a business app in PHP (cron script on the CLI). The file follows the format:
    HD: Some text here {text here too}
    DC: A description here
    DC: the description continues here
    DC: and it ends here.
    DT: 2012-08-01
    HD: Next header here {supplemental text}
    ... this repeats over and over for a few hundred megs
I have to read each line, parse out the HD: line and grab the text on this line. I then compare this text against data stored in a database. When a match is found, I want to then record the following DC: lines that succeed the matched HD:.
Pseudo code:
    while ( the_file_pointer_isnt_end_of_file) {
        line = getCurrentLineFromFile
        title = parseTitleFrom(line)
        matched = searchForMatchInDB(line)
        if ( matched ) {
            recordTheDCLines  // <- Best way to do this?
        }
    }
My problem is that because I am reading line by line, what is the best way to trigger the script to start saving DC lines, and then when they are finished save them to the database?
I have a vague idea, but have yet to properly implement it. I would love to hear the communities ideas\suggestions!
Thank you.
© Programmers or respective owner