an algorhithm for filtering out raw txt files

Posted by Roman Luštrik on Stack Overflow See other posts from Stack Overflow or by Roman Luštrik
Published on 2011-01-07T18:50:04Z Indexed on 2011/01/07 18:53 UTC
Read the original article Hit count: 172

Filed under:
|
|
|

Imagine you have a .txt file of the following structure:

>>> header
>>> header
>>> header
K L M
200 0.1 1
201 0.8 1
202 0.01 3
...
800 0.4 2
>>> end of file
50 0.1 1
75 0.78 5
...

I would like to read all the data except lines denoted by >>> and lines below the >>> end of file line. So far I've solved this using read.table(comment.char = ">", skip = x, nrow = y) (x and y are currently fixed). This reads the data between the header and >>> end of file.

However, I would like to make my function a bit more plastic regarding the number of rows. Data may have values larger than 800, and consequently more rows.

I could scan or readLines the file and see which row corresponds to the >>> end of file and calculate the number of lines to be read. What approach would you use?

© Stack Overflow or respective owner

Related posts about r

    Related posts about import