Parsing multiple files at a time in Perl

Posted by sfactor on Stack Overflow See other posts from Stack Overflow or by sfactor
Published on 2010-12-31T14:23:06Z Indexed on 2010/12/31 14:54 UTC
Read the original article Hit count: 239

I have a large data set (around 90GB) to work with. There are data files (tab delimited) for each hour of each day and I need to perform operations in the entire data set. For example, get the share of OSes which are given in one of the columns. I tried merging all the files into one huge file and performing the simple count operation but it was simply too huge for the server memory.

So, I guess I need to perform the operation each file at a time and then add up in the end. I am new to perl and am especially naive about the performance issues. How do I do such operations in a case like this.

As an example two columns of the file are.

ID      OS
1       Windows
2       Linux
3       Windows
4       Windows

Lets do something simple, counting the share of the OSes in the data set. So, each .txt file has millions of these lines and there are many such files. What would be the most efficient way to operate on the entire files.

© Stack Overflow or respective owner

Related posts about perl

Related posts about Performance