Improving grepping over a huge file performance

Posted by rogerio_marcio on Programmers See other posts from Programmers or by rogerio_marcio
Published on 2012-05-29T22:02:09Z Indexed on 2012/09/05 21:50 UTC
Read the original article Hit count: 356

Filed under:
|
|
|

I have FILE_A which has over 300K lines and FILE_B which has over 30M lines. I created a bash script that greps each line in FILE_A over in FILE_B and writes the result of the grep to a new file.

This whole process is taking over 5+ hours.

I'm looking for suggestions on whether you see any way of improving the performance of my script.

I'm using grep -F -m 1 as the grep command. FILE_A looks like this:

123456789 
123455321

and FILE_B is like this:

123456789,123456789,730025400149993,
123455321,123455321,730025400126097,

So with bash I have a while loop that picks the next line in FILE_A and greps it over in FILE_B. When the pattern is found in FILE_B i write it to result.txt.

while read -r line; do
   grep -F -m1 $line 30MFile
done < 300KFile

Thanks a lot in advance for your help.

© Programmers or respective owner

Related posts about algorithms

Related posts about Performance