Good approach for hundreds of comsumers and big files

Posted by ????? ??????? on Programmers See other posts from Programmers or by ????? ???????
Published on 2014-05-30T21:19:39Z Indexed on 2014/05/31 3:51 UTC
Read the original article Hit count: 412

Filed under:
|
|
|

I have several files (nearly 1GB each) with data. Data is a string line.

I need to process each of these files with several hundreds of consumers. Each of these consumers does some processing that differs from others. Consumers do not write anywhere concurrently. They only need input string. After processing they update their local buffers. Consumers can easily be executed in parallel.

Important: With one specific file each consumer has to process all lines (without skipping) in correct order (as they appear in file). The order of processing different files doesn't matter.

Processing of a single line by one consumer is comparably fast. I expect less than 50 microseconds on Corei5.

So now I'm looking for the good approach to this problem. This is going to be be a part of a .NET project, so please let's stick with .NET only (C# is preferable).

I know about TPL and DataFlow. I guess that the most relevant would be BroadcastBlock. But i think that the problem here is that with each line I'll have to wait for all consumers to finish in order to post the new one. I guess that it would be not very efficient.

I think that ideally situation would be something like this:

  1. One thread reads from file and writes to the buffer.
  2. Each consumer, when it is ready, reads the line from the buffer concurrently and processes it.
  3. The entry from the buffer shouldn't be deleted as one consumer reads it. It can be deleted only when all consumers have processed it.
  4. TPL schedules consumer threads itself.
  5. If one consumer outperforms the others, it shouldn't wait and can read more recent entries from the buffer.

Am i right with this kind of approach? Whether yes or not, how can i implement the good solution?

A bit was already discussed on StackOverflow: link

© Programmers or respective owner

Related posts about c#

Related posts about .NET