Understanding VS2010 C# parallel profiling results
- by Haggai
I have a program with many independent computations so I decided to parallelize it.
I use Parallel.For/Each.
The results were okay for a dual-core machine - CPU utilization of about 80%-90% most of the time.
However, with a dual Xeon machine (i.e. 8 cores) I get only about 30%-40% CPU utilization, although the program spends quite a lot of time (sometimes more than 10 seconds) on the parallel sections, and I see it employs about 20-30 more threads in those sections compared to serial sections. Each thread takes more than 1 second to complete, so I see no reason for them to work in parallel - unless there is a synchronization problem.
I used the built-in profiler of VS2010, and the results are strange.
Even though I use locks only in one place, the profiler reports that about 85% of the program's time is spent on synchronization (also 5-7% sleep, 5-7% execution, under 1% IO).
The locked code is only a cache (a dictionary) get/add:
bool esn_found;
lock (lock_load_esn)
    esn_found = cache.TryGetValue(st, out esn);
if(!esn_found)
{
    esn = pData.esa_inv_idx.esa[term_idx];
    esn.populate(pData.esa_inv_idx.datafile);
    lock (lock_load_esn)
    {
        if (!cache.ContainsKey(st))
            cache.Add(st, esn);
    }
}
lock_load_esn is a static member of the class of type Object.
esn.populate reads from a file using a separate StreamReader for each thread.  
However, when I press the Synchronization button to see what causes the most delay, I see that the profiler reports lines which are function entrance lines, and doesn't report the locked sections themselves.
It doesn't even report the function that contains the above code (reminder - the only lock in the program) as part of the blocking profile with noise level 2%.  With noise level at 0% it reports all the functions of the program, which I don't understand why they count as blocking synchronizations.
So my question is - what is going on here?
How can it be that 85% of the time is spent on synchronization?
How do I find out what really is the problem with the parallel sections of my program?
Thanks.