Search Results

Search found 6 results on 1 pages for 'haggai'.

Page 1/1 | 1 

  • Optimizing sparse dot-product in C#

    - by Haggai
    Hello. I'm trying to calculate the dot-product of two very sparse associative arrays. The arrays contain an ID and a value, so the calculation should be done only on those IDs that are common to both arrays, e.g. <(1, 0.5), (3, 0.7), (12, 1.3) * <(2, 0.4), (3, 2.3), (12, 4.7) = 0.7*2.3 + 1.3*4.7 . My implementation (call it dict) currently uses Dictionaries, but it is too slow to my taste. double dot_product(IDictionary<int, double> arr1, IDictionary<int, double> arr2) { double res = 0; double val2; foreach (KeyValuePair<int, double> p in arr1) if (arr2.TryGetValue(p.Key, out val2)) res += p.Value * val2; return res; } The full arrays have about 500,000 entries each, while the sparse ones are only tens to hundreds entries each. I did some experiments with toy versions of dot products. First I tried to multiply just two double arrays to see the ultimate speed I can get (let's call this "flat"). Then I tried to change the implementation of the associative array multiplication using an int[] ID array and a double[] values array, walking together on both ID arrays and multiplying when they are equal (let's call this "double"). I then tried to run all three versions with debug or release, with F5 or Ctrl-F5. The results are as follows: debug F5: dict: 5.29s double: 4.18s (79% of dict) flat: 0.99s (19% of dict, 24% of double) debug ^F5: dict: 5.23s double: 4.19s (80% of dict) flat: 0.98s (19% of dict, 23% of double) release F5: dict: 5.29s double: 3.08s (58% of dict) flat: 0.81s (15% of dict, 26% of double) release ^F5: dict: 4.62s double: 1.22s (26% of dict) flat: 0.29s ( 6% of dict, 24% of double) I don't understand these results. Why isn't the dictionary version optimized in release F5 as do the double and flat versions? Why is it only slightly optimized in the release ^F5 version while the other two are heavily optimized? Also, since converting my code into the "double" scheme would mean lots of work - do you have any suggestions how to optimize the dictionary one? Thanks! Haggai

    Read the article

  • Understanding VS2010 C# parallel profiling results

    - by Haggai
    I have a program with many independent computations so I decided to parallelize it. I use Parallel.For/Each. The results were okay for a dual-core machine - CPU utilization of about 80%-90% most of the time. However, with a dual Xeon machine (i.e. 8 cores) I get only about 30%-40% CPU utilization, although the program spends quite a lot of time (sometimes more than 10 seconds) on the parallel sections, and I see it employs about 20-30 more threads in those sections compared to serial sections. Each thread takes more than 1 second to complete, so I see no reason for them to work in parallel - unless there is a synchronization problem. I used the built-in profiler of VS2010, and the results are strange. Even though I use locks only in one place, the profiler reports that about 85% of the program's time is spent on synchronization (also 5-7% sleep, 5-7% execution, under 1% IO). The locked code is only a cache (a dictionary) get/add: bool esn_found; lock (lock_load_esn) esn_found = cache.TryGetValue(st, out esn); if(!esn_found) { esn = pData.esa_inv_idx.esa[term_idx]; esn.populate(pData.esa_inv_idx.datafile); lock (lock_load_esn) { if (!cache.ContainsKey(st)) cache.Add(st, esn); } } lock_load_esn is a static member of the class of type Object. esn.populate reads from a file using a separate StreamReader for each thread. However, when I press the Synchronization button to see what causes the most delay, I see that the profiler reports lines which are function entrance lines, and doesn't report the locked sections themselves. It doesn't even report the function that contains the above code (reminder - the only lock in the program) as part of the blocking profile with noise level 2%. With noise level at 0% it reports all the functions of the program, which I don't understand why they count as blocking synchronizations. So my question is - what is going on here? How can it be that 85% of the time is spent on synchronization? How do I find out what really is the problem with the parallel sections of my program? Thanks.

    Read the article

  • C# wrapper for objects

    - by Haggai
    I'm looking for a way to create a generic wrapper for any object. The wrapper object will behave just like the class it wraps, but will be able to have more properties, variable, methods etc., for e.g. object counting, caching etc. Say the wrapper class be called Wrapper, and the class to be wrapped be called Square and has the constructor Square(double edge_len) and the properties/methods EdgeLength and Area, I would like to use it as follows: Wrapper<Square> mySquare = new Wrapper<Square>(2.5); /* or */ new Square(2.5); Console.Write("Edge {0} -> Area {1}", mySquare.EdgeLength, mySquare.Area); Obviously I can create such a wrapper class for each class I want to wrap, but I'm looking for a general solution, i.e. Wrapper<T> which can handle both primitive and compound types (although in my current situation I would be happy with just wrapping my own classes). Suggestions? Thanks.

    Read the article

  • How to build a client to Google wave

    - by Haggai
    Hi, By looking at current Google wave APIs, I can't find a way to create an alternative client. It's not a robot or gadget, and the embed API is very slim. Nevertheless, I do see some clients out there - such as Waver and Waveboard. How do they do it ? is it based on XMPP ?

    Read the article

1