Search Results

Search found 46 results on 2 pages for 'plinq'.

Page 1/2 | 1 2  | Next Page >

  • Parallelism in .NET – Part 7, Some Differences between PLINQ and LINQ to Objects

    - by Reed
    In my previous post on Declarative Data Parallelism, I mentioned that PLINQ extends LINQ to Objects to support parallel operations.  Although nearly all of the same operations are supported, there are some differences between PLINQ and LINQ to Objects.  By introducing Parallelism to our declarative model, we add some extra complexity.  This, in turn, adds some extra requirements that must be addressed. In order to illustrate the main differences, and why they exist, let’s begin by discussing some differences in how the two technologies operate, and look at the underlying types involved in LINQ to Objects and PLINQ . LINQ to Objects is mainly built upon a single class: Enumerable.  The Enumerable class is a static class that defines a large set of extension methods, nearly all of which work upon an IEnumerable<T>.  Many of these methods return a new IEnumerable<T>, allowing the methods to be chained together into a fluent style interface.  This is what allows us to write statements that chain together, and lead to the nice declarative programming model of LINQ: double min = collection .Where(item => item.SomeProperty > 6 && item.SomeProperty < 24) .Min(item => item.PerformComputation()); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Other LINQ variants work in a similar fashion.  For example, most data-oriented LINQ providers are built upon an implementation of IQueryable<T>, which allows the database provider to turn a LINQ statement into an underlying SQL query, to be performed directly on the remote database. PLINQ is similar, but instead of being built upon the Enumerable class, most of PLINQ is built upon a new static class: ParallelEnumerable.  When using PLINQ, you typically begin with any collection which implements IEnumerable<T>, and convert it to a new type using an extension method defined on ParallelEnumerable: AsParallel().  This method takes any IEnumerable<T>, and converts it into a ParallelQuery<T>, the core class for PLINQ.  There is a similar ParallelQuery class for working with non-generic IEnumerable implementations. This brings us to our first subtle, but important difference between PLINQ and LINQ – PLINQ always works upon specific types, which must be explicitly created. Typically, the type you’ll use with PLINQ is ParallelQuery<T>, but it can sometimes be a ParallelQuery or an OrderedParallelQuery<T>.  Instead of dealing with an interface, implemented by an unknown class, we’re dealing with a specific class type.  This works seamlessly from a usage standpoint – ParallelQuery<T> implements IEnumerable<T>, so you can always “switch back” to an IEnumerable<T>.  The difference only arises at the beginning of our parallelization.  When we’re using LINQ, and we want to process a normal collection via PLINQ, we need to explicitly convert the collection into a ParallelQuery<T> by calling AsParallel().  There is an important consideration here – AsParallel() does not need to be called on your specific collection, but rather any IEnumerable<T>.  This allows you to place it anywhere in the chain of methods involved in a LINQ statement, not just at the beginning.  This can be useful if you have an operation which will not parallelize well or is not thread safe.  For example, the following is perfectly valid, and similar to our previous examples: double min = collection .AsParallel() .Select(item => item.SomeOperation()) .Where(item => item.SomeProperty > 6 && item.SomeProperty < 24) .Min(item => item.PerformComputation()); However, if SomeOperation() is not thread safe, we could just as easily do: double min = collection .Select(item => item.SomeOperation()) .AsParallel() .Where(item => item.SomeProperty > 6 && item.SomeProperty < 24) .Min(item => item.PerformComputation()); In this case, we’re using standard LINQ to Objects for the Select(…) method, then converting the results of that map routine to a ParallelQuery<T>, and processing our filter (the Where method) and our aggregation (the Min method) in parallel. PLINQ also provides us with a way to convert a ParallelQuery<T> back into a standard IEnumerable<T>, forcing sequential processing via standard LINQ to Objects.  If SomeOperation() was thread-safe, but PerformComputation() was not thread-safe, we would need to handle this by using the AsEnumerable() method: double min = collection .AsParallel() .Select(item => item.SomeOperation()) .Where(item => item.SomeProperty > 6 && item.SomeProperty < 24) .AsEnumerable() .Min(item => item.PerformComputation()); Here, we’re converting our collection into a ParallelQuery<T>, doing our map operation (the Select(…) method) and our filtering in parallel, then converting the collection back into a standard IEnumerable<T>, which causes our aggregation via Min() to be performed sequentially. This could also be written as two statements, as well, which would allow us to use the language integrated syntax for the first portion: var tempCollection = from item in collection.AsParallel() let e = item.SomeOperation() where (e.SomeProperty > 6 && e.SomeProperty < 24) select e; double min = tempCollection.AsEnumerable().Min(item => item.PerformComputation()); This allows us to use the standard LINQ style language integrated query syntax, but control whether it’s performed in parallel or serial by adding AsParallel() and AsEnumerable() appropriately. The second important difference between PLINQ and LINQ deals with order preservation.  PLINQ, by default, does not preserve the order of of source collection. This is by design.  In order to process a collection in parallel, the system needs to naturally deal with multiple elements at the same time.  Maintaining the original ordering of the sequence adds overhead, which is, in many cases, unnecessary.  Therefore, by default, the system is allowed to completely change the order of your sequence during processing.  If you are doing a standard query operation, this is usually not an issue.  However, there are times when keeping a specific ordering in place is important.  If this is required, you can explicitly request the ordering be preserved throughout all operations done on a ParallelQuery<T> by using the AsOrdered() extension method.  This will cause our sequence ordering to be preserved. For example, suppose we wanted to take a collection, perform an expensive operation which converts it to a new type, and display the first 100 elements.  In LINQ to Objects, our code might look something like: // Using IEnumerable<SourceClass> collection IEnumerable<ResultClass> results = collection .Select(e => e.CreateResult()) .Take(100); If we just converted this to a parallel query naively, like so: IEnumerable<ResultClass> results = collection .AsParallel() .Select(e => e.CreateResult()) .Take(100); We could very easily get a very different, and non-reproducable, set of results, since the ordering of elements in the input collection is not preserved.  To get the same results as our original query, we need to use: IEnumerable<ResultClass> results = collection .AsParallel() .AsOrdered() .Select(e => e.CreateResult()) .Take(100); This requests that PLINQ process our sequence in a way that verifies that our resulting collection is ordered as if it were processed serially.  This will cause our query to run slower, since there is overhead involved in maintaining the ordering.  However, in this case, it is required, since the ordering is required for correctness. PLINQ is incredibly useful.  It allows us to easily take nearly any LINQ to Objects query and run it in parallel, using the same methods and syntax we’ve used previously.  There are some important differences in operation that must be considered, however – it is not a free pass to parallelize everything.  When using PLINQ in order to parallelize your routines declaratively, the same guideline I mentioned before still applies: Parallelization is something that should be handled with care and forethought, added by design, and not just introduced casually.

    Read the article

  • Parallelism in .NET – Part 10, Cancellation in PLINQ and the Parallel class

    - by Reed
    Many routines are parallelized because they are long running processes.  When writing an algorithm that will run for a long period of time, its typically a good practice to allow that routine to be cancelled.  I previously discussed terminating a parallel loop from within, but have not demonstrated how a routine can be cancelled from the caller’s perspective.  Cancellation in PLINQ and the Task Parallel Library is handled through a new, unified cooperative cancellation model introduced with .NET 4.0. Cancellation in .NET 4 is based around a new, lightweight struct called CancellationToken.  A CancellationToken is a small, thread-safe value type which is generated via a CancellationTokenSource.  There are many goals which led to this design.  For our purposes, we will focus on a couple of specific design decisions: Cancellation is cooperative.  A calling method can request a cancellation, but it’s up to the processing routine to terminate – it is not forced. Cancellation is consistent.  A single method call requests a cancellation on every copied CancellationToken in the routine. Let’s begin by looking at how we can cancel a PLINQ query.  Supposed we wanted to provide the option to cancel our query from Part 6: double min = collection .AsParallel() .Min(item => item.PerformComputation()); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } We would rewrite this to allow for cancellation by adding a call to ParallelEnumerable.WithCancellation as follows: var cts = new CancellationTokenSource(); // Pass cts here to a routine that could, // in parallel, request a cancellation try { double min = collection .AsParallel() .WithCancellation(cts.Token) .Min(item => item.PerformComputation()); } catch (OperationCanceledException e) { // Query was cancelled before it finished } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, if the user calls cts.Cancel() before the PLINQ query completes, the query will stop processing, and an OperationCanceledException will be raised.  Be aware, however, that cancellation will not be instantaneous.  When cts.Cancel() is called, the query will only stop after the current item.PerformComputation() elements all finish processing.  cts.Cancel() will prevent PLINQ from scheduling a new task for a new element, but will not stop items which are currently being processed.  This goes back to the first goal I mentioned – Cancellation is cooperative.  Here, we’re requesting the cancellation, but it’s up to PLINQ to terminate. If we wanted to allow cancellation to occur within our routine, we would need to change our routine to accept a CancellationToken, and modify it to handle this specific case: public void PerformComputation(CancellationToken token) { for (int i=0; i<this.iterations; ++i) { // Add a check to see if we've been canceled // If a cancel was requested, we'll throw here token.ThrowIfCancellationRequested(); // Do our processing now this.RunIteration(i); } } With this overload of PerformComputation, each internal iteration checks to see if a cancellation request was made, and will throw an OperationCanceledException at that point, instead of waiting until the method returns.  This is good, since it allows us, as developers, to plan for cancellation, and terminate our routine in a clean, safe state. This is handled by changing our PLINQ query to: try { double min = collection .AsParallel() .WithCancellation(cts.Token) .Min(item => item.PerformComputation(cts.Token)); } catch (OperationCanceledException e) { // Query was cancelled before it finished } PLINQ is very good about handling this exception, as well.  There is a very good chance that multiple items will raise this exception, since the entire purpose of PLINQ is to have multiple items be processed concurrently.  PLINQ will take all of the OperationCanceledException instances raised within these methods, and merge them into a single OperationCanceledException in the call stack.  This is done internally because we added the call to ParallelEnumerable.WithCancellation. If, however, a different exception is raised by any of the elements, the OperationCanceledException as well as the other Exception will be merged into a single AggregateException. The Task Parallel Library uses the same cancellation model, as well.  Here, we supply our CancellationToken as part of the configuration.  The ParallelOptions class contains a property for the CancellationToken.  This allows us to cancel a Parallel.For or Parallel.ForEach routine in a very similar manner to our PLINQ query.  As an example, we could rewrite our Parallel.ForEach loop from Part 2 to support cancellation by changing it to: try { var cts = new CancellationTokenSource(); var options = new ParallelOptions() { CancellationToken = cts.Token }; Parallel.ForEach(customers, options, customer => { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // Check for cancellation here options.CancellationToken.ThrowIfCancellationRequested(); // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } }); } catch (OperationCanceledException e) { // The loop was cancelled } Notice that here we use the same approach taken in PLINQ.  The Task Parallel Library will automatically handle our cancellation in the same manner as PLINQ, providing a clean, unified model for cancellation of any parallel routine.  The TPL performs the same aggregation of the cancellation exceptions as PLINQ, as well, which is why a single exception handler for OperationCanceledException will cleanly handle this scenario.  This works because we’re using the same CancellationToken provided in the ParallelOptions.  If a different exception was thrown by one thread, or a CancellationToken from a different CancellationTokenSource was used to raise our exception, we would instead receive all of our individual exceptions merged into one AggregateException.

    Read the article

  • Parallelism in .NET – Part 9, Configuration in PLINQ and TPL

    - by Reed
    Parallel LINQ and the Task Parallel Library contain many options for configuration.  Although the default configuration options are often ideal, there are times when customizing the behavior is desirable.  Both frameworks provide full configuration support. When working with Data Parallelism, there is one primary configuration option we often need to control – the number of threads we want the system to use when parallelizing our routine.  By default, PLINQ and the TPL both use the ThreadPool to schedule tasks.  Given the major improvements in the ThreadPool in CLR 4, this default behavior is often ideal.  However, there are times that the default behavior is not appropriate.  For example, if you are working on multiple threads simultaneously, and want to schedule parallel operations from within both threads, you might want to consider restricting each parallel operation to using a subset of the processing cores of the system.  Not doing this might over-parallelize your routine, which leads to inefficiencies from having too many context switches. In the Task Parallel Library, configuration is handled via the ParallelOptions class.  All of the methods of the Parallel class have an overload which accepts a ParallelOptions argument. We configure the Parallel class by setting the ParallelOptions.MaxDegreeOfParallelism property.  For example, let’s revisit one of the simple data parallel examples from Part 2: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, we’re looping through an image, and calling a method on each pixel in the image.  If this was being done on a separate thread, and we knew another thread within our system was going to be doing a similar operation, we likely would want to restrict this to using half of the cores on the system.  This could be accomplished easily by doing: var options = new ParallelOptions(); options.MaxDegreeOfParallelism = Math.Max(Environment.ProcessorCount / 2, 1); Parallel.For(0, pixelData.GetUpperBound(0), options, row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Now, we’re restricting this routine to using no more than half the cores in our system.  Note that I included a check to prevent a single core system from supplying zero; without this check, we’d potentially cause an exception.  I also did not hard code a specific value for the MaxDegreeOfParallelism property.  One of our goals when parallelizing a routine is allowing it to scale on better hardware.  Specifying a hard-coded value would contradict that goal. Parallel LINQ also supports configuration, and in fact, has quite a few more options for configuring the system.  The main configuration option we most often need is the same as our TPL option: we need to supply the maximum number of processing threads.  In PLINQ, this is done via a new extension method on ParallelQuery<T>: ParallelEnumerable.WithDegreeOfParallelism. Let’s revisit our declarative data parallelism sample from Part 6: double min = collection.AsParallel().Min(item => item.PerformComputation()); Here, we’re performing a computation on each element in the collection, and saving the minimum value of this operation.  If we wanted to restrict this to a limited number of threads, we would add our new extension method: int maxThreads = Math.Max(Environment.ProcessorCount / 2, 1); double min = collection .AsParallel() .WithDegreeOfParallelism(maxThreads) .Min(item => item.PerformComputation()); This automatically restricts the PLINQ query to half of the threads on the system. PLINQ provides some additional configuration options.  By default, PLINQ will occasionally revert to processing a query in parallel.  This occurs because many queries, if parallelized, typically actually cause an overall slowdown compared to a serial processing equivalent.  By analyzing the “shape” of the query, PLINQ often decides to run a query serially instead of in parallel.  This can occur for (taken from MSDN): Queries that contain a Select, indexed Where, indexed SelectMany, or ElementAt clause after an ordering or filtering operator that has removed or rearranged original indices. Queries that contain a Take, TakeWhile, Skip, SkipWhile operator and where indices in the source sequence are not in the original order. Queries that contain Zip or SequenceEquals, unless one of the data sources has an originally ordered index and the other data source is indexable (i.e. an array or IList(T)). Queries that contain Concat, unless it is applied to indexable data sources. Queries that contain Reverse, unless applied to an indexable data source. If the specific query follows these rules, PLINQ will run the query on a single thread.  However, none of these rules look at the specific work being done in the delegates, only at the “shape” of the query.  There are cases where running in parallel may still be beneficial, even if the shape is one where it typically parallelizes poorly.  In these cases, you can override the default behavior by using the WithExecutionMode extension method.  This would be done like so: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .Select(i => i.PerformComputation()) .Reverse(); Here, the default behavior would be to not parallelize the query unless collection implemented IList<T>.  We can force this to run in parallel by adding the WithExecutionMode extension method in the method chain. Finally, PLINQ has the ability to configure how results are returned.  When a query is filtering or selecting an input collection, the results will need to be streamed back into a single IEnumerable<T> result.  For example, the method above returns a new, reversed collection.  In this case, the processing of the collection will be done in parallel, but the results need to be streamed back to the caller serially, so they can be enumerated on a single thread. This streaming introduces overhead.  IEnumerable<T> isn’t designed with thread safety in mind, so the system needs to handle merging the parallel processes back into a single stream, which introduces synchronization issues.  There are two extremes of how this could be accomplished, but both extremes have disadvantages. The system could watch each thread, and whenever a thread produces a result, take that result and send it back to the caller.  This would mean that the calling thread would have access to the data as soon as data is available, which is the benefit of this approach.  However, it also means that every item is introducing synchronization overhead, since each item needs to be merged individually. On the other extreme, the system could wait until all of the results from all of the threads were ready, then push all of the results back to the calling thread in one shot.  The advantage here is that the least amount of synchronization is added to the system, which means the query will, on a whole, run the fastest.  However, the calling thread will have to wait for all elements to be processed, so this could introduce a long delay between when a parallel query begins and when results are returned. The default behavior in PLINQ is actually between these two extremes.  By default, PLINQ maintains an internal buffer, and chooses an optimal buffer size to maintain.  Query results are accumulated into the buffer, then returned in the IEnumerable<T> result in chunks.  This provides reasonably fast access to the results, as well as good overall throughput, in most scenarios. However, if we know the nature of our algorithm, we may decide we would prefer one of the other extremes.  This can be done by using the WithMergeOptions extension method.  For example, if we know that our PerformComputation() routine is very slow, but also variable in runtime, we may want to retrieve results as they are available, with no bufferring.  This can be done by changing our above routine to: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.NotBuffered) .Select(i => i.PerformComputation()) .Reverse(); On the other hand, if are already on a background thread, and we want to allow the system to maximize its speed, we might want to allow the system to fully buffer the results: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.FullyBuffered) .Select(i => i.PerformComputation()) .Reverse(); Notice, also, that you can specify multiple configuration options in a parallel query.  By chaining these extension methods together, we generate a query that will always run in parallel, and will always complete before making the results available in our IEnumerable<T>.

    Read the article

  • Parallelism in .NET – Part 8, PLINQ’s ForAll Method

    - by Reed
    Parallel LINQ extends LINQ to Objects, and is typically very similar.  However, as I previously discussed, there are some differences.  Although the standard way to handle simple Data Parellelism is via Parallel.ForEach, it’s possible to do the same thing via PLINQ. PLINQ adds a new method unavailable in standard LINQ which provides new functionality… LINQ is designed to provide a much simpler way of handling querying, including filtering, ordering, grouping, and many other benefits.  Reading the description in LINQ to Objects on MSDN, it becomes clear that the thinking behind LINQ deals with retrieval of data.  LINQ works by adding a functional programming style on top of .NET, allowing us to express filters in terms of predicate functions, for example. PLINQ is, generally, very similar.  Typically, when using PLINQ, we write declarative statements to filter a dataset or perform an aggregation.  However, PLINQ adds one new method, which provides a very different purpose: ForAll. The ForAll method is defined on ParallelEnumerable, and will work upon any ParallelQuery<T>.  Unlike the sequence operators in LINQ and PLINQ, ForAll is intended to cause side effects.  It does not filter a collection, but rather invokes an action on each element of the collection. At first glance, this seems like a bad idea.  For example, Eric Lippert clearly explained two philosophical objections to providing an IEnumerable<T>.ForEach extension method, one of which still applies when parallelized.  The sole purpose of this method is to cause side effects, and as such, I agree that the ForAll method “violates the functional programming principles that all the other sequence operators are based upon”, in exactly the same manner an IEnumerable<T>.ForEach extension method would violate these principles.  Eric Lippert’s second reason for disliking a ForEach extension method does not necessarily apply to ForAll – replacing ForAll with a call to Parallel.ForEach has the same closure semantics, so there is no loss there. Although ForAll may have philosophical issues, there is a pragmatic reason to include this method.  Without ForAll, we would take a fairly serious performance hit in many situations.  Often, we need to perform some filtering or grouping, then perform an action using the results of our filter.  Using a standard foreach statement to perform our action would avoid this philosophical issue: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action foreach (var item in filteredItems) { // These will now run serially item.DoSomething(); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This would cause a loss in performance, since we lose any parallelism in place, and cause all of our actions to be run serially. We could easily use a Parallel.ForEach instead, which adds parallelism to the actions: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action once the filter completes Parallel.ForEach(filteredItems, item => { // These will now run in parallel item.DoSomething(); }); This is a noticeable improvement, since both our filtering and our actions run parallelized.  However, there is still a large bottleneck in place here.  The problem lies with my comment “perform an action once the filter completes”.  Here, we’re parallelizing the filter, then collecting all of the results, blocking until the filter completes.  Once the filtering of every element is completed, we then repartition the results of the filter, reschedule into multiple threads, and perform the action on each element.  By moving this into two separate statements, we potentially double our parallelization overhead, since we’re forcing the work to be partitioned and scheduled twice as many times. This is where the pragmatism comes into play.  By violating our functional principles, we gain the ability to avoid the overhead and cost of rescheduling the work: // Perform an action on the results of our filter collection .AsParallel() .Where( i => i.SomePredicate() ) .ForAll( i => i.DoSomething() ); The ability to avoid the scheduling overhead is a compelling reason to use ForAll.  This really goes back to one of the key points I discussed in data parallelism: Partition your problem in a way to place the most work possible into each task.  Here, this means leaving the statement attached to the expression, even though it causes side effects and is not standard usage for LINQ. This leads to my one guideline for using ForAll: The ForAll extension method should only be used to process the results of a parallel query, as returned by a PLINQ expression. Any other usage scenario should use Parallel.ForEach, instead.

    Read the article

  • Parallel LINQ - PLINQ

    - by nmarun
    Turns out now with .net 4.0 we can run a query like a multi-threaded application. Say you want to query a collection of objects and return only those that meet certain conditions. Until now, we basically had one ‘control’ that iterated over all the objects in the collection, checked the condition on each object and returned if it passed. We obviously agree that if we can ‘break’ this task into smaller ones, assign each task to a different ‘control’ and ask all the controls to do their job - in-parallel, the time taken the finish the entire task will be much lower. Welcome to PLINQ. Let’s take some examples. I have the following method that uses our good ol’ LINQ. 1: private static void Linq(int lowerLimit, int upperLimit) 2: { 3: // populate an array with int values from lowerLimit to the upperLimit 4: var source = Enumerable.Range(lowerLimit, upperLimit); 5:  6: // Start a timer 7: Stopwatch stopwatch = new Stopwatch(); 8: stopwatch.Start(); 9:  10: // set the expectation => build the expression tree 11: var evenNumbers =   from num in source 12: where IsDivisibleBy(num, 2) 13: select num; 14: 15: // iterate over and print the returned items 16: foreach (var number in evenNumbers) 17: { 18: Console.WriteLine(string.Format("** {0}", number)); 19: } 20:  21: stopwatch.Stop(); 22:  23: // check the metrics 24: Console.WriteLine(String.Format("Elapsed {0}ms", stopwatch.ElapsedMilliseconds)); 25: } I’ve added comments for the major steps, but the only thing I want to talk about here is the IsDivisibleBy() method. I know I could have just included the logic directly in the where clause. I called a method to add ‘delay’ to the execution of the query - to simulate a loooooooooong operation (will be easier to compare the results). 1: private static bool IsDivisibleBy(int number, int divisor) 2: { 3: // iterate over some database query 4: // to add time to the execution of this method; 5: // the TableB has around 10 records 6: for (int i = 0; i < 10; i++) 7: { 8: DataClasses1DataContext dataContext = new DataClasses1DataContext(); 9: var query = from b in dataContext.TableBs select b; 10: 11: foreach (var row in query) 12: { 13: // Do NOTHING (wish my job was like this) 14: } 15: } 16:  17: return number % divisor == 0; 18: } Now, let’s look at how to modify this to PLINQ. 1: private static void Plinq(int lowerLimit, int upperLimit) 2: { 3: // populate an array with int values from lowerLimit to the upperLimit 4: var source = Enumerable.Range(lowerLimit, upperLimit); 5:  6: // Start a timer 7: Stopwatch stopwatch = new Stopwatch(); 8: stopwatch.Start(); 9:  10: // set the expectation => build the expression tree 11: var evenNumbers = from num in source.AsParallel() 12: where IsDivisibleBy(num, 2) 13: select num; 14:  15: // iterate over and print the returned items 16: foreach (var number in evenNumbers) 17: { 18: Console.WriteLine(string.Format("** {0}", number)); 19: } 20:  21: stopwatch.Stop(); 22:  23: // check the metrics 24: Console.WriteLine(String.Format("Elapsed {0}ms", stopwatch.ElapsedMilliseconds)); 25: } That’s it, this is now in PLINQ format. Oh and if you haven’t found the difference, look line 11 a little more closely. You’ll see an extension method ‘AsParallel()’ added to the ‘source’ variable. Couldn’t be more simpler right? So this is going to improve the performance for us. Let’s test it. So in my Main method of the Console application that I’m working on, I make a call to both. 1: static void Main(string[] args) 2: { 3: // set lower and upper limits 4: int lowerLimit = 1; 5: int upperLimit = 20; 6: // call the methods 7: Console.WriteLine("Calling Linq() method"); 8: Linq(lowerLimit, upperLimit); 9: 10: Console.WriteLine(); 11: Console.WriteLine("Calling Plinq() method"); 12: Plinq(lowerLimit, upperLimit); 13:  14: Console.ReadLine(); // just so I get enough time to read the output 15: } YMMV, but here are the results that I got:    It’s quite obvious from the above results that the Plinq() method is taking considerably less time than the Linq() version. I’m sure you’ve already noticed that the output of the Plinq() method is not in order. That’s because, each of the ‘control’s we sent to fetch the results, reported with values as and when they obtained them. This is something about parallel LINQ that one needs to remember – the collection cannot be guaranteed to be undisturbed. This could be counted as a negative about PLINQ (emphasize ‘could’). Nevertheless, if we want the collection to be sorted, we can use a SortedSet (.net 4.0) or build our own custom ‘sorter’. Either way we go, there’s a good chance we’ll end up with a better performance using PLINQ. And there’s another negative of PLINQ (depending on how you see it). This is regarding the CPU cycles. See the usage for Linq() method (used ResourceMonitor): I have dual CPU’s and see the height of the peak in the bottom two blocks and now compare to what happens when I run the Plinq() method. The difference is obvious. Higher usage, but for a shorter duration (width of the peak). Both these points make sense in both cases. Linq() runs for a longer time, but uses less resources whereas Plinq() runs for a shorter time and consumes more resources. Even after knowing all these, I’m still inclined towards PLINQ. PLINQ rocks! (no hard feelings LINQ)

    Read the article

  • NHibernate LINQ + PLINQ

    - by cvista
    Hi i've just started reading up on PLINQ and find it fasinating. I'm using NHib-Linq in my projects - does anyone know if there's any benefit/problems using PLINQ type queries with NHLinq? w://

    Read the article

  • PLINQ Adventure Land - WaitForAll

    - by adweigert
    PLINQ is awesome for getting a lot of work done fast, but one thing I haven't figured out yet is how to start work with PLINQ but only let it execute for a maximum amount of time and react if it is taking too long. So, as I must admit I am still learning PLINQ, I created this extension in that ignorance. It behaves similar to ForAll<> but takes a timeout and returns false if the threads don't complete in the specified amount of time. Hope this helps someone else take PLINQ further, it definitely has helped for me ...  public static bool WaitForAll<T>(this ParallelQuery<T> query, TimeSpan timeout, Action<T> action) { Contract.Requires(query != null); Contract.Requires(action != null); var exception = (Exception)null; var cts = new CancellationTokenSource(); var forAllWithCancellation = new Action(delegate { try { query.WithCancellation(cts.Token).ForAll(action); } catch (OperationCanceledException) { // NOOP } catch (AggregateException ex) { exception = ex; } }); var mrs = new ManualResetEvent(false); var callback = new AsyncCallback(delegate { mrs.Set(); }); var result = forAllWithCancellation.BeginInvoke(callback, null); if (mrs.WaitOne(timeout)) { forAllWithCancellation.EndInvoke(result); if (exception != null) { throw exception; } return true; } else { cts.Cancel(); return false; } }

    Read the article

  • PLINQ delayed execution

    - by tbischel
    I'm trying to understand how parallelism might work using PLINQ, given delayed execution. Here is a simple example. string[] words = { "believe", "receipt", "relief", "field" }; bool result = words.AsParallel().Any(w => w.Contains("ei")); With LINQ, I would expect the execution to reach the "receipt" value and return true, without executing the query for rest of the values. If we do this in parallel, the evaluation of "relief" may have began before the result of "receipt" has returned. But once the query knows that "receipt" will cause a true result, will the other threads yield immediately? In my case, this is important because the "any" test may be very expensive, and I would want to free up the processors for execution of other tasks.

    Read the article

  • Using PLINQ to calculate and update values within the enclosure does not work

    - by Keith
    I recently needed to do a running total on a report. Where for each group, I order the rows and then calculate the running total based on the previous rows within the group. Aha! I thought, a perfect use case for PLINQ! However, when I wrote up the code, I got some strange behavior. The values I was modifying showed as being modified when stepping through the debugger, but when they were accessed they were always zero. Sample Code: class Item { public int PortfolioID; public int TAAccountID; public DateTime TradeDate; public decimal Shares; public decimal RunningTotal; } List<Item> itemList = new List<Item> { new Item { PortfolioID = 1, TAAccountID = 1, TradeDate = new DateTime(2010, 5, 1), Shares = 5.335m, }, new Item { PortfolioID = 1, TAAccountID = 1, TradeDate = new DateTime(2010, 5, 2), Shares = -2.335m, }, new Item { PortfolioID = 2, TAAccountID = 1, TradeDate = new DateTime(2010, 5, 1), Shares = 7.335m, }, new Item { PortfolioID = 2, TAAccountID = 1, TradeDate = new DateTime(2010, 5, 2), Shares = -3.335m, }, }; var found = (from i in itemList where i.TAAccountID == 1 select new Item { TAAccountID = i.TAAccountID, PortfolioID = i.PortfolioID, Shares = i.Shares, TradeDate = i.TradeDate, RunningTotal = 0 }); found.AsParallel().ForAll(x => { var prevItems = found.Where(i => i.PortfolioID == x.PortfolioID && i.TAAccountID == x.TAAccountID && i.TradeDate <= x.TradeDate); x.RunningTotal = prevItems.Sum(s => s.Shares); }); foreach (Item i in found) { Console.WriteLine("Running total: {0}", i.RunningTotal); } Console.ReadLine(); If I change the select for found to be .ToArray(), then it works fine and I get calculated reuslts. Any ideas what I am doing wrong?

    Read the article

  • Is it OK to try to use Plinq in all Linq queries?

    - by Tony_Henrich
    I read that PLinq will automatically use non parallel Linq if it finds PLinq to be more expensive. So I figured then why not use PLinq for everything (when possible) and let the runtime decide which one to use. The apps will be deployed to multicore servers and I am OK to develop a little more code to deal with parallelism. What are the pitfalls of using plinq as a default?

    Read the article

  • Use Session state within pLinq queries

    - by Dima
    Hi, I have a fairly simple Linq query (simplified code): dim x = From Product In lstProductList.AsParallel Order By Product.Price.GrossPrice Descending Select Product Product is a class. Product.Price is a child class and GrossPrice is one of its properties. In order to work out the price I need to use Session("exchange_rate"). So for each item in lstProductList there's a function that does the following: NetPrice=NetPrice * Session("exchange_rate") (and then GrossPrice returns NetPrice+VatAmount) No matter what I've tried I cannot access session state. I have tried HttpContext.Current - but that returns Nothing. I've tried Implements IRequiresSessionState on the class (which helps in a similar situation in generic http handlers [.ashx]) - no luck. I'm using simple InProc session state mode. Exchange rate has to be user specific. What can I do? I'm working with: web development, .Net 4, VB.net Step-by-step: page_load (in .aspx) dim objSearch as new SearchClass() dim output = objSearch.renderProductsFound() then in objSearch.renderProductsFound: lstProductList.Add(objProduct(1)) ... lstProductList.Add(objProduct(n)) dim x = From Product In lstProductList.AsParallel Order By Product.Price.GrossPrice Descending Select Product In Product.Price.GrossPrice Get : return me.NetPrice+me.VatAmount In Product.Price.NetPrice Get: return NetBasePrice*Session("exchange_rate") Again, simplified code, too much to paste in here. Works fine if I unwrap the query into For loops.

    Read the article

  • Parallelism in .NET – Part 6, Declarative Data Parallelism

    - by Reed
    When working with a problem that can be decomposed by data, we have a collection, and some operation being performed upon the collection.  I’ve demonstrated how this can be parallelized using the Task Parallel Library and imperative programming using imperative data parallelism via the Parallel class.  While this provides a huge step forward in terms of power and capabilities, in many cases, special care must still be given for relative common scenarios. C# 3.0 and Visual Basic 9.0 introduced a new, declarative programming model to .NET via the LINQ Project.  When working with collections, we can now write software that describes what we want to occur without having to explicitly state how the program should accomplish the task.  By taking advantage of LINQ, many operations become much shorter, more elegant, and easier to understand and maintain.  Version 4.0 of the .NET framework extends this concept into the parallel computation space by introducing Parallel LINQ. Before we delve into PLINQ, let’s begin with a short discussion of LINQ.  LINQ, the extensions to the .NET Framework which implement language integrated query, set, and transform operations, is implemented in many flavors.  For our purposes, we are interested in LINQ to Objects.  When dealing with parallelizing a routine, we typically are dealing with in-memory data storage.  More data-access oriented LINQ variants, such as LINQ to SQL and LINQ to Entities in the Entity Framework fall outside of our concern, since the parallelism there is the concern of the data base engine processing the query itself. LINQ (LINQ to Objects in particular) works by implementing a series of extension methods, most of which work on IEnumerable<T>.  The language enhancements use these extension methods to create a very concise, readable alternative to using traditional foreach statement.  For example, let’s revisit our minimum aggregation routine we wrote in Part 4: double min = double.MaxValue; foreach(var item in collection) { double value = item.PerformComputation(); min = System.Math.Min(min, value); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, we’re doing a very simple computation, but writing this in an imperative style.  This can be loosely translated to English as: Create a very large number, and save it in min Loop through each item in the collection. For every item: Perform some computation, and save the result If the computation is less than min, set min to the computation Although this is fairly easy to follow, it’s quite a few lines of code, and it requires us to read through the code, step by step, line by line, in order to understand the intention of the developer. We can rework this same statement, using LINQ: double min = collection.Min(item => item.PerformComputation()); Here, we’re after the same information.  However, this is written using a declarative programming style.  When we see this code, we’d naturally translate this to English as: Save the Min value of collection, determined via calling item.PerformComputation() That’s it – instead of multiple logical steps, we have one single, declarative request.  This makes the developer’s intentions very clear, and very easy to follow.  The system is free to implement this using whatever method required. Parallel LINQ (PLINQ) extends LINQ to Objects to support parallel operations.  This is a perfect fit in many cases when you have a problem that can be decomposed by data.  To show this, let’s again refer to our minimum aggregation routine from Part 4, but this time, let’s review our final, parallelized version: // Safe, and fast! double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach( collection, // First, we provide a local state initialization delegate. () => double.MaxValue, // Next, we supply the body, which takes the original item, loop state, // and local state, and returns a new local state (item, loopState, localState) => { double value = item.PerformComputation(); return System.Math.Min(localState, value); }, // Finally, we provide an Action<TLocal>, to "merge" results together localState => { // This requires locking, but it's only once per used thread lock(syncObj) min = System.Math.Min(min, localState); } ); Here, we’re doing the same computation as above, but fully parallelized.  Describing this in English becomes quite a feat: Create a very large number, and save it in min Create a temporary object we can use for locking Call Parallel.ForEach, specifying three delegates For the first delegate: Initialize a local variable to hold the local state to a very large number For the second delegate: For each item in the collection, perform some computation, save the result If the result is less than our local state, save the result in local state For the final delegate: Take a lock on our temporary object to protect our min variable Save the min of our min and local state variables Although this solves our problem, and does it in a very efficient way, we’ve created a set of code that is quite a bit more difficult to understand and maintain. PLINQ provides us with a very nice alternative.  In order to use PLINQ, we need to learn one new extension method that works on IEnumerable<T> – ParallelEnumerable.AsParallel(). That’s all we need to learn in order to use PLINQ: one single method.  We can write our minimum aggregation in PLINQ very simply: double min = collection.AsParallel().Min(item => item.PerformComputation()); By simply adding “.AsParallel()” to our LINQ to Objects query, we converted this to using PLINQ and running this computation in parallel!  This can be loosely translated into English easily, as well: Process the collection in parallel Get the Minimum value, determined by calling PerformComputation on each item Here, our intention is very clear and easy to understand.  We just want to perform the same operation we did in serial, but run it “as parallel”.  PLINQ completely extends LINQ to Objects: the entire functionality of LINQ to Objects is available.  By simply adding a call to AsParallel(), we can specify that a collection should be processed in parallel.  This is simple, safe, and incredibly useful.

    Read the article

  • MapReduce in DryadLINQ and PLINQ

    - by JoshReuben
    MapReduce See http://en.wikipedia.org/wiki/Mapreduce The MapReduce pattern aims to handle large-scale computations across a cluster of servers, often involving massive amounts of data. "The computation takes a set of input key/value pairs, and produces a set of output key/value pairs. The developer expresses the computation as two Func delegates: Map and Reduce. Map - takes a single input pair and produces a set of intermediate key/value pairs. The MapReduce function groups results by key and passes them to the Reduce function. Reduce - accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values. Typically just zero or one output value is produced per Reduce invocation. The intermediate values are supplied to the user's Reduce function via an iterator." the canonical MapReduce example: counting word frequency in a text file.     MapReduce using DryadLINQ see http://research.microsoft.com/en-us/projects/dryadlinq/ and http://connect.microsoft.com/Dryad DryadLINQ provides a simple and straightforward way to implement MapReduce operations. This The implementation has two primary components: A Pair structure, which serves as a data container. A MapReduce method, which counts word frequency and returns the top five words. The Pair Structure - Pair has two properties: Word is a string that holds a word or key. Count is an int that holds the word count. The structure also overrides ToString to simplify printing the results. The following example shows the Pair implementation. public struct Pair { private string word; private int count; public Pair(string w, int c) { word = w; count = c; } public int Count { get { return count; } } public string Word { get { return word; } } public override string ToString() { return word + ":" + count.ToString(); } } The MapReduce function  that gets the results. the input data could be partitioned and distributed across the cluster. 1. Creates a DryadTable<LineRecord> object, inputTable, to represent the lines of input text. For partitioned data, use GetPartitionedTable<T> instead of GetTable<T> and pass the method a metadata file. 2. Applies the SelectMany operator to inputTable to transform the collection of lines into collection of words. The String.Split method converts the line into a collection of words. SelectMany concatenates the collections created by Split into a single IQueryable<string> collection named words, which represents all the words in the file. 3. Performs the Map part of the operation by applying GroupBy to the words object. The GroupBy operation groups elements with the same key, which is defined by the selector delegate. This creates a higher order collection, whose elements are groups. In this case, the delegate is an identity function, so the key is the word itself and the operation creates a groups collection that consists of groups of identical words. 4. Performs the Reduce part of the operation by applying Select to groups. This operation reduces the groups of words from Step 3 to an IQueryable<Pair> collection named counts that represents the unique words in the file and how many instances there are of each word. Each key value in groups represents a unique word, so Select creates one Pair object for each unique word. IGrouping.Count returns the number of items in the group, so each Pair object's Count member is set to the number of instances of the word. 5. Applies OrderByDescending to counts. This operation sorts the input collection in descending order of frequency and creates an ordered collection named ordered. 6. Applies Take to ordered to create an IQueryable<Pair> collection named top, which contains the 100 most common words in the input file, and their frequency. Test then uses the Pair object's ToString implementation to print the top one hundred words, and their frequency.   public static IQueryable<Pair> MapReduce( string directory, string fileName, int k) { DryadDataContext ddc = new DryadDataContext("file://" + directory); DryadTable<LineRecord> inputTable = ddc.GetTable<LineRecord>(fileName); IQueryable<string> words = inputTable.SelectMany(x => x.line.Split(' ')); IQueryable<IGrouping<string, string>> groups = words.GroupBy(x => x); IQueryable<Pair> counts = groups.Select(x => new Pair(x.Key, x.Count())); IQueryable<Pair> ordered = counts.OrderByDescending(x => x.Count); IQueryable<Pair> top = ordered.Take(k);   return top; }   To Test: IQueryable<Pair> results = MapReduce(@"c:\DryadData\input", "TestFile.txt", 100); foreach (Pair words in results) Debug.Print(words.ToString());   Note: DryadLINQ applications can use a more compact way to represent the query: return inputTable         .SelectMany(x => x.line.Split(' '))         .GroupBy(x => x)         .Select(x => new Pair(x.Key, x.Count()))         .OrderByDescending(x => x.Count)         .Take(k);     MapReduce using PLINQ The pattern is relevant even for a single multi-core machine, however. We can write our own PLINQ MapReduce in a few lines. the Map function takes a single input value and returns a set of mapped values àLINQ's SelectMany operator. These are then grouped according to an intermediate key à LINQ GroupBy operator. The Reduce function takes each intermediate key and a set of values for that key, and produces any number of outputs per key à LINQ SelectMany again. We can put all of this together to implement MapReduce in PLINQ that returns a ParallelQuery<T> public static ParallelQuery<TResult> MapReduce<TSource, TMapped, TKey, TResult>( this ParallelQuery<TSource> source, Func<TSource, IEnumerable<TMapped>> map, Func<TMapped, TKey> keySelector, Func<IGrouping<TKey, TMapped>, IEnumerable<TResult>> reduce) { return source .SelectMany(map) .GroupBy(keySelector) .SelectMany(reduce); } the map function takes in an input document and outputs all of the words in that document. The grouping phase groups all of the identical words together, such that the reduce phase can then count the words in each group and output a word/count pair for each grouping: var files = Directory.EnumerateFiles(dirPath, "*.txt").AsParallel(); var counts = files.MapReduce( path => File.ReadLines(path).SelectMany(line => line.Split(delimiters)), word => word, group => new[] { new KeyValuePair<string, int>(group.Key, group.Count()) });

    Read the article

  • Using TPL and PLINQ to raise performance of feed aggregator

    - by DigiMortal
    In this posting I will show you how to use Task Parallel Library (TPL) and PLINQ features to boost performance of simple RSS-feed aggregator. I will use here only very basic .NET classes that almost every developer starts from when learning parallel programming. Of course, we will also measure how every optimization affects performance of feed aggregator. Feed aggregator Our feed aggregator works as follows: Load list of blogs Download RSS-feed Parse feed XML Add new posts to database Our feed aggregator is run by task scheduler after every 15 minutes by example. We will start our journey with serial implementation of feed aggregator. Second step is to use task parallelism and parallelize feeds downloading and parsing. And our last step is to use data parallelism to parallelize database operations. We will use Stopwatch class to measure how much time it takes for aggregator to download and insert all posts from all registered blogs. After every run we empty posts table in database. Serial aggregation Before doing parallel stuff let’s take a look at serial implementation of feed aggregator. All tasks happen one after other. internal class FeedClient {     private readonly INewsService _newsService;     private const int FeedItemContentMaxLength = 255;       public FeedClient()     {          ObjectFactory.Initialize(container =>          {              container.PullConfigurationFromAppConfig = true;          });           _newsService = ObjectFactory.GetInstance<INewsService>();     }       public void Execute()     {         var blogs = _newsService.ListPublishedBlogs();           for (var index = 0; index <blogs.Count; index++)         {              ImportFeed(blogs[index]);         }     }       private void ImportFeed(BlogDto blog)     {         if(blog == null)             return;         if (string.IsNullOrEmpty(blog.RssUrl))             return;           var uri = new Uri(blog.RssUrl);         SyndicationContentFormat feedFormat;           feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri);           if (feedFormat == SyndicationContentFormat.Rss)             ImportRssFeed(blog);         if (feedFormat == SyndicationContentFormat.Atom)             ImportAtomFeed(blog);                 }       private void ImportRssFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = RssFeed.Create(uri);           foreach (var item in feed.Channel.Items)         {             SaveRssFeedItem(item, blog.Id, blog.CreatedById);         }     }       private void ImportAtomFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = AtomFeed.Create(uri);           foreach (var item in feed.Entries)         {             SaveAtomFeedEntry(item, blog.Id, blog.CreatedById);         }     } } Serial implementation of feed aggregator downloads and inserts all posts with 25.46 seconds. Task parallelism Task parallelism means that separate tasks are run in parallel. You can find out more about task parallelism from MSDN page Task Parallelism (Task Parallel Library) and Wikipedia page Task parallelism. Although finding parts of code that can run safely in parallel without synchronization issues is not easy task we are lucky this time. Feeds import and parsing is perfect candidate for parallel tasks. We can safely parallelize feeds import because importing tasks doesn’t share any resources and therefore they don’t also need any synchronization. After getting the list of blogs we iterate through the collection and start new TPL task for each blog feed aggregation. internal class FeedClient {     private readonly INewsService _newsService;     private const int FeedItemContentMaxLength = 255;       public FeedClient()     {          ObjectFactory.Initialize(container =>          {              container.PullConfigurationFromAppConfig = true;          });           _newsService = ObjectFactory.GetInstance<INewsService>();     }       public void Execute()     {         var blogs = _newsService.ListPublishedBlogs();                var tasks = new Task[blogs.Count];           for (var index = 0; index <blogs.Count; index++)         {             tasks[index] = new Task(ImportFeed, blogs[index]);             tasks[index].Start();         }           Task.WaitAll(tasks);     }       private void ImportFeed(object blogObject)     {         if(blogObject == null)             return;         var blog = (BlogDto)blogObject;         if (string.IsNullOrEmpty(blog.RssUrl))             return;           var uri = new Uri(blog.RssUrl);         SyndicationContentFormat feedFormat;           feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri);           if (feedFormat == SyndicationContentFormat.Rss)             ImportRssFeed(blog);         if (feedFormat == SyndicationContentFormat.Atom)             ImportAtomFeed(blog);                }       private void ImportRssFeed(BlogDto blog)     {          var uri = new Uri(blog.RssUrl);          var feed = RssFeed.Create(uri);           foreach (var item in feed.Channel.Items)          {              SaveRssFeedItem(item, blog.Id, blog.CreatedById);          }     }     private void ImportAtomFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = AtomFeed.Create(uri);           foreach (var item in feed.Entries)         {             SaveAtomFeedEntry(item, blog.Id, blog.CreatedById);         }     } } You should notice first signs of the power of TPL. We made only minor changes to our code to parallelize blog feeds aggregating. On my machine this modification gives some performance boost – time is now 17.57 seconds. Data parallelism There is one more way how to parallelize activities. Previous section introduced task or operation based parallelism, this section introduces data based parallelism. By MSDN page Data Parallelism (Task Parallel Library) data parallelism refers to scenario in which the same operation is performed concurrently on elements in a source collection or array. In our code we have independent collections we can process in parallel – imported feed entries. As checking for feed entry existence and inserting it if it is missing from database doesn’t affect other entries the imported feed entries collection is ideal candidate for parallelization. internal class FeedClient {     private readonly INewsService _newsService;     private const int FeedItemContentMaxLength = 255;       public FeedClient()     {          ObjectFactory.Initialize(container =>          {              container.PullConfigurationFromAppConfig = true;          });           _newsService = ObjectFactory.GetInstance<INewsService>();     }       public void Execute()     {         var blogs = _newsService.ListPublishedBlogs();                var tasks = new Task[blogs.Count];           for (var index = 0; index <blogs.Count; index++)         {             tasks[index] = new Task(ImportFeed, blogs[index]);             tasks[index].Start();         }           Task.WaitAll(tasks);     }       private void ImportFeed(object blogObject)     {         if(blogObject == null)             return;         var blog = (BlogDto)blogObject;         if (string.IsNullOrEmpty(blog.RssUrl))             return;           var uri = new Uri(blog.RssUrl);         SyndicationContentFormat feedFormat;           feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri);           if (feedFormat == SyndicationContentFormat.Rss)             ImportRssFeed(blog);         if (feedFormat == SyndicationContentFormat.Atom)             ImportAtomFeed(blog);                }       private void ImportRssFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = RssFeed.Create(uri);           feed.Channel.Items.AsParallel().ForAll(a =>         {             SaveRssFeedItem(a, blog.Id, blog.CreatedById);         });      }        private void ImportAtomFeed(BlogDto blog)      {         var uri = new Uri(blog.RssUrl);         var feed = AtomFeed.Create(uri);           feed.Entries.AsParallel().ForAll(a =>         {              SaveAtomFeedEntry(a, blog.Id, blog.CreatedById);         });      } } We did small change again and as the result we parallelized checking and saving of feed items. This change was data centric as we applied same operation to all elements in collection. On my machine I got better performance again. Time is now 11.22 seconds. Results Let’s visualize our measurement results (numbers are given in seconds). As we can see then with task parallelism feed aggregation takes about 25% less time than in original case. When adding data parallelism to task parallelism our aggregation takes about 2.3 times less time than in original case. More about TPL and PLINQ Adding parallelism to your application can be very challenging task. You have to carefully find out parts of your code where you can safely go to parallel processing and even then you have to measure the effects of parallel processing to find out if parallel code performs better. If you are not careful then troubles you will face later are worse than ones you have seen before (imagine error that occurs by average only once per 10000 code runs). Parallel programming is something that is hard to ignore. Effective programs are able to use multiple cores of processors. Using TPL you can also set degree of parallelism so your application doesn’t use all computing cores and leaves one or more of them free for host system and other processes. And there are many more things in TPL that make it easier for you to start and go on with parallel programming. In next major version all .NET languages will have built-in support for parallel programming. There will be also new language constructs that support parallel programming. Currently you can download Visual Studio Async to get some idea about what is coming. Conclusion Parallel programming is very challenging but good tools offered by Visual Studio and .NET Framework make it way easier for us. In this posting we started with feed aggregator that imports feed items on serial mode. With two steps we parallelized feed importing and entries inserting gaining 2.3 times raise in performance. Although this number is specific to my test environment it shows clearly that parallel programming may raise the performance of your application significantly.

    Read the article

  • Going Parallel with the Task Parallel Library and PLINQ

    With more and more computers using a multi-core processor, the free lunch of increased clock speeds and the inherent performance gains are over. Software developers must instead make sure their applications take use of all the cores available in an efficient manner. New features in .NET 4.0 mean that managed code developers too can join the party.

    Read the article

  • Improve your Application Performance with .NET Framework 4.0

    Nice Article on CodeGuru. This processors we use today are quite different from those of just a few years ago, as most processors today provide multiple cores and/or multiple threads. With multiple cores and/or threads we need to change how we tackle problems in code. Yes we can still continue to write code to perform an action in a top down fashion to complete a task. This apprach will continue to work; however, you are not taking advantage of the extra processing power available. The best way to take advantage of the extra cores prior to .NET Framework 4.0 was to create threads and/or utilize the ThreadPool. For many developers utilizing Threads or the ThreadPool can be a little daunting. The .NET 4.0 Framework drastically simplified the process of utilizing the extra processing power through the Task Parallel Library (TPL). This article talks following topics “Data Parallelism”, “Parallel LINQ (PLINQ)” and “Task Parallelism”. span.fullpost {display:none;}

    Read the article

  • Improve your Application Performance with .NET Framework 4.0

    Nice Article on CodeGuru. This processors we use today are quite different from those of just a few years ago, as most processors today provide multiple cores and/or multiple threads. With multiple cores and/or threads we need to change how we tackle problems in code. Yes we can still continue to write code to perform an action in a top down fashion to complete a task. This apprach will continue to work; however, you are not taking advantage of the extra processing power available. The best way to take advantage of the extra cores prior to .NET Framework 4.0 was to create threads and/or utilize the ThreadPool. For many developers utilizing Threads or the ThreadPool can be a little daunting. The .NET 4.0 Framework drastically simplified the process of utilizing the extra processing power through the Task Parallel Library (TPL). This article talks following topics “Data Parallelism”, “Parallel LINQ (PLINQ)” and “Task Parallelism”. span.fullpost {display:none;}

    Read the article

  • The WaitForAll Roadshow

    - by adweigert
    OK, so I took for granted some imaginative uses of WaitForAll but lacking that, here is how I am using. First, I have a nice little class called Parallel that allows me to spin together a list of tasks (actions) and then use WaitForAll, so here it is, WaitForAll's 15 minutes of fame ... First Parallel that allows me to spin together several Action delegates to execute, well in parallel.   public static class Parallel { public static ParallelQuery Task(Action action) { return new Action[] { action }.AsParallel(); } public static ParallelQuery> Task(Action action) { return new Action[] { action }.AsParallel(); } public static ParallelQuery Task(this ParallelQuery actions, Action action) { var list = new List(actions); list.Add(action); return list.AsParallel(); } public static ParallelQuery> Task(this ParallelQuery> actions, Action action) { var list = new List>(actions); list.Add(action); return list.AsParallel(); } }   Next, this is an example usage from an app I'm working on that just is rendering some basic computer information via WMI and performance counters. The WMI calls can be expensive given the distance and link speed of some of the computers it will be trying to communicate with. This is the actual MVC action from my controller to return the data for an individual computer.  public PartialViewResult Detail(string computerName) { var computer = this.Computers.Get(computerName); var perf = Factory.GetInstance(); var detail = new ComputerDetailViewModel() { Computer = computer }; try { var work = Parallel .Task(delegate { // Win32_ComputerSystem var key = computer.Name + "_Win32_ComputerSystem"; var system = this.Cache.Get(key); if (system == null) { using (var impersonation = computer.ImpersonateElevatedIdentity()) { system = computer.GetWmiContext().GetInstances().Single(); } this.Cache.Set(key, system); } detail.TotalMemory = system.TotalPhysicalMemory; detail.Manufacturer = system.Manufacturer; detail.Model = system.Model; detail.NumberOfProcessors = system.NumberOfProcessors; }) .Task(delegate { // Win32_OperatingSystem var key = computer.Name + "_Win32_OperatingSystem"; var os = this.Cache.Get(key); if (os == null) { using (var impersonation = computer.ImpersonateElevatedIdentity()) { os = computer.GetWmiContext().GetInstances().Single(); } this.Cache.Set(key, os); } detail.OperatingSystem = os.Caption; detail.OSVersion = os.Version; }) // Performance Counters .Task(delegate { using (var impersonation = computer.ImpersonateElevatedIdentity()) { detail.AvailableBytes = perf.GetSample(computer, "Memory", "Available Bytes"); } }) .Task(delegate { using (var impersonation = computer.ImpersonateElevatedIdentity()) { detail.TotalProcessorUtilization = perf.GetValue(computer, "Processor", "% Processor Time", "_Total"); } }).WithExecutionMode(ParallelExecutionMode.ForceParallelism); if (!work.WaitForAll(TimeSpan.FromSeconds(15), task => task())) { return PartialView("Timeout"); } } catch (Exception ex) { this.LogException(ex); return PartialView("Error.ascx"); } return PartialView(detail); }

    Read the article

  • How can one manage to fully use the newly enhanced Parallelism features in .NET 4.0?

    - by Will Marcouiller
    I am pretty much interested into using the newly enhanced Parallelism features in .NET 4.0. I have also seen some possibilities of using it in F#, as much as in C#. Despite, I can only see what PLINQ has to offer with, for example, the following: var query = from c in Customers.AsParallel() where (c.Name.Contains("customerNameLike") select c; There must for sure be some other use of this parallelism thing. Have you any other examples of using it? Is this particularly turned toward PLINQ, or are there other usage as easy as PLINQ? Thanks! =)

    Read the article

  • How to use IObservable/IObserver with ConcurrentQueue or ConcurrentStack

    - by James Black
    I realized that when I am trying to process items in a concurrent queue using multiple threads while multiple threads can be putting items into it, the ideal solution would be to use the Reactive Extensions with the Concurrent data structures. My original question is at: http://stackoverflow.com/questions/2997797/while-using-concurrentqueue-trying-to-dequeue-while-looping-through-in-parallel/ So I am curious if there is any way to have a LINQ (or PLINQ) query that will continuously be dequeueing as items are put into it. I am trying to get this to work in a way where I can have n number of producers pushing into the queue and a limited number of threads to process, so I don't overload the database. If I could use Rx framework then I expect that I could just start it, and if 100 items are placed in within 100ms, then the 20 threads that are part of the PLINQ query would just process through the queue. There are three technologies I am trying to work together: Rx Framework (Reactive LINQ) PLING System.Collections.Concurrent structures

    Read the article

  • What's the next big thing after LINQ?

    - by Leniel Macaferi
    I started using LINQ (Language Integrated Query) when it was still in beta, more specifically Microsoft .NET LINQ Preview (May 2006). Almost 4 years have passed and here we are using LINQ in a lot of projects for the most diverse tasks. I even wrote my final college project based on LINQ. You see how I like it. LINQ and more recently PLINQ (Parallel LINQ) give our jobs a great boost when it comes to more programming power and less lines of code leading us to more expressive and readable code. I keep thinking what could be the next big language improvement for C# after LINQ. I know there are some promissing language features coming as Code Contracts, etc, but nothing having the impact that LINQ had. What do you think could be the next big thing?

    Read the article

1 2  | Next Page >