Search Results

Search found 7 results on 1 pages for 'parallelquery'.

Page 1/1 | 1 

  • The WaitForAll Roadshow

    - by adweigert
    OK, so I took for granted some imaginative uses of WaitForAll but lacking that, here is how I am using. First, I have a nice little class called Parallel that allows me to spin together a list of tasks (actions) and then use WaitForAll, so here it is, WaitForAll's 15 minutes of fame ... First Parallel that allows me to spin together several Action delegates to execute, well in parallel.   public static class Parallel { public static ParallelQuery Task(Action action) { return new Action[] { action }.AsParallel(); } public static ParallelQuery> Task(Action action) { return new Action[] { action }.AsParallel(); } public static ParallelQuery Task(this ParallelQuery actions, Action action) { var list = new List(actions); list.Add(action); return list.AsParallel(); } public static ParallelQuery> Task(this ParallelQuery> actions, Action action) { var list = new List>(actions); list.Add(action); return list.AsParallel(); } }   Next, this is an example usage from an app I'm working on that just is rendering some basic computer information via WMI and performance counters. The WMI calls can be expensive given the distance and link speed of some of the computers it will be trying to communicate with. This is the actual MVC action from my controller to return the data for an individual computer.  public PartialViewResult Detail(string computerName) { var computer = this.Computers.Get(computerName); var perf = Factory.GetInstance(); var detail = new ComputerDetailViewModel() { Computer = computer }; try { var work = Parallel .Task(delegate { // Win32_ComputerSystem var key = computer.Name + "_Win32_ComputerSystem"; var system = this.Cache.Get(key); if (system == null) { using (var impersonation = computer.ImpersonateElevatedIdentity()) { system = computer.GetWmiContext().GetInstances().Single(); } this.Cache.Set(key, system); } detail.TotalMemory = system.TotalPhysicalMemory; detail.Manufacturer = system.Manufacturer; detail.Model = system.Model; detail.NumberOfProcessors = system.NumberOfProcessors; }) .Task(delegate { // Win32_OperatingSystem var key = computer.Name + "_Win32_OperatingSystem"; var os = this.Cache.Get(key); if (os == null) { using (var impersonation = computer.ImpersonateElevatedIdentity()) { os = computer.GetWmiContext().GetInstances().Single(); } this.Cache.Set(key, os); } detail.OperatingSystem = os.Caption; detail.OSVersion = os.Version; }) // Performance Counters .Task(delegate { using (var impersonation = computer.ImpersonateElevatedIdentity()) { detail.AvailableBytes = perf.GetSample(computer, "Memory", "Available Bytes"); } }) .Task(delegate { using (var impersonation = computer.ImpersonateElevatedIdentity()) { detail.TotalProcessorUtilization = perf.GetValue(computer, "Processor", "% Processor Time", "_Total"); } }).WithExecutionMode(ParallelExecutionMode.ForceParallelism); if (!work.WaitForAll(TimeSpan.FromSeconds(15), task => task())) { return PartialView("Timeout"); } } catch (Exception ex) { this.LogException(ex); return PartialView("Error.ascx"); } return PartialView(detail); }

    Read the article

  • PLINQ Adventure Land - WaitForAll

    - by adweigert
    PLINQ is awesome for getting a lot of work done fast, but one thing I haven't figured out yet is how to start work with PLINQ but only let it execute for a maximum amount of time and react if it is taking too long. So, as I must admit I am still learning PLINQ, I created this extension in that ignorance. It behaves similar to ForAll<> but takes a timeout and returns false if the threads don't complete in the specified amount of time. Hope this helps someone else take PLINQ further, it definitely has helped for me ...  public static bool WaitForAll<T>(this ParallelQuery<T> query, TimeSpan timeout, Action<T> action) { Contract.Requires(query != null); Contract.Requires(action != null); var exception = (Exception)null; var cts = new CancellationTokenSource(); var forAllWithCancellation = new Action(delegate { try { query.WithCancellation(cts.Token).ForAll(action); } catch (OperationCanceledException) { // NOOP } catch (AggregateException ex) { exception = ex; } }); var mrs = new ManualResetEvent(false); var callback = new AsyncCallback(delegate { mrs.Set(); }); var result = forAllWithCancellation.BeginInvoke(callback, null); if (mrs.WaitOne(timeout)) { forAllWithCancellation.EndInvoke(result); if (exception != null) { throw exception; } return true; } else { cts.Cancel(); return false; } }

    Read the article

  • Parallelism in .NET – Part 7, Some Differences between PLINQ and LINQ to Objects

    - by Reed
    In my previous post on Declarative Data Parallelism, I mentioned that PLINQ extends LINQ to Objects to support parallel operations.  Although nearly all of the same operations are supported, there are some differences between PLINQ and LINQ to Objects.  By introducing Parallelism to our declarative model, we add some extra complexity.  This, in turn, adds some extra requirements that must be addressed. In order to illustrate the main differences, and why they exist, let’s begin by discussing some differences in how the two technologies operate, and look at the underlying types involved in LINQ to Objects and PLINQ . LINQ to Objects is mainly built upon a single class: Enumerable.  The Enumerable class is a static class that defines a large set of extension methods, nearly all of which work upon an IEnumerable<T>.  Many of these methods return a new IEnumerable<T>, allowing the methods to be chained together into a fluent style interface.  This is what allows us to write statements that chain together, and lead to the nice declarative programming model of LINQ: double min = collection .Where(item => item.SomeProperty > 6 && item.SomeProperty < 24) .Min(item => item.PerformComputation()); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Other LINQ variants work in a similar fashion.  For example, most data-oriented LINQ providers are built upon an implementation of IQueryable<T>, which allows the database provider to turn a LINQ statement into an underlying SQL query, to be performed directly on the remote database. PLINQ is similar, but instead of being built upon the Enumerable class, most of PLINQ is built upon a new static class: ParallelEnumerable.  When using PLINQ, you typically begin with any collection which implements IEnumerable<T>, and convert it to a new type using an extension method defined on ParallelEnumerable: AsParallel().  This method takes any IEnumerable<T>, and converts it into a ParallelQuery<T>, the core class for PLINQ.  There is a similar ParallelQuery class for working with non-generic IEnumerable implementations. This brings us to our first subtle, but important difference between PLINQ and LINQ – PLINQ always works upon specific types, which must be explicitly created. Typically, the type you’ll use with PLINQ is ParallelQuery<T>, but it can sometimes be a ParallelQuery or an OrderedParallelQuery<T>.  Instead of dealing with an interface, implemented by an unknown class, we’re dealing with a specific class type.  This works seamlessly from a usage standpoint – ParallelQuery<T> implements IEnumerable<T>, so you can always “switch back” to an IEnumerable<T>.  The difference only arises at the beginning of our parallelization.  When we’re using LINQ, and we want to process a normal collection via PLINQ, we need to explicitly convert the collection into a ParallelQuery<T> by calling AsParallel().  There is an important consideration here – AsParallel() does not need to be called on your specific collection, but rather any IEnumerable<T>.  This allows you to place it anywhere in the chain of methods involved in a LINQ statement, not just at the beginning.  This can be useful if you have an operation which will not parallelize well or is not thread safe.  For example, the following is perfectly valid, and similar to our previous examples: double min = collection .AsParallel() .Select(item => item.SomeOperation()) .Where(item => item.SomeProperty > 6 && item.SomeProperty < 24) .Min(item => item.PerformComputation()); However, if SomeOperation() is not thread safe, we could just as easily do: double min = collection .Select(item => item.SomeOperation()) .AsParallel() .Where(item => item.SomeProperty > 6 && item.SomeProperty < 24) .Min(item => item.PerformComputation()); In this case, we’re using standard LINQ to Objects for the Select(…) method, then converting the results of that map routine to a ParallelQuery<T>, and processing our filter (the Where method) and our aggregation (the Min method) in parallel. PLINQ also provides us with a way to convert a ParallelQuery<T> back into a standard IEnumerable<T>, forcing sequential processing via standard LINQ to Objects.  If SomeOperation() was thread-safe, but PerformComputation() was not thread-safe, we would need to handle this by using the AsEnumerable() method: double min = collection .AsParallel() .Select(item => item.SomeOperation()) .Where(item => item.SomeProperty > 6 && item.SomeProperty < 24) .AsEnumerable() .Min(item => item.PerformComputation()); Here, we’re converting our collection into a ParallelQuery<T>, doing our map operation (the Select(…) method) and our filtering in parallel, then converting the collection back into a standard IEnumerable<T>, which causes our aggregation via Min() to be performed sequentially. This could also be written as two statements, as well, which would allow us to use the language integrated syntax for the first portion: var tempCollection = from item in collection.AsParallel() let e = item.SomeOperation() where (e.SomeProperty > 6 && e.SomeProperty < 24) select e; double min = tempCollection.AsEnumerable().Min(item => item.PerformComputation()); This allows us to use the standard LINQ style language integrated query syntax, but control whether it’s performed in parallel or serial by adding AsParallel() and AsEnumerable() appropriately. The second important difference between PLINQ and LINQ deals with order preservation.  PLINQ, by default, does not preserve the order of of source collection. This is by design.  In order to process a collection in parallel, the system needs to naturally deal with multiple elements at the same time.  Maintaining the original ordering of the sequence adds overhead, which is, in many cases, unnecessary.  Therefore, by default, the system is allowed to completely change the order of your sequence during processing.  If you are doing a standard query operation, this is usually not an issue.  However, there are times when keeping a specific ordering in place is important.  If this is required, you can explicitly request the ordering be preserved throughout all operations done on a ParallelQuery<T> by using the AsOrdered() extension method.  This will cause our sequence ordering to be preserved. For example, suppose we wanted to take a collection, perform an expensive operation which converts it to a new type, and display the first 100 elements.  In LINQ to Objects, our code might look something like: // Using IEnumerable<SourceClass> collection IEnumerable<ResultClass> results = collection .Select(e => e.CreateResult()) .Take(100); If we just converted this to a parallel query naively, like so: IEnumerable<ResultClass> results = collection .AsParallel() .Select(e => e.CreateResult()) .Take(100); We could very easily get a very different, and non-reproducable, set of results, since the ordering of elements in the input collection is not preserved.  To get the same results as our original query, we need to use: IEnumerable<ResultClass> results = collection .AsParallel() .AsOrdered() .Select(e => e.CreateResult()) .Take(100); This requests that PLINQ process our sequence in a way that verifies that our resulting collection is ordered as if it were processed serially.  This will cause our query to run slower, since there is overhead involved in maintaining the ordering.  However, in this case, it is required, since the ordering is required for correctness. PLINQ is incredibly useful.  It allows us to easily take nearly any LINQ to Objects query and run it in parallel, using the same methods and syntax we’ve used previously.  There are some important differences in operation that must be considered, however – it is not a free pass to parallelize everything.  When using PLINQ in order to parallelize your routines declaratively, the same guideline I mentioned before still applies: Parallelization is something that should be handled with care and forethought, added by design, and not just introduced casually.

    Read the article

  • MapReduce in DryadLINQ and PLINQ

    - by JoshReuben
    MapReduce See http://en.wikipedia.org/wiki/Mapreduce The MapReduce pattern aims to handle large-scale computations across a cluster of servers, often involving massive amounts of data. "The computation takes a set of input key/value pairs, and produces a set of output key/value pairs. The developer expresses the computation as two Func delegates: Map and Reduce. Map - takes a single input pair and produces a set of intermediate key/value pairs. The MapReduce function groups results by key and passes them to the Reduce function. Reduce - accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values. Typically just zero or one output value is produced per Reduce invocation. The intermediate values are supplied to the user's Reduce function via an iterator." the canonical MapReduce example: counting word frequency in a text file.     MapReduce using DryadLINQ see http://research.microsoft.com/en-us/projects/dryadlinq/ and http://connect.microsoft.com/Dryad DryadLINQ provides a simple and straightforward way to implement MapReduce operations. This The implementation has two primary components: A Pair structure, which serves as a data container. A MapReduce method, which counts word frequency and returns the top five words. The Pair Structure - Pair has two properties: Word is a string that holds a word or key. Count is an int that holds the word count. The structure also overrides ToString to simplify printing the results. The following example shows the Pair implementation. public struct Pair { private string word; private int count; public Pair(string w, int c) { word = w; count = c; } public int Count { get { return count; } } public string Word { get { return word; } } public override string ToString() { return word + ":" + count.ToString(); } } The MapReduce function  that gets the results. the input data could be partitioned and distributed across the cluster. 1. Creates a DryadTable<LineRecord> object, inputTable, to represent the lines of input text. For partitioned data, use GetPartitionedTable<T> instead of GetTable<T> and pass the method a metadata file. 2. Applies the SelectMany operator to inputTable to transform the collection of lines into collection of words. The String.Split method converts the line into a collection of words. SelectMany concatenates the collections created by Split into a single IQueryable<string> collection named words, which represents all the words in the file. 3. Performs the Map part of the operation by applying GroupBy to the words object. The GroupBy operation groups elements with the same key, which is defined by the selector delegate. This creates a higher order collection, whose elements are groups. In this case, the delegate is an identity function, so the key is the word itself and the operation creates a groups collection that consists of groups of identical words. 4. Performs the Reduce part of the operation by applying Select to groups. This operation reduces the groups of words from Step 3 to an IQueryable<Pair> collection named counts that represents the unique words in the file and how many instances there are of each word. Each key value in groups represents a unique word, so Select creates one Pair object for each unique word. IGrouping.Count returns the number of items in the group, so each Pair object's Count member is set to the number of instances of the word. 5. Applies OrderByDescending to counts. This operation sorts the input collection in descending order of frequency and creates an ordered collection named ordered. 6. Applies Take to ordered to create an IQueryable<Pair> collection named top, which contains the 100 most common words in the input file, and their frequency. Test then uses the Pair object's ToString implementation to print the top one hundred words, and their frequency.   public static IQueryable<Pair> MapReduce( string directory, string fileName, int k) { DryadDataContext ddc = new DryadDataContext("file://" + directory); DryadTable<LineRecord> inputTable = ddc.GetTable<LineRecord>(fileName); IQueryable<string> words = inputTable.SelectMany(x => x.line.Split(' ')); IQueryable<IGrouping<string, string>> groups = words.GroupBy(x => x); IQueryable<Pair> counts = groups.Select(x => new Pair(x.Key, x.Count())); IQueryable<Pair> ordered = counts.OrderByDescending(x => x.Count); IQueryable<Pair> top = ordered.Take(k);   return top; }   To Test: IQueryable<Pair> results = MapReduce(@"c:\DryadData\input", "TestFile.txt", 100); foreach (Pair words in results) Debug.Print(words.ToString());   Note: DryadLINQ applications can use a more compact way to represent the query: return inputTable         .SelectMany(x => x.line.Split(' '))         .GroupBy(x => x)         .Select(x => new Pair(x.Key, x.Count()))         .OrderByDescending(x => x.Count)         .Take(k);     MapReduce using PLINQ The pattern is relevant even for a single multi-core machine, however. We can write our own PLINQ MapReduce in a few lines. the Map function takes a single input value and returns a set of mapped values àLINQ's SelectMany operator. These are then grouped according to an intermediate key à LINQ GroupBy operator. The Reduce function takes each intermediate key and a set of values for that key, and produces any number of outputs per key à LINQ SelectMany again. We can put all of this together to implement MapReduce in PLINQ that returns a ParallelQuery<T> public static ParallelQuery<TResult> MapReduce<TSource, TMapped, TKey, TResult>( this ParallelQuery<TSource> source, Func<TSource, IEnumerable<TMapped>> map, Func<TMapped, TKey> keySelector, Func<IGrouping<TKey, TMapped>, IEnumerable<TResult>> reduce) { return source .SelectMany(map) .GroupBy(keySelector) .SelectMany(reduce); } the map function takes in an input document and outputs all of the words in that document. The grouping phase groups all of the identical words together, such that the reduce phase can then count the words in each group and output a word/count pair for each grouping: var files = Directory.EnumerateFiles(dirPath, "*.txt").AsParallel(); var counts = files.MapReduce( path => File.ReadLines(path).SelectMany(line => line.Split(delimiters)), word => word, group => new[] { new KeyValuePair<string, int>(group.Key, group.Count()) });

    Read the article

  • Getting length of ParellQuery collection

    - by dotnetdev
    Hi, Is there a way to get the length of a collection from this? ParallelQuery<string> Lines = File.ReadAllLines("Topics.txt").AsParallel<string>(); This has no length property. There is a count method but it takes a Func. If I don't pass a Func paremeter, I could get all the properties in the collection, but how could I not pass one in? Thanks

    Read the article

  • Parallelism in .NET – Part 8, PLINQ’s ForAll Method

    - by Reed
    Parallel LINQ extends LINQ to Objects, and is typically very similar.  However, as I previously discussed, there are some differences.  Although the standard way to handle simple Data Parellelism is via Parallel.ForEach, it’s possible to do the same thing via PLINQ. PLINQ adds a new method unavailable in standard LINQ which provides new functionality… LINQ is designed to provide a much simpler way of handling querying, including filtering, ordering, grouping, and many other benefits.  Reading the description in LINQ to Objects on MSDN, it becomes clear that the thinking behind LINQ deals with retrieval of data.  LINQ works by adding a functional programming style on top of .NET, allowing us to express filters in terms of predicate functions, for example. PLINQ is, generally, very similar.  Typically, when using PLINQ, we write declarative statements to filter a dataset or perform an aggregation.  However, PLINQ adds one new method, which provides a very different purpose: ForAll. The ForAll method is defined on ParallelEnumerable, and will work upon any ParallelQuery<T>.  Unlike the sequence operators in LINQ and PLINQ, ForAll is intended to cause side effects.  It does not filter a collection, but rather invokes an action on each element of the collection. At first glance, this seems like a bad idea.  For example, Eric Lippert clearly explained two philosophical objections to providing an IEnumerable<T>.ForEach extension method, one of which still applies when parallelized.  The sole purpose of this method is to cause side effects, and as such, I agree that the ForAll method “violates the functional programming principles that all the other sequence operators are based upon”, in exactly the same manner an IEnumerable<T>.ForEach extension method would violate these principles.  Eric Lippert’s second reason for disliking a ForEach extension method does not necessarily apply to ForAll – replacing ForAll with a call to Parallel.ForEach has the same closure semantics, so there is no loss there. Although ForAll may have philosophical issues, there is a pragmatic reason to include this method.  Without ForAll, we would take a fairly serious performance hit in many situations.  Often, we need to perform some filtering or grouping, then perform an action using the results of our filter.  Using a standard foreach statement to perform our action would avoid this philosophical issue: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action foreach (var item in filteredItems) { // These will now run serially item.DoSomething(); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This would cause a loss in performance, since we lose any parallelism in place, and cause all of our actions to be run serially. We could easily use a Parallel.ForEach instead, which adds parallelism to the actions: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action once the filter completes Parallel.ForEach(filteredItems, item => { // These will now run in parallel item.DoSomething(); }); This is a noticeable improvement, since both our filtering and our actions run parallelized.  However, there is still a large bottleneck in place here.  The problem lies with my comment “perform an action once the filter completes”.  Here, we’re parallelizing the filter, then collecting all of the results, blocking until the filter completes.  Once the filtering of every element is completed, we then repartition the results of the filter, reschedule into multiple threads, and perform the action on each element.  By moving this into two separate statements, we potentially double our parallelization overhead, since we’re forcing the work to be partitioned and scheduled twice as many times. This is where the pragmatism comes into play.  By violating our functional principles, we gain the ability to avoid the overhead and cost of rescheduling the work: // Perform an action on the results of our filter collection .AsParallel() .Where( i => i.SomePredicate() ) .ForAll( i => i.DoSomething() ); The ability to avoid the scheduling overhead is a compelling reason to use ForAll.  This really goes back to one of the key points I discussed in data parallelism: Partition your problem in a way to place the most work possible into each task.  Here, this means leaving the statement attached to the expression, even though it causes side effects and is not standard usage for LINQ. This leads to my one guideline for using ForAll: The ForAll extension method should only be used to process the results of a parallel query, as returned by a PLINQ expression. Any other usage scenario should use Parallel.ForEach, instead.

    Read the article

  • Parallelism in .NET – Part 9, Configuration in PLINQ and TPL

    - by Reed
    Parallel LINQ and the Task Parallel Library contain many options for configuration.  Although the default configuration options are often ideal, there are times when customizing the behavior is desirable.  Both frameworks provide full configuration support. When working with Data Parallelism, there is one primary configuration option we often need to control – the number of threads we want the system to use when parallelizing our routine.  By default, PLINQ and the TPL both use the ThreadPool to schedule tasks.  Given the major improvements in the ThreadPool in CLR 4, this default behavior is often ideal.  However, there are times that the default behavior is not appropriate.  For example, if you are working on multiple threads simultaneously, and want to schedule parallel operations from within both threads, you might want to consider restricting each parallel operation to using a subset of the processing cores of the system.  Not doing this might over-parallelize your routine, which leads to inefficiencies from having too many context switches. In the Task Parallel Library, configuration is handled via the ParallelOptions class.  All of the methods of the Parallel class have an overload which accepts a ParallelOptions argument. We configure the Parallel class by setting the ParallelOptions.MaxDegreeOfParallelism property.  For example, let’s revisit one of the simple data parallel examples from Part 2: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, we’re looping through an image, and calling a method on each pixel in the image.  If this was being done on a separate thread, and we knew another thread within our system was going to be doing a similar operation, we likely would want to restrict this to using half of the cores on the system.  This could be accomplished easily by doing: var options = new ParallelOptions(); options.MaxDegreeOfParallelism = Math.Max(Environment.ProcessorCount / 2, 1); Parallel.For(0, pixelData.GetUpperBound(0), options, row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Now, we’re restricting this routine to using no more than half the cores in our system.  Note that I included a check to prevent a single core system from supplying zero; without this check, we’d potentially cause an exception.  I also did not hard code a specific value for the MaxDegreeOfParallelism property.  One of our goals when parallelizing a routine is allowing it to scale on better hardware.  Specifying a hard-coded value would contradict that goal. Parallel LINQ also supports configuration, and in fact, has quite a few more options for configuring the system.  The main configuration option we most often need is the same as our TPL option: we need to supply the maximum number of processing threads.  In PLINQ, this is done via a new extension method on ParallelQuery<T>: ParallelEnumerable.WithDegreeOfParallelism. Let’s revisit our declarative data parallelism sample from Part 6: double min = collection.AsParallel().Min(item => item.PerformComputation()); Here, we’re performing a computation on each element in the collection, and saving the minimum value of this operation.  If we wanted to restrict this to a limited number of threads, we would add our new extension method: int maxThreads = Math.Max(Environment.ProcessorCount / 2, 1); double min = collection .AsParallel() .WithDegreeOfParallelism(maxThreads) .Min(item => item.PerformComputation()); This automatically restricts the PLINQ query to half of the threads on the system. PLINQ provides some additional configuration options.  By default, PLINQ will occasionally revert to processing a query in parallel.  This occurs because many queries, if parallelized, typically actually cause an overall slowdown compared to a serial processing equivalent.  By analyzing the “shape” of the query, PLINQ often decides to run a query serially instead of in parallel.  This can occur for (taken from MSDN): Queries that contain a Select, indexed Where, indexed SelectMany, or ElementAt clause after an ordering or filtering operator that has removed or rearranged original indices. Queries that contain a Take, TakeWhile, Skip, SkipWhile operator and where indices in the source sequence are not in the original order. Queries that contain Zip or SequenceEquals, unless one of the data sources has an originally ordered index and the other data source is indexable (i.e. an array or IList(T)). Queries that contain Concat, unless it is applied to indexable data sources. Queries that contain Reverse, unless applied to an indexable data source. If the specific query follows these rules, PLINQ will run the query on a single thread.  However, none of these rules look at the specific work being done in the delegates, only at the “shape” of the query.  There are cases where running in parallel may still be beneficial, even if the shape is one where it typically parallelizes poorly.  In these cases, you can override the default behavior by using the WithExecutionMode extension method.  This would be done like so: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .Select(i => i.PerformComputation()) .Reverse(); Here, the default behavior would be to not parallelize the query unless collection implemented IList<T>.  We can force this to run in parallel by adding the WithExecutionMode extension method in the method chain. Finally, PLINQ has the ability to configure how results are returned.  When a query is filtering or selecting an input collection, the results will need to be streamed back into a single IEnumerable<T> result.  For example, the method above returns a new, reversed collection.  In this case, the processing of the collection will be done in parallel, but the results need to be streamed back to the caller serially, so they can be enumerated on a single thread. This streaming introduces overhead.  IEnumerable<T> isn’t designed with thread safety in mind, so the system needs to handle merging the parallel processes back into a single stream, which introduces synchronization issues.  There are two extremes of how this could be accomplished, but both extremes have disadvantages. The system could watch each thread, and whenever a thread produces a result, take that result and send it back to the caller.  This would mean that the calling thread would have access to the data as soon as data is available, which is the benefit of this approach.  However, it also means that every item is introducing synchronization overhead, since each item needs to be merged individually. On the other extreme, the system could wait until all of the results from all of the threads were ready, then push all of the results back to the calling thread in one shot.  The advantage here is that the least amount of synchronization is added to the system, which means the query will, on a whole, run the fastest.  However, the calling thread will have to wait for all elements to be processed, so this could introduce a long delay between when a parallel query begins and when results are returned. The default behavior in PLINQ is actually between these two extremes.  By default, PLINQ maintains an internal buffer, and chooses an optimal buffer size to maintain.  Query results are accumulated into the buffer, then returned in the IEnumerable<T> result in chunks.  This provides reasonably fast access to the results, as well as good overall throughput, in most scenarios. However, if we know the nature of our algorithm, we may decide we would prefer one of the other extremes.  This can be done by using the WithMergeOptions extension method.  For example, if we know that our PerformComputation() routine is very slow, but also variable in runtime, we may want to retrieve results as they are available, with no bufferring.  This can be done by changing our above routine to: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.NotBuffered) .Select(i => i.PerformComputation()) .Reverse(); On the other hand, if are already on a background thread, and we want to allow the system to maximize its speed, we might want to allow the system to fully buffer the results: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.FullyBuffered) .Select(i => i.PerformComputation()) .Reverse(); Notice, also, that you can specify multiple configuration options in a parallel query.  By chaining these extension methods together, we generate a query that will always run in parallel, and will always complete before making the results available in our IEnumerable<T>.

    Read the article

1