Search Results

Search found 233 results on 10 pages for 'parallelism'.

Page 4/10 | < Previous Page | 1 2 3 4 5 6 7 8 9 10  | Next Page >

  • Physical Cores vs Virtual Cores in Parallelism

    - by Code Curiosity
    When it comes to virtualization, I have been deliberating on the relationship between the physical cores and the virtual cores, especially in how it effects applications employing parallelism. For example, in a VM scenario, if there are less physical cores than there are virtual cores, if that's possible, what's the effect or limits placed on the application's parallel processing? I'm asking, because in my environment, it's not disclosed as to what the physical architecture is. Is there still much advantage to parallelizing if the application lives on a dual core VM hosted on a single core physical machine?

    Read the article

  • The limits of parallelism

    - by psihodelia
    Is it possible to solve a problem of O(n!) complexity within a reasonable time given infinite number of processing units and infinite space? The typical example of O(n!) problem is brute-force search: trying all permutations (ordered combinations).

    Read the article

  • Parallelism on two duo-core processor system

    - by Qin
    I wrote a Java program that draw the Mandelbrot image. To make it interesting, I divided the for loop that calculates the color of each pixel into 2 halves; each half will be executed as a thread thus parallelizing the task. On a two core one cpu system, the performance of using two thread approach vs just one main thread is nearly two fold. My question is on a two dual-core processor system, will the parallelized task be split among different processor instead of just utilize the two core on one processor? I suppose the former scenario will be slower than the latter one simply because the latency of communicating between 2 CPU over the motherboard wires. Any ideas? Thanks

    Read the article

  • Parallel shell loops

    - by brubelsabs
    Hi, I want to process many files and since I've here a bunch of cores I want to do it in parallel: for i in *.myfiles; do do_something $i `derived_params $i` other_params; done I know of a Makefile solution but my commands needs the arguments out of the shell globbing list. What I found is: > function pwait() { > while [ $(jobs -p | wc -l) -ge $1 ]; do > sleep 1 > done > } > To use it, all one has to do is put & after the jobs and a pwait call, the parameter gives the number of parallel processes: > for i in *; do > do_something $i & > pwait 10 > done But this doesn't work very well, e.g. I tried it with e.g. a for loop converting many files but giving me error and left jobs undone. I can't belive that this isn't done yet since the discussion on zsh mailing list is so old by now. So do you know any better?

    Read the article

  • TPL v/s Reactive Framework

    - by Abhijeet Patel
    When would one choose to use Rx over TPL or are the 2 frameworks orthogonal? From what I understand Rx is primarily intended to provide an abstraction over events and allow composition but it also allows for providing an abstraction over async operations. using the Createxx overloads and the Fromxxx overloads and cancellation via disposing the IDisposable returned. TPL also provides an abstraction for operations via Task and cancellation abilities. My dilemma is when to use which and for what scenarios?

    Read the article

  • Faking a Single Address Space

    - by dsimcha
    I have a large scientific computing task that parallelizes very well with SMP, but at too fine grained a level to be easily parallelized via explicit message passing. I'd like to parallelize it across address spaces and physical machines. Is it feasible to create a scheduler that would parallelize already multithreaded code across multiple physical computers under the following conditions: The code is already multithreaded and can scale pretty well on SMP configurations. The fact that not all of the threads are running in the same address space or on the same physical machine must be transparent to the program, even if this comes at a significant performance penalty in some use cases. You may assume that all of the physical machines involved are running operating systems and CPU architectures that are binary compatible. Things like locks and atomic operations may be slow (having network latency to deal with and all) but must "just work".

    Read the article

  • Using TPL and PLINQ to raise performance of feed aggregator

    - by DigiMortal
    In this posting I will show you how to use Task Parallel Library (TPL) and PLINQ features to boost performance of simple RSS-feed aggregator. I will use here only very basic .NET classes that almost every developer starts from when learning parallel programming. Of course, we will also measure how every optimization affects performance of feed aggregator. Feed aggregator Our feed aggregator works as follows: Load list of blogs Download RSS-feed Parse feed XML Add new posts to database Our feed aggregator is run by task scheduler after every 15 minutes by example. We will start our journey with serial implementation of feed aggregator. Second step is to use task parallelism and parallelize feeds downloading and parsing. And our last step is to use data parallelism to parallelize database operations. We will use Stopwatch class to measure how much time it takes for aggregator to download and insert all posts from all registered blogs. After every run we empty posts table in database. Serial aggregation Before doing parallel stuff let’s take a look at serial implementation of feed aggregator. All tasks happen one after other. internal class FeedClient {     private readonly INewsService _newsService;     private const int FeedItemContentMaxLength = 255;       public FeedClient()     {          ObjectFactory.Initialize(container =>          {              container.PullConfigurationFromAppConfig = true;          });           _newsService = ObjectFactory.GetInstance<INewsService>();     }       public void Execute()     {         var blogs = _newsService.ListPublishedBlogs();           for (var index = 0; index <blogs.Count; index++)         {              ImportFeed(blogs[index]);         }     }       private void ImportFeed(BlogDto blog)     {         if(blog == null)             return;         if (string.IsNullOrEmpty(blog.RssUrl))             return;           var uri = new Uri(blog.RssUrl);         SyndicationContentFormat feedFormat;           feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri);           if (feedFormat == SyndicationContentFormat.Rss)             ImportRssFeed(blog);         if (feedFormat == SyndicationContentFormat.Atom)             ImportAtomFeed(blog);                 }       private void ImportRssFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = RssFeed.Create(uri);           foreach (var item in feed.Channel.Items)         {             SaveRssFeedItem(item, blog.Id, blog.CreatedById);         }     }       private void ImportAtomFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = AtomFeed.Create(uri);           foreach (var item in feed.Entries)         {             SaveAtomFeedEntry(item, blog.Id, blog.CreatedById);         }     } } Serial implementation of feed aggregator downloads and inserts all posts with 25.46 seconds. Task parallelism Task parallelism means that separate tasks are run in parallel. You can find out more about task parallelism from MSDN page Task Parallelism (Task Parallel Library) and Wikipedia page Task parallelism. Although finding parts of code that can run safely in parallel without synchronization issues is not easy task we are lucky this time. Feeds import and parsing is perfect candidate for parallel tasks. We can safely parallelize feeds import because importing tasks doesn’t share any resources and therefore they don’t also need any synchronization. After getting the list of blogs we iterate through the collection and start new TPL task for each blog feed aggregation. internal class FeedClient {     private readonly INewsService _newsService;     private const int FeedItemContentMaxLength = 255;       public FeedClient()     {          ObjectFactory.Initialize(container =>          {              container.PullConfigurationFromAppConfig = true;          });           _newsService = ObjectFactory.GetInstance<INewsService>();     }       public void Execute()     {         var blogs = _newsService.ListPublishedBlogs();                var tasks = new Task[blogs.Count];           for (var index = 0; index <blogs.Count; index++)         {             tasks[index] = new Task(ImportFeed, blogs[index]);             tasks[index].Start();         }           Task.WaitAll(tasks);     }       private void ImportFeed(object blogObject)     {         if(blogObject == null)             return;         var blog = (BlogDto)blogObject;         if (string.IsNullOrEmpty(blog.RssUrl))             return;           var uri = new Uri(blog.RssUrl);         SyndicationContentFormat feedFormat;           feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri);           if (feedFormat == SyndicationContentFormat.Rss)             ImportRssFeed(blog);         if (feedFormat == SyndicationContentFormat.Atom)             ImportAtomFeed(blog);                }       private void ImportRssFeed(BlogDto blog)     {          var uri = new Uri(blog.RssUrl);          var feed = RssFeed.Create(uri);           foreach (var item in feed.Channel.Items)          {              SaveRssFeedItem(item, blog.Id, blog.CreatedById);          }     }     private void ImportAtomFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = AtomFeed.Create(uri);           foreach (var item in feed.Entries)         {             SaveAtomFeedEntry(item, blog.Id, blog.CreatedById);         }     } } You should notice first signs of the power of TPL. We made only minor changes to our code to parallelize blog feeds aggregating. On my machine this modification gives some performance boost – time is now 17.57 seconds. Data parallelism There is one more way how to parallelize activities. Previous section introduced task or operation based parallelism, this section introduces data based parallelism. By MSDN page Data Parallelism (Task Parallel Library) data parallelism refers to scenario in which the same operation is performed concurrently on elements in a source collection or array. In our code we have independent collections we can process in parallel – imported feed entries. As checking for feed entry existence and inserting it if it is missing from database doesn’t affect other entries the imported feed entries collection is ideal candidate for parallelization. internal class FeedClient {     private readonly INewsService _newsService;     private const int FeedItemContentMaxLength = 255;       public FeedClient()     {          ObjectFactory.Initialize(container =>          {              container.PullConfigurationFromAppConfig = true;          });           _newsService = ObjectFactory.GetInstance<INewsService>();     }       public void Execute()     {         var blogs = _newsService.ListPublishedBlogs();                var tasks = new Task[blogs.Count];           for (var index = 0; index <blogs.Count; index++)         {             tasks[index] = new Task(ImportFeed, blogs[index]);             tasks[index].Start();         }           Task.WaitAll(tasks);     }       private void ImportFeed(object blogObject)     {         if(blogObject == null)             return;         var blog = (BlogDto)blogObject;         if (string.IsNullOrEmpty(blog.RssUrl))             return;           var uri = new Uri(blog.RssUrl);         SyndicationContentFormat feedFormat;           feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri);           if (feedFormat == SyndicationContentFormat.Rss)             ImportRssFeed(blog);         if (feedFormat == SyndicationContentFormat.Atom)             ImportAtomFeed(blog);                }       private void ImportRssFeed(BlogDto blog)     {         var uri = new Uri(blog.RssUrl);         var feed = RssFeed.Create(uri);           feed.Channel.Items.AsParallel().ForAll(a =>         {             SaveRssFeedItem(a, blog.Id, blog.CreatedById);         });      }        private void ImportAtomFeed(BlogDto blog)      {         var uri = new Uri(blog.RssUrl);         var feed = AtomFeed.Create(uri);           feed.Entries.AsParallel().ForAll(a =>         {              SaveAtomFeedEntry(a, blog.Id, blog.CreatedById);         });      } } We did small change again and as the result we parallelized checking and saving of feed items. This change was data centric as we applied same operation to all elements in collection. On my machine I got better performance again. Time is now 11.22 seconds. Results Let’s visualize our measurement results (numbers are given in seconds). As we can see then with task parallelism feed aggregation takes about 25% less time than in original case. When adding data parallelism to task parallelism our aggregation takes about 2.3 times less time than in original case. More about TPL and PLINQ Adding parallelism to your application can be very challenging task. You have to carefully find out parts of your code where you can safely go to parallel processing and even then you have to measure the effects of parallel processing to find out if parallel code performs better. If you are not careful then troubles you will face later are worse than ones you have seen before (imagine error that occurs by average only once per 10000 code runs). Parallel programming is something that is hard to ignore. Effective programs are able to use multiple cores of processors. Using TPL you can also set degree of parallelism so your application doesn’t use all computing cores and leaves one or more of them free for host system and other processes. And there are many more things in TPL that make it easier for you to start and go on with parallel programming. In next major version all .NET languages will have built-in support for parallel programming. There will be also new language constructs that support parallel programming. Currently you can download Visual Studio Async to get some idea about what is coming. Conclusion Parallel programming is very challenging but good tools offered by Visual Studio and .NET Framework make it way easier for us. In this posting we started with feed aggregator that imports feed items on serial mode. With two steps we parallelized feed importing and entries inserting gaining 2.3 times raise in performance. Although this number is specific to my test environment it shows clearly that parallel programming may raise the performance of your application significantly.

    Read the article

  • MongoDB: What's the point of using MapReduce without parallelism?

    - by netvope
    Quoting http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Parallelism As of right now, MapReduce jobs on a single mongod process are single threaded Without parallelism, what are the benefits of MapReduce compared to simpler or more traditional methods for queries and data aggregation? To avoid confusion: the question is NOT "what are the benefits of document-oriented DB over traditional relational DB"

    Read the article

  • Is it possible that an insert would use parallelism?

    - by Lieven Cardoen
    In what situation can inserting a simple record (the table has got some 10 columns, two of them are xml) use parallelism? CREATE TABLE [dbo].[PackageSessions]( [PackageSessionId] [int] IDENTITY(1,1) NOT NULL, [PackageId] [int] NOT NULL, [UserId] [int] NOT NULL, [StartDateTime] [datetime] NULL, [StopDateTime] [datetime] NULL, [Score] [float] NOT NULL, [ScoreMax] [float] NOT NULL, [CompletionStatus] [int] NOT NULL, [ReviewPlayerContextId] [int] NOT NULL, [PlayerContextId] [int] NOT NULL, [ReducedScore] [float] NOT NULL, [ReducedScoreMax] [float] NOT NULL, [PackageSnapShot] [xml] NULL, [InterfaceLanguageId] [int] NOT NULL, [Data] [xml] NULL, [PackageSessionLanguageId] [int] NOT NULL, CONSTRAINT [PackageSessions_PK] PRIMARY KEY CLUSTERED ( [PackageSessionId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]

    Read the article

  • Output = MAXDOP 1

    - by Dave Ballantyne
    It is widely know that data modifications on table variables do not support parallelism, Peter Larsson has a good example of that here .  Whilst tracking down a performance issue,  I saw that using the OUTPUT clause also causes parallelism to not be used. By way of example,  first lets create two tables with a simple parent and child (one to one) relationship, and then populate them with 100,000 rows. Drop table ParentDrop table Childgocreate table Parent(id integer identity Primary Key,data1 char(255))Create Table Child(id integer Primary Key)goinsert into Parent(data1)Select top 1000000 NULL from sys.columns a cross join sys.columns b insert into ChildSelect id from Parentgo If we then execute update Parent set data1 =''from Parentjoin Child on Parent.Id = Child.Id where Parent.Id %100 =1 and Child.id %100 =1 We should see an execution plan using parallelism such as   However,  if the OUTPUT clause is now used update Parent set data1 =''output inserted.idfrom Parentjoin Child on Parent.Id = Child.Id where Parent.Id %100 =1 and Child.id %100 =1   The execution plan shows that Parallelism was not used Make of that what you will, but i thought that this was a pretty unexpected outcome. Update : Laurence Hoff has mailed me to note that when the OUTPUT results are captured to a temporary table using the INTO clause,  then parallelism is used.  Naturally if you use a table variable then there is still no parallelism  

    Read the article

  • How can one manage to fully use the newly enhanced Parallelism features in .NET 4.0?

    - by Will Marcouiller
    I am pretty much interested into using the newly enhanced Parallelism features in .NET 4.0. I have also seen some possibilities of using it in F#, as much as in C#. Despite, I can only see what PLINQ has to offer with, for example, the following: var query = from c in Customers.AsParallel() where (c.Name.Contains("customerNameLike") select c; There must for sure be some other use of this parallelism thing. Have you any other examples of using it? Is this particularly turned toward PLINQ, or are there other usage as easy as PLINQ? Thanks! =)

    Read the article

  • SQL SERVER – Reducing CXPACKET Wait Stats for High Transactional Database

    - by pinaldave
    While engaging in a performance tuning consultation for a client, a situation occurred where they were facing a lot of CXPACKET Waits Stats. The client asked me if I could help them reduce this huge number of wait stats. I usually receive this kind of request from other client as well, but the important thing to understand is whether this question has any merits or benefits, or not. Before we continue the resolution, let us understand what CXPACKET Wait Stats are. The official definition suggests that CXPACKET Wait Stats occurs when trying to synchronize the query processor exchange iterator. You may consider lowering the degree of parallelism if a conflict concerning this wait type develops into a problem. (from BOL) In simpler words, when a parallel operation is created for SQL Query, there are multiple threads for a single query. Each query deals with a different set of the data (or rows). Due to some reasons, one or more of the threads lag behind, creating the CXPACKET Wait Stat. Threads which came first have to wait for the slower thread to finish. The Wait by a specific completed thread is called CXPACKET Wait Stat. Note that CXPACKET Wait is done by completed thread and not the one which are unfinished. “Note that not all the CXPACKET wait types are bad. You might experience a case when it totally makes sense. There might also be cases when this is also unavoidable. If you remove this particular wait type for any query, then that query may run slower because the parallel operations are disabled for the query.” Now let us see what the best practices to reduce the CXPACKET Wait Stats are. The suggestions, with which you will find that if you search online through the browser, would play a major role as and might be asked about their jobs In addition, might tell you that you should set ‘maximum degree of parallelism’ to 1. I do agree with these suggestions, too; however, I think this is not the final resolutions. As soon as you set your entire query to run on single CPU, you will get a very bad performance from the queries which are actually performing okay when using parallelism. The best suggestion to this is that you set ‘the maximum degree of parallelism’ to a lower number or 1 (be very careful with this – it can create more problems) but tune the queries which can be benefited from multiple CPU’s. You can use query hint OPTION (MAXDOP 0) to run the server to use parallelism. Here is the two-quick script which helps to resolve these issues: Change MAXDOP at Server Level EXEC sys.sp_configure N'max degree of parallelism', N'1' GO RECONFIGURE WITH OVERRIDE GO Run Query with all the CPU (using parallelism) USE AdventureWorks GO SELECT * FROM Sales.SalesOrderDetail ORDER BY ProductID OPTION (MAXDOP 0) GO Below is the blog post which will help you to find all the parallel query in your server. SQL SERVER – Find Queries using Parallelism from Cached Plan Please note running Queries in single CPU may worsen your performance and it is not recommended at all. Infect this can be very bad advise. I strongly suggest that you identify the queries which are offending and tune them instead of following any other suggestions. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQL White Papers, SQLAuthority News, T SQL, Technology

    Read the article

  • Sql Server 2000 Stored Procedure Prevent Parallelism or something?

    - by user187305
    I have a huge disgusting stored procedure that wasn't slow a couple months ago, but now is. I barely know what this thing does and I am in no way interested in rewriting it. I do know that if I take the body of the stored procedure and then declare/set the values of the parameters and run it in query analyzer that it runs more than 20x faster. From the internet, I've read that this is probably due to a bad cached query plan. So, I've tried running the sp with "WITH RECOMPILE" after the EXEC and I've also tried putting the "WITH RECOMPLE" inside the sp, but neither of those helped even a little bit. When I look at the execution plan of the sp vs the query, the biggest difference is that the sp has "Parallelism" operations all over the place and the query doesn't have any. Can this be the cause of the difference in speeds? Thank you, any ideas would be great... I'm stuck.

    Read the article

  • Sql Server 2000 Stored Procedure Prevents Parallelism or something?

    - by user187305
    I have a huge disgusting stored procedure that wasn't slow a couple months ago, but now is. I barely know what this thing does and I am in no way interested in rewriting it. I do know that if I take the body of the stored procedure and then declare/set the values of the parameters and run it in query analyzer that it runs more than 20x faster. From the internet, I've read that this is probably due to a bad cached query plan. So, I've tried running the sp with "WITH RECOMPILE" after the EXEC and I've also tried putting the "WITH RECOMPLE" inside the sp, but neither of those helped even a little bit. When I look at the execution plan of the sp vs the query, the biggest difference is that the sp has "Parallelism" operations all over the place and the query doesn't have any. Can this be the cause of the difference in speeds? Thank you, any ideas would be great... I'm stuck.

    Read the article

  • Improving Partitioned Table Join Performance

    - by Paul White
    The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) );   CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY);   WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50);   -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE);   -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory:   Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   CREATE TABLE #P ( partition_number integer PRIMARY KEY);   INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1;   SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals;   DROP TABLE #P;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated  – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

    Read the article

  • Tips for measuring the parallelism speed-up in multi-core development.

    - by fnCzar
    I have read many of the good questions and answers around multi-core programming how-tos etc. I am familiar with concurrency, IPC, MPI etc but what I need is advice on how to measure speed-up which will help in making a business case of spending the time to write such code. Please don't answer with "well run it with single-core code then multi-core code and figure out the difference". This is neither a scientific nor a reliable way to measure performance improvement. If you know of tools that will do some of the heavy lifting please mention them. Answers pertaining to methodology will be more fitting but listing tools is ok as well.

    Read the article

  • Execution plan different on two different Sql Servers

    - by Lieven Cardoen
    On our sql server, a query takes 20 ms. If I look at the execution plan, parallelism is used, hash match, Bitmap create, ... a lot of images with two arrows pointing to the left. On sql server of a customer where our product runs, the same query takes 2500 ms. If I look at their execution plan, no parallelism or any of the things with arrows are used... I've been searching for a couple of days no why the query runs so much slower on their sql server. Is parallelism and all of the other things something that can be configured on their sql server? And how to you configure that? What are the dangers of using parallelism? Another strange thing is that on our server, the query needs some 1200 reads and 0 writes. On their sql server it needs 1.5 million reads and some 1500 writes. Why these writes when a read query is done?

    Read the article

  • PPL and TPL sessions on channel9

    - by Daniel Moth
    Back in June there was an internal conference in Redmond ("Engineering Forum") aimed at Microsoft engineers, and delivered by Microsoft engineers. I was asked to put together a track on Multi-Core development, so I picked 6 parallelism experts and we created 6 awesome sessions (we won the top spot in the Top 10 :-)). Two of the speakers kept the content fairly external-friendly, so we received permission to publish their recordings publicly. Enjoy (best to download the High Quality WMV): Don McCrady - Parallelism in C++ Using the Concurrency Runtime Stephen Toub - Implementing Parallel Patterns using .NET 4 To get notified on future videos on parallelism (or to browse the archive) stay tuned on this channel9 parallel computing feed. Comments about this post welcome at the original blog.

    Read the article

  • The Uganda .NET Usergroup meeting for January 2011 - a look back.

    - by Malisa L. Ncube
    We had a very interesting meeting on Friday 28th last week. We had 10 attendees and two speakers. The first topic presented was Cloud Computing, presented by Allan Rwakatungu @arwakatungu who works with MTN Uganda. He gave a very brilliant outline of how Cloud computing and service oriented applications had begun changing the platform for operating business and the costs it saves because of scalability and elasticity. He went on to demonstrate the steps you would take if you are beginning a new Windows Azure project. He explained the history and evolution of the Windows Azure, SQL Azure and cloud services offered by Amazon and google.com. The attendees had many questions to ask (obviously), but they were all answered very well. We once again thank Allan, for taking time to prepare the presentation and demonstrating for us. We recorded a video on the entire presentation and after doing some editing we will publish it. One wish which was echoed by most members was that Microsoft should open the cloud services and development for Africa. Microsoft currently does not even have servers here in Africa and so far, that does not put African developers in the same platform as other developers in other continents. Now is the time considering the improvements in network speeds and joining of the Seacom network and broadband.   I presented on Parallelism and Multithreading using .NET 4.0, I also gave some details on the language changes in C# 5.0 and the async keyword and the TaskEx class. I explained the Task, Scheduling of parallel tasks and demonstrated problems that may arise from using parallelism inappropriately. I also demonstrated the performance improvements that may be achieved by taking advantage of multi-core processors. You may download the presentation on Parallelism and Multi-threading from here. The resolution of the meeting was that we should meet more than once a month and begin other activities which should be more fun. e.g. Geek Dinner, Geek Beer or CodeCamp. Based on that we all agreed we shall have a mid-month meeting starting from February. Cheers folks! del.icio.us Tags: .net,usergroup,cloud computing,parallelism,multi-threading

    Read the article

  • Programação paralela no .NET Framework 4 – Parte I

    - by anobre
    Introdução O avanço de tecnologia nos últimos anos forneceu, a baixo custo, acesso  a workstations com inúmeros CPUs. Facilmente encontramos hoje máquinas clientes com 2, 4 e até 8 núcleos, sem considerar os “super-servidores” com até 36 processadores :) Da wikipedia: A Unidade central de processamento (CPU, de acordo com as iniciais em inglês) ou o processador é a parte de um sistema de computador que executa as instruções de um programa de computador, e é o elemento primordial na execução das funções de um computador. Este termo tem sido usado na indústria de computadores pelo menos desde o início dos anos 1960[1]. A forma, desenho e implementação de CPUs têm mudado dramaticamente desde os primeiros exemplos, mas o seu funcionamento fundamental permanece o mesmo. Fazendo uma analogia, seria muito interessante delegarmos tarefas no mundo real que podem ser executadas independentemente a pessoas diferentes, atingindo desta forma uma  maior performance / produtividade na sua execução. A computação paralela se baseia na idéia que um problema maior pode ser dividido em problemas menores, sendo resolvidos de forma paralela. Este pensamento é utilizado há algum tempo por HPC (High-performance computing), e através das facilidades dos últimos anos, assim como a preocupação com consumo de energia, tornaram esta idéia mais atrativa e de fácil acesso a qualquer ambiente. No .NET Framework A plataforma .NET apresenta um runtime, bibliotecas e ferramentas para fornecer uma base de acesso fácil e rápido à programação paralela, sem trabalhar diretamente com threads e thread pool. Esta série de posts irá apresentar todos os recursos disponíveis, iniciando os estudos pela TPL, ou Task Parallel Library. Task Parallel Library A TPL é um conjunto de tipos localizados no namespace System.Threading e System.Threading.Tasks, a partir da versão 4 do framework. A partir da versão 4 do framework, o TPL é a maneira recomendada para escrever código paralelo e multithreaded. http://msdn.microsoft.com/en-us/library/dd460717(v=VS.100).aspx Task Parallelism O termo “task parallelism”, ou em uma tradução live paralelismo de tarefas, se refere a uma ou mais tarefas sendo executadas de forma simultanea. Considere uma tarefa como um método. A maneira mais fácil de executar tarefas de forma paralela é o código abaixo: Parallel.Invoke(() => TrabalhoInicial(), () => TrabalhoSeguinte()); O que acontece de verdade? Por trás nos panos, esta instrução instancia de forma implícita objetos do tipo Task, responsável por representar uma operação assíncrona, não exatamente paralela: public class Task : IAsyncResult, IDisposable É possível instanciar Tasks de forma explícita, sendo uma alternativa mais complexa ao Parallel.Invoke. var task = new Task(() => TrabalhoInicial()); task.Start(); Outra opção de instanciar uma Task e já executar sua tarefa é: var t = Task<int>.Factory.StartNew(() => TrabalhoInicialComValor());var t2 = Task<int>.Factory.StartNew(() => TrabalhoSeguinteComValor()); A diferença básica entre as duas abordagens é que a primeira tem início conhecido, mais utilizado quando não queremos que a instanciação e o agendamento da execução ocorra em uma só operação, como na segunda abordagem. Data Parallelism Ainda parte da TPL, o Data Parallelism se refere a cenários onde a mesma operação deva ser executada paralelamente em elementos de uma coleção ou array, através de instruções paralelas For e ForEach. A idéia básica é pegar cada elemento da coleção (ou array) e trabalhar com diversas threads concomitantemente. A classe-chave para este cenário é a System.Threading.Tasks.Parallel // Sequential version foreach (var item in sourceCollection) { Process(item); } // Parallel equivalent Parallel.ForEach(sourceCollection, item => Process(item)); Complicado né? :) Demonstração Acesse aqui um vídeo com exemplos (screencast). Cuidado! Apesar da imensa vontade de sair codificando, tome cuidado com alguns problemas básicos de paralelismo. Neste link é possível conhecer algumas situações. Abraços.

    Read the article

  • Understanding and Controlling Parallel Query Processing in SQL Server

    Data warehousing and general reporting applications tend to be CPU intensive because they need to read and process a large number of rows. To facilitate quick data processing for queries that touch a large amount of data, Microsoft SQL Server exploits the power of multiple logical processors to provide parallel query processing operations such as parallel scans. Through extensive testing, we have learned that, for most large queries that are executed in a parallel fashion, SQL Server can deliver linear or nearly linear response time speedup as the number of logical processors increases. However, some queries in high parallelism scenarios perform suboptimally. There are also some parallelism issues that can occur in a multi-user parallel query workload. This white paper describes parallel performance problems you might encounter when you run such queries and workloads, and it explains why these issues occur. In addition, it presents how data warehouse developers can detect these issues, and how they can work around them or mitigate them.

    Read the article

  • SQL SERVER – Wait Stats – Wait Types – Wait Queues – Day 0 of 28

    - by pinaldave
    This blog post will have running account of the all the blog post I will be doing in this month related to SQL Server Wait Types and Wait Queues. SQL SERVER – Introduction to Wait Stats and Wait Types – Wait Type – Day 1 of 28 SQL SERVER – Signal Wait Time Introduction with Simple Example – Wait Type – Day 2 of 28 SQL SERVER – DMV – sys.dm_os_wait_stats Explanation – Wait Type – Day 3 of 28 SQL SERVER – DMV – sys.dm_os_waiting_tasks and sys.dm_exec_requests – Wait Type – Day 4 of 28 SQL SERVER – Capturing Wait Types and Wait Stats Information at Interval – Wait Type – Day 5 of 28 SQL SERVER – CXPACKET – Parallelism – Usual Solution – Wait Type – Day 6 of 28 SQL SERVER – CXPACKET – Parallelism – Advanced Solution – Wait Type – Day 7 of 28 SQL SERVER – SOS_SCHEDULER_YIELD – Wait Type – Day 8 of 28 SQL SERVER – PAGEIOLATCH_DT, PAGEIOLATCH_EX, PAGEIOLATCH_KP, PAGEIOLATCH_SH, PAGEIOLATCH_UP – Wait Type – Day 9 of 28 SQL SERVER – IO_COMPLETION – Wait Type – Day 10 of 28 SQL SERVER – ASYNC_IO_COMPLETION – Wait Type – Day 11 of 28 SQL SERVER – PAGELATCH_DT, PAGELATCH_EX, PAGELATCH_KP, PAGELATCH_SH, PAGELATCH_UP – Wait Type – Day 12 of 28 SQL SERVER – FT_IFTS_SCHEDULER_IDLE_WAIT – Full Text – Wait Type – Day 13 of 28 SQL SERVER – BACKUPIO, BACKUPBUFFER – Wait Type – Day 14 of 28 SQL SERVER – LCK_M_XXX – Wait Type – Day 15 of 28 Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10  | Next Page >