Search Results

Search found 233 results on 10 pages for 'parallelism'.

Page 2/10 | < Previous Page | 1 2 3 4 5 6 7 8 9 10  | Next Page >

  • Parallelism in .NET – Part 17, Think Continuations, not Callbacks

    - by Reed
    In traditional asynchronous programming, we’d often use a callback to handle notification of a background task’s completion.  The Task class in the Task Parallel Library introduces a cleaner alternative to the traditional callback: continuation tasks. Asynchronous programming methods typically required callback functions.  For example, MSDN’s Asynchronous Delegates Programming Sample shows a class that factorizes a number.  The original method in the example has the following signature: public static bool Factorize(int number, ref int primefactor1, ref int primefactor2) { //... .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } However, calling this is quite “tricky”, even if we modernize the sample to use lambda expressions via C# 3.0.  Normally, we could call this method like so: int primeFactor1 = 0; int primeFactor2 = 0; bool answer = Factorize(10298312, ref primeFactor1, ref primeFactor2); Console.WriteLine("{0}/{1} [Succeeded {2}]", primeFactor1, primeFactor2, answer); If we want to make this operation run in the background, and report to the console via a callback, things get tricker.  First, we need a delegate definition: public delegate bool AsyncFactorCaller( int number, ref int primefactor1, ref int primefactor2); Then we need to use BeginInvoke to run this method asynchronously: int primeFactor1 = 0; int primeFactor2 = 0; AsyncFactorCaller caller = new AsyncFactorCaller(Factorize); caller.BeginInvoke(10298312, ref primeFactor1, ref primeFactor2, result => { int factor1 = 0; int factor2 = 0; bool answer = caller.EndInvoke(ref factor1, ref factor2, result); Console.WriteLine("{0}/{1} [Succeeded {2}]", factor1, factor2, answer); }, null); This works, but is quite difficult to understand from a conceptual standpoint.  To combat this, the framework added the Event-based Asynchronous Pattern, but it isn’t much easier to understand or author. Using .NET 4’s new Task<T> class and a continuation, we can dramatically simplify the implementation of the above code, as well as make it much more understandable.  We do this via the Task.ContinueWith method.  This method will schedule a new Task upon completion of the original task, and provide the original Task (including its Result if it’s a Task<T>) as an argument.  Using Task, we can eliminate the delegate, and rewrite this code like so: var background = Task.Factory.StartNew( () => { int primeFactor1 = 0; int primeFactor2 = 0; bool result = Factorize(10298312, ref primeFactor1, ref primeFactor2); return new { Result = result, Factor1 = primeFactor1, Factor2 = primeFactor2 }; }); background.ContinueWith(task => Console.WriteLine("{0}/{1} [Succeeded {2}]", task.Result.Factor1, task.Result.Factor2, task.Result.Result)); This is much simpler to understand, in my opinion.  Here, we’re explicitly asking to start a new task, then continue the task with a resulting task.  In our case, our method used ref parameters (this was from the MSDN Sample), so there is a little bit of extra boiler plate involved, but the code is at least easy to understand. That being said, this isn’t dramatically shorter when compared with our C# 3 port of the MSDN code above.  However, if we were to extend our requirements a bit, we can start to see more advantages to the Task based approach.  For example, supposed we need to report the results in a user interface control instead of reporting it to the Console.  This would be a common operation, but now, we have to think about marshaling our calls back to the user interface.  This is probably going to require calling Control.Invoke or Dispatcher.Invoke within our callback, forcing us to specify a delegate within the delegate.  The maintainability and ease of understanding drops.  However, just as a standard Task can be created with a TaskScheduler that uses the UI synchronization context, so too can we continue a task with a specific context.  There are Task.ContinueWith method overloads which allow you to provide a TaskScheduler.  This means you can schedule the continuation to run on the UI thread, by simply doing: Task.Factory.StartNew( () => { int primeFactor1 = 0; int primeFactor2 = 0; bool result = Factorize(10298312, ref primeFactor1, ref primeFactor2); return new { Result = result, Factor1 = primeFactor1, Factor2 = primeFactor2 }; }).ContinueWith(task => textBox1.Text = string.Format("{0}/{1} [Succeeded {2}]", task.Result.Factor1, task.Result.Factor2, task.Result.Result), TaskScheduler.FromCurrentSynchronizationContext()); This is far more understandable than the alternative.  By using Task.ContinueWith in conjunction with TaskScheduler.FromCurrentSynchronizationContext(), we get a simple way to push any work onto a background thread, and update the user interface on the proper UI thread.  This technique works with Windows Presentation Foundation as well as Windows Forms, with no change in methodology.

    Read the article

  • SQL SERVER – CXPACKET – Parallelism – Usual Solution – Wait Type – Day 6 of 28

    - by pinaldave
    CXPACKET has to be most popular one of all wait stats. I have commonly seen this wait stat as one of the top 5 wait stats in most of the systems with more than one CPU. Books On-Line: Occurs when trying to synchronize the query processor exchange iterator. You may consider lowering the degree of parallelism if contention on this wait type becomes a problem. CXPACKET Explanation: When a parallel operation is created for SQL Query, there are multiple threads for a single query. Each query deals with a different set of the data (or rows). Due to some reasons, one or more of the threads lag behind, creating the CXPACKET Wait Stat. There is an organizer/coordinator thread (thread 0), which takes waits for all the threads to complete and gathers result together to present on the client’s side. The organizer thread has to wait for the all the threads to finish before it can move ahead. The Wait by this organizer thread for slow threads to complete is called CXPACKET wait. Note that not all the CXPACKET wait types are bad. You might experience a case when it totally makes sense. There might also be cases when this is unavoidable. If you remove this particular wait type for any query, then that query may run slower because the parallel operations are disabled for the query. Reducing CXPACKET wait: We cannot discuss about reducing the CXPACKET wait without talking about the server workload type. OLTP: On Pure OLTP system, where the transactions are smaller and queries are not long but very quick usually, set the “Maximum Degree of Parallelism” to 1 (one). This way it makes sure that the query never goes for parallelism and does not incur more engine overhead. EXEC sys.sp_configure N'cost threshold for parallelism', N'1' GO RECONFIGURE WITH OVERRIDE GO Data-warehousing / Reporting server: As queries will be running for long time, it is advised to set the “Maximum Degree of Parallelism” to 0 (zero). This way most of the queries will utilize the parallel processor, and long running queries get a boost in their performance due to multiple processors. EXEC sys.sp_configure N'cost threshold for parallelism', N'0' GO RECONFIGURE WITH OVERRIDE GO Mixed System (OLTP & OLAP): Here is the challenge. The right balance has to be found. I have taken a very simple approach. I set the “Maximum Degree of Parallelism” to 2, which means the query still uses parallelism but only on 2 CPUs. However, I keep the “Cost Threshold for Parallelism” very high. This way, not all the queries will qualify for parallelism but only the query with higher cost will go for parallelism. I have found this to work best for a system that has OLTP queries and also where the reporting server is set up. Here, I am setting ‘Cost Threshold for Parallelism’ to 25 values (which is just for illustration); you can choose any value, and you can find it out by experimenting with the system only. In the following script, I am setting the ‘Max Degree of Parallelism’ to 2, which indicates that the query that will have a higher cost (here, more than 25) will qualify for parallel query to run on 2 CPUs. This implies that regardless of the number of CPUs, the query will select any two CPUs to execute itself. EXEC sys.sp_configure N'cost threshold for parallelism', N'25' GO EXEC sys.sp_configure N'max degree of parallelism', N'2' GO RECONFIGURE WITH OVERRIDE GO Read all the post in the Wait Types and Queue series. Additionally a must read comment of Jonathan Kehayias. Note: The information presented here is from my experience and I no way claim it to be accurate. I suggest you all to read the online book for further clarification. All the discussion of Wait Stats over here is generic and it varies from system to system. It is recommended that you test this on the development server before implementing on the production server. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: DMV, Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology

    Read the article

  • SQL SERVER – Relationship with Parallelism with Locks and Query Wait – Question for You

    - by Pinal Dave
    Today, I have one very simple question based on following image. A full disclaimer is that I have no idea why it is like that. I tried to reach out to few of my friends who know a lot about SQL Server but no one has any answer. Here is the question: If you go to server properties and click on Advanced you will see the following screen. Under the Parallelism section if you noticed there are four options: Cost Threshold for Parallelism Locks Max Degree of Parallelism Query Wait I can clearly understand why Cost Threshold for Parallelism and Max Degree of Parallelism belongs to Parallelism but I am not sure why we have two other options Locks and Query Wait belongs to Parallelism section. I can see that the options are ordered alphabetically but I do not understand the reason for locks and query wait to list under Parallelism. Here is the question for you – Why Locks and Query Wait options are listed under Parallelism section in SQL Server Advanced Properties? Please leave a comment with your explanation. I will publish valid answers on this blog with due credit. Reference: Pinal Dave (http://blog.sqlauthority.com)   Filed under: PostADay, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Parallelism implies concurrency but not the other way round right?

    - by Cedric Martin
    I often read that parallelism and concurrency are different things. Very often the answerers/commenters go as far as writing that they're two entirely different things. Yet in my view they're related but I'd like some clarification on that. For example if I'm on a multi-core CPU and manage to divide the computation into x smaller computation (say using fork/join) each running in its own thread, I'll have a program that is both doing parallel computation (because supposedly at any point in time several threads are going to run on several cores) and being concurrent right? While if I'm simply using, say, Java and dealing with UI events and repaints on the Event Dispatch Thread plus running the only thread I created myself, I'll have a program that is concurrent (EDT + GC thread + my main thread etc.) but not parallel. I'd like to know if I'm getting this right and if parallelism (on a "single but multi-cores" system) always implies concurrency or not? Also, are multi-threaded programs running on multi-cores CPU but where the different threads are doing totally different computation considered to be using "parallelism"?

    Read the article

  • Queries barely over the Cost Threshold for Parallelism

    - by jchang
    I had discussed SQL Server parallelism in Oct 2010, with my thoughts on the best settings for: Cost Threshold for Parallelism (CTP) and Max Degrees of Parallelism (MAXDOP) in Parallelism Strategy and Comments . At the time, I had intended to follow up with detailed measurements. So now a mere 2 years later, here it is. The general thought was that CTP should be raised from the default value of 5, and MAXDOP should be changed from unrestricted, on modern systems with very many cores, and most especially...(read more)

    Read the article

  • Queries barely over the Cost Threshold for Parallelism

    - by jchang
    Previously I had discussed SQL Server parallelism, with my thoughts on the best settings for: Cost Threshold for Parallelism (CTP) and Max Degrees of Parallelism (MAXDOP) in Parallelism Strategy and Comments . At the time, I had intended to follow up with detailed measurements. So now a mere 2 years later, here it is. The general thought was that CTP should be raised from the default value of 5, and MAXDOP should be changed from unrestricted, on modern systems with very many cores, and most especially...(read more)

    Read the article

  • What about parallelism across network using multiple PCs?

    - by MainMa
    Parallel computing is used more and more, and new framework features and shortcuts make it easier to use (for example Parallel extensions which are directly available in .NET 4). Now what about the parallelism across network? I mean, an abstraction of everything related to communications, creation of processes on remote machines, etc. Something like, in C#: NetworkParallel.ForEach(myEnumerable, () => { // Computing and/or access to web ressource or local network database here }); I understand that it is very different from the multi-core parallelism. The two most obvious differences would probably be: The fact that such parallel task will be limited to computing, without being able for example to use files stored locally (but why not a database?), or even to use local variables, because it would be rather two distinct applications than two threads of the same application, The very specific implementation, requiring not just a separate thread (which is quite easy), but spanning a process on different machines, then communicating with them over local network. Despite those differences, such parallelism is quite possible, even without speaking about distributed architecture. Do you think it will be implemented in a few years? Do you agree that it enables developers to easily develop extremely powerfull stuff with much less pain? Example: Think about a business application which extracts data from the database, transforms it, and displays statistics. Let's say this application takes ten seconds to load data, twenty seconds to transform data and ten seconds to build charts on a single machine in a company, using all the CPU, whereas ten other machines are used at 5% of CPU most of the time. In a such case, every action may be done in parallel, resulting in probably six to ten seconds for overall process instead of forty.

    Read the article

  • How can parallelism affect number of results?

    - by spender
    I have a fairly complex query that looks something like this: create table Items(SomeOtherTableID int,SomeField int) create table SomeOtherTable(Id int,GroupID int) with cte1 as ( select SomeOtherTableID,COUNT(*) SubItemCount from Items t where t.SomeField is not null group by SomeOtherTableID ),cte2 as ( select tc.SomeOtherTableID,ROW_NUMBER() over (partition by a.GroupID order by tc.SubItemCount desc) SubItemRank from Items t inner join SomeOtherTable a on a.Id=t.SomeOtherTableID inner join cte1 tc on tc.SomeOtherTableID=t.SomeOtherTableID where t.SomeField is not null ),cte3 as ( select SomeOtherTableID from cte2 where SubItemRank=1 ) select * from cte3 t1 inner join cte3 t2 on t1.SomeOtherTableID<t2.SomeOtherTableID option (maxdop 1) The query is such that cte3 is filled with 6222 distinct results. In the final select, I am performing a cross join on cte3 with itself, (so that I can compare every value in the table with every other value in the table at a later point). Notice the final line : option (maxdop 1) Apparently, this switches off parallelism. So, with 6222 results rows in cte3, I would expect (6222*6221)/2, or 19353531 results in the subsequent cross joining select, and with the final maxdop line in place, that is indeed the case. However, when I remove the maxdop line, the number of results jumps to 19380454. I have 4 cores on my dev box. WTF? Can anyone explain why this is? Do I need to reconsider previous queries that cross join in this way?

    Read the article

  • Gilda Garretón, a Java Developer and Parallelism Computing Researcher

    - by Yolande
    In a new interview titled “Gilda Garretón, a Java Developer and Parallelism Computing Research,” Garretón shares her first-hand experience developing with Java and Java 7 for very large-scale integration (VLSI) of computer-aided design (CAD). Garretón gives an insightful overview of how Java is contributing to the parallelism development and to the Electric VLSI Design Systems, an open source VLSI CAD application used as a research platform for new CAD algorithms as well as the research flow for hardware test chips.  Garretón considers that parallelism programming is hard and complex, yet important developments are taking place.  "With the addition of the concurrent package in Java SE 6 and the Fork/Join feature in Java SE 7, developers have a chance to rely more on existing frameworks and dedicate more time to the essence of their parallel algorithms." Read the full article here  

    Read the article

  • Max Degree of Parallelism Server-Side Setting

    - by Tara Kizer
    Recently I opened a case with Microsoft PSS to help us through a severe performance problem on a new system.  As part of that case, the PSS engineer checked our “max degree of parallelism” server-side setting.  It is our standard to use 4 on our production systems that have 16 CPUs (2 sockets, quad-core, hyper-threaded).  The PSS engineer had me run the below query to get Microsoft’s recommended value of “max degree of parallelism” server-side setting for our 16-CPU system: select case when cpu_count / hyperthread_ratio > 8 then 8 else cpu_count / hyperthread_ratio end as optimal_maxdop_setting from sys.dm_os_sys_info; The query returned 2.  I made the change using sp_configure, and it did not resolve our issue.  We have decided to leave it in place for now.   Do you agree with this query?  What are your thoughts on this? If you decide to change your setting to reflect the output of this query, please test it first to ensure there are no negative side effects.

    Read the article

  • Understanding and Using Parallelism in SQL Server

    SQL Server is able to make implicit use of parallelism to speed SQL queries. Quite how it does it, and how you can be sure that it is doing so, isn't entirely obvious to most of us. Paul White begins a series that makes it all seem simple, starting at the gentle level of counting Jelly Beans. Free trial of SQL Backup™“SQL Backup was able to cut down my backup time significantly AND achieved a 90% compression at the same time!” Joe Cheng. Download a free trial now.

    Read the article

  • How to Load Oracle Tables From Hadoop Tutorial (Part 5 - Leveraging Parallelism in OSCH)

    - by Bob Hanckel
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Using OSCH: Beyond Hello World In the previous post we discussed a “Hello World” example for OSCH focusing on the mechanics of getting a toy end-to-end example working. In this post we are going to talk about how to make it work for big data loads. We will explain how to optimize an OSCH external table for load, paying particular attention to Oracle’s DOP (degree of parallelism), the number of external table location files we use, and the number of HDFS files that make up the payload. We will provide some rules that serve as best practices when using OSCH. The assumption is that you have read the previous post and have some end to end OSCH external tables working and now you want to ramp up the size of the loads. Using OSCH External Tables for Access and Loading OSCH external tables are no different from any other Oracle external tables.  They can be used to access HDFS content using Oracle SQL: SELECT * FROM my_hdfs_external_table; or use the same SQL access to load a table in Oracle. INSERT INTO my_oracle_table SELECT * FROM my_hdfs_external_table; To speed up the load time, you will want to control the degree of parallelism (i.e. DOP) and add two SQL hints. ALTER SESSION FORCE PARALLEL DML PARALLEL  8; ALTER SESSION FORCE PARALLEL QUERY PARALLEL 8; INSERT /*+ append pq_distribute(my_oracle_table, none) */ INTO my_oracle_table SELECT * FROM my_hdfs_external_table; There are various ways of either hinting at what level of DOP you want to use.  The ALTER SESSION statements above force the issue assuming you (the user of the session) are allowed to assert the DOP (more on that in the next section).  Alternatively you could embed additional parallel hints directly into the INSERT and SELECT clause respectively. /*+ parallel(my_oracle_table,8) *//*+ parallel(my_hdfs_external_table,8) */ Note that the "append" hint lets you load a target table by reserving space above a given "high watermark" in storage and uses Direct Path load.  In other doesn't try to fill blocks that are already allocated and partially filled. It uses unallocated blocks.  It is an optimized way of loading a table without incurring the typical resource overhead associated with run-of-the-mill inserts.  The "pq_distribute" hint in this context unifies the INSERT and SELECT operators to make data flow during a load more efficient. Finally your target Oracle table should be defined with "NOLOGGING" and "PARALLEL" attributes.   The combination of the "NOLOGGING" and use of the "append" hint disables REDO logging, and its overhead.  The "PARALLEL" clause tells Oracle to try to use parallel execution when operating on the target table. Determine Your DOP It might feel natural to build your datasets in Hadoop, then afterwards figure out how to tune the OSCH external table definition, but you should start backwards. You should focus on Oracle database, specifically the DOP you want to use when loading (or accessing) HDFS content using external tables. The DOP in Oracle controls how many PQ slaves are launched in parallel when executing an external table. Typically the DOP is something you want to Oracle to control transparently, but for loading content from Hadoop with OSCH, it's something that you will want to control. Oracle computes the maximum DOP that can be used by an Oracle user. The maximum value that can be assigned is an integer value typically equal to the number of CPUs on your Oracle instances, times the number of cores per CPU, times the number of Oracle instances. For example, suppose you have a RAC environment with 2 Oracle instances. And suppose that each system has 2 CPUs with 32 cores. The maximum DOP would be 128 (i.e. 2*2*32). In point of fact if you are running on a production system, the maximum DOP you are allowed to use will be restricted by the Oracle DBA. This is because using a system maximum DOP can subsume all system resources on Oracle and starve anything else that is executing. Obviously on a production system where resources need to be shared 24x7, this can’t be allowed to happen. The use cases for being able to run OSCH with a maximum DOP are when you have exclusive access to all the resources on an Oracle system. This can be in situations when your are first seeding tables in a new Oracle database, or there is a time where normal activity in the production database can be safely taken off-line for a few hours to free up resources for a big incremental load. Using OSCH on high end machines (specifically Oracle Exadata and Oracle BDA cabled with Infiniband), this mode of operation can load up to 15TB per hour. The bottom line is that you should first figure out what DOP you will be allowed to run with by talking to the DBAs who manage the production system. You then use that number to derive the number of location files, and (optionally) the number of HDFS data files that you want to generate, assuming that is flexible. Rule 1: Find out the maximum DOP you will be allowed to use with OSCH on the target Oracle system Determining the Number of Location Files Let’s assume that the DBA told you that your maximum DOP was 8. You want the number of location files in your external table to be big enough to utilize all 8 PQ slaves, and you want them to represent equally balanced workloads. Remember location files in OSCH are metadata lists of HDFS files and are created using OSCH’s External Table tool. They also represent the workload size given to an individual Oracle PQ slave (i.e. a PQ slave is given one location file to process at a time, and only it will process the contents of the location file.) Rule 2: The size of the workload of a single location file (and the PQ slave that processes it) is the sum of the content size of the HDFS files it lists For example, if a location file lists 5 HDFS files which are each 100GB in size, the workload size for that location file is 500GB. The number of location files that you generate is something you control by providing a number as input to OSCH’s External Table tool. Rule 3: The number of location files chosen should be a small multiple of the DOP Each location file represents one workload for one PQ slave. So the goal is to keep all slaves busy and try to give them equivalent workloads. Obviously if you run with a DOP of 8 but have 5 location files, only five PQ slaves will have something to do and the other three will have nothing to do and will quietly exit. If you run with 9 location files, then the PQ slaves will pick up the first 8 location files, and assuming they have equal work loads, will finish up about the same time. But the first PQ slave to finish its job will then be rescheduled to process the ninth location file, potentially doubling the end to end processing time. So for this DOP using 8, 16, or 32 location files would be a good idea. Determining the Number of HDFS Files Let’s start with the next rule and then explain it: Rule 4: The number of HDFS files should try to be a multiple of the number of location files and try to be relatively the same size In our running example, the DOP is 8. This means that the number of location files should be a small multiple of 8. Remember that each location file represents a list of unique HDFS files to load, and that the sum of the files listed in each location file is a workload for one Oracle PQ slave. The OSCH External Table tool will look in an HDFS directory for a set of HDFS files to load.  It will generate N number of location files (where N is the value you gave to the tool). It will then try to divvy up the HDFS files and do its best to make sure the workload across location files is as balanced as possible. (The tool uses a greedy algorithm that grabs the biggest HDFS file and delegates it to a particular location file. It then looks for the next biggest file and puts in some other location file, and so on). The tools ability to balance is reduced if HDFS file sizes are grossly out of balance or are too few. For example suppose my DOP is 8 and the number of location files is 8. Suppose I have only 8 HDFS files, where one file is 900GB and the others are 100GB. When the tool tries to balance the load it will be forced to put the singleton 900GB into one location file, and put each of the 100GB files in the 7 remaining location files. The load balance skew is 9 to 1. One PQ slave will be working overtime, while the slacker PQ slaves are off enjoying happy hour. If however the total payload (1600 GB) were broken up into smaller HDFS files, the OSCH External Table tool would have an easier time generating a list where each workload for each location file is relatively the same.  Applying Rule 4 above to our DOP of 8, we could divide the workload into160 files that were approximately 10 GB in size.  For this scenario the OSCH External Table tool would populate each location file with 20 HDFS file references, and all location files would have similar workloads (approximately 200GB per location file.) As a rule, when the OSCH External Table tool has to deal with more and smaller files it will be able to create more balanced loads. How small should HDFS files get? Not so small that the HDFS open and close file overhead starts having a substantial impact. For our performance test system (Exadata/BDA with Infiniband), I compared three OSCH loads of 1 TiB. One load had 128 HDFS files living in 64 location files where each HDFS file was about 8GB. I then did the same load with 12800 files where each HDFS file was about 80MB size. The end to end load time was virtually the same. However when I got ridiculously small (i.e. 128000 files at about 8MB per file), it started to make an impact and slow down the load time. What happens if you break rules 3 or 4 above? Nothing draconian, everything will still function. You just won’t be taking full advantage of the generous DOP that was allocated to you by your friendly DBA. The key point of the rules articulated above is this: if you know that HDFS content is ultimately going to be loaded into Oracle using OSCH, it makes sense to chop them up into the right number of files roughly the same size, derived from the DOP that you expect to use for loading. Next Steps So far we have talked about OLH and OSCH as alternative models for loading. That’s not quite the whole story. They can be used together in a way that provides for more efficient OSCH loads and allows one to be more flexible about scheduling on a Hadoop cluster and an Oracle Database to perform load operations. The next lesson will talk about Oracle Data Pump files generated by OLH, and loaded using OSCH. It will also outline the pros and cons of using various load methods.  This will be followed up with a final tutorial lesson focusing on how to optimize OLH and OSCH for use on Oracle's engineered systems: specifically Exadata and the BDA. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

    Read the article

  • Parallelism in Python

    - by fmark
    What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise them. Coming from a C background, I am familiar with three approaches to parallelism: Message passing processes, possibly distributed across a cluster, e.g. MPI. Explicit shared memory parallelism, either using pthreads or fork(), pipe(), et. al Implicit shared memory parallelism, using OpenMP. Deciding on an approach to use is an exercise in trade-offs. In Python, what approaches are available and what are their characteristics? Is there a clusterable MPI clone? What are the preferred ways of achieving shared memory parallelism? I have heard reference to problems with the GIL, as well as references to tasklets. In short, what do I need to know about the different parallelization strategies in Python before choosing between them?

    Read the article

  • C++ Accelerated Massive Parallelism

    - by Daniel Moth
    At AMD's Fusion conference Herb Sutter announced in his keynote session a technology that our team has been working on that we call C++ Accelerated Massive Parallelism (C++ AMP) and during the keynote I showed a brief demo of an app built with our technology. After the keynote, I go deeper into the technology in my breakout session. If you read both those abstracts, you'll get some information about what C++ AMP is, without being too explicit since we published the abstracts before the technology was announced. You can find the official online announcement at Soma's blog post. Here, I just wanted to capture the key points about C++ AMP that can serve as an introduction and an FAQ. So, in no particular order… C++ AMP lowers the barrier to entry for heterogeneous hardware programmability and brings performance to the mainstream, without sacrificing developer productivity or solution portability. is designed not only to help you address today's massively parallel hardware (i.e. GPUs and APUs), but it also future proofs your code investments with a forward looking design. is part of Visual C++. You don't need to use a different compiler or learn different syntax. is modern C++. Not C or some other derivative. is integrated and supported fully in Visual Studio vNext. Editing, building, debugging, profiling and all the other goodness of Visual Studio work well with C++ AMP. provides an STL-like library as part of the existing concurrency namespace and delivered in the new amp.h header file. makes it extremely easy to work with large multi-dimensional data on heterogeneous hardware; in a manner that exposes parallelization. introduces only one core C++ language extension. builds on DirectX (and DirectCompute in particular) which offers a great hardware abstraction layer that is ubiquitous and reliable. The architecture is such, that this point can be thought of as an implementation detail that does not surface to the API layer. Stay tuned on my blog for more over the coming months where I will switch from just talking about C++ AMP to showing you how to use the API with code examples… Comments about this post welcome at the original blog.

    Read the article

  • SQL SERVER – CXPACKET – Parallelism – Advanced Solution – Wait Type – Day 7 of 28

    - by pinaldave
    Earlier we discussed about the what is the common solution to solve the issue with CXPACKET wait time. Today I am going to talk about few of the other suggestions which can help to reduce the CXPACKET wait. If you are going to suggest that I should focus on MAXDOP and COST THRESHOLD – I totally agree. I have covered them in details in yesterday’s blog post. Today we are going to discuss few other way CXPACKET can be reduced. Potential Reasons: If data is heavily skewed, there are chances that query optimizer may estimate the correct amount of the data leading to assign fewer thread to query. This can easily lead to uneven workload on threads and may create CXPAKCET wait. While retrieving the data one of the thread face IO, Memory or CPU bottleneck and have to wait to get those resources to execute its tasks, may create CXPACKET wait as well. Data which is retrieved is on different speed IO Subsystem. (This is not common and hardly possible but there are chances). Higher fragmentations in some area of the table can lead less data per page. This may lead to CXPACKET wait. As I said the reasons here mentioned are not the major cause of the CXPACKET wait but any kind of scenario can create the probable wait time. Best Practices to Reduce CXPACKET wait: Refer earlier article regarding MAXDOP and Cost Threshold. De-fragmentation of Index can help as more data can be obtained per page. (Assuming close to 100 fill-factor) If data is on multiple files which are on multiple similar speed physical drive, the CXPACKET wait may reduce. Keep the statistics updated, as this will give better estimate to query optimizer when assigning threads and dividing the data among available threads. Updating statistics can significantly improve the strength of the query optimizer to render proper execution plan. This may overall affect the parallelism process in positive way. Bad Practice: In one of the recent consultancy project, when I was called in I noticed that one of the ‘experienced’ DBA noticed higher CXPACKET wait and to reduce them, he has increased the worker threads. The reality was increasing worker thread has lead to many other issues. With more number of the threads, more amount of memory was used leading memory pressure. As there were more threads CPU scheduler faced higher ‘Context Switching’ leading further degrading performance. When I explained all these to ‘experienced’ DBA he suggested that now we should reduce the number of threads. Not really! Lower number of the threads may create heavy stalling for parallel queries. I suggest NOT to touch the setting of number of the threads when dealing with CXPACKET wait. Read all the post in the Wait Types and Queue series. Note: The information presented here is from my experience and I no way claim it to be accurate. I suggest reading book on-line for further clarification. All the discussion of Wait Stats over here is generic and it varies by system to system. You are recommended to test this on development server before implementing to production server. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: DMV, Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology

    Read the article

  • RiverTrail - JavaScript GPPGU Data Parallelism

    - by JoshReuben
    Where is WebCL ? The Khronos WebCL working group is working on a JavaScript binding to the OpenCL standard so that HTML 5 compliant browsers can host GPGPU web apps – e.g. for image processing or physics for WebGL games - http://www.khronos.org/webcl/ . While Nokia & Samsung have some protype WebCL APIs, Intel has one-upped them with a higher level of abstraction: RiverTrail. Intro to RiverTrail Intel Labs JavaScript RiverTrail provides GPU accelerated SIMD data-parallelism in web applications via a familiar JavaScript programming paradigm. It extends JavaScript with simple deterministic data-parallel constructs that are translated at runtime into a low-level hardware abstraction layer. With its high-level JS API, programmers do not have to learn a new language or explicitly manage threads, orchestrate shared data synchronization or scheduling. It has been proposed as a draft specification to ECMA a (known as ECMA strawman). RiverTrail runs in all popular browsers (except I.E. of course). To get started, download a prebuilt version https://github.com/downloads/RiverTrail/RiverTrail/rivertrail-0.17.xpi , install Intel's OpenCL SDK http://www.intel.com/go/opencl and try out the interactive River Trail shell http://rivertrail.github.com/interactive For a video overview, see  http://www.youtube.com/watch?v=jueg6zB5XaM . ParallelArray the ParallelArray type is the central component of this API & is a JS object that contains ordered collections of scalars – i.e. multidimensional uniform arrays. A shape property describes the dimensionality and size– e.g. a 2D RGBA image will have shape [height, width, 4]. ParallelArrays are immutable & fluent – they are manipulated by invoking methods on them which produce new ParallelArray objects. ParallelArray supports several constructors over arrays, functions & even the canvas. // Create an empty Parallel Array var pa = new ParallelArray(); // pa0 = <>   // Create a ParallelArray out of a nested JS array. // Note that the inner arrays are also ParallelArrays var pa = new ParallelArray([ [0,1], [2,3], [4,5] ]); // pa1 = <<0,1>, <2,3>, <4.5>>   // Create a two-dimensional ParallelArray with shape [3, 2] using the comprehension constructor var pa = new ParallelArray([3, 2], function(iv){return iv[0] * iv[1];}); // pa7 = <<0,0>, <0,1>, <0,2>>   // Create a ParallelArray from canvas.  This creates a PA with shape [w, h, 4], var pa = new ParallelArray(canvas); // pa8 = CanvasPixelArray   ParallelArray exposes fluent API functions that take an elemental JS function for data manipulation: map, combine, scan, filter, and scatter that return a new ParallelArray. Other functions are scalar - reduce  returns a scalar value & get returns the value located at a given index. The onus is on the developer to ensure that the elemental function does not defeat data parallelization optimization (avoid global var manipulation, recursion). For reduce & scan, order is not guaranteed - the onus is on the dev to provide an elemental function that is commutative and associative so that scan will be deterministic – E.g. Sum is associative, but Avg is not. map Applies a provided elemental function to each element of the source array and stores the result in the corresponding position in the result array. The map method is shape preserving & index free - can not inspect neighboring values. // Adding one to each element. var source = new ParallelArray([1,2,3,4,5]); var plusOne = source.map(function inc(v) {     return v+1; }); //<2,3,4,5,6> combine Combine is similar to map, except an index is provided. This allows elemental functions to access elements from the source array relative to the one at the current index position. While the map method operates on the outermost dimension only, combine, can choose how deep to traverse - it provides a depth argument to specify the number of dimensions it iterates over. The elemental function of combine accesses the source array & the current index within it - element is computed by calling the get method of the source ParallelArray object with index i as argument. It requires more code but is more expressive. var source = new ParallelArray([1,2,3,4,5]); var plusOne = source.combine(function inc(i) { return this.get(i)+1; }); reduce reduces the elements from an array to a single scalar result – e.g. Sum. // Calculate the sum of the elements var source = new ParallelArray([1,2,3,4,5]); var sum = source.reduce(function plus(a,b) { return a+b; }); scan Like reduce, but stores the intermediate results – return a ParallelArray whose ith elements is the results of using the elemental function to reduce the elements between 0 and I in the original ParallelArray. // do a partial sum var source = new ParallelArray([1,2,3,4,5]); var psum = source.scan(function plus(a,b) { return a+b; }); //<1, 3, 6, 10, 15> scatter a reordering function - specify for a certain source index where it should be stored in the result array. An optional conflict function can prevent an exception if two source values are assigned the same position of the result: var source = new ParallelArray([1,2,3,4,5]); var reorder = source.scatter([4,0,3,1,2]); // <2, 4, 5, 3, 1> // if there is a conflict use the max. use 33 as a default value. var reorder = source.scatter([4,0,3,4,2], 33, function max(a, b) {return a>b?a:b; }); //<2, 33, 5, 3, 4> filter // filter out values that are not even var source = new ParallelArray([1,2,3,4,5]); var even = source.filter(function even(iv) { return (this.get(iv) % 2) == 0; }); // <2,4> Flatten used to collapse the outer dimensions of an array into a single dimension. pa = new ParallelArray([ [1,2], [3,4] ]); // <<1,2>,<3,4>> pa.flatten(); // <1,2,3,4> Partition used to restore the original shape of the array. var pa = new ParallelArray([1,2,3,4]); // <1,2,3,4> pa.partition(2); // <<1,2>,<3,4>> Get return value found at the indices or undefined if no such value exists. var pa = new ParallelArray([0,1,2,3,4], [10,11,12,13,14], [20,21,22,23,24]) pa.get([1,1]); // 11 pa.get([1]); // <10,11,12,13,14>

    Read the article

  • SQLbits London 2012 - Demos

    - by Adam Machanic
    Thanks to everyone who attended my sessions last Friday and Saturday at SQLbits! It was great to meet many new people, not to mention spending some time exploring one of my favorite cities, London. Attached are the demos for each of the two talks I delivered: Query Tuning Mastery: The Art of and Science of Manhandling Parallelism As a database developer, your job boils down to one word: performance. And in today's multi-core-driven world, query performance is very much determined by how well you're...(read more)

    Read the article

  • Geek City: A Hint of Degrees

    - by Kalen Delaney
    This is just a quick post to describe a test I just ran to satisfy my own curiosity. I remember when Microsoft first introduced the query hint OPTION (MAXDOP N). We already had the configuration option ‘max degree of parallelism’, so there were lots of questions about how the hint interacted with the configuration option. Some people thought the configuration option set an absolute maximum, and the hint could only specify something less than that value to be meaningful. Other people thought differently,...(read more)

    Read the article

  • Code for Parallelism Features Tour

    Last year I linked to a screencast that shows off many VS2010 features delivered by the Parallel Computing team.There have been requests for the code used to demonstrate the features. Like with all my screencasts, you can see all the code in action, so you could simply type it in. To save you doing that though, you may download the two files with the demo code here: MM.cs and Program.cs. HTH. Comments about this post welcome at the original blog.

    Read the article

  • SQL SERVER – Partition Parallelism Support in expressor 3.6

    - by pinaldave
    I am very excited to learn that there is a new version of expressor’s data integration platform coming out in March of this year.  It will be version 3.6, and I look forward to using it and telling everyone about it.  Let me describe a little bit more about what will be so great in expressor 3.6: Greatly enhanced user interface Parallel Processing Bulk Artifact Upgrading The User Interface First let me cover the most obvious enhancements. The expressor Studio user interface (UI) has had some significant work done. Kudos to the expressor Engineering team; the entire UI is a visual masterpiece that is very responsive and intuitive. The improvements are more than just eye candy; they provide significant productivity gains when developing expressor Dataflows. Operator shape icons now include a description that identifies the function of each operator, instead of having to guess at the function by the icon. Operator shapes and highlighting depict the current function and status: Disabled, enabled, complete, incomplete, and error. Each status displays an appropriate message in the message panel with correction suggestions. Floating or docking property panels provide descriptive tool tips for each property as well as auto resize when adjusting the canvas, without having to search Help or the need to scroll around to get access to the property. Progress and status indicators let you know when an operation is working. “No limit” canvas with snap-to-grid allows automatic sizing and accurate positioning when you have numerous operators in the Dataflow. The inline tool bar offers quick access to pan, zoom, fit and overview functions. Selecting multiple artifacts with a right click context allows you to easily manage your workspace more efficiently. Partitioning and Parallel Processing Partitioning allows each operator to process multiple subsets of records in parallel as opposed to processing all records that flow through that operator in a single sequential set. This capability allows the user to configure the expressor Dataflow to run in a way that most efficiently utilizes the resources of the hardware where the Dataflow is running. Partitions can exist in most individual operators. Using partitions increases the speed of an expressor data integration application, therefore improving performance and load times. With the expressor 3.6 Enterprise Edition, expressor simplifies enabling parallel processing by adding intuitive partition settings that are easy to configure. Bulk Artifact Upgrading Bulk Artifact Upgrading sounds a bit intimidating, but it actually is not and it is a welcome addition to expressor Studio. In past releases, users were prompted to confirm that they wanted to upgrade their individual artifacts only when opened. This was a cumbersome and repetitive process. Now with bulk artifact upgrading, a user can easily select what artifact or group of artifacts to upgrade all at once. As you can see, there are many new features and upgrade options that will prove to make expressor Studio quicker and more efficient.  I hope I’m not the only one who is excited about all these new upgrades, and that I you try expressor and share your experience with me. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

    Read the article

  • SSIS and Parallelism: The Unseen Minions

    Sometimes, a procedural database process cannot easily be reduced to a set-based algorithm in order to reduce the time it takes. Then, you have to find other ways to parallelise it. Other ways? Josef shows how to use SSIS to drastically reduce the time that such a process takes.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10  | Next Page >