Search Results

Search found 87926 results on 3518 pages for 'deft code'.

Page 537/3518 | < Previous Page | 533 534 535 536 537 538 539 540 541 542 543 544  | Next Page >

  • parallel_for_each from amp.h – part 1

    - by Daniel Moth
    This posts assumes that you've read my other C++ AMP posts on index<N> and extent<N>, as well as about the restrict modifier. It also assumes you are familiar with C++ lambdas (if not, follow my links to C++ documentation). Basic structure and parameters Now we are ready for part 1 of the description of the new overload for the concurrency::parallel_for_each function. The basic new parallel_for_each method signature returns void and accepts two parameters: a grid<N> (think of it as an alias to extent) a restrict(direct3d) lambda, whose signature is such that it returns void and accepts an index of the same rank as the grid So it looks something like this (with generous returns for more palatable formatting) assuming we are dealing with a 2-dimensional space: // some_code_A parallel_for_each( g, // g is of type grid<2> [ ](index<2> idx) restrict(direct3d) { // kernel code } ); // some_code_B The parallel_for_each will execute the body of the lambda (which must have the restrict modifier), on the GPU. We also call the lambda body the "kernel". The kernel will be executed multiple times, once per scheduled GPU thread. The only difference in each execution is the value of the index object (aka as the GPU thread ID in this context) that gets passed to your kernel code. The number of GPU threads (and the values of each index) is determined by the grid object you pass, as described next. You know that grid is simply a wrapper on extent. In this context, one way to think about it is that the extent generates a number of index objects. So for the example above, if your grid was setup by some_code_A as follows: extent<2> e(2,3); grid<2> g(e); ...then given that: e.size()==6, e[0]==2, and e[1]=3 ...the six index<2> objects it generates (and hence the values that your lambda would receive) are:    (0,0) (1,0) (0,1) (1,1) (0,2) (1,2) So what the above means is that the lambda body with the algorithm that you wrote will get executed 6 times and the index<2> object you receive each time will have one of the values just listed above (of course, each one will only appear once, the order is indeterminate, and they are likely to call your code at the same exact time). Obviously, in real GPU programming, you'd typically be scheduling thousands if not millions of threads, not just 6. If you've been following along you should be thinking: "that is all fine and makes sense, but what can I do in the kernel since I passed nothing else meaningful to it, and it is not returning any values out to me?" Passing data in and out It is a good question, and in data parallel algorithms indeed you typically want to pass some data in, perform some operation, and then typically return some results out. The way you pass data into the kernel, is by capturing variables in the lambda (again, if you are not familiar with them, follow the links about C++ lambdas), and the way you use data after the kernel is done executing is simply by using those same variables. In the example above, the lambda was written in a fairly useless way with an empty capture list: [ ](index<2> idx) restrict(direct3d), where the empty square brackets means that no variables were captured. If instead I write it like this [&](index<2> idx) restrict(direct3d), then all variables in the some_code_A region are made available to the lambda by reference, but as soon as I try to use any of those variables in the lambda, I will receive a compiler error. This has to do with one of the direct3d restrictions, where only one type can be capture by reference: objects of the new concurrency::array class that I'll introduce in the next post (suffice for now to think of it as a container of data). If I write the lambda line like this [=](index<2> idx) restrict(direct3d), all variables in the some_code_A region are made available to the lambda by value. This works for some types (e.g. an integer), but not for all, as per the restrictions for direct3d. In particular, no useful data classes work except for one new type we introduce with C++ AMP: objects of the new concurrency::array_view class, that I'll introduce in the post after next. Also note that if you capture some variable by value, you could use it as input to your algorithm, but you wouldn’t be able to observe changes to it after the parallel_for_each call (e.g. in some_code_B region since it was passed by value) – the exception to this rule is the array_view since (as we'll see in a future post) it is a wrapper for data, not a container. Finally, for completeness, you can write your lambda, e.g. like this [av, &ar](index<2> idx) restrict(direct3d) where av is a variable of type array_view and ar is a variable of type array - the point being you can be very specific about what variables you capture and how. So it looks like from a large data perspective you can only capture array and array_view objects in the lambda (that is how you pass data to your kernel) and then use the many threads that call your code (each with a unique index) to perform some operation. You can also capture some limited types by value, as input only. When the last thread completes execution of your lambda, the data in the array_view or array are ready to be used in the some_code_B region. We'll talk more about all this in future posts… (a)synchronous Please note that the parallel_for_each executes as if synchronous to the calling code, but in reality, it is asynchronous. I.e. once the parallel_for_each call is made and the kernel has been passed to the runtime, the some_code_B region continues to execute immediately by the CPU thread, while in parallel the kernel is executed by the GPU threads. However, if you try to access the (array or array_view) data that you captured in the lambda in the some_code_B region, your code will block until the results become available. Hence the correct statement: the parallel_for_each is as-if synchronous in terms of visible side-effects, but asynchronous in reality.   That's all for now, we'll revisit the parallel_for_each description, once we introduce properly array and array_view – coming next. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • Google+ Platform Office Hours for June 13th, 2012

    Google+ Platform Office Hours for June 13th, 2012 Here are the show notes for this week's office hours. This week was devoted to your questions and our answers. We covered a wide breadth of topics. 0:43 - Introductions 2:54 - About Tabletop Forge's KickStarter - goo.gl 10:00 - Can I run multiple Hangout Apps at the same time? 12:28 - Is Google looking into adding more powerful Hangout moderation controls? 13:47 - How do you use Hangout Apps with Hangouts on Air? - +Fraser Cain's tips and tricks for Hangouts on Air: goo.gl 23:40 - I have an Android game. How do I port it to the Hangouts API? 27:57 - Pre-hangout Apps, Hangouts on Air pre-rolls, scheduling hangouts and other ways to help viewers find your Hangouts on Air 33:55 - How do I bookmark useful Google+ posts with Google+? 38:13 - Can you add a host ID field to the Hangouts API? When will the overlay garbage collection improve? 40:17 - Hand movement tracking as part of the Hangouts API Thanks to everyone who joined the hangout and asked questions on Google+! From: GoogleDevelopers Views: 698 18 ratings Time: 44:16 More in Science & Technology

    Read the article

  • Geo for Good Summit Highlights

    Geo for Good Summit Highlights The last week of September, Google hosted the Geo for Good User Summit, for nonprofit mapping and technology specialists to update the nonprofit community about new and special features of Google's mapping products. In this week's Maps Developers Live event, Mano Marks from Maps Developer Relations and Raleigh Seamster, Program Manager with the Google Earth Outreach team will talk about the highlights of the Summit and show off some great examples of people using Maps to help the world. From: GoogleDevelopers Views: 0 0 ratings Time: 00:00 More in Education

    Read the article

  • How to become a good team player?

    - by Nick
    I've been programming (obsessively) since I was 12. I am fairly knowledgeable across the spectrum of languages out there, from assembly, to C++, to Javascript, to Haskell, Lisp, and Qi. But all of my projects have been by myself. I got my degree in chemical engineering, not CS or computer engineering, but for the first time this fall I'll be working on a large programming project with other people, and I have no clue how to prepare. I've been using Windows all of my life, but this project is going to be very unix-y, so I purchased a Mac recently in the hopes of familiarizing myself with the environment. I was fortunate to participate in a hackathon with some friends this past year -- both CS majors -- and excitingly enough, we won. But I realized as I worked with them that their workflow was very different from mine. They used Git for version control. I had never used it at the time, but I've since learned all that I can about it. They also used a lot of frameworks and libraries. I had to learn what Rails was pretty much overnight for the hackathon (on the other hand, they didn't know what lexical scoping or closures were). All of our code worked well, but they didn't understand mine, and I didn't understand theirs. I hear references to things that real programmers do on a daily basis -- unit testing, code reviews, but I only have the vaguest sense of what these are. I normally don't have many bugs in my little projects, so I have never needed a bug tracking system or tests for them. And the last thing is that it takes me a long time to understand other people's code. Variable naming conventions (that vary with each new language) are difficult (__mzkwpSomRidicAbbrev), and I find the loose coupling difficult. That's not to say I don't loosely couple things -- I think I'm quite good at it for my own work, but when I download something like the Linux kernel or the Chromium source code to look at it, I spend hours trying to figure out how all of these oddly named directories and files connect. It's a programming sin to reinvent the wheel, but I often find it's just quicker to write up the functionality myself than to spend hours dissecting some library. Obviously, people who do this for a living don't have these problems, and I'll need to get to that point myself. Question: What are some steps that I can take to begin "integrating" with everyone else? Thanks!

    Read the article

  • Running C++ AMP kernels on the CPU

    - by Daniel Moth
    One of the FAQs we receive is whether C++ AMP can be used to target the CPU. For targeting multi-core we have a technology we released with VS2010 called PPL, which has had enhancements for VS 11 – that is what you should be using! FYI, it also has a Linux implementation via Intel's TBB which conforms to the same interface. When you choose to use C++ AMP, you choose to take advantage of massively parallel hardware, through accelerators like the GPU. Having said that, you can always use the accelerator class to check if you are running on a system where the is no hardware with a DirectX 11 driver, and decide what alternative code path you wish to follow.  In fact, if you do nothing in code, if the runtime does not find DX11 hardware to run your code on, it will choose the WARP accelerator which will run your code on the CPU, taking advantage of multi-core and SSE2 (depending on the CPU capabilities WARP also uses SSE3 and SSE 4.1 – it does not currently use AVX and on such systems you hopefully have a DX 11 GPU anyway). A few things to know about WARP It is our fallback CPU solution, not intended as a primary target of C++ AMP. WARP stands for Windows Advanced Rasterization Platform and you can read old info on this MSDN page on WARP. What is new in Windows 8 Developer Preview is that WARP now supports DirectCompute, which is what C++ AMP builds on. It is not currently clear if we will have a CPU fallback solution for non-Windows 8 platforms when we ship. When you create a WARP accelerator, its is_emulated property returns true. WARP does not currently support double precision.   BTW, when we refer to WARP, we refer to this accelerator described above. If we use lower case "warp", that refers to a bunch of threads that run concurrently in lock step and share the same instruction. In the VS 11 Developer Preview, the size of warp in our Ref emulator is 4 – Ref is another emulator that runs on the CPU, but it is extremely slow not intended for production, just for debugging. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • Load Balance and Parallel Performance

    Load balancing an application workload among threads is critical to performance. However, achieving perfect load balance is non-trivial, and it depends on the parallelism within the application, workload, the number of threads, load balancing policy, and the threading implementation.

    Read the article

  • Google I/O 2012 - Writing Polished Apps that have Deep Integration into the Google Drive UI

    Google I/O 2012 - Writing Polished Apps that have Deep Integration into the Google Drive UI Mike Procopio, Steve Bazyl We'll go through how to implement complete Drive apps. This is not an introduction to Drive apps, but rather how to build your product into Google Drive, and ensure that the experience is seamless for a user. We will also discuss how to effectively distribute your app in the Chrome Web Store. The example app built in this talk will demonstrate an example use case, but otherwise be production-ready. For all I/O 2012 sessions, go to developers.google.com From: GoogleDevelopers Views: 829 5 ratings Time: 50:59 More in Science & Technology

    Read the article

  • How to remove the boundary effects arising due to zero padding in scipy/numpy fft?

    - by Omkar
    I have made a python code to smoothen a given signal using the Weierstrass transform, which is basically the convolution of a normalised gaussian with a signal. The code is as follows: #Importing relevant libraries from __future__ import division from scipy.signal import fftconvolve import numpy as np def smooth_func(sig, x, t= 0.002): N = len(x) x1 = x[-1] x0 = x[0] # defining a new array y which is symmetric around zero, to make the gaussian symmetric. y = np.linspace(-(x1-x0)/2, (x1-x0)/2, N) #gaussian centered around zero. gaus = np.exp(-y**(2)/t) #using fftconvolve to speed up the convolution; gaus.sum() is the normalization constant. return fftconvolve(sig, gaus/gaus.sum(), mode='same') If I run this code for say a step function, it smoothens the corner, but at the boundary it interprets another corner and smoothens that too, as a result giving unnecessary behaviour at the boundary. I explain this with a figure shown in the link below. Boundary effects This problem does not arise if we directly integrate to find convolution. Hence the problem is not in Weierstrass transform, and hence the problem is in the fftconvolve function of scipy. To understand why this problem arises we first need to understand the working of fftconvolve in scipy. The fftconvolve function basically uses the convolution theorem to speed up the computation. In short it says: convolution(int1,int2)=ifft(fft(int1)*fft(int2)) If we directly apply this theorem we dont get the desired result. To get the desired result we need to take the fft on a array double the size of max(int1,int2). But this leads to the undesired boundary effects. This is because in the fft code, if size(int) is greater than the size(over which to take fft) it zero pads the input and then takes the fft. This zero padding is exactly what is responsible for the undesired boundary effects. Can you suggest a way to remove this boundary effects? I have tried to remove it by a simple trick. After smoothening the function I am compairing the value of the smoothened signal with the original signal near the boundaries and if they dont match I replace the value of the smoothened func with the input signal at that point. It is as follows: i = 0 eps=1e-3 while abs(smooth[i]-sig[i])> eps: #compairing the signals on the left boundary smooth[i] = sig[i] i = i + 1 j = -1 while abs(smooth[j]-sig[j])> eps: # compairing on the right boundary. smooth[j] = sig[j] j = j - 1 There is a problem with this method, because of using an epsilon there are small jumps in the smoothened function, as shown below: jumps in the smooth func Can there be any changes made in the above method to solve this boundary problem?

    Read the article

  • Google I/O 2012 - Playing with Patterns

    Google I/O 2012 - Playing with Patterns "Marco Paglia Best-in-class application designers and developers will talk about their experience in developing for Android, showing screenshots from their app, exploring the challenges they faced, and offering creative solutions congruent with the Android Design guide. Guests will be invited to show examples of visual and interaction patterns in their application that manage to keep it simultaneously consistent and personal. For all I/O 2012 sessions, go to developers.google.com From: GoogleDevelopers Views: 1 0 ratings Time: 02:13:20 More in Science & Technology

    Read the article

< Previous Page | 533 534 535 536 537 538 539 540 541 542 543 544  | Next Page >