Search Results

Search found 77 results on 4 pages for 'gpgpu'.

Page 1/4 | 1 2 3 4  | Next Page >

  • GPGPU

    WhatGPU obviously stands for Graphics Processing Unit (the silicon powering the display you are using to read this blog post). The extra GP in front of that stands for General Purpose computing.So, altogether GPGPU refers to computing we can perform on GPU for purposes beyond just drawing on the screen. In effect, we can use a GPGPU a bit like we already use a CPU: to perform some calculation (that doesn’t have to have any visual element to it). The attraction is that a GPGPU can be orders of magnitude faster than a CPU.WhyWhen I was at the SuperComputing conference in Portland last November, GPGPUs were all the rage. A quick online search reveals many articles introducing the GPGPU topic. I'll just share 3 here: pcper (ignoring all pages except the first, it is a good consumer perspective), gizmodo (nice take using mostly layman terms) and vizworld (answering the question on "what's the big deal").The GPGPU programming paradigm (from a high level) is simple: in your CPU program you define functions (aka kernels) that take some input, can perform the costly operation and return the output. The kernels are the things that execute on the GPGPU leveraging its power (and hence execute faster than what they could on the CPU) while the host CPU program waits for the results or asynchronously performs other tasks.However, GPGPUs have different characteristics to CPUs which means they are suitable only for certain classes of problem (i.e. data parallel algorithms) and not for others (e.g. algorithms with branching or recursion or other complex flow control). You also pay a high cost for transferring the input data from the CPU to the GPU (and vice versa the results back to the CPU), so the computation itself has to be long enough to justify the overhead transfer costs. If your problem space fits the criteria then you probably want to check out this technology.HowSo where can you get a graphics card to start playing with all this? At the time of writing, the two main vendors ATI (owned by AMD) and NVIDIA are the obvious players in this industry. You can read about GPGPU on this AMD page and also on this NVIDIA page. NVIDIA's website also has a free chapter on the topic from the "GPU Gems" book: A Toolkit for Computation on GPUs.If you followed the links above, then you've already come across some of the choices of programming models that are available today. Essentially, AMD is offering their ATI Stream technology accessible via a language they call Brook+; NVIDIA offers their CUDA platform which is accessible from CUDA C. Choosing either of those locks you into the GPU vendor and hence your code cannot run on systems with cards from the other vendor (e.g. imagine if your CPU code would run on Intel chips but not AMD chips). Having said that, both vendors plan to support a new emerging standard called OpenCL, which theoretically means your kernels can execute on any GPU that supports it. To learn more about all of these there is a website: gpgpu.org. The caveat about that site is that (currently) it completely ignores the Microsoft approach, which I touch on next.On Windows, there is already a cross-GPU-vendor way of programming GPUs and that is the DirectX API. Specifically, on Windows Vista and Windows 7, the DirectX 11 API offers a dedicated subset of the API for GPGPU programming: DirectCompute. You use this API on the CPU side, to set up and execute the kernels that run on the GPU. The kernels are written in a language called HLSL (High Level Shader Language). You can use DirectCompute with HLSL to write a "compute shader", which is the term DirectX uses for what I've been referring to in this post as a "kernel". For a comprehensive collection of links about this (including tutorials, videos and samples) please see my blog post: DirectCompute.Note that there are many efforts to build even higher level languages on top of DirectX that aim to expose GPGPU programming to a wider audience by making it as easy as today's mainstream programming models. I'll mention here just two of those efforts: Accelerator from MSR and Brahma by Ananth. Comments about this post welcome at the original blog.

    Read the article

  • gpgpu vs. physX for physics simulation

    - by notabene
    Hello First theoretical question. What is better (faster)? Develop your own gpgpu techniques for physics simulation (cloth, fluids, colisions...) or to use PhysX? (If i say develop i mean implement existing algorithms like navier-strokes...) I don't care about what will take more time to develop. What will be faster for end user? As i understand that physx are accelerated through PPU units in gpu, does it mean that physical simulation can run in paralel with rastarization? Are PPUs different units than unified shader units used as vertex/geometry/pixel/gpgpu shader units? And little non-theoretical question: Is physx able to do sofisticated simulation equal to lets say Autodesk's Maya fluid solver? Are there any c++ gpu accelerated physics frameworks to try? (I am interested in both physx and gpgpu, commercial engines are ok too).

    Read the article

  • Using SurfaceFormat.Single and HLSL for GPGPU with XNA

    - by giancarlo todone
    I'm trying to implement a so-called ping-pong technique in XNA; you basically have two RenderTarget2D A and B and at each iteration you use one as texture and the other as target - and vice versa - for a quad rendered through an HLSL pixel shader. step1: A--PS--B step2: B--PS--A step3: A--PS--B ... In my setup, both RenderTargets are SurfaceFormat.Single. In my .fx file, I have a tachnique to do the update, and another to render the "current buffer" to the screen. Before starting the "ping-pong", buffer A is filled with test data with SetData<float>(float[]) function: this seems to work properly, because if I render a quad on the screen through the "Draw" pixel shader, i do see the test data being correctly rendered. However, if i do update buffer B, something does not function proerly and the next rendering to screen will be all black. For debug purposes, i replaced the "Update" HLSL pixel shader with one that should simply copy buffer A into B (or B into A depending on which among "ping" and "pong" phases we are...). From some examples i found on the net, i see that in order to correctly fetch a float value from a texture sampler from HLSL code, i should only need to care for the red channel. So, basically the debug "Update" HLSL function is: float4 ComputePS(float2 inPos : TEXCOORD0) : COLOR0 { float v1 = tex2D(bufSampler, inPos.xy).r; return float4(v1,0,0,1); } which still doesn't work and results in a all-zeroes ouput. Here's the "Draw" function that seems to properly display initial data: float4 DrawPS(float2 inPos : TEXCOORD0) : COLOR0 { float v1 = tex2D(bufSampler, inPos.xy).r; return float4(v1,v1,v1,1); } Now: playing around with HLSL doesn't change anything, so maybe I'm missing something on the c# side of this, so here's the infamous Update() function: _effect.Parameters["bufTexture"].SetValue(buf[_currentBuf]); _graphicsDevice.SetRenderTarget(buf[1 - _currentBuf]); _graphicsDevice.Clear(Color.Black); // probably not needed since RenderTargetUsage is DiscardContents _effect.CurrentTechnique = _computeTechnique; _computeTechnique.Passes[0].Apply(); _quadRender.Render(); _graphicsDevice.SetRenderTarget(null); _currentBuf = 1 - _currentBuf; Any clue?

    Read the article

  • Financial applications on GPGPU

    - by CUDA-dev
    I want to know what sort of financial applications can be implemented using a GPGPU. I'm aware of Option pricing/ Stock price estimation using Monte Carlo simulation on GPGPU using CUDA. Can someone enumerate the various possibilities of utilizing GPGPU for any application in Finance domain,

    Read the article

  • Are you a GPGPU developer? Participate in our UX study

    - by Daniel Moth
    You know that I work on the parallel debugger in Visual Studio and I've talked about GPGPU before and I have also mentioned UX. Below is a request from my UX colleagues that pulls all of it together. If you write and debug parallel code that uses GPUs for non-graphical, computationally intensive operations keep reading. The Microsoft Visual Studio Parallel Computing team is seeking developers for a 90-minute research study. The study will take place via LiveMeeting or at a usability lab in Redmond, depending on your preference. We will walk you through an example of debugging GPGPU code in Visual Studio with you giving us step-by-step feedback. ("Is this what you would you expect?", "Are we showing you the things that would help you?", "How would you improve this") The walkthrough utilizes a “paper” version of our current design. After the walkthrough, we would then show you some additional design ideas and seek your input on various design tradeoffs. Are you interested or know someone who might be a good fit? Let us know at this address: [email protected]. Those who participate (and those who referred them), will receive a gratuity item from a list of current Microsoft products. Comments about this post welcome at the original blog.

    Read the article

  • Are you a GPGPU developer? Participate in our UX study

    - by Daniel Moth
    You know that I work on the parallel debugger in Visual Studio and I've talked about GPGPU before and I have also mentioned UX. Below is a request from my UX colleagues that pulls all of it together. If you write and debug parallel code that uses GPUs for non-graphical, computationally intensive operations keep reading. The Microsoft Visual Studio Parallel Computing team is seeking developers for a 90-minute research study. The study will take place via LiveMeeting or at a usability lab in Redmond, depending on your preference. We will walk you through an example of debugging GPGPU code in Visual Studio with you giving us step-by-step feedback. ("Is this what you would you expect?", "Are we showing you the things that would help you?", "How would you improve this") The walkthrough utilizes a “paper” version of our current design. After the walkthrough, we would then show you some additional design ideas and seek your input on various design tradeoffs. Are you interested or know someone who might be a good fit? Let us know at this address: [email protected]. Those who participate (and those who referred them), will receive a gratuity item from a list of current Microsoft products. Comments about this post welcome at the original blog.

    Read the article

  • Best approach for GPGPU/CUDA/OpenCL in Java?

    - by Frederik
    General-purpose computing on graphics processing units (GPGPU) is a very attractive concept to harness the power of the GPU for any kind of computing. I'd love to use GPGPU for image processing, particles, and fast geometric operations. Right now, it seems the two contenders in this space are CUDA and OpenCL. I'd like to know: Is OpenCL usable yet from Java on Windows/Mac? What are the libraries ways to interface to OpenCL/CUDA? Is using JNA directly an option? Am I forgetting something? Any real-world experience/examples/war stories are appreciated.

    Read the article

  • GPGPU programming with OpenGL ES 2.0

    - by Albus Dumbledore
    I am trying to do some image processing on the GPU, e.g. median, blur, brightness, etc. The general idea is to do something like this framework from GPU Gems 1. I am able to write the GLSL fragment shader for processing the pixels as I've been trying out different things in an effect designer app. I am not sure however how I should do the other part of the task. That is, I'd like to be working on the image in image coords and then outputting the result to a texture. I am aware of the gl_FragCoords variable. As far as I understand it it goes like that: I need to set up a view (an orthographic one maybe?) and a quad in such a way so that the pixel shader would be applied once to each pixel in the image and so that it would be rendering to a texture or something. But how can I achieve that considering there's depth that may make things somewhat awkward to me... I'd be very grateful if anyone could help me with this rather simple task as I am really frustrated with myself. UPDATE It seems I'll have to use an FBO, getting one like this: glBindFramebuffer(...)

    Read the article

  • Untrusted GPGPU code (OpenCL etc) - is it safe? What risks?

    - by Grzegorz Wierzowiecki
    There are many approaches when it goes about running untrusted code on typical CPU : sandboxes, fake-roots, virtualization... What about untrusted code for GPGPU (OpenCL,cuda or already compiled one) ? Assuming that memory on graphics card is cleared before running such third-party untrusted code, are there any security risks? What kind of risks? Any way to prevent them ? (Possible sandboxing on gpgpu or other technique?) P.S. I am more interested in gpu binary code level security rather than hight-level gpgpu programming language security (But those solutions are welcome as well). What I mean is that references to gpu opcodes (a.k.a machine code) are welcome.

    Read the article

  • DirectCompute

    In my previous blog post I introduced the concept of GPGPU ending with:On Windows, there is already a cross-GPU-vendor way of programming GPUs and that is the Direct X API. Specifically, on Windows Vista and Windows 7, the DirectX 11 API offers a dedicated subset of the API for GPGPU programming: DirectCompute. You use this API on the CPU side, to set up and execute the kernels on the GPU. The kernels are written in a language called HLSL (High Level Shader Language). You can use DirectCompute with HLSL to write a "compute shader", which is the term DirectX uses for what I've been referring to in this post as "kernel".In this post I want to share some links to get you started with DirectCompute and HLSL.1. Watch the recording of the PDC 09 session: DirectX11 DirectCompute.2. If session recordings is your thing there are two more on DirectCompute from nvidia's GTC09 conference 1015 (pdf, mp4) and 1411 (mp4 plus the presenter's written version of the session).3. Over at gamedev there is an old Compute Shader tutorial. At the same site, there is a 3-part blog post on Compute Shader: Introduction, Resources and Addressing.4. From PDC, you can also download the DirectCompute Hands On Lab.5. When you are ready to get your hands even dirtier, download the latest Windows DirectX SDK (at the time of writing the latest is dated Feb 2010).6. Within the SDK you'll find a Compute Shader Overview and samples such as: Basic, Sort, OIT, NBodyGravity, HDR Tone Mapping.7. Talking of DX11/DirectCompute samples, there are also a couple of good ones on this URL.8. The documentation of the various APIs is available online. Here are just some good (but far from complete) taster entry points into that: numthreads, SV_DispatchThreadID, SV_GroupThreadID, SV_GroupID, SV_GroupIndex, D3D11CreateDevice, D3DX11CompileFromFile, CreateComputeShader, Dispatch, D3D11_BIND_FLAG, GSSetShader. Comments about this post welcome at the original blog.

    Read the article

  • .NET access to the GPU for compute purposes

    - by Daniel Moth
    In the distant past I talked about GPGPU and Microsoft's then approach of DirectCompute. Since then of course we now have C++ AMP coming out with Visual Studio 11, so there is a mainstream easier way for developers to access the GPU for compute purposes, using C++. The question occasionally arises of how can a .NET developer access the GPU for compute purposes from their C# (or VB) code. The answer is by interoping from the managed code to a native DLL and in the native DLL use C++ AMP. As a long term .NET developer myself, I can tell you this is straightforward. Sure, there could have been a managed wrapper for C++ AMP, but honestly that is the reason we have interop – it doesn't make much sense to invest resources to solve a problem that is already solved (most developer customers would prefer investments in other areas of Visual Studio!). Besides, interoping from C# to C++ is much easier than interoping to some of the other older approaches of GPGPU programming ;-) To help you get started with the interop approach, Igor Ostrovsky has previously shared the "Hello World" version of interoping from C# to C++ AMP in his blog post: How to use C++ AMP from C# …we then were asked specifically about how to interop from C# to C++ AMP in a Metro style application on Windows 8, so Igor delivered again with this post: How to use C++ AMP from C# using WinRT Have fun! Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • What DX level does my graphics card support? Does it go to 11?

    - by Daniel Moth
    Recently I run into a situation that I have run into quite a few times. Someone encounters a machine and the question arises: "Is there a DirectX 11 card in this machine?". Typically the reason you are interested in that is because cards with DirectX 11 drivers fully support DirectCompute (and by extension C++ AMP) for GPGPU programming. The driver specifically is WDDM (1.1 on Windows 7 and Windows 8 introduces WDDM 1.2 with cool new capabilities). There are many ways for figuring out if you have a DirectX11 card, so here are the approaches that you can use, with a bonus right at the end of the post. Run DxDiag WindowsKey + R, type DxDiag and hit Enter. That is the DirectX diagnostic tool, which unfortunately, only tells you on the "System" tab what is the highest version of DirectX installed on your machine. So if it reports DirectX 11, that doesn't mean you have a DX11 driver! The "Display" tab has a promising "DDI version" label, but unfortunately that doesn't seem to be accurate on the machines I've tested it with (or I may be misinterpreting its use). Either way, this tool is not the one you want for this purpose, although it is good for telling you the WDDM version among other things. Use the Microsoft hardware page There is a Microsoft Windows 7 compatibility center, that lists all hardware (tip: use the advanced search) and you could try and locate your device there… good luck. Use Wikipedia or the hardware vendor's website Use the Wikipedia page for the vendor cards, for both nvidia and amd. Often this information will also be in the specifications for the cards on the IHV site, but is is nice that wikipedia has a single page per vendor that you can search etc. There is a column in the tables for API support where you can see the DirectX version. Check if it is one of these recommended DX11 cards You may not have a DirectX 11 card and are interested in purchasing one. While I am in no position to make recommendations, I will list here some cards from two big IHVs that we know are DirectX 11 capable. Some AMD (aka ATI) cards Low end, inexpensive DX11 hardware: Radeon 5450, 5550, 6450, 6570 Mid range (decent perf, single precision): Radeon 5750, 5770, 6770, 6790 High end (capable of double precision): Radeon 5850, 5870, 6950, 6970 Single precision APUs: AMD E-Series APUs AMD A-Series APUs Some NVIDIA cards Low end, inexpensive DX11 hardware: GeForce GT430, GT 440, GT520, GTS 450 Quadro 400, 600 Mid-range (decent perf, single precision): GeForce GTX 460, GTX 550 Ti, GTX 560, GTX 560 Ti Quadro 2000 High end (capable of double precision): GeForce GTX 480, GTX 570, GTX 580, GTX 590, GTX 595 Quadro 4000, 5000, 6000 Tesla C2050, C2070, C2075 Get the DirectX SDK and run DirectX Caps Viewer Download and install the June 2010 DirectX SDK. As part of that you now have the DirectX Capabilities Viewer utility (find it in your start menu by searching for "DirectX Caps Viewer", the filename is DXCapsViewer.exe). It will list all your devices (emulated, and real hardware ones) under the first node. Expand the hardware entries and then expand again the Direct3D 11 folder. If you see D3D_FEATURE_LEVEL_11_ under that, then your card supports feature level 11 which means it supports DirectCompute and C++ AMP. In the following screenshot of one of my old laptops, the card only goes to feature level 10. Run a utility from the web that just tells you! Of course, writing some C++ AMP code that enumerates accelerators and lists the ones that are capable is trivial. However that requires that you have redistributed the runtime, so a more broadly applicable approach is to use the DX APIs directly to enumerate the DX11 capable cards. That is exactly what the development lead for C++ AMP has done and he describes and shares that utility at this post. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • What DX level does my graphics card support? Does it go to 11?

    - by Daniel Moth
    Recently I run into a situation that I have run into quite a few times. Someone encounters a machine and the question arises: "Is there a DirectX 11 card in this machine?". Typically the reason you are interested in that is because cards with DirectX 11 drivers fully support DirectCompute (and by extension C++ AMP) for GPGPU programming. The driver specifically is WDDM (1.1 on Windows 7 and Windows 8 introduces WDDM 1.2 with cool new capabilities). There are many ways for figuring out if you have a DirectX11 card, so here are the approaches that you can use, with a bonus right at the end of the post. Run DxDiag WindowsKey + R, type DxDiag and hit Enter. That is the DirectX diagnostic tool, which unfortunately, only tells you on the "System" tab what is the highest version of DirectX installed on your machine. So if it reports DirectX 11, that doesn't mean you have a DX11 driver! The "Display" tab has a promising "DDI version" label, but unfortunately that doesn't seem to be accurate on the machines I've tested it with (or I may be misinterpreting its use). Either way, this tool is not the one you want for this purpose, although it is good for telling you the WDDM version among other things. Use the Microsoft hardware page There is a Microsoft Windows 7 compatibility center, that lists all hardware (tip: use the advanced search) and you could try and locate your device there… good luck. Use Wikipedia or the hardware vendor's website Use the Wikipedia page for the vendor cards, for both nvidia and amd. Often this information will also be in the specifications for the cards on the IHV site, but is is nice that wikipedia has a single page per vendor that you can search etc. There is a column in the tables for API support where you can see the DirectX version. Check if it is one of these recommended DX11 cards You may not have a DirectX 11 card and are interested in purchasing one. While I am in no position to make recommendations, I will list here some cards from two big IHVs that we know are DirectX 11 capable. Some AMD (aka ATI) cards Low end, inexpensive DX11 hardware: Radeon 5450, 5550, 6450, 6570 Mid range (decent perf, single precision): Radeon 5750, 5770, 6770, 6790 High end (capable of double precision): Radeon 5850, 5870, 6950, 6970 Single precision APUs: AMD E-Series APUs AMD A-Series APUs Some NVIDIA cards Low end, inexpensive DX11 hardware: GeForce GT430, GT 440, GT520, GTS 450 Quadro 400, 600 Mid-range (decent perf, single precision): GeForce GTX 460, GTX 550 Ti, GTX 560, GTX 560 Ti Quadro 2000 High end (capable of double precision): GeForce GTX 480, GTX 570, GTX 580, GTX 590, GTX 595 Quadro 4000, 5000, 6000 Tesla C2050, C2070, C2075 Get the DirectX SDK and run DirectX Caps Viewer Download and install the June 2010 DirectX SDK. As part of that you now have the DirectX Capabilities Viewer utility (find it in your start menu by searching for "DirectX Caps Viewer", the filename is DXCapsViewer.exe). It will list all your devices (emulated, and real hardware ones) under the first node. Expand the hardware entries and then expand again the Direct3D 11 folder. If you see D3D_FEATURE_LEVEL_11_ under that, then your card supports feature level 11 which means it supports DirectCompute and C++ AMP. In the following screenshot of one of my old laptops, the card only goes to feature level 10. Run a utility from the web that just tells you! Of course, writing some C++ AMP code that enumerates accelerators and lists the ones that are capable is trivial. However that requires that you have redistributed the runtime, so a more broadly applicable approach is to use the DX APIs directly to enumerate the DX11 capable cards. That is exactly what the development lead for C++ AMP has done and he describes and shares that utility at this post. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • What kind of CPU/GPU integration is offered by APUs?

    - by clabacchio
    I'm truly fascinated by the idea of GPGPU and using the GPU for heavy processing. I'm seeing that also APUs (Accelerated Processing Units, CPU+GPU on the same chip) are gaining a consistent popularity. Are all of the APUs using a GPGPU? Can it be used for processing? And is it seamless or it requires special code (like Cuda) to have the hard work made by the GPU? I'm not interested in bare graphic performance, but more about how much the GPU can accelerate the "normal" CPU work.

    Read the article

  • C++ AMP Video Overview

    - by Daniel Moth
    I hope to be recording some C++ AMP screencasts for channel9 soon (you'll find them through my regular screencasts link on the left), and in all of them I will assume you have watched this short interview overview of C++ AMP.   Note: I think there were some technical problems with streaming so best to download the "High Quality WMV" or switch to progressive format. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • Screencasts introducing C++ AMP

    - by Daniel Moth
    It has been almost 2.5 years since I last recorded a screencast, and I had forgotten how time consuming they are to plan/record/edit/produce/publish, but at the same time so much fun to see the end result! So below are links to 4 screencasts to teach you C++ AMP basics from scratch (even if you class yourself as a .NET developer you'll be able to follow). Setup code - part 1 array_view, extent, index - part 2 parallel_for_each - part 3 accelerator - part 4 If you have comments/questions about what is shown in each video, please leave them at each video recoding. If you have generic questions about C++ AMP, please ask in the C++ AMP MSDN forum. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • GPU Debugging with VS 11

    - by Daniel Moth
    With VS 11 Developer Preview we have invested tremendously in parallel debugging for both CPU (managed and native) and GPU debugging. I'll be doing a whole bunch of blog posts on those topics, and in this post I just wanted to get people started with GPU debugging, i.e. with debugging C++ AMP code. First I invite you to watch 6 minutes of a glimpse of the C++ AMP debugging experience though this video (ffw to minute 51:54, up until minute 59:16). Don't read the rest of this post, just go watch that video, ideally download the High Quality WMV. Summary GPU debugging essentially means debugging the lambda that you pass to the parallel_for_each call (plus any functions you call from the lambda, of course). CPU debugging means debugging all the code above and below the parallel_for_each call, i.e. all the code except the restrict(direct3d) lambda and the functions that it calls. With VS 11 you have to choose what debugger you want to use for a particular debugging session, CPU or GPU. So you can place breakpoints all over your code, then choose what debugger you want (CPU or GPU), and you'll only be able to hit breakpoints for the code type that the debugger engine understands – the remaining breakpoints will appear as unbound. If you want to hit the unbound breakpoints, you'd have to stop debugging, and start again with the other debugger. Sorry. We suck. We know. But once you are past that limitation, I think you'll find the experience truly rewarding – seriously! Switching debugger engines With the Developer Preview bits, one way to switch the debugger engine is through the project properties – see the screenshots that follow. This one is showing the CPU option selected, which is basically the default that you are all familiar with: This screenshot is showing the GPU option selected, by changing the debugger launcher (notice that this applies for both the local and remote case): You actually do not have to open the project properties just for switching the debugger engine, you can switch the selection from the toolbar in VS 11 Developer Preview too – see following screenshot (the effect is the same as if you opened the project properties and switched there) Breakpoint behavior Here are two screenshots, one showing a debugging session for CPU and the other a debugging session for GPU (notice the unbound breakpoints in each case) …and here is the GPU case (where we cannot bind the CPU breakpoints but can the GPU breakpoint, which is actually hit) Give C++ AMP debugging a try So to debug your C++ AMP code, pull down the drop down under the 'play' button to select the 'GPU C++ Direct3D Compute Debugger' menu option, then hit F5 (or the 'play' button itself). Then you can explore debugging by exploring the menus under the Debug and under the Debug->Windows menus. One way to do that exploration is through the C++ AMP debugging walkthrough on MSDN. Another way to explore the C++ AMP debugging experience, you can use the moth.cpp code file, which is what I used in my BUILD session debugger demo. Note that for my demo I was using the latest internal VS11 bits, so your experience with the Developer Preview bits won't be identical to what you saw me demonstrate, but it shouldn't be far off. Stay tuned for a lot more content on the parallel debugger in VS 11, both CPU and GPU, both managed and native. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • My MSDN magazine articles are live

    - by Daniel Moth
    Five years ago I wrote my first MSDN magazine article, and 21 months later I wrote my second MSDN Magazine article (during the VS 2010 Beta). By my calculation, that makes it two and a half years without having written anything for my favorite developer magazine! So, I came back with a vengeance, and in this month's April issue of the MSDN magazine you can find two articles from yours truly - enjoy: A Code-Based Introduction to C++ AMP Introduction to Tiling in C++ AMP For more on C++ AMP, please remember that I blog about it on our team blog, and we take questions in our MSDN forum. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • Get started with C++ AMP

    - by Daniel Moth
    With the imminent release of Visual Studio 2012, even if you do not classify yourself as a C++ developer, C++ AMP is something you should learn so you can understand how to speed up your loops by offloading to the GPU the computation performed in the loop (assuming you have large number of iterations/data). We have many C# customers who are using C++ AMP through pinvoke, and of course many more directly from C++. So regardless of your programming language, I hope you'll find helpful these short videos that help you get started with C++ AMP C++ AMP core API introduction... from scratch Tiling Introduction - C++ AMP Matrix Multiplication with C++ AMP GPU debugging in Visual Studio 2012 In particular the work we have done for parallel and GPU debugging in Visual Studio 2012 is market leading, so check it out! Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • Attend my Fusion sessions

    - by Daniel Moth
    The inaugural Fusion conference was 1 year ago in June 2011 and I was there doing a demo in the keynote, and also presenting a breakout session. If you look at the abstract and title for that session you won't see the term "C++ AMP" in there because the technology wasn't announced and we didn't want to spill the beans ahead of the keynote, where the technology was announced. It was only an announcement, we did not give any bits out, and in fact the first bits came three months later in September 2011 with the Beta following in February 2012. So it really feels great 1 year later, to be back at Fusion presenting two sessions on C++ AMP, demonstrating our progress from that announcement, to the Visual Studio 2012 Release Candidate that came out last week. If you are attending Fusion (in person or virtually later), be sure to watch my two-part session. Part 1 is PT-3601 on Tuesday 4pm and part 2 is PT-3602 on Wednesday 4pm. Here is the shared abstract for both parts: Harnessing GPU Compute with C++ AMP C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 2012. After a quick overview of the technology understanding its goals and its differentiation compared with other approaches, we will dive into the programming model and its modern C++ API. This is a code heavy, interactive, two-part session, where every part of the library will be explained. Demos will include showing off the richest parallel and GPU debugging story on the market, in the upcoming Visual Studio release. See you there! Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • BUILD apps that use C++ AMP

    - by Daniel Moth
    If you are a developer on the Microsoft platform, you are hopefully attending (live or virtually) the sessions of the BUILD conference, aka //build/ in Anaheim, CA. The conference sold out not long after it opened registration, and it achieved that without sharing *any* session details nor a meaningful agenda up until after the keynote today – impressive! I am speaking at BUILD and hope you'll catch my talk at 9am on Friday (the last day of the conference) at Marriott Elite 2 Ballroom. Session details follow. 802 - Taming GPU compute with C++ AMP Developers today inject parallelism into their compute-intensive applications in order to take advantage of multi-core CPU hardware. Beyond CPUs, however, compute accelerators such as general-purpose GPUs can provide orders of magnitude speed-ups for data parallel algorithms. How can you as a C++ developer fully utilize this heterogeneous hardware from your Visual Studio environment?  How can you benefit from this tremendous performance boost in your Visual C++ solutions without sacrificing developer productivity?  The answers will be presented in this session about C++ Accelerated Massive Parallelism. I'll be covering a lot of the material I've been recently blogging about on my blog that you are reading, which I have also indexed over on our team blog under the title: "C++ AMP in a nutshell". Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • Implementing algorithms via compute shaders vs. pipeline shaders

    - by TravisG
    With the availability of compute shaders for both DirectX and OpenGL it's now possible to implement many algorithms without going through the rasterization pipeline and instead use general purpose computing on the GPU to solve the problem. For some algorithms this seems to become the intuitive canonical solution because they're inherently not rasterization based, and rasterization-based shaders seemed to be a workaround to harness GPU power (simple example: creating a noise texture. No quad needs to be rasterized here). Given an algorithm that can be implemented both ways, are there general (potential) performance benefits over using compute shaders vs. going the normal route? Are there drawbacks that we should watch out for (for example, is there some kind of unusual overhead to switching from/to compute shaders at runtime)? Are there perhaps other benefits or drawbacks to consider when choosing between the two?

    Read the article

  • Screencasts introducing C++ AMP

    - by Daniel Moth
    It has been almost 2.5 years since I last recorded a screencast, and I had forgotten how time consuming they are to plan/record/edit/produce/publish, but at the same time so much fun to see the end result! So below are links to 4 screencasts to teach you C++ AMP basics from scratch (even if you class yourself as a .NET developer you'll be able to follow). Setup code - part 1 array_view, extent, index - part 2 parallel_for_each - part 3 accelerator - part 4 If you have comments/questions about what is shown in each video, please leave them at each video recoding. If you have generic questions about C++ AMP, please ask in the C++ AMP MSDN forum. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • concurrency::accelerator_view

    - by Daniel Moth
    Overview We saw previously that accelerator represents a target for our C++ AMP computation or memory allocation and that there is a notion of a default accelerator. We ended that post by introducing how one can obtain accelerator_view objects from an accelerator object through the accelerator class's default_view property and the create_view method. The accelerator_view objects can be thought of as handles to an accelerator. You can also construct an accelerator_view given another accelerator_view (through the copy constructor or the assignment operator overload). Speaking of operator overloading, you can also compare (for equality and inequality) two accelerator_view objects between them to determine if they refer to the same underlying accelerator. We'll see later that when we use concurrency::array objects, the allocation of data takes place on an accelerator at array construction time, so there is a constructor overload that accepts an accelerator_view object. We'll also see later that a new concurrency::parallel_for_each function overload can take an accelerator_view object, so it knows on what target to execute the computation (represented by a lambda that the parallel_for_each also accepts). Beyond normal usage, accelerator_view is a quality of service concept that offers isolation to multiple "consumers" of an accelerator. If in your code you are accessing the accelerator from multiple threads (or, in general, from different parts of your app), then you'll want to create separate accelerator_view objects for each thread. flush, wait, and queuing_mode When you create an accelerator_view via the create_view method of the accelerator, you pass in an option of immediate or deferred, which are the two members of the queuing_mode enum. At any point you can access this value from the queuing_mode property of the accelerator_view. When the queuing_mode value is immediate (which is the default), any commands sent to the device such as kernel invocations and data transfers (e.g. parallel_for_each and copy, as we'll see in future posts), will get submitted as soon as the runtime sees fit (that is the definition of immediate). When the value of queuing_mode is deferred, the commands will be batched up. To send all buffered commands to the device for execution, there is a non-blocking flush method that you can call. If you wish to block until all the commands have been sent, there is a wait method you can call. Deferring is a more advanced scenario aimed at performance gains when you are submitting many device commands and you want to avoid the tiny overhead of flushing/submitting each command separately. Querying information Just like accelerator, accelerator_view exposes the is_debug and version properties. In fact, you can always access the accelerator object from the accelerator property on the accelerator_view class to access the accelerator interface we looked at previously. Interop with D3D (aka DX) In a later post I'll show an example of an app that uses C++ AMP to compute data that is used in pixel shaders. In those scenarios, you can benefit by integrating C++ AMP into your graphics pipeline and one of the building blocks for that is being able to use the same device context from both the compute kernel and the other shaders. You can do that by going from accelerator_view to device context (and vice versa), through part of our interop API in amp.h: *get_device, create_accelerator_view. More on those in a later post. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • concurrency::accelerator

    - by Daniel Moth
    Overview An accelerator represents a "target" on which C++ AMP code can execute and where data can reside. Typically (but not necessarily) an accelerator is a GPU device. Accelerators are represented in C++ AMP as objects of the accelerator class. For many scenarios, you do not need to obtain an accelerator object, since the runtime has a notion of a default accelerator, which is what it thinks is the best one in the system. Examples where you need to deal with accelerator objects are if you need to pick your own accelerator (based on your specific criteria), or if you need to use more than one accelerators from your app. Construction and operator usage You can query and obtain a std::vector of all the accelerators on your system, which the runtime discovers on startup. Beyond enumerating accelerators, you can also create one directly by passing to the constructor a system-wide unique path to a device if you know it (i.e. the “Device Instance Path” property for the device in Device Manager), e.g. accelerator acc(L"PCI\\VEN_1002&DEV_6898&SUBSYS_0B001002etc"); There are some predefined strings (for predefined accelerators) that you can pass to the accelerator constructor (and there are corresponding constants for those on the accelerator class itself, so you don’t have to hardcode them every time). Examples are the following: accelerator::default_accelerator represents the default accelerator that the C++ AMP runtime picks for you if you don’t pick one (the heuristics of how it picks one will be covered in a future post). Example: accelerator acc; accelerator::direct3d_ref represents the reference rasterizer emulator that simulates a direct3d device on the CPU (in a very slow manner). This emulator is available on systems with Visual Studio installed and is useful for debugging. More on debugging in general in future posts. Example: accelerator acc(accelerator::direct3d_ref); accelerator::direct3d_warp represents a target that I will cover in future blog posts. Example: accelerator acc(accelerator::direct3d_warp); accelerator::cpu_accelerator represents the CPU. In this first release the only use of this accelerator is for using the staging arrays technique that I'll cover separately. Example: accelerator acc(accelerator::cpu_accelerator); You can also create an accelerator by shallow copying another accelerator instance (via the corresponding constructor) or simply assigning it to another accelerator instance (via the operator overloading of =). Speaking of operator overloading, you can also compare (for equality and inequality) two accelerator objects between them to determine if they refer to the same underlying device. Querying accelerator characteristics Given an accelerator object, you can access its description, version, device path, size of dedicated memory in KB, whether it is some kind of emulator, whether it has a display attached, whether it supports double precision, and whether it was created with the debugging layer enabled for extensive error reporting. Below is example code that accesses some of the properties; in your real code you'd probably be checking one or more of them in order to pick an accelerator (or check that the default one is good enough for your specific workload): void inspect_accelerator(concurrency::accelerator acc) { std::wcout << "New accelerator: " << acc.description << std::endl; std::wcout << "is_debug = " << acc.is_debug << std::endl; std::wcout << "is_emulated = " << acc.is_emulated << std::endl; std::wcout << "dedicated_memory = " << acc.dedicated_memory << std::endl; std::wcout << "device_path = " << acc.device_path << std::endl; std::wcout << "has_display = " << acc.has_display << std::endl; std::wcout << "version = " << (acc.version >> 16) << '.' << (acc.version & 0xFFFF) << std::endl; } accelerator_view In my next blog post I'll cover a related class: accelerator_view. Suffice to say here that each accelerator may have from 1..n related accelerator_view objects. You can get the accelerator_view from an accelerator via the default_view property, or create new ones by invoking the create_view method that creates an accelerator_view object for you (by also accepting a queuing_mode enum value of deferred or immediate that we'll also explore in the next blog post). Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

1 2 3 4  | Next Page >