Search Results

Search found 77 results on 4 pages for 'gpgpu'.

Page 3/4 | < Previous Page | 1 2 3 4 | Next Page >

Why aren't we programming on the GPU???

- by Chris

So I finally took the time to learn CUDA and get it installed and configured on my computer and I have to say, I'm quite impressed! Here's how it does rendering the Mandelbrot set at 1280 x 678 pixels on my home PC with a Q6600 and a GeForce 8800GTS (max of 1000 iterations): Maxing out all 4 CPU cores with OpenMP: 2.23 fps Running the same algorithm on my GPU: 104.7 fps And here's how fast I got it to render the whole set at 8192 x 8192 with a max of 1000 iterations: Serial implemetation on my home PC: 81.2 seconds All 4 CPU cores on my home PC (OpenMP): 24.5 seconds 32 processors on my school's super computer (MPI with master-worker): 1.92 seconds My home GPU (CUDA): 0.310 seconds 4 GPUs on my school's super computer (CUDA with static domain decomposition): 0.0547 seconds So here's my question - if we can get such huge speedups by programming the GPU instead of the CPU, why is nobody doing it??? I can think of so many things we could speed up like this, and yet I don't know of many commercial apps that are actually doing it. Also, what kinds of other speedups have you seen by offloading your computations to the GPU?

Read the article
How to properly use glDiscardFramebufferEXT

- by Rafael Spring

This question relates to the OpenGL ES 2.0 Extension EXT_discard_framebuffer. It is unclear to me which cases justify the use of this extension. If I call glDiscardFramebufferEXT() and it puts the specified attachable images in an undefined state this means that either: - I don't care about the content anymore since it has been used with glReadPixels() already, - I don't care about the content anymore since it has been used with glCopyTexSubImage() already, - I shouldn't have made the render in the first place. Clearly, only the 1st two cases make sense or are there other cases in which glDiscardFramebufferEXT() is useful? If yes, which are these cases?

Read the article
GLSL shader render to texture not saving alpha value

- by quadelirus

I am rendering to a texture using a GLSL shader and then sending that texture as input to a second shader. For the first texture I am using RGB channels to send color data to the second GLSL shader, but I want to use the alpha channel to send a floating point number that the second shader will use as part of its program. The problem is that when I read the texture in the second shader the alpha value is always 1.0. I tested this in the following way: at the end of the first shader I did this: gl_FragColor(r, g, b, 0.1); and then in the second texture I read the value of the first texture using something along the lines of vec4 f = texture2D(previous_tex, pos); if (f.a != 1.0) { gl_FragColor = vec4(0.0, 0.0, 0.0, 1.0); return; } No pixels in my output are black, whereas if I change the above code to read gl_FragColor(r, g, 0.1, 1.0); //Notice I'm now sending 0.1 for blue and in the second shader vec4 f = texture2D(previous_tex, pos); if (f.b != 1.0) { gl_FragColor = vec4(0.0, 0.0, 0.0, 1.0); return; } All the appropriate pixels are black. This means that for some reason when I set the alpha value to something other than 1.0 in the first shader and render to a texture, it is still seen as being 1.0 by the second shader. Before I render to texture I glDisable(GL_BLEND); It seems pretty clear to me that the problem has to do with OpenGL handling alpha values in some way that isn't obvious to me since I can use the blue channel in the way I want, and figured someone out there will instantly see the problem.

Read the article
GPU YUV to RGB. Worth the effort?

- by Jaime Pardos

Hello, I have to convert several full PAL videos (720x576@25) from YUV 4:2:2 to RGB, in real time, and probably a custom resize for each. I have thought of using the GPU, as I have seen some example that does just this (except that it's 4:4:4 so the bpp is the same in source and destiny)-- http://www.fourcc.org/source/YUV420P-OpenGL-GLSLang.c However, I don't have any experience with using GPU's and I'm not sure of what can be done. The example, as I understand it, just converts the video frame to YUV and displays it in the screen. Is it possible to get the processed frame instead? Would it be worth the effort to send it to the GPU, get it transformed, and sending it again to main memory, or would it kill performance? Being a bit platform-specific, assuming I work on windows, is it possible to get an OpenGL or DirectDraw surface from a window so the GPU can draw directly to it?

Read the article
OpenCL and CUDA

- by scatman

Should i learn OpenCL if i only want to program NVIDIA GPUs ?

Read the article
Do the parallel-for in .net 4.0 takes privilege of GPU computing automatically?

- by Gulshan

Do the parallel-for in .net 4.0 takes privilege of GPU computing automatically? Or I have to configure with some drivers so that it uses GPU.

Read the article
Double precision floating point in CUDA

- by cuda-dev

Does CUDA support double precision floating point numbers ? Also need reasons for the same.

Read the article
How to optimize Conway's game of life for CUDA?

- by nlight

I've written this CUDA kernel for Conway's game of life: global void gameOfLife(float* returnBuffer, int width, int height) { unsigned int x = blockIdx.x*blockDim.x + threadIdx.x; unsigned int y = blockIdx.y*blockDim.y + threadIdx.y; float p = tex2D(inputTex, x, y); float neighbors = 0; neighbors += tex2D(inputTex, x+1, y); neighbors += tex2D(inputTex, x-1, y); neighbors += tex2D(inputTex, x, y+1); neighbors += tex2D(inputTex, x, y-1); neighbors += tex2D(inputTex, x+1, y+1); neighbors += tex2D(inputTex, x-1, y-1); neighbors += tex2D(inputTex, x-1, y+1); neighbors += tex2D(inputTex, x+1, y-1); __syncthreads(); float final = 0; if(neighbors < 2) final = 0; else if(neighbors 3) final = 0; else if(p != 0) final = 1; else if(neighbors == 3) final = 1; __syncthreads(); returnBuffer[x + y*width] = final; } I am looking for errors/optimizations. Parallel programming is quite new to me and I am not sure if I get how to do it right. The rest of the app is: Memcpy input array to a 2d texture inputTex stored in a CUDA array. Output is memcpy-ed from global memory to host and then dealt with. As you can see a thread deals with a single pixel. I am unsure if that is the fastest way as some sources suggest doing a row or more per thread. If I understand correctly NVidia themselves say that the more threads, the better. I would love advice on this on someone with practical experience.

Read the article
NVIDIA GPUs and PhysX engine

- by Lucas

How is the NVIDIA PhysX engine implemented in the NVIDIA GPUs: It's a co-processor or the physical algorithms are implemented as fragment programs to be executed in the GPU pipeline ?

Read the article
What's the most trivial function that would benfit from being computed on a GPU?

- by hanDerPeder

Hi. I'm just starting out learning OpenCL. I'm trying to get a feel for what performance gains to expect when moving functions/algorithms to the GPU. The most basic kernel given in most tutorials is a kernel that takes two arrays of numbers and sums the value at the corresponding indexes and adds them to a third array, like so: __kernel void add(__global float *a, __global float *b, __global float *answer) { int gid = get_global_id(0); answer[gid] = a[gid] + b[gid]; } __kernel void sub(__global float* n, __global float* answer) { int gid = get_global_id(0); answer[gid] = n[gid] - 2; } __kernel void ranksort(__global const float *a, __global float *answer) { int gid = get_global_id(0); int gSize = get_global_size(0); int x = 0; for(int i = 0; i < gSize; i++){ if(a[gid] > a[i]) x++; } answer[x] = a[gid]; } I am assuming that you could never justify computing this on the GPU, the memory transfer would out weight the time it would take computing this on the CPU by magnitudes (I might be wrong about this, hence this question). What I am wondering is what would be the most trivial example where you would expect significant speedup when using a OpenCL kernel instead of the CPU?

Read the article
Easiest way to sign/certify text file in C++?

- by user355008

I want to verify if the text log files created by my program being run at my customer's site have been tampered with. How do you suggest I go about doing this? I searched a bunch here and google but couldn't find my answer. Thanks!

Read the article
The best way to predict performance without actually porting the code?

- by ardiyu07

I believe there are people with the same experience with me, where he/she must give a (estimated) performance report of porting a program from sequential to parallel with some designated multicore hardwares, with a very few amount of time given. For instance, if a 10K LoC sequential program was given and executes on Intel i7-3770k (not vectorized) in 100 ms, how long would it take to run if one parallelizes the code to a Tesla C2075 with NVIDIA CUDA, given that all kinds of parallelizing optimization techniques were done? (but you're only given 2-4 days to report the performance? assume that you didn't know the algorithm at all. Or perhaps it'd be safer if we just assume that it's an impossible situation to finish the job) Therefore, I'm wondering, what most likely be the fastest way to give such performance report? Is it safe to calculate solely by the hardware's capability, such as GFLOPs peak and memory bandwidth rate? Is there a mathematical way to calculate it? If there is, please prove your method with the corresponding problem description and the algorithm, and also the target hardwares' specifications. Or perhaps there already exists such tool to (roughly) estimate code porting? (Please don't the answer: 'kill yourself is the fastest way.')

Read the article
CUDA threads for inner loop

- by Manolete

I've got this kernel __global__ void kernel1(int keep, int include, int width, int* d_Xco, int* d_Xnum, bool* d_Xvalid, float* d_Xblas) { int i, k; i = threadIdx.x + blockIdx.x * blockDim.x; if(i < keep){ for(k = 0; k < include ; k++){ int val = (d_Xblas[i*include + k] >= 1e5); int aux = d_Xnum[i]; d_Xblas[i*include + k] *= (!val); d_Xco[i*width + aux] = k; d_Xnum[i] +=val; d_Xvalid[i*include + k] = (!val); } } } launched with int keep = 9000; int include = 23000; int width = 0.2*include; int threads = 192; int blocks = keep+threads-1/threads; kernel1 <<< blocks,threads >>>( keep, include, width, d_Xco, d_Xnum, d_Xvalid, d_Xblas ); This kernel1 works fine but it is obviously not totally optimized. I thought it would be straight forward to eliminate the inner loop k but for some reason it doesn't work fine. My first idea was: __global__ void kernel2(int keep, int include, int width, int* d_Xco, int* d_Xnum, bool* d_Xvalid, float* d_Xblas) { int i, k; i = threadIdx.x + blockIdx.x * blockDim.x; k = threadIdx.y + blockIdx.y * blockDim.y; if((i < keep) && (k < include) ) { int val = (d_Xblas[i*include + k] >= 1e5); int aux = d_Xnum[i]; d_Xblas[i*include + k] *= (float)(!val); d_Xco[i*width + aux] = k; atomicAdd(&d_Xnum[i], val); d_Xvalid[i*include + k] = (!val); } } launched with a 2D grid: int keep = 9000; int include = 23000; int width = 0.2*include; int th = 32; dim3 threads(th,th); dim3 blocks (keep+threads.x-1/threads.x, include+threads.y-1/threads.y); kernel2 <<< blocks,threads >>>( keep, include, width, d_Xco, d_Xnum, d_Xvalid, d_Xblas ); Although I believe the idea is fine, it does not work and I am running out of ideas here. Could you please help me out here? I also think the problem could be in d_Xco which stores the position k in a smaller array , so the order matters, but I can't think of any other way of doing it...

Read the article
Best of "The Moth" 2010

- by Daniel Moth

It is the time again (like in 2004, 2005, 2006, 2007, 2008, 2009) to look back at my blog for the past year and identify areas of interest that seem to be more prominent than others. After doing so, representative posts follow in my top 5 list (in random order). 1. This was the year where I had to move for the first time since 2004 my blog engine (blogger.com –> dasBlog), host provider (zen –> godaddy), web server technology and OS (apache on Linux –> IIS on Windows Server). My goal was not to break any permalinks or the look and feel of this website. A series of posts covered how I achieved that goal, culminating in a tool for others to use if they wanted to do the same: Tool to convert blogger.com content to dasBlog. Going forward I aim to be sharing more small code utilities like that one… 2. At work I am known for being fairly responsive on email, and more importantly never dropping email balls on the floor. This is due to my email processing system, which I shared here: Processing Email in Outlook. I will be sharing more tips with regards to making the best of the Office products. 3. There is no doubt in my mind that this is the year people will remember as the one where Microsoft finally fights back in the mobile space. Even though the new platform means my Windows Mobile book sales will dwindle :-), I am ecstatic about Windows Phone 7 both as a consumer and as a developer. On the release day, to get you started I shared the top 10 Windows Phone 7 developer resources. I will be sharing my tips from my experience in writing code for and consuming this new platform… 4. For my HPC developer friends using Visual Studio, I shared Slides and code for MPI Cluster Debugger and also gave you all the links you need for getting started with Dryad and DryadLINQ from MSR. Expect more from me on cluster development in the coming year… 5. Still in the HPC space, but actually also in the game and even mainstream development, the big disruption and opportunity comes in the form of GPGPU and, on the Microsoft platform, (currently) DirectCompute. Expect more from me on gpgpu development in the coming year… Subscribe via the link on the left to stay tuned for 2011… I wish you a very Happy New Year (with whatever definition of happiness works for you)! Comments about this post welcome at the original blog.

Read the article
Grpahic hardwares

- by Vanangamudi

Which vendor provides better GPGPU. my requirements are confined to rendering utilising the GPU for BSDF building for e.g. Intel started providing Ivy Bridge chipset GPU, which are comparably fast to HD5960 cards. I'm not that against nvidia or amd. but I'm a fan of Intel. how it compares to nvidia in price and performance. if possible may I know, how all of them perform with OpenCL?? I'm not sure if it is right to ask it here. but I don't know where to ask.

Read the article
How are you keeping abreast with latest HPC trends?

- by Fanatic23

For those of you primarily working on high performance computing for non-scientific purposes (as in no fortran) what are the ways by which you keep yourself abreast of the latest trends out there? Industry gossip? Random blogs from google? Search SO? I am into HPC recently and looking for the high level architectural landscape. Given that everyone from GPGPU coders to FPGA folks to Erlang developers claim they are into HPC, how are you keeping up with so much information?

Read the article
Are you at Super Computing 10?

- by Daniel Moth

Like last year, I was going to attend SC this year, but other events are unfortunately keeping me here in Seattle next week. If you are going to be in New Orleans, have fun and be sure not to miss out on the following two opportunities. MPI Debugging UX Study Throughout the week, my team is conducting 90-minute studies on debugging MPI applications within Visual Studio. In exchange for your feedback (under NDA) you will receive a Microsoft Gratuity (and the knowledge that you are impacting the development of Visual Studio). If you are interested, sign up at the Microsoft Information Desk in the Exhibitor Hall during exhibit hours. Outside of exhibit hours, send email to [email protected]. If you took part in the GPGPU study, this is very similar except it is for MPI. Microsoft High Performance Computing Summit On Monday 15th, the Microsoft annual user group meeting takes place. Shuttle transportation and lunch is provided. For full details of this event and to register, please visit the official event page. Comments about this post welcome at the original blog.

Read the article
C++ Numerical Recipes – A New Adventure!

- by JoshReuben

I am about to embark on a great journey – over the next 6 weeks I plan to read through C++ Numerical Recipes 3rd edition http://amzn.to/YtdpkS I'll be reading this with an eye to C++ AMP, thinking about implementing the suitable subset (non-recursive, additive, commutative) to run on the GPU. APIs supporting HPC, GPGPU or MapReduce are all useful – providing you have the ability to choose the correct algorithm to leverage on them. I really think this is the most fascinating area of programming – a lot more exciting than LOB CRUD !!! When you think about it , everything is a function – we categorize & we extrapolate. As abstractions get higher & less leaky, sooner or later information systems programming will become a non-programmer task – you will be using WYSIWYG designers to build: GUIs MVVM service mapping & virtualization workflows ORM Entity relations In the data source SharePoint / LightSwitch are not there yet, but every iteration gets closer. For information workers, managed code is a race to the bottom. As MS futures are a bit shaky right now, the provider agnostic nature & higher barriers of entry of both C++ & Numerical Analysis seem like a rational choice to me. Its also fascinating – stepping outside the box. This is not the first time I've delved into numerical analysis. 6 months ago I read Numerical methods with Applications, which can be found for free online: http://nm.mathforcollege.com/ 2 years ago I learned the .NET Extreme Optimization library www.extremeoptimization.com – not bad 2.5 years ago I read Schaums Numerical Analysis book http://amzn.to/V5yuLI - not an easy read, as topics jump back & forth across chapters: 3 years ago I read Practical Numerical Methods with C# http://amzn.to/V5yCL9 (which is a toy learning language for this kind of stuff) I also read through AI a Modern Approach 3rd edition END to END http://amzn.to/V5yQSp - this took me a few years but was the most rewarding experience. I'll post progress updates – see you on the other side !

Read the article
Join our team at Microsoft

- by Daniel Moth

If you are looking for a SDE or SDET job at Microsoft, keep on reading. Back in January I posted a Dev Lead opening on our team, which was quickly filled internally (by Maria Blees). Our team is part of the recently announced Microsoft Technical Computing group. Specifically, we are working on new debugger functionality, integrated with Visual Studio (we are starting work on the next version), aimed to address HPC and GPGPU scenarios (and continuing the Parallel Debugging scenarios we started addressing with VS2010). We now have many more openings on our debugger team. We posted three of those on the careers website: Software Development Engineer Software Development Engineer II Software Development Engineer in Test II (don't let the word "Test" fool you: An SDET on our team is no different than a developer in any way, including the skills required) Please do read the contents of the links above. Specifically, note that for both positions you need to be as proficient in writing C++ code as you are with managed code (WPF experience is a plus). If you think you have what it takes, you wish to join a quality and schedule driven project, and want to contribute features to a product that has global impact, then send me your resume and I'll pass it on to the hiring managers. Comments about this post welcome at the original blog.

Read the article
SRV from UAV on the same texture in directx

- by notabene

I'm programming gpgpu raymarching (volumetric raytracing) in directx11. I succesfully perform compute shader and save raymarched volume data to texture. Then i want to use same texture as SRV in normal graphic pipeline. But it doesnt work, texture is not visible. Texture is ok, when i save it file it is what i expect. Texture rendering is ok too, when i render another SRV, it is ok. So problem is only in UAV-SRV. I also triple checked if pointers are ok. Please help, i'm getting mad about this. Here is some code: //before dispatch D3D11_TEXTURE2D_DESC textureDesc; ZeroMemory( &textureDesc, sizeof( textureDesc ) ); textureDesc.Width = xr; textureDesc.Height = yr; textureDesc.MipLevels = 1; textureDesc.ArraySize = 1; textureDesc.SampleDesc.Count = 1; textureDesc.SampleDesc.Quality = 0; textureDesc.Usage = D3D11_USAGE_DEFAULT; textureDesc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE ; textureDesc.Format = DXGI_FORMAT_R32G32B32A32_FLOAT; D3D->CreateTexture2D( &textureDesc, NULL, &pTexture ); D3D11_UNORDERED_ACCESS_VIEW_DESC viewDescUAV; ZeroMemory( &viewDescUAV, sizeof( viewDescUAV ) ); viewDescUAV.Format = DXGI_FORMAT_R32G32B32A32_FLOAT; viewDescUAV.ViewDimension = D3D11_UAV_DIMENSION_TEXTURE2D; viewDescUAV.Texture2D.MipSlice = 0; D3DD->CreateUnorderedAccessView( pTexture, &viewDescUAV, &pTextureUAV ); //the getSRV function after dispatch. D3D11_SHADER_RESOURCE_VIEW_DESC srvDesc ; ZeroMemory( &srvDesc, sizeof( srvDesc ) ); srvDesc.Format = DXGI_FORMAT_R32G32B32A32_FLOAT; srvDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2D; srvDesc.Texture2D.MipLevels = 1; D3DD->CreateShaderResourceView( pTexture, &srvDesc, &pTextureSRV);

Read the article
Are we in a functional programming fad?

- by TraumaPony

I use both functional and imperative languages daily, and it's rather amusing to see the surge of adoption of functional languages from both sides of the fence. It strikes me, however, that it looks rather like a fad. Do you think that it's a fad? I know the reasons for using functional languages at times and imperative languages in others, but do you really think that this trend will continue due to the cliched "many-core" revolution that has been only "18 months from now" since 2004 (sort of like communism's Radiant Future), or do you think that it's only temporary; a fascination of the mainstream developer that will be quickly replaced by the next shiny idea, like Web 3.0 or GPGPU? Note, that I'm not trying to start a flamewar or anything (sorry if it sounds bitter), I'm just curious as to whether people will think functional or functional/imperative languages will become mainstream. Edit: By mainstream, I mean, equal number of programmers to say, Python, Java, C#, etc

Read the article
Fast, Vectorizable method of taking floating point number modulus of special primes?

- by caffiend

Is there a fast method for taking the modulus of a floating point number? With integers, there are tricks for Mersenne primes, so that its possible to calculate y = x MOD 2^31 without needing division. Can any similar tricks be applied for floating point numbers? Preferably, in a way that can be converted into vector/SIMD operations, or moved into GPGPU code. The primes I'm interested in would be 2^7 and 2^31, although if there are more efficient ones for floating point numbers, those would be welcome.

Read the article
Computer science undergraduate project ideas

- by Mehrdad Afshari

Hopefully, I'm going to finish my undergraduate studies next semester and I'm thinking about the topic of my final project. And yes, I've read the questions with duplicate title. I'm asking this from a bit different viewpoint, so it's not an exact dupe. I've spent at least half of my life coding stuff in different languages and frameworks so I'm not looking at this project as a way to learn much about coding and preparing for real world apps or such. I've done lots of those already. But since I have to do it to complete my degree, I felt I should spend my time doing something useful instead of throwing the whole thing out. I'm planning to make it an open source project or a hosted Web app (depending on the type) if I can make a high quality thing out of it, so I decided to ask StackOverflow what could make a useful project. Situation I've plenty of freedom about the topic. They also require 30-40 pages of text describing the project. I have the following points in mind (the more satisfied, the better): Something useful for software development Something that benefits the community Having academic value is great Shouldn't take more than a month of development (I know I'm lazy). Shouldn't be related to advanced theoretical stuff (soft computing, fuzzy logic, neural networks, ...). I've been a business-oriented software developer. It should be software oriented. While I love hacking microcontrollers and other fun embedded electronic things, I'm not really good at soldering and things like that. I'm leaning toward a Web application (think StackOverflow, PasteBin, NerdDinner, things like those). Technology It's probably going to be done in .NET (C#, F#) and Windows platform. If I really like the project (cool low level hacking), I might actually slip to C/C++. But really, C# is what I'm efficient at. Ideas Programming language, parsing and compiler related stuff: Designing a domain specific programming language and compiler Templating language compiled to C# or IL Database tools and related code generation stuff Web related technologies: ASP.NET MVC View engine doing something cool (don't know what exactly...) Specific-purpose, small, fast ASP.NET-based Web framework Applications: Visual Studio plugin to integrate with Bazaar (it's too much work, I think). ASP.NET based, jQuery-powered issue tracker (and possibly, project lifecycle management as a whole - poor man's TFS) Others: Something related to GPGPU Looking forward for great ideas! Unfortunately, I can't help on a currently existing project. I need to start my own to prevent further problems (as it's an undergrad project, nevertheless).

Read the article
RiverTrail - JavaScript GPPGU Data Parallelism

- by JoshReuben

Where is WebCL ? The Khronos WebCL working group is working on a JavaScript binding to the OpenCL standard so that HTML 5 compliant browsers can host GPGPU web apps – e.g. for image processing or physics for WebGL games - http://www.khronos.org/webcl/ . While Nokia & Samsung have some protype WebCL APIs, Intel has one-upped them with a higher level of abstraction: RiverTrail. Intro to RiverTrail Intel Labs JavaScript RiverTrail provides GPU accelerated SIMD data-parallelism in web applications via a familiar JavaScript programming paradigm. It extends JavaScript with simple deterministic data-parallel constructs that are translated at runtime into a low-level hardware abstraction layer. With its high-level JS API, programmers do not have to learn a new language or explicitly manage threads, orchestrate shared data synchronization or scheduling. It has been proposed as a draft specification to ECMA a (known as ECMA strawman). RiverTrail runs in all popular browsers (except I.E. of course). To get started, download a prebuilt version https://github.com/downloads/RiverTrail/RiverTrail/rivertrail-0.17.xpi , install Intel's OpenCL SDK http://www.intel.com/go/opencl and try out the interactive River Trail shell http://rivertrail.github.com/interactive For a video overview, see http://www.youtube.com/watch?v=jueg6zB5XaM . ParallelArray the ParallelArray type is the central component of this API & is a JS object that contains ordered collections of scalars – i.e. multidimensional uniform arrays. A shape property describes the dimensionality and size– e.g. a 2D RGBA image will have shape [height, width, 4]. ParallelArrays are immutable & fluent – they are manipulated by invoking methods on them which produce new ParallelArray objects. ParallelArray supports several constructors over arrays, functions & even the canvas. // Create an empty Parallel Array var pa = new ParallelArray(); // pa0 = <> // Create a ParallelArray out of a nested JS array. // Note that the inner arrays are also ParallelArrays var pa = new ParallelArray([ [0,1], [2,3], [4,5] ]); // pa1 = <<0,1>, <2,3>, <4.5>> // Create a two-dimensional ParallelArray with shape [3, 2] using the comprehension constructor var pa = new ParallelArray([3, 2], function(iv){return iv[0] * iv[1];}); // pa7 = <<0,0>, <0,1>, <0,2>> // Create a ParallelArray from canvas. This creates a PA with shape [w, h, 4], var pa = new ParallelArray(canvas); // pa8 = CanvasPixelArray ParallelArray exposes fluent API functions that take an elemental JS function for data manipulation: map, combine, scan, filter, and scatter that return a new ParallelArray. Other functions are scalar - reduce returns a scalar value & get returns the value located at a given index. The onus is on the developer to ensure that the elemental function does not defeat data parallelization optimization (avoid global var manipulation, recursion). For reduce & scan, order is not guaranteed - the onus is on the dev to provide an elemental function that is commutative and associative so that scan will be deterministic – E.g. Sum is associative, but Avg is not. map Applies a provided elemental function to each element of the source array and stores the result in the corresponding position in the result array. The map method is shape preserving & index free - can not inspect neighboring values. // Adding one to each element. var source = new ParallelArray([1,2,3,4,5]); var plusOne = source.map(function inc(v) { return v+1; }); //<2,3,4,5,6> combine Combine is similar to map, except an index is provided. This allows elemental functions to access elements from the source array relative to the one at the current index position. While the map method operates on the outermost dimension only, combine, can choose how deep to traverse - it provides a depth argument to specify the number of dimensions it iterates over. The elemental function of combine accesses the source array & the current index within it - element is computed by calling the get method of the source ParallelArray object with index i as argument. It requires more code but is more expressive. var source = new ParallelArray([1,2,3,4,5]); var plusOne = source.combine(function inc(i) { return this.get(i)+1; }); reduce reduces the elements from an array to a single scalar result – e.g. Sum. // Calculate the sum of the elements var source = new ParallelArray([1,2,3,4,5]); var sum = source.reduce(function plus(a,b) { return a+b; }); scan Like reduce, but stores the intermediate results – return a ParallelArray whose ith elements is the results of using the elemental function to reduce the elements between 0 and I in the original ParallelArray. // do a partial sum var source = new ParallelArray([1,2,3,4,5]); var psum = source.scan(function plus(a,b) { return a+b; }); //<1, 3, 6, 10, 15> scatter a reordering function - specify for a certain source index where it should be stored in the result array. An optional conflict function can prevent an exception if two source values are assigned the same position of the result: var source = new ParallelArray([1,2,3,4,5]); var reorder = source.scatter([4,0,3,1,2]); // <2, 4, 5, 3, 1> // if there is a conflict use the max. use 33 as a default value. var reorder = source.scatter([4,0,3,4,2], 33, function max(a, b) {return a>b?a:b; }); //<2, 33, 5, 3, 4> filter // filter out values that are not even var source = new ParallelArray([1,2,3,4,5]); var even = source.filter(function even(iv) { return (this.get(iv) % 2) == 0; }); // <2,4> Flatten used to collapse the outer dimensions of an array into a single dimension. pa = new ParallelArray([ [1,2], [3,4] ]); // <<1,2>,<3,4>> pa.flatten(); // <1,2,3,4> Partition used to restore the original shape of the array. var pa = new ParallelArray([1,2,3,4]); // <1,2,3,4> pa.partition(2); // <<1,2>,<3,4>> Get return value found at the indices or undefined if no such value exists. var pa = new ParallelArray([0,1,2,3,4], [10,11,12,13,14], [20,21,22,23,24]) pa.get([1,1]); // 11 pa.get([1]); // <10,11,12,13,14>

Read the article
CodePlex Daily Summary for Wednesday, September 19, 2012

CodePlex Daily Summary for Wednesday, September 19, 2012Popular ReleasesWinRT XAML Toolkit: WinRT XAML Toolkit - 1.2.3: WinRT XAML Toolkit based on the Windows 8 RTM SDK. Download the latest source from the SOURCE CODE page. For compiled version use NuGet. You can add it to your project in Visual Studio by going to View/Other Windows/Package Manager Console and entering: PM> Install-Package winrtxamltoolkit Features AsyncUI extensions Controls and control extensions Converters Debugging helpers Imaging IO helpers VisualTree helpers Samples Recent changes NOTE: Namespace changes DebugConsol...Python Tools for Visual Studio: 1.5 RC: PTVS 1.5RC Available! We’re pleased to announce the release of Python Tools for Visual Studio 1.5 RC. Python Tools for Visual Studio (PTVS) is an open-source plug-in for Visual Studio which supports programming with the Python language. PTVS supports a broad range of features including CPython/IronPython, Edit/Intellisense/Debug/Profile, Cloud, HPC, IPython, etc. support. The primary new feature for the 1.5 release is Django including Azure support! The http://www.djangoproject.com is a pop...Launchbar: Lanchbar 4.0.0: First public release.AssaultCube Reloaded: 2.5.4 -: Linux has Ubuntu 11.10 32-bit precompiled binaries and Ubuntu 10.10 64-bit precompiled binaries, but you can compile your own as it also contains the source. If you are using Mac or other operating systems, please wait while we try to package for those OSes. Try to compile it. If it fails, download a virtual machine. The server pack is ready for both Windows and Linux, but you might need to compile your own for Linux (source included) Changelog: New logo Improved airstrike! Reset nukes...JayData - The cross-platform HTML5 data-management library for JavaScript: JayData 1.2: JayData is a unified data access library for JavaScript to CRUD + Query data from different sources like OData, MongoDB, WebSQL, SqLite, Facebook or YQL. The library can be integrated with Knockout.js or Sencha Touch 2 and can be used on Node.js as well. See it in action in this 6 minutes video Sencha Touch 2 example app using JayData: Netflix browser. What's new in JayData 1.2 For detailed release notes check the release notes. JayData core: all async operations now support promises JayDa...fastJSON: v2.0.5: 2.0.5 - fixed number parsing for invariant format - added a test for German locale number testing (,. problems)????????API for .Net SDK: SDK for .Net ??? Release 4: 2012?9?17??? ?????，???????????????。 ?????Release 3??????，???????，???，??? ??????????????????SDK，????????。 ??，??????? That's all.VidCoder: 1.4.0 Beta: First Beta release! Catches up to HandBrake nightlies with SVN 4937. Added PGS (Blu-ray) subtitle support. Additional framerates available: 30, 50, 59.94, 60 Additional sample rates available: 8, 11.025, 12 and 16 kHz Additional higher bitrates available for audio. Same as Source Constant Framerate available. Added Apple TV 3 preset. Added new Bob deinterlacing option. Introduced process isolation for encodes. Now if HandBrake crashes, VidCoder will keep running and continue pro...DNN Metro7 style Skin package: Metro7 style Skin for DotNetNuke 06.02.01: Stabilization release fixed this issues: Links not worked on FF, Chrome and Safari Modified packaging with own manifest file for install and source package. Moved the user Image on the Login to the left side. Moved h2 font-size to 24px. Note : This release Comes w/o source package about we still work an a solution. Who Needs the Visual Studio source files please go to source and download it from there. Known 16 CSS issues that related to the skin.css. All others are DNN default o...Visual Studio Icon Patcher: Version 1.5.1: This fixes a bug in the 1.5 release where it would crash when no language packs were installed for VS2010.sheetengine - Isometric HTML5 JavaScript Display Engine: sheetengine v1.1.0: This release of sheetengine introduces major drawing optimizations. A background canvas is created with the full drawn scenery onto which only the changed parts are redrawn. For example a moving object will cause only its bounding box to be redrawn instead of the full scene. This background canvas is copied to the main canvas in each iteration. For this reason the size of the bounding box of every object needs to be defined and also the width and height of the background canvas. The example...VFPX: Desktop Alerts 1.0.2: This update for the Desktop Alerts contains changes to behavior for setting custom sounds for alerts. I have removed ALERTWAV.TXT from the project, and also removed DA_DEFAULTSOUND from the VFPALERT.H file. The AlertManager class and Alert class both have a "default" cSound of ADDBS(JUSTPATH(_VFP.ServerName))+"alert.wav" --- so, as long as you distribute a sound file with the file name "alert.wav" along with the EXE, that file will be used. You can set your own sound file globally by setti...MCEBuddy 2.x: MCEBuddy 2.2.15: Changelog for 2.2.15 (32bit and 64bit) 1. Added support for %originalfilepath% to get the source file full path. Used for custom commands only. 2. Added support for better parsing of Media Portal XML files to extract ShowName and Episode Name and download additional details from TVDB (like Season No, Episode No etc). 3. Added support for TVDB seriesID in metadata 4. Added support for eMail non blocking UI testCrashReporter.NET : Exception reporting library for C# and VB.NET: CrashReporter.NET 1.2: *Added html mail format which shows hierarchical exception report for better understanding.PDF Viewer Web part: PDF Viewer Web Part: PDF Viewer Web PartIIS Express Manager: IIS Express Manager v 0.5B: Several added features, including adding site and right click menu for sites; which allows you to start/stop site, view it directly in browser etc.Chris on SharePoint Solutions: View Grid Banding - v1.0: Initial release of the View Creation and Management Page Column Selector Banding solution.Microsoft Ajax Minifier: Microsoft Ajax Minifier 4.67: Fix issue #18629 - incorrectly handling null characters in string literals and not throwing an error when outside string literals. update for Issue #18600 - forgot to make the ///#DEBUG= directive also set a known-global for the given debug namespace. removed the kill-switch for disregarding preprocessor define-comments (///#IF and the like) and created a separate CodeSettings.IgnorePreprocessorDefines property for those who really need to turn that off. Some people had been setting -kil...Lakana - WPF Framework: Lakana V2: Lakana V2 contains : - Lakana WPF Forms (with sample project) - Lakana WPF Navigation (with sample project)Microsoft SQL Server Product Samples: Database: OData QueryFeed workflow activity: The OData QueryFeed sample activity shows how to create a workflow activity that consumes an OData resource, and renders entity properties in a Microsoft Excel 2010 worksheet or Microsoft Word 2010 document. Using the sample QueryFeed activity, you can consume any OData resource. The sample activity uses LINQ to project OData metadata into activity designer expression items. By setting activity expressions, a fully qualified OData query string is constructed consisting of Resource, Filter, Or...New ProjectsCachalote-Todo: Cachalote TodoCommerce Server Tools: A collection of tools and samples for Microsoft and Ascentium Commerce ServerCopiarArquivosNaRede: facilitate the process of copying files from one machine to host several machines,should be used primarily for network administrators and support teams.CS 3750 Team Ghana: Weber State University's Computer Science Department students are working on feature completing the Ghana Hospital inventory system in Asp.Net C#.CUDAFY TSP: Solving the Traveling Salesman Problem in C# on the GPGPU.Deixei Software Factory: Line Of Business application are one of the most important assets in a enterprise environment. You do not need to build the basis over and over. FocusMeter: Tiny tray application for tracking distractions.FrontTest: ????????generalshop: Host a number of ASP.NET controls I developed overtime: 1, RolloverImageButtonjschome: ??????。OVS Web App: OVS web appPeoplePicker Port Tester: The PeoplePicker Port Tester helps with troubleshooting PeoplePicker issues.Project Demo: hajshjdhfjkhdskfhdkhfkahdkjProyectoLenguaje2: proyecto de lenguaje de programación II Por: Moreira Kennedy Palma LuisSharePoint 2010 and 2013 jQuery / JS code samples: Various code samples for SharePoint 2010 and SharePoint 2013Strom: A projectsuperScheduler: A TDD project to help the contributors learn and have fun. The result should be something that looks like a cross platform task scheduler.......Troll Face SDK: Application project using the Face SDK for Windows Phone for demonstration purposeUseful PowerShell Scripts: Powerful & useful PowerShell scriptsWeiTalk: Sina weibo for Windows PhoneXML.NET Serializer: Striving for: - Great performance - Very easy to use - Very flexible and configurable - Ability to configure both via configuration file and/or attributes

Read the article

Search Results

Search found 77 results on 4 pages for 'gpgpu'.

Page 3/4 | < Previous Page | 1 2 3 4 | Next Page >

- by Chris

- by Rafael Spring

- by quadelirus

- by Jaime Pardos

- by scatman

- by Gulshan

- by cuda-dev

- by nlight

- by Lucas

- by hanDerPeder

- by user355008

- by ardiyu07

- by Manolete

- by Daniel Moth

- by Vanangamudi

- by Fanatic23

- by Daniel Moth

- by JoshReuben

- by Daniel Moth

- by notabene

- by TraumaPony

- by caffiend

- by Mehrdad Afshari

- by JoshReuben

< Previous Page | 1 2 3 4 | Next Page >