Optimize code performance when odd/even threads are doing different things in CUDA

Posted by Orion Nebula on Stack Overflow See other posts from Stack Overflow or by Orion Nebula
Published on 2010-05-18T11:29:15Z Indexed on 2010/05/20 12:10 UTC
Read the original article Hit count: 377

Filed under:

cuda

|

odd

|

thread

|

Performance

|

optimization

Hi all!

I have two large vectors, I am trying to do some sort of element multiplication, where an even-numbered element in the first vector is multiplied by the next odd-numbered element in the second vector .... and where the odd-numbered element in the first vector is multiplied by the preceding even-numbered element in the second vector

Ex.

vector 1 is V1(1) V1(2) V1(3) V1(4)

vector 2 is V2(1) V2(2) V2(3) V2(4)

V1(1) * V2(2)

V1(3) * V2(4)

V1(2) * V2(1)

V1(4) * V2(3)

I have written a Cuda code to do this: (Pds has the elements of the first vector in shared memory, Nds the second Vector)

//instead of using %2 .. i check for the first bit to decide if number is odd/even --> faster

if ((tx & 0x0001) ==  0x0000)
    Nds[tx+1] = Pds[tx] * Nds[tx+1];
else
    Nds[tx-1] = Pds[tx] * Nds[tx-1];
__syncthreads();

Is there anyway to further accelerate this code or avoid divergence ?

Thanks

© Stack Overflow or respective owner

Related posts about cuda

CUDA Driver API vs. CUDA runtime

as seen on Stack Overflow - Search for 'Stack Overflow'
When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math): I assume the tradeoff between the two are increased performance for the low-evel API but at the cost of increased… >>> More
Updating a Cuda 4.0 project to Cuda 4.2

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a VS2010 project that was tested with CUDA 4.0, today I installed CUDA 4.2 and I want to update this project, the problem is that when I try to run the project it asks me for cudart32_40_17.dll, but since this is CUDA 4.2 I only have on my folders (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4… >>> More
How to solve CUDA crash when run CUDA example fluidsGL?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I use ubuntu 12.04 64 bits with GTX560Ti. I install CUDA by following instruction: wget http: //developer.download.nvidia.com/compute/cuda/4_2/rel/toolkit/cudatoolkit_4.2.9_lin ux_64_ubuntu11.04.run wget http: //developer.download.nvidia.com/compute/cuda/4_2/rel/drivers/devdriver_4… >>> More
Context migration in CUDA.NET

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm currently using CUDA.NET library by GASS. I need to initialize cuda arrays (actually cublas vectors, but it doesn't matters) in one CPU thread and use them in other CPU thread. But CUDA context which holding all initialized arrays and loaded functions, can be attached to only one CPU thread. There… >>> More
CUDA on GeForce 8600GT

as seen on Super User - Search for 'Super User'
I have got the cuda driver, toolkit and sdk installed in Ubuntu 10.04. I'm using nVidia Geforce 8600 GT card. Official website says my card is CUDA supported. But on running the deviceQuery that comes with the cuda sdk, I'm getting the following output. ./deviceQuery Starting... CUDA Device Query… >>> More

Related posts about odd

Troubleshoot odd large transaction log backups...

as seen on Server Fault - Search for 'Server Fault'
I have a SQL Server 2005 SP2 system with a single database that is 42gigs in size. It is a modestly active database that sees on average 25 transactions per second. The database is configured in Full recovery model and we perform transaction log backups every hour. However it seems to be pretty… >>> More
Odd nested dictionary behavior in python

as seen on Stack Overflow - Search for 'Stack Overflow'
Im new two python and am trying to grow a dictionary of dictionaries. I have done this in php and perl but python is behaving very differently. Im sure it makes sense to those more familiar with python. Here is my code: colnames = ['name','dob','id']; tablehashcopy = {}; tablehashcopy = dict.fromkeys(colnames… >>> More
Odd DOM Problem with Firefox

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello. I'm experiencing an odd problem when trying to navigate through a table's rows and cells in a while loop using javascript. I'm using Firefox 3.5.7 on Win7 with Firebug enabled. I have this markup: <table> <tbody> <tr id='firstRow'><td>a</td><td>b</td><td>c</td></tr> … >>> More
Eliminating "phantom" or "ghost" clicks on my Mac Pro

as seen on Super User - Search for 'Super User'
Recently on my Mac Pro I have been experiencing phantom clicks and other strange behaviors. I have been rummaging through my system preferences to try and root out possible causes, and recently came across a strange finding in the Exposé panel (the keyboard modifiers are there from my taking the screenshot): I… >>> More
Sed: how to remove every ODD line?

as seen on Stack Overflow - Search for 'Stack Overflow'
Remove lines marked: remove Not Remove Not Remove ... >>> More