Why is rvalue write in shared memory array serialised?

Posted by CJM on Ask Ubuntu See other posts from Ask Ubuntu or by CJM
Published on 2012-04-05T21:25:39Z Indexed on 2012/04/05 23:43 UTC
Read the original article Hit count: 218

Filed under:
|
|

I'm using CUDA 4.0 on a GPU with computing capability 2.1. One of my device functions is the following:

device void test(int n, int* itemp) // itemp is shared memory pointer {
const int tid = threadIdx.x; const int bdim = blockDim.x;

int i, j, k; bool flag = 0;

itemp[tid] = 0;
for(i=tid; i<n; i+=bdim)
{   // { code that produces some values of "flag" }
}
itemp[tid] = flag;

}

Each thread is checking some conditions and producing a 0/1 flag. Then each thread is writing flag at the tid-th location of a shared int array. The write statement "itemp[tid] = flag;" gets serialized -- though "itemp[tid] = 0;" is not. This is causing huge performance lag which technically should not be there -- I want to avoid it. Please help.

© Ask Ubuntu or respective owner

Related posts about Performance

Related posts about shared