CUDA Kernel Not Updating Global Variable
- by Taher Khokhawala
I am facing the following problem in a CUDA kernel. There is an array "cu_fx" in global memory. Each thread has a unique identifier jj and a local loop variable ii and a local float variable temp.
Following code is not working. It is not at all changing cu_fx[jj]. At the end of loop cu_fx[jj] remains 0.
ii = 0;
cu_fx[jj] = 0;
while(ii < l)
{
    if(cu_y[ii] > 0)
        cu_fx[jj] += (cu_mu[ii]*cu_Kernel[(jj-start_row)*Kernel_w + ii]);
    else
        cu_fx[jj] -= (cu_mu[ii]*cu_Kernel[(jj-start_row)*Kernel_w + ii]);
    ii++;
}
But when I rewrite it using a temporary variable temp, it works fine.
ii = 0;
temp = 0;
while(ii < l)
{
    if(cu_y[ii] > 0)
        temp += (cu_mu[ii]*cu_Kernel[(jj-start_row)*Kernel_w + ii]);
    else
        temp -= (cu_mu[ii]*cu_Kernel[(jj-start_row)*Kernel_w + ii]);
    ii++;
}
cu_fx[jj] = temp;
Can somebody please help with this problem. Thanking in advance.