Search Results

Search found 1 results on 1 pages for 'xnov'.

Page 1/1 | 1 

  • thread management in nbody code of cuda-sdk

    - by xnov
    When I read the nbody code in Cuda-SDK, I went through some lines in the code and I found that it is a little bit different than their paper in GPUGems3 "Fast N-Body Simulation with CUDA". My questions are: First, why the blockIdx.x is still involved in loading memory from global to share memory as written in the following code? for (int tile = blockIdx.y; tile < numTiles + blockIdx.y; tile++) { sharedPos[threadIdx.x+blockDim.x*threadIdx.y] = multithreadBodies ? positions[WRAP(blockIdx.x + q * tile + threadIdx.y, gridDim.x) * p + threadIdx.x] : //this line positions[WRAP(blockIdx.x + tile, gridDim.x) * p + threadIdx.x]; //this line __syncthreads(); // This is the "tile_calculation" function from the GPUG3 article. acc = gravitation(bodyPos, acc); __syncthreads(); } isn't it supposed to be like this according to paper? I wonder why sharedPos[threadIdx.x+blockDim.x*threadIdx.y] = multithreadBodies ? positions[WRAP(q * tile + threadIdx.y, gridDim.x) * p + threadIdx.x] : positions[WRAP(tile, gridDim.x) * p + threadIdx.x]; Second, in the multiple threads per body why the threadIdx.x is still involved? Isn't it supposed to be a fix value or not involving at all because the sum only due to threadIdx.y if (multithreadBodies) { SX_SUM(threadIdx.x, threadIdx.y).x = acc.x; //this line SX_SUM(threadIdx.x, threadIdx.y).y = acc.y; //this line SX_SUM(threadIdx.x, threadIdx.y).z = acc.z; //this line __syncthreads(); // Save the result in global memory for the integration step if (threadIdx.y == 0) { for (int i = 1; i < blockDim.y; i++) { acc.x += SX_SUM(threadIdx.x,i).x; //this line acc.y += SX_SUM(threadIdx.x,i).y; //this line acc.z += SX_SUM(threadIdx.x,i).z; //this line } } } Can anyone explain this to me? Is it some kind of optimization for faster code?

    Read the article

1