My kernel only works in block (0,0)

Posted by ZeroDivide on Stack Overflow See other posts from Stack Overflow or by ZeroDivide
Published on 2010-06-09T18:20:50Z Indexed on 2010/06/09 22:52 UTC
Read the original article Hit count: 232

Filed under:

cuda

I am trying to write a simple matrixMultiplication application that multiplies two square matrices using CUDA. I am having a problem where my kernel is only computing correctly in block (0,0) of the grid.

This is my invocation code:

dim3 dimBlock(4,4,1);
dim3 dimGrid(4,4,1);
//Launch the kernel;
MatrixMulKernel<<<dimGrid,dimBlock>>>(Md,Nd,Pd,Width);

This is my Kernel function

__global__ void MatrixMulKernel(int* Md, int* Nd, int* Pd, int Width)
{
        const int tx = threadIdx.x; 
        const int ty = threadIdx.y;
        const int bx = blockIdx.x;
        const int by = blockIdx.y;
        const int row = (by * blockDim.y + ty);
        const int col = (bx * blockDim.x + tx);

        //Pvalue stores the Pd element that is computed by the thread
        int Pvalue = 0;

        for (int k = 0; k < Width; k++)
        {
            Pvalue += Md[row * Width + k] * Nd[k * Width + col];
        }
        __syncthreads();
        //Write the matrix to device memory each thread writes one element
        Pd[row * Width + col] = Pvalue;

    }

I think the problem may have something to do with memory but I'm a bit lost. What should I do to make this code work across several blocks?

Related posts about cuda

CUDA Driver API vs. CUDA runtime

as seen on Stack Overflow - Search for 'Stack Overflow'
When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math): I assume the tradeoff between the two are increased performance for the low-evel API but at the cost of increased… >>> More
Updating a Cuda 4.0 project to Cuda 4.2

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a VS2010 project that was tested with CUDA 4.0, today I installed CUDA 4.2 and I want to update this project, the problem is that when I try to run the project it asks me for cudart32_40_17.dll, but since this is CUDA 4.2 I only have on my folders (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4… >>> More
How to solve CUDA crash when run CUDA example fluidsGL?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I use ubuntu 12.04 64 bits with GTX560Ti. I install CUDA by following instruction: wget http: //developer.download.nvidia.com/compute/cuda/4_2/rel/toolkit/cudatoolkit_4.2.9_lin ux_64_ubuntu11.04.run wget http: //developer.download.nvidia.com/compute/cuda/4_2/rel/drivers/devdriver_4… >>> More
Context migration in CUDA.NET

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm currently using CUDA.NET library by GASS. I need to initialize cuda arrays (actually cublas vectors, but it doesn't matters) in one CPU thread and use them in other CPU thread. But CUDA context which holding all initialized arrays and loaded functions, can be attached to only one CPU thread. There… >>> More
CUDA on GeForce 8600GT

as seen on Super User - Search for 'Super User'
I have got the cuda driver, toolkit and sdk installed in Ubuntu 10.04. I'm using nVidia Geforce 8600 GT card. Official website says my card is CUDA supported. But on running the deviceQuery that comes with the cuda sdk, I'm getting the following output. ./deviceQuery Starting... CUDA Device Query… >>> More

Developer IT

My kernel only works in block (0,0) - Developer IT

My kernel only works in block (0,0)

cuda

Related posts about cuda

CUDA Driver API vs. CUDA runtime

Updating a Cuda 4.0 project to Cuda 4.2

How to solve CUDA crash when run CUDA example fluidsGL?

Context migration in CUDA.NET

CUDA on GeForce 8600GT

Categories cloud