Matrix Multiplication with Threads: Why is it not faster?

Posted by prelic on Stack Overflow See other posts from Stack Overflow or by prelic
Published on 2010-06-06T22:29:35Z Indexed on 2010/06/06 22:32 UTC
Read the original article Hit count: 270

Filed under:
|

Hey all,

So I've been playing around with pthreads, specifically trying to calculate the product of two matrices. My code is extremely messy because it was just supposed to be a quick little fun project for myself, but the thread theory I used was very similar to:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define M 3
#define K 2
#define N 3
#define NUM_THREADS 10

int A [M][K] = { {1,4}, {2,5}, {3,6} };
int B [K][N] = { {8,7,6}, {5,4,3} };
int C [M][N];

struct v {
   int i; /* row */
   int j; /* column */
};

void *runner(void *param); /* the thread */

int main(int argc, char *argv[]) {

   int i,j, count = 0;
   for(i = 0; i < M; i++) {
      for(j = 0; j < N; j++) {
         //Assign a row and column for each thread
         struct v *data = (struct v *) malloc(sizeof(struct v));
         data->i = i;
         data->j = j;
         /* Now create the thread passing it data as a parameter */
         pthread_t tid;       //Thread ID
         pthread_attr_t attr; //Set of thread attributes
         //Get the default attributes
         pthread_attr_init(&attr);
         //Create the thread
         pthread_create(&tid,&attr,runner,data);
         //Make sure the parent waits for all thread to complete
         pthread_join(tid, NULL);
         count++;
      }
   }

   //Print out the resulting matrix
   for(i = 0; i < M; i++) {
      for(j = 0; j < N; j++) {
         printf("%d ", C[i][j]);
     }
      printf("\n");
   }
}

//The thread will begin control in this function
void *runner(void *param) {
   struct v *data = param; // the structure that holds our data
   int n, sum = 0; //the counter and sum

   //Row multiplied by column
   for(n = 0; n< K; n++){
      sum += A[data->i][n] * B[n][data->j];
   }
   //assign the sum to its coordinate
   C[data->i][data->j] = sum;

   //Exit the thread
   pthread_exit(0);
}

source: http://macboypro.com/blog/2009/06/29/matrix-multiplication-in-c-using-pthreads-on-linux/

For the non-threaded version, I used the same setup (3 2-d matrices, dynamically allocated structs to hold r/c), and added a timer. First trials indicated that the non-threaded version was faster. My first thought was that the dimensions were too small to notice a difference, and it was taking longer to create the threads. So I upped the dimensions to about 50x50, randomly filled, and ran it, and I'm still not seeing any performance upgrade with the threaded version.

What am I missing here?

© Stack Overflow or respective owner

Related posts about c

    Related posts about threads