I have a large, dense matrix A, and I aim to find the solution to the linear system Ax=b using an iterative method (in MATLAB was the plan using its built in GMRES). For more than 10,000 rows, this is too much for my computer to store in memory, but I know that the entries in A are constructed by two known vectors x and y of length N and the entries satisfy:
A(i,j) = .5*(x[i]-x[j])^2+([y[i]-y[j])^2 * log(x[i]-x[j])^2+([y[i]-y[j]^2).
MATLAB's GMRES command accepts as input a function call that can compute the matrix vector product A*x, which allows me to handle larger matrices than I can store in memory. To write the matrix-vecotr product function, I first tried this in matlab by going row by row and using some vectorization, but I avoid spawning the entire array A (since it would be too large). This was fairly slow unfortnately in my application for GMRES. My plan was to write a mex file for MATLAB to, which is in C, and ideally should be significantly faster than the matlab code. I'm rather new to C, so this went rather poorly and my naive attempt at writing the code in C was slower than my partially vectorized attempt in Matlab.
#include <math.h>
#include "mex.h"
void Aproduct(double *x, double *ctrs_x, double *ctrs_y, double *b, mwSize n)
{
mwSize i;
mwSize j;
double val;
for (i=0; i<n; i++) {
for (j=0; j<i; j++) {
val = pow(ctrs_x[i]-ctrs_x[j],2)+pow(ctrs_y[i]-ctrs_y[j],2);
b[i] = b[i] + .5* val * log(val) * x[j];
}
for (j=i+1; j<n; j++) {
val = pow(ctrs_x[i]-ctrs_x[j],2)+pow(ctrs_y[i]-ctrs_y[j],2);
b[i] = b[i] + .5* val * log(val) * x[j];
}
}
}
The above is the computational portion of the code for the matlab mex file (which is slightly modified C, if I understand correctly). Please note that I skip the case i=j, since in that case the variable val will be a 0*log(0), which should be interpreted as 0 for me, so I just skip it.
Is there a more efficient or faster way to write this? When I call this C function via the mex file in matlab, it is quite slow, slower even than the matlab method I used. This surprises me since I suspected that C code should be much faster than matlab.
The alternative matlab method which is partially vectorized that I am comparing it with is
function Ax = Aprod(x,ctrs)
n = length(x);
Ax = zeros(n,1);
for j=1:(n-3)
v = .5*((ctrs(j,1)-ctrs(:,1)).^2+(ctrs(j,2)-ctrs(:,2)).^2).*log((ctrs(j,1)-ctrs(:,1)).^2+(ctrs(j,2)-ctrs(:,2)).^2);
v(j)=0;
Ax(j) = dot(v,x(1:n-3);
end
(the n-3 is because there is actually 3 extra components, but they are dealt with separately,so I excluded that code). This is partly vectorized and only needs one for loop, so it makes some sense that it is faster. However, I was hoping I could go even faster with C+mex file.
Any suggestions or help would be greatly appreciated! Thanks!
EDIT: I should be more clear. I am open to any faster method that can help me use GMRES to invert this matrix that I am interested in, which requires a faster way of doing the matrix vector product without explicitly loading the array into memory. Thanks!