I recently asked about trying to optimise a Python loop for a scientific application, and received an excellent, smart way of recoding it within NumPy which reduced execution time by a factor of around 100 for me!
However, calculation of the B value is actually nested within a few other loops, because it is evaluated at a regular grid of positions. Is there a similarly smart NumPy rewrite to shave time off this procedure?
I suspect the performance gain for this part would be less marked, and the disadvantages would presumably be that it would not be possible to report back to the user on the progress of the calculation, that the results could not be written to the output file until the end of the calculation, and possibly that doing this in one enormous step would have memory implications? Is it possible to circumvent any of these?
import numpy as np
import time
def reshape_vector(v):
    b = np.empty((3,1))
    for i in range(3):
        b[i][0] = v[i]
    return b
def unit_vectors(r):
     return r / np.sqrt((r*r).sum(0))
def calculate_dipole(mu, r_i, mom_i):
    relative = mu - r_i
    r_unit = unit_vectors(relative)
    A = 1e-7
    num = A*(3*np.sum(mom_i*r_unit, 0)*r_unit - mom_i)
    den = np.sqrt(np.sum(relative*relative, 0))**3
    B = np.sum(num/den, 1)
    return B
N = 20000 # number of dipoles
r_i = np.random.random((3,N)) # positions of dipoles
mom_i = np.random.random((3,N)) # moments of dipoles
a = np.random.random((3,3)) # three basis vectors for this crystal
n = [10,10,10] # points at which to evaluate sum
gamma_mu = 135.5 # a constant
t_start = time.clock()
for i in range(n[0]):
    r_frac_x = np.float(i)/np.float(n[0])
    r_test_x = r_frac_x * a[0]
    for j in range(n[1]):
        r_frac_y = np.float(j)/np.float(n[1])
        r_test_y = r_frac_y * a[1]
        for k in range(n[2]):
            r_frac_z = np.float(k)/np.float(n[2])
            r_test = r_test_x +r_test_y + r_frac_z * a[2]
            r_test_fast = reshape_vector(r_test)
            B = calculate_dipole(r_test_fast, r_i, mom_i)
            omega = gamma_mu*np.sqrt(np.dot(B,B))
            # write r_test, B and omega to a file
    frac_done = np.float(i+1)/(n[0]+1)
    t_elapsed = (time.clock()-t_start)
    t_remain = (1-frac_done)*t_elapsed/frac_done
    print frac_done*100,'% done in',t_elapsed/60.,'minutes...approximately',t_remain/60.,'minutes remaining'