I'm trying to write a function that supports broadcasting and is fast at the same time. However, numpy's zero-rank arrays are causing trouble as usual. I couldn't find anything useful on google, or by searching here.
So, I'm asking you. How should I implement broadcasting efficiently and handle zero-rank arrays at the same time?
This whole post became larger than anticipated, sorry.
Details:
To clarify what I'm talking about I'll give a simple example:
Say I want to implement a Heaviside step-function. I.e. a function that acts on the real axis, which is 0 on the negative side, 1 on the positive side, and from case to case either 0, 0.5, or 1 at the point 0.
Implementation
Masking
The most efficient way I found so far is the following. It uses boolean arrays as masks to assign the correct values to the corresponding slots in the output vector.
from numpy import *
def step_mask(x, limit=+1):
    """Heaviside step-function.
    y = 0  if x < 0
    y = 1  if x > 0
    See below for x == 0.
    Arguments:
        x       Evaluate the function at these points.
        limit   Which limit at x == 0?
                limit >  0:  y = 1
                limit == 0:  y = 0.5
                limit <  0:  y = 0
    Return:
        The values corresponding to x.
    """
    b = broadcast(x, limit)
    out = zeros(b.shape)
    out[x>0] = 1
    mask = (limit > 0) & (x == 0)
    out[mask] = 1
    mask = (limit == 0) & (x == 0)
    out[mask] = 0.5
    mask = (limit < 0) & (x == 0)
    out[mask] = 0
    return out
List Comprehension
The following-the-numpy-docs way is to use a list comprehension on the flat iterator of the broadcast object. However, list comprehensions become absolutely unreadable for such complicated functions.
def step_comprehension(x, limit=+1):
    b = broadcast(x, limit)
    out = empty(b.shape)
    out.flat = [ ( 1 if x_ > 0 else
                  ( 0 if x_ < 0 else
                   ( 1 if l_ > 0 else
                    ( 0.5 if l_ ==0 else
                     ( 0 ))))) for x_, l_ in b ]
    return out
For Loop
And finally, the most naive way is a for loop. It's probably the most readable option. However, Python for-loops are anything but fast. And hence, a really bad idea in numerics.
def step_for(x, limit=+1):
    b = broadcast(x, limit)
    out = empty(b.shape)
    for i, (x_, l_) in enumerate(b):
        if x_ > 0:
            out[i] = 1
        elif x_ < 0:
            out[i] = 0
        elif l_ > 0:
            out[i] = 1
        elif l_ < 0:
            out[i] = 0
        else:
            out[i] = 0.5
    return out
Test
First of all a brief test to see if the output is correct.
>>> x = array([-1, -0.1, 0, 0.1, 1])
>>> step_mask(x, +1)
array([ 0.,  0.,  1.,  1.,  1.])
>>> step_mask(x, 0)
array([ 0. ,  0. ,  0.5,  1. ,  1. ])
>>> step_mask(x, -1)
array([ 0.,  0.,  0.,  1.,  1.])
It is correct, and the other two functions give the same output.
Performance
How about efficiency? These are the timings:
In [45]: xl = linspace(-2, 2, 500001)
In [46]: %timeit step_mask(xl)
10 loops, best of 3: 19.5 ms per loop
In [47]: %timeit step_comprehension(xl)
1 loops, best of 3: 1.17 s per loop
In [48]: %timeit step_for(xl)
1 loops, best of 3: 1.15 s per loop
The masked version performs best as expected. However, I'm surprised that the comprehension is on the same level as the for loop.
Zero Rank Arrays
But, 0-rank arrays pose a problem. Sometimes you want to use a function scalar input. And preferably not have to worry about wrapping all scalars in at least 1-D arrays.
>>> step_mask(1)
Traceback (most recent call last):
  File "<ipython-input-50-91c06aa4487b>", line 1, in <module>
    step_mask(1)
  File "script.py", line 22, in step_mask
    out[x>0] = 1
IndexError: 0-d arrays can't be indexed.
>>> step_for(1)
Traceback (most recent call last):
  File "<ipython-input-51-4e0de4fcb197>", line 1, in <module>
    step_for(1)
  File "script.py", line 55, in step_for
    out[i] = 1
IndexError: 0-d arrays can't be indexed.
>>> step_comprehension(1)
array(1.0)
Only the list comprehension can handle 0-rank arrays. The other two versions
would need special case handling for 0-rank arrays.
Numpy gets a bit messy when you want to use the same code for arrays and scalars. However, I really like to have functions that work on as arbitrary input as possible. Who knows which parameters I'll want to iterate over at some point.
Question:
What is the best way to implement a function as the one above? Is there a way to avoid if scalar then like special cases?
I'm not looking for a built-in Heaviside. It's just a simplified example. In my code the above pattern appears in many places to make parameter iteration as simple as possible without littering the client code with for loops or comprehensions.
Furthermore, I'm aware of Cython, or weave & Co., or implementation directly in C. However, the performance of the masked version above is sufficient for the moment. And for the moment I would like to keep things as simple as possible.