Trying to reduce the speed overhead of an almost-but-not-quite-int number class

Posted by Fumiyo Eda on Stack Overflow See other posts from Stack Overflow or by Fumiyo Eda
Published on 2011-02-19T14:16:56Z Indexed on 2011/02/19 15:25 UTC
Read the original article Hit count: 238

I have implemented a C++ class which behaves very similarly to the standard int type. The difference is that it has an additional concept of "epsilon" which represents some tiny value that is much less than 1, but greater than 0. One way to think of it is as a very wide fixed point number with 32 MSBs (the integer parts), 32 LSBs (the epsilon parts) and a huge sea of zeros in between.

The following class works, but introduces a ~2x speed penalty in the overall program. (The program includes code that has nothing to do with this class, so the actual speed penalty of this class is probably much greater than 2x.) I can't paste the code that is using this class, but I can say the following:

+, -, +=, <, > and >= are the only heavily used operators. Use of setEpsilon() and getInt() is extremely rare. * is also rare, and does not even need to consider the epsilon values at all.

Here is the class:

#include <limits>

struct int32Uepsilon {
typedef int32Uepsilon Self;

int32Uepsilon () { _value = 0;
                   _eps   = 0; }
int32Uepsilon (const int &i) { _value = i;
                               _eps   = 0; }
void setEpsilon() { _eps = 1; }
Self operator+(const Self &rhs) const { Self result = *this;
                                      result._value += rhs._value;
                                      result._eps   += rhs._eps;
                                      return result; }
Self operator-(const Self &rhs) const { Self result = *this;
                                      result._value -= rhs._value;
                                      result._eps   -= rhs._eps;
                                      return result; }
Self operator-(               ) const { Self result = *this;
                                      result._value = -result._value;
                                      result._eps   = -result._eps;
                                      return result; }
Self operator*(const Self &rhs) const { return this->getInt() * rhs.getInt(); } // XXX: discards epsilon

bool operator<(const Self &rhs) const { return (_value < rhs._value) ||
                                             (_value == rhs._value && _eps < rhs._eps); }
bool operator>(const Self &rhs) const { return (_value > rhs._value) ||
                                             (_value == rhs._value && _eps > rhs._eps); }
bool operator>=(const Self &rhs) const { return (_value >= rhs._value) ||
                                             (_value == rhs._value && _eps >= rhs._eps); }

Self &operator+=(const Self &rhs) { this->_value += rhs._value;
                                  this->_eps   += rhs._eps;
                                  return *this; }
Self &operator-=(const Self &rhs) { this->_value -= rhs._value;
                                  this->_eps   -= rhs._eps;
                                  return *this; }
int getInt() const { return(_value); }

private:
  int _value;
  int _eps;
};

namespace std {
template<>
struct numeric_limits<int32Uepsilon> {
  static const bool is_signed  = true;
  static int max() { return 2147483647; }
}
};

The code above works, but it is quite slow. Does anyone have any ideas on how to improve performance? There are a few hints/details I can give that might be helpful:

  • 32 bits are definitely insufficient to hold both _value and _eps. In practice, up to 24 ~ 28 bits of _value are used and up to 20 bits of _eps are used.
  • I could not measure a significant performance difference between using int32_t and int64_t, so memory overhead itself is probably not the problem here.
  • Saturating addition/subtraction on _eps would be cool, but isn't really necessary.
  • Note that the signs of _value and _eps are not necessarily the same! This broke my first attempt at speeding this class up.
  • Inline assembly is no problem, so long as it works with GCC on a Core i7 system running Linux!

© Stack Overflow or respective owner

Related posts about c++

Related posts about optimization