Why is Double.Parse so slow?

Posted by alexhildyard on Geeks with Blogs See other posts from Geeks with Blogs or by alexhildyard
Published on Wed, 27 Jun 2012 17:17:47 GMT Indexed on 2012/06/27 21:18 UTC
Read the original article Hit count: 275

Filed under:


I was recently investigating a bottleneck in one of my applications, which read a CSV file from disk using a TextReader a line at a time, split the tokens, called Double.Parse on each one, then shunted the results into an object list. I was surprised to find it was actually the Double.Parse which seemed to be taking up most of the time.

Googling turned up this, which is a little unfocused in places but throws out some excellent ideas:

  • It makes more sense to work with binary format directly, rather than coerce strings into doubles
  • There is a significant performance improvement in composing doubles directly from the byte stream via long intermediaries
  • String.Split is inefficient on fixed length records

In fact it turned out that my problem was more insidious and also more mundane -- a simple case of bad data in, bad data out. Since I had been serialising my Doubles as strings, when I inadvertently divided by zero and produced a "NaN", this of course was serialised as well without error. And because I was reading in using Double.Parse, these "NaN" fields were also (correctly) populating real Double objects without error. The issue is that Double.Parse("NaN") is incredibly slow. In fact, it is of the order of 2000x slower than parsing a valid double. For example, the code below gave me results of 357ms to parse 1000 NaNs, versus 15ms to parse 100,000 valid doubles.

            const int invalid_iterations = 1000;
            const int valid_iterations = invalid_iterations * 100;
            const string invalid_string = "NaN";
            const string valid_string = "3.14159265";

            DateTime start = DateTime.Now;
           
            for (int i = 0; i < invalid_iterations; i++)
            {
                double invalid_double = Double.Parse(invalid_string);
            }

            Console.WriteLine(String.Format("{0} iterations of invalid double, time taken (ms): {1}",
                invalid_iterations,
                ((TimeSpan)DateTime.Now.Subtract(start)).Milliseconds
            ));

            start = DateTime.Now;

            for (int i = 0; i < valid_iterations; i++)
            {
                double valid_double = Double.Parse(valid_string);
            }

            Console.WriteLine(String.Format("{0} iterations of valid double, time taken (ms): {1}",
                valid_iterations,
                ((TimeSpan)DateTime.Now.Subtract(start)).Milliseconds
            ));
 
I think the moral is to look at the context -- specifically the data -- as well as the code itself. Once I had corrected my data, the performance of Double.Parse was perfectly acceptable, and while clearly it could have been improved, it was now sufficient to my needs.

© Geeks with Blogs or respective owner