jhenning - Developer IT

IBM "per core" comparisons for SPECjEnterprise2010

- by jhenning

I recently stumbled upon a blog entry from Roman Kharkovski (an IBM employee) comparing some SPECjEnterprise2010 results for IBM vs. Oracle. Mr. Kharkovski's blog claims that SPARC delivers half the transactions per core vs. POWER7. Prior to any argument, I should say that my predisposition is to like Mr. Kharkovski, because he says that his blog is intended to be factual; that the intent is to try to avoid marketing hype and FUD tactic; and mostly because he features a picture of himself wearing a bike helmet (me too). Therefore, in a spirit of technical argument, rather than FUD fight, there are a few areas in his comparison that should be discussed. Scaling is not free For any benchmark, if a small system scores 13k using quantity R1 of some resource, and a big system scores 57k using quantity R2 of that resource, then, sure, it's tempting to divide: is 13k/R1 > 57k/R2 ? It is tempting, but not necessarily educational. The problem is that scaling is not free. Building big systems is harder than building small systems. Scoring 13k/R1 on a little system provides no guarantee whatsoever that one can sustain that ratio when attempting to handle more than 4 times as many users. Choosing the denominator radically changes the picture When ratios are used, one can vastly manipulate appearances by the choice of denominator. In this case, lots of choices are available for the resource to be compared (R1 and R2 above). IBM chooses to put cores in the denominator. Mr. Kharkovski provides some reasons for that choice in his blog entry. And yet, it should be noted that the very concept of a core is: arbitrary: not necessarily comparable across vendors; fluid: modern chips shift chip resources in response to load; and invisible: unless you have a microscope, you can't see it. By contrast, one can actually see processor chips with the naked eye, and they are a bit easier to count. If we put chips in the denominator instead of cores, we get: 13161.07 EjOPS / 4 chips = 3290 EjOPS per chip for IBM vs 57422.17 EjOPS / 16 chips = 3588 EjOPS per chip for Oracle The choice of denominator makes all the difference in the appearance. Speaking for myself, dividing by chips just seems to make more sense, because: I can see chips and count them; and I can accurately compare the number of chips in my system to the count in some other vendor's system; and Tthe probability of being able to continue to accurately count them over the next 10 years of microprocessor development seems higher than the probability of being able to accurately and comparably count "cores". SPEC Fair use requirements Speaking as an individual, not speaking for SPEC and not speaking for my employer, I wonder whether Mr. Kharkovski's blog article, taken as a whole, meets the requirements of the SPEC Fair Use rule www.spec.org/fairuse.html section I.D.2. For example, Mr. Kharkovski's footnote (1) begins Results from http://www.spec.org as of 04/04/2013 Oracle SUN SPARC T5-8 449 EjOPS/core SPECjEnterprise2010 (Oracle's WLS best SPECjEnterprise2010 EjOPS/core result on SPARC). IBM Power730 823 EjOPS/core (World Record SPECjEnterprise2010 EJOPS/core result) The questionable tactic, from a Fair Use point of view, is that there is no such metric at the designated location. At www.spec.org, You can find the SPEC metric 57422.17 SPECjEnterprise2010 EjOPS for Oracle and You can also find the SPEC metric 13161.07 SPECjEnterprise2010 EjOPS for IBM. Despite the implication of the footnote, you will not find any mention of 449 nor anything that says 823. SPEC says that you can, under its fair use rule, derive your own values; but it emphasizes: "The context must not give the appearance that SPEC has created or endorsed the derived value." Substantiation and transparency Although SPEC disclaims responsibility for non-SPEC information (section I.E), it says that non-SPEC data and methods should be accurate, should be explained, should be substantiated. Unfortunately, it is difficult or impossible for the reader to independently verify the pricing: Were like units compared to like (e.g. list price to list price)? Were all components (hw, sw, support) included? Were all fees included? Note that when tpc.org shows IBM pricing, there are often items such as "PROCESSOR ACTIVATION" and "MEMORY ACTIVATION". Without the transparency of a detailed breakdown, the pricing claims are questionable. T5 claim for "Fastest Processor" Mr. Kharkovski several times questions Oracle's claim for fastest processor, writing You see, when you publish industry benchmarks, people may actually compare your results to other vendor's results. Well, as we performance people always say, "it depends". If you believe in performance-per-core as the primary way of looking at the world, then yes, the POWER7+ is impressive, spending its chip resources to support up to 32 threads (8 cores x 4 threads). Or, it just might be useful to consider performance-per-chip. Each SPARC T5 chip allows 128 hardware threads to be simultaneously executing (16 cores x 8 threads). The Industry Standard Benchmark that focuses specifically on processor chip performance is SPEC CPU2006. For this very well known and popular benchmark, SPARC T5: provides better performance than both POWER7 and POWER7+, for 1 chip vs. 1 chip, for 8 chip vs. 8 chip, for integer (SPECint_rate2006) and floating point (SPECfp_rate2006), for Peak tuning and for Base tuning. For example, at the 8-chip level, integer throughput (SPECint_rate2006) is: 3750 for SPARC 2170 for POWER7+. You can find the details at the March 2013 BestPerf CPU2006 page SPEC is a trademark of the Standard Performance Evaluation Corporation, www.spec.org. The two specific results quoted for SPECjEnterprise2010 are posted at the URLs linked from the discussion. Results for SPEC CPU2006 were verified at spec.org 1 July 2013, and can be rechecked here.

Developer IT