Can the STREAM and GUPS (single CPU) benchmark use non-local memory in NUMA machine

Posted by osgx on Stack Overflow See other posts from Stack Overflow or by osgx
Published on 2010-03-25T17:04:04Z Indexed on 2010/03/26 11:13 UTC
Read the original article Hit count: 520

Filed under:

hpc

Hello

I want to run some tests from HPCC, STREAM and GUPS.

They will test memory bandwidth, latency, and throughput (in term of random accesses).

Can I start Single CPU test STREAM or Single CPU GUPS on NUMA node with memory interleaving enabled? (Is it allowed by the rules of HPCC - High Performance Computing Challenge?)

Usage of non-local memory can increase GUPS results, because it will increase 2- or 4- fold the number of memory banks, available for random accesses. (GUPS typically limited by nonideal memory-subsystem and by slow memory bank opening/closing. With more banks it can do update to one bank, while the other banks are opening/closing.)

Thanks.

UPDATE:

(you may nor reorder the memory accesses that the program makes).

But can compiler reorder loops nesting? E.g. hpcc/RandomAccess.c

  /* Perform updates to main table.  The scalar equivalent is:
   *
   *     u64Int ran;
   *     ran = 1;
   *     for (i=0; i<NUPDATE; i++) {
   *       ran = (ran << 1) ^ (((s64Int) ran < 0) ? POLY : 0);
   *       table[ran & (TableSize-1)] ^= stable[ran >> (64-LSTSIZE)];
   *     }
   */
  for (j=0; j<128; j++)
    ran[j] = starts ((NUPDATE/128) * j);
  for (i=0; i<NUPDATE/128; i++) {
/* #pragma ivdep */
    for (j=0; j<128; j++) {
      ran[j] = (ran[j] << 1) ^ ((s64Int) ran[j] < 0 ? POLY : 0);
      Table[ran[j] & (TableSize-1)] ^= stable[ran[j] >> (64-LSTSIZE)];
    }
  }

The main loop here is for (i=0; i<NUPDATE/128; i++) { and the nested loop is for (j=0; j<128; j++) {. Using 'loop interchange' optimization, compiler can convert this code to

for (j=0; j<128; j++) {
  for (i=0; i<NUPDATE/128; i++) {
      ran[j] = (ran[j] << 1) ^ ((s64Int) ran[j] < 0 ? POLY : 0);
      Table[ran[j] & (TableSize-1)] ^= stable[ran[j] >> (64-LSTSIZE)];
  }
}

It can be done because this loop nest is perfect loop nest. Is such optimization prohibited by rules of HPCC?

Developer IT

Can the STREAM and GUPS (single CPU) benchmark use non-local memory in NUMA machine - Developer IT

Can the STREAM and GUPS (single CPU) benchmark use non-local memory in NUMA machine

hpcc

benchmarking

stream

gups

hpc

Related posts about hpcc

Can the STREAM and GUPS (single CPU) benchmark use non-local memory in NUMA machine

Related posts about benchmarking

How to run benchmarking on MySQL?

Codeigniter benchmarking, where are these ms coming from?

Benchmarking PHP page load times

Graph representation benchmarking

Database Trends & Applications column: Database Benchmarking from A to Z

Categories cloud