Search Results

Search found 1 results on 1 pages for 'gups'.

Page 1/1 | 1 

  • Can the STREAM and GUPS (single CPU) benchmark use non-local memory in NUMA machine

    - by osgx
    Hello I want to run some tests from HPCC, STREAM and GUPS. They will test memory bandwidth, latency, and throughput (in term of random accesses). Can I start Single CPU test STREAM or Single CPU GUPS on NUMA node with memory interleaving enabled? (Is it allowed by the rules of HPCC - High Performance Computing Challenge?) Usage of non-local memory can increase GUPS results, because it will increase 2- or 4- fold the number of memory banks, available for random accesses. (GUPS typically limited by nonideal memory-subsystem and by slow memory bank opening/closing. With more banks it can do update to one bank, while the other banks are opening/closing.) Thanks. UPDATE: (you may nor reorder the memory accesses that the program makes). But can compiler reorder loops nesting? E.g. hpcc/RandomAccess.c /* Perform updates to main table. The scalar equivalent is: * * u64Int ran; * ran = 1; * for (i=0; i<NUPDATE; i++) { * ran = (ran << 1) ^ (((s64Int) ran < 0) ? POLY : 0); * table[ran & (TableSize-1)] ^= stable[ran >> (64-LSTSIZE)]; * } */ for (j=0; j<128; j++) ran[j] = starts ((NUPDATE/128) * j); for (i=0; i<NUPDATE/128; i++) { /* #pragma ivdep */ for (j=0; j<128; j++) { ran[j] = (ran[j] << 1) ^ ((s64Int) ran[j] < 0 ? POLY : 0); Table[ran[j] & (TableSize-1)] ^= stable[ran[j] >> (64-LSTSIZE)]; } } The main loop here is for (i=0; i<NUPDATE/128; i++) { and the nested loop is for (j=0; j<128; j++) {. Using 'loop interchange' optimization, compiler can convert this code to for (j=0; j<128; j++) { for (i=0; i<NUPDATE/128; i++) { ran[j] = (ran[j] << 1) ^ ((s64Int) ran[j] < 0 ? POLY : 0); Table[ran[j] & (TableSize-1)] ^= stable[ran[j] >> (64-LSTSIZE)]; } } It can be done because this loop nest is perfect loop nest. Is such optimization prohibited by rules of HPCC?

    Read the article

1