gentiradentes - Developer IT

Benefit of using multiple SIMD instruction sets simultaneously

- by GenTiradentes

I'm writing a highly parallel application that's multithreaded. I've already got an SSE accelerated thread class written. If I were to write an MMX accelerated thread class, then run both at the same time (one SSE thread and one MMX thread per core) would the performance improve noticeably? I would think that this setup would help hide memory latency, but I'd like to be sure before I start pouring time into it.

Read the article

Concurrent Generation of Sequential Keys

- by GenTiradentes

I'm working on a project which generates a very large number of sequential text strings, in a very tight loop. My application makes heavy use of SIMD instruction set extensions like SSE and MMX, in other parts of the program, but the key generator is plain C++. The way my key generator works is I have a keyGenerator class, which holds a single char array that stores the current key. To get the next key, there is a function called "incrementKey," which treats the string as a number, adding one to the string, carrying where necessary. Now, the problem is, the keygen is somewhat of a bottleneck. It's fast, but it would be nice if it were faster. One of the biggest problems is that when I'm generating a set of sequential keys to be processed using my SSE2 code, I have to have the entire set stored in an array, which means I have to sequentially generate and copy 12 strings into an array, one by one, like so: char* keys[12]; for(int i = 0; i < 12; i++) { keys[i] = new char[16]; strcmp(keys[i], keygen++); } So how would you efficiently generate these plaintext strings in order? I need some ideas to help move this along. Concurrency would be nice; as my code is right now, each successive key depends on the previous one, which means that the processor can't start work on the next key until the current one has been completely generated. Here is the code relevant to the key generator: KeyGenerator.h class keyGenerator { public: keyGenerator(unsigned long long location, characterSet* charset) : location(location), charset(charset) { for(int i = 0; i < 16; i++) key[i] = 0; charsetStr = charset->getCharsetStr(); integerToKey(); } ~keyGenerator() { } inline void incrementKey() { register size_t keyLength = strlen(key); for(register char* place = key; place; place++) { if(*place == charset->maxChar) { // Overflow, reset char at place *place = charset->minChar; if(!*(place+1)) { // Carry, no space, insert char *(place+1) = charset->minChar; ++keyLength; break; } else { continue; } } else { // Space available, increment char at place if(*place == charset->charSecEnd[0]) *place = charset->charSecBegin[0]; else if(*place == charset->charSecEnd[1]) *place = charset->charSecBegin[1]; (*place)++; break; } } } inline char* operator++() // Pre-increment { incrementKey(); return key; } inline char* operator++(int) // Post-increment { memcpy(postIncrementRetval, key, 16); incrementKey(); return postIncrementRetval; } void integerToKey() { register unsigned long long num = location; if(!num) { key[0] = charsetStr[0]; } else { num++; while(num) { num--; unsigned int remainder = num % charset->length; num /= charset->length; key[strlen(key)] = charsetStr[remainder]; } } } inline unsigned long long keyToInteger() { // TODO return 0; } inline char* getKey() { return key; } private: unsigned long long location; characterSet* charset; std::string charsetStr; char key[16]; // We need a place to store the key for the post increment operation. char postIncrementRetval[16]; }; CharacterSet.h struct characterSet { characterSet() { } characterSet(unsigned int len, int min, int max, int charsec0, int charsec1, int charsec2, int charsec3) { init(length, min, max, charsec0, charsec1, charsec2, charsec3); } void init(unsigned int len, int min, int max, int charsec0, int charsec1, int charsec2, int charsec3) { length = len; minChar = min; maxChar = max; charSecEnd[0] = charsec0; charSecBegin[0] = charsec1; charSecEnd[1] = charsec2; charSecBegin[1] = charsec3; } std::string getCharsetStr() { std::string retval; for(int chr = minChar; chr != maxChar; chr++) { for(int i = 0; i < 2; i++) if(chr == charSecEnd[i]) chr = charSecBegin[i]; retval += chr; } return retval; } int minChar, maxChar; // charSec = character set section int charSecEnd[2], charSecBegin[2]; unsigned int length; };

Read the article

Declaring pointers; asterisk on the left or right of the space between the type and name?

- by GenTiradentes

I've seen mixed versions of this in a lot of code. (This applies to C and C++, by the way.) People seem to declare pointers in one of two ways, and I have no idea which one is correct, of if it even matters. The first way it to put the asterisk adjacent the type name, like so: someType* somePtr; The second way is to put the asterisk adjacent the name of the variable, like so: someType *somePtr; This has been driving me nuts for some time now. Is there any standard way of declaring pointers? Does it even matter how pointers are declared? I've used both declarations before, and I know that the compiler doesn't care which way it is. However, the fact that I've seen pointers declared in two different ways leads me to believe that there's a reason behind it. I'm curious if either method is more readable or logical in some way that I'm missing.

Read the article

What's a good algorithm for searching arrays N and M, in order to find elements in N that also exist

- by GenTiradentes

I have two arrays, N and M. they are both arbitrarily sized, though N is usually smaller than M. I want to find out what elements in N also exist in M, in the fastest way possible. To give you an example of one possible instance of the program, N is an array 12 units in size, and M is an array 1,000 units in size. I want to find which elements in N also exist in M. (There may not be any matches.) The more parallel the solution, the better. I used to use a hash map for this, but it's not quite as efficient as I'd like it to be. Typing this out, I just thought of running a binary search of M on sizeof(N) independent threads. (Using CUDA) I'll see how this works, though other suggestions are welcome.

Read the article

Function calls in virtual machine killing performance

- by GenTiradentes

I wrote a virtual machine in C, which has a call table populated by pointers to functions that provide the functionality of the VM's opcodes. When the virtual machine is run, it first interprets a program, creating an array of indexes corresponding to the appropriate function in the call table for the opcode provided. It then loops through the array, calling each function until it reaches the end. Each instruction is extremely small, typically one line. Perfect for inlining. The problem is that the compiler doesn't know when any of the virtual machine's instructions are going to be called, as it's decided at runtime, so it can't inline them. The overhead of function calls and argument passing is killing the performance of my VM. Any ideas on how to get around this?

Developer IT