porgarmingduod - Developer IT

Boost interprocess cached pools

- by porgarmingduod

I'm trying to figure out if my reading of the docs for boost interprocess allocators is correct. When using cached_adaptive_pool to allocate memory: typedef cached_adaptive_pool<int, managed_shared_memory::segment_manager> pool_allocator_t; pool_allocator_t pool_allocator(segment.get_segment_manager()); // Allocate an integer in the shared memory segment pool_allocator_t::pointer pool_allocator.allocate_one(); My understanding is that with multiple processes one can allocate and deallocate freely: That is, if I have a cached pool allocator for integers in one process, then it can deallocate integers allocated by similar pools in other processes (provided, of course, that they are working on the same shared memory segment). It may be a stupid question, but working with multiple processes and shared memory is hard enough, so I'd like to know 100% whether I got the basics right.

Read the article

Memory Bandwidth Performance for Modern Machines

- by porgarmingduod

I'm designing a real-time system that occasionally has to duplicate a large amount of memory. The memory consists of non-tiny regions, so I expect the copying performance will be fairly close to the maximum bandwidth the relevant components (CPU, RAM, MB) can do. This led me to wonder what kind of raw memory bandwidth modern commodity machine can muster? My aging Core2Duo gives me 1.5 GB/s if I use 1 thread to memcpy() (and understandably less if I memcpy() with both cores simultaneously.) While 1.5 GB is a fair amount of data, the real-time application I'm working on will have have something like 1/50th of a second, which means 30 MB. Basically, almost nothing. And perhaps worst of all, as I add multiple cores, I can process a lot more data without any increased performance for the needed duplication step. But a low-end Core2Due isn't exactly hot stuff these days. Are there any sites with information, such as actual benchmarks, on raw memory bandwidth on current and near-future hardware? Furthermore, for duplicating large amounts of data in memory, are there any shortcuts, or is memcpy() as good as it will get? Given a bunch of cores with nothing to do but duplicate as much memory as possible in a short amount of time, what's the best I can do?

Read the article

Learning to read GCC assembler output

- by porgarmingduod

I'm considering picking up some very rudimentary understanding of assembly. My current goal is simple: VERY BASIC understanding of GCC assembler output when compiling C/C++ with the -S switch. Just enough to do simple things such as looking at a single function and verifying whether GCC optimizes away things I expect to disappear. Does anyone have/know of a truly concise introduction to assembly, relevant to GCC and specifically for the purpose of reading, and a list of the most important instructions anyone casually reading assembly should know?

Read the article

Inlining an array of non-default constructible objects in a C++ class

- by porgarmingduod

C++ doesn't allow a class containing an array of items that are not default constructible: class Gordian { public: int member; Gordian(int must_have_variable) : member(must_have_variable) {} }; class Knot { Gordian* pointer_array[8]; // Sure, this works. Gordian inlined_array[8]; // Won't compile. Can't be initialized. }; As even beginner C++ users know, the language guarantees that all members are initialized when constructing a class. And it doesn't trust the user to initialize everything in the constructor - one has to provide valid arguments to the constructors of all members before the body of the constructor even starts. Generally, that's a great idea as far as I'm concerned, but I've come across a situation where it would be a lot easier if I could actually have an array of non-default constructible objects. The obvious solution: Have an array of pointers to the objects. This is not optimal in my case, as I am using shared memory. It would force me to do extra allocation from an already contended resource (that is, the shared memory). The entire reason I want to have the array inlined in the object is to reduce the number of allocations. This is a situation where I would be willing to use a hack, even an ugly one, provided it works. One possible hack I am thinking about would be: class Knot { public: struct dummy { char padding[sizeof(Gordian)]; }; dummy inlined_array[8]; Gordian* get(int index) { return reinterpret_cast<Gordian*>(&inlined_array[index]); } Knot() { for (int x = 0; x != 8; x++) { new (get(x)) Gordian(x*x); } } }; Sure, it compiles, but I'm not exactly an experienced C++ programmer. That is, I couldn't possibly trust my hacks less. So, the questions: 1) Does the hack I came up with seem workable? What are the issues? (I'm mainly concerned with C++0x on newer versions of GCC). 2) Is there a better way to inline an array of non-default constructible objects in a class?

Read the article

Overwriting a range of bits in an integer in a generic way

- by porgarmingduod

Given two integers X and Y, I want to overwrite bits at position P to P+N. Example: int x = 0xAAAA; // 0b1010101010101010 int y = 0x0C30; // 0b0000110000110000 int result = 0xAC3A; // 0b1010110000111010 Does this procedure have a name? If I have masks, the operation is easy enough: int mask_x = 0xF00F; // 0b1111000000001111 int mask_y = 0x0FF0; // 0b0000111111110000 int result = (x & mask_x) | (y & mask_y); What I can't quite figure out is how to write it in a generic way, such as in the following generic C++ function: template<typename IntType> IntType OverwriteBits(IntType dst, IntType src, int pos, int len) { // If: // dst = 0xAAAA; // 0b1010101010101010 // src = 0x0C30; // 0b0000110000110000 // pos = 4 ^ // len = 8 ^------- // Then: // result = 0xAC3A; // 0b1010110000111010 } The problem is that I cannot figure out how to make the masks properly when all the variables, including the width of the integer, is variable. Does anyone know how to write the above function properly?

Read the article

Alias for a C++ template?

- by porgarmingduod

typedef boost::interprocess::managed_shared_memory::segment_manager segment_manager_t; // Works fine, segment_manager is a class typedef boost::interprocess::adaptive_pool allocator_t; // Can't do this, adaptive_pool is a template The idea is that if I want to switch between boost interprocess' several different options for shared memory and allocators, I just modify the typedefs. Unfortunately the allocators are templates, so I can't typedef the allocator I want to use. Is there a way to achieve an alias to a template in C++? (Except for the obvious #define ALLOCATOR_T boost::interprocess::adaptive_pool)

Read the article

Does a c/c++ compiler optimize constant divisions by power-of-two value into shifts?

- by porgarmingduod

Question says it all. Does anyone know if the following... size_t div(size_t value) { const size_t x = 64; return value / x; } ...is optimized into? size_t div(size_t value) { return value >> 6; } Do compilers do this? (My interest lies in GCC). Are there situations where it does and others where it doesn't? I would really like to know, because every time I write a division that could be optimized like this I spend some mental energy wondering about whether precious nothings of a second is wasted doing a division where a shift would suffice.

Developer IT