Why is thread local storage so slow?

Posted by dsimcha on Stack Overflow See other posts from Stack Overflow or by dsimcha
Published on 2009-02-03T05:28:37Z Indexed on 2010/03/20 10:51 UTC
Read the original article Hit count: 437

Filed under:
|
|

I'm working on a custom mark-release style memory allocator for the D programming language that works by allocating from thread-local regions. It seems that the thread local storage bottleneck is causing a huge (~50%) slowdown in allocating memory from these regions compared to an otherwise identical single threaded version of the code, even after designing my code to have only one TLS lookup per allocation/deallocation. This is based on allocating/freeing memory a large number of times in a loop, and I'm trying to figure out if it's an artifact of my benchmarking method. My understanding is that thread local storage should basically just involve accessing something through an extra layer of indirection, similar to accessing a variable via a pointer. Is this incorrect? How much overhead does thread-local storage typically have?

Note: Although I mention D, I'm also interested in general answers that aren't specific to D, since D's implementation of thread-local storage will likely improve if it is slower than the best implementations.

© Stack Overflow or respective owner

Related posts about d

    Related posts about multithreading