CUDA: When to use shared memory and when to rely on L1 caching?

Posted by Roger Dahl on Stack Overflow See other posts from Stack Overflow or by Roger Dahl
Published on 2012-06-30T16:31:18Z Indexed on 2012/07/01 3:16 UTC
Read the original article Hit count: 399

Filed under:

caching

|

cuda

|

shared-memory

After Compute Capability 2.0 (Fermi) was released, I've wondered if there are any use cases left for shared memory. That is, when is it better to use shared memory than just let L1 perform its magic in the background?

Is shared memory simply there to let algorithms designed for CC < 2.0 run efficiently without modifications?

To collaborate via shared memory, threads in a block write to shared memory and synchronize with __syncthreads(). Why not simply write to global memory (through L1), and synchronize with __threadfence_block()? The latter option should be easier to implement since it doesn't have to relate to two different locations of values, and it should be faster because there is no explicit copying from global to shared memory. Since the data gets cached in L1, threads don't have to wait for data to actually make it all the way out to global memory.

With shared memory, one is guaranteed that a value that was put there remains there throughout the duration of the block. This is as opposed to values in L1, which get evicted if they are not used often enough. Are there any cases where it's better too cache such rarely used data in shared memory than to let the L1 manage them based on the usage pattern that the algorithm actually has?

© Stack Overflow or respective owner

Related posts about caching

Windows Azure Evolution – Caching (Preview)

as seen on Geeks with Blogs - Search for 'Geeks with Blogs'
Caching is a popular topic when we are building a high performance and high scalable system not only on top of the cloud platform but the on-premise environment as well. On March 2011 the Windows Azure AppFabric Caching had been production launched. It provides an in-memory, distributed caching service… >>> More
Disable eclipselink caching and query caching - not working?

as seen on Stack Overflow - Search for 'Stack Overflow'
I am using eclipselink JPA with a database which is also being updated externally to my application. For that reason there are tables I want to query every few seconds. I can't get this to work even when I try to disable the cache and query cache. For example: EntityManagerFactory entityManagerFactory… >>> More
apache's caching (mod_cache) vs squid reverse-proxy caching?

as seen on Server Fault - Search for 'Server Fault'
simple question about an area im not familiar with. which one is best for caching? are there other options? cause i want to be sure which one to use before i learn it. so i dont have to realize afterwards that it was a bad choice. so would be great if someone could shed a light on this topic. >>> More
.htaccess for compression, browser caching, proxy caching, etc.

as seen on Stack Overflow - Search for 'Stack Overflow'
Can someone provide me with an optimize .htaccess configuration that handles compression, browser caching, proxy caching, etc. for a typical website? Aside from my visitors, I'm also trying to make Google PageSpeed happy. >>> More
Caching And Queing while caching is being populated?

as seen on Stack Overflow - Search for 'Stack Overflow'
whats the correct way to cache a object(DataSet) and then get it and when it expires repopulate it with out a major hiccup in the application... the application relies on this data to be there for read only purposes. >>> More

Related posts about cuda

CUDA Driver API vs. CUDA runtime

as seen on Stack Overflow - Search for 'Stack Overflow'
When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math): I assume the tradeoff between the two are increased performance for the low-evel API but at the cost of increased… >>> More
Updating a Cuda 4.0 project to Cuda 4.2

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a VS2010 project that was tested with CUDA 4.0, today I installed CUDA 4.2 and I want to update this project, the problem is that when I try to run the project it asks me for cudart32_40_17.dll, but since this is CUDA 4.2 I only have on my folders (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4… >>> More
How to solve CUDA crash when run CUDA example fluidsGL?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I use ubuntu 12.04 64 bits with GTX560Ti. I install CUDA by following instruction: wget http: //developer.download.nvidia.com/compute/cuda/4_2/rel/toolkit/cudatoolkit_4.2.9_lin ux_64_ubuntu11.04.run wget http: //developer.download.nvidia.com/compute/cuda/4_2/rel/drivers/devdriver_4… >>> More
Context migration in CUDA.NET

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm currently using CUDA.NET library by GASS. I need to initialize cuda arrays (actually cublas vectors, but it doesn't matters) in one CPU thread and use them in other CPU thread. But CUDA context which holding all initialized arrays and loaded functions, can be attached to only one CPU thread. There… >>> More
CUDA on GeForce 8600GT

as seen on Super User - Search for 'Super User'
I have got the cuda driver, toolkit and sdk installed in Ubuntu 10.04. I'm using nVidia Geforce 8600 GT card. Official website says my card is CUDA supported. But on running the deviceQuery that comes with the cuda sdk, I'm getting the following output. ./deviceQuery Starting... CUDA Device Query… >>> More