Search Results

Search found 127 results on 6 pages for 'opencl'.

Page 2/6 | < Previous Page | 1 2 3 4 5 6  | Next Page >

  • openmp vs opencl for computer vision

    - by user1235711
    I am creating a computer vision application that detect objects via a web camera. I am currently focusing on the performance of the application My problem is in a part of the application that generates the XML cascade file using Haartraining file. This is very slow and takes about 6days . To get around this problem I decided to use multiprocessing, to minimize the total time to generate Haartraining XML file. I found two solutions: opencl and (openMp and openMPI ) . Now I'm confused about which one to use. I read that opencl is to use multiple cpu and GPU but on the same machine. Is that so? On the other hand OpenMP is for multi-processing and using openmpi we can use multiple CPUs over the network. But OpenMP has no GPU support. Can you please suggest the pros and cons of using either of the libraries.

    Read the article

  • Will Apple abandon OpenCL?

    - by John
    I am developing OpenCL applications, amongst others for MacOS. The new Macbook pro 13 inch comes with an Intel HD Graphics 3000 card so it seams reasonable to assume all their mainstream computers like Macbook and Mac mini will also come out with this Intel graphics card soon. OpenCL is not available for Intel graphics cards. Intel having a terrible reputation in developing graphics drivers and Apple knowing this makes me wonder Apple is abandoning OpenCL already again. Especially considering OpenCL should run anywhere, not only on high end systems. Developing applications only for the high end Macs with dedicated graphics hardware or for previous generation hardware with the Geforce 320M would not be a feasible option for me. Does anybody have any thoughts on this?

    Read the article

  • Will Apple abandon OpenCL?

    - by John
    I am developing OpenCL applications, amongst others for MacOS. The new Macbook pro 13 inch comes with an Intel HD Graphics 3000 card so it seams reasonable to assume all their mainstream computers like Macbook and Mac mini will also come out with this Intel graphics card soon. OpenCL is not available for Intel graphics cards. Intel having a terrible reputation in developing graphics drivers and Apple knowing this makes me wonder Apple is abandoning OpenCL already again. Especially considering OpenCL should run anywhere, not only on high end systems. Developing applications only for the high end Macs with dedicated graphics hardware or for previous generation hardware with the Geforce 320M would not be a feasible option for me. Does anybody have any thoughts on this?

    Read the article

  • Limit on number of kernel arguments in OpenCL

    - by Rakesh K
    Hi, I wanted to know if there is any limit on the number of arguments that are set to kernel function in OpenCL. I am getting the error as INVALID_ARG_INDEX while setting arguments. I am setting 9 arguments in the kernel function. Please help me in this regard. Thanks, Rakesh.

    Read the article

  • OpenCL: Strange buffer or image bahaviour with NVidia but not Amd

    - by Alex R.
    I have a big problem (on Linux): I create a buffer with defined data, then an OpenCL kernel takes this data and puts it into an image2d_t. When working on an AMD C50 (Fusion CPU/GPU) the program works as desired, but on my GeForce 9500 GT the given kernel computes the correct result very rarely. Sometimes the result is correct, but very often it is incorrect. Sometimes it depends on very strange changes like removing unused variable declarations or adding a newline. I realized that disabling the optimization will increase the probability to fail. I have the most actual display driver in both systems. Here is my reduced code: #include <CL/cl.h> #include <string> #include <iostream> #include <sstream> #include <cmath> void checkOpenCLErr(cl_int err, std::string name){ const char* errorString[] = { "CL_SUCCESS", "CL_DEVICE_NOT_FOUND", "CL_DEVICE_NOT_AVAILABLE", "CL_COMPILER_NOT_AVAILABLE", "CL_MEM_OBJECT_ALLOCATION_FAILURE", "CL_OUT_OF_RESOURCES", "CL_OUT_OF_HOST_MEMORY", "CL_PROFILING_INFO_NOT_AVAILABLE", "CL_MEM_COPY_OVERLAP", "CL_IMAGE_FORMAT_MISMATCH", "CL_IMAGE_FORMAT_NOT_SUPPORTED", "CL_BUILD_PROGRAM_FAILURE", "CL_MAP_FAILURE", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "CL_INVALID_VALUE", "CL_INVALID_DEVICE_TYPE", "CL_INVALID_PLATFORM", "CL_INVALID_DEVICE", "CL_INVALID_CONTEXT", "CL_INVALID_QUEUE_PROPERTIES", "CL_INVALID_COMMAND_QUEUE", "CL_INVALID_HOST_PTR", "CL_INVALID_MEM_OBJECT", "CL_INVALID_IMAGE_FORMAT_DESCRIPTOR", "CL_INVALID_IMAGE_SIZE", "CL_INVALID_SAMPLER", "CL_INVALID_BINARY", "CL_INVALID_BUILD_OPTIONS", "CL_INVALID_PROGRAM", "CL_INVALID_PROGRAM_EXECUTABLE", "CL_INVALID_KERNEL_NAME", "CL_INVALID_KERNEL_DEFINITION", "CL_INVALID_KERNEL", "CL_INVALID_ARG_INDEX", "CL_INVALID_ARG_VALUE", "CL_INVALID_ARG_SIZE", "CL_INVALID_KERNEL_ARGS", "CL_INVALID_WORK_DIMENSION", "CL_INVALID_WORK_GROUP_SIZE", "CL_INVALID_WORK_ITEM_SIZE", "CL_INVALID_GLOBAL_OFFSET", "CL_INVALID_EVENT_WAIT_LIST", "CL_INVALID_EVENT", "CL_INVALID_OPERATION", "CL_INVALID_GL_OBJECT", "CL_INVALID_BUFFER_SIZE", "CL_INVALID_MIP_LEVEL", "CL_INVALID_GLOBAL_WORK_SIZE", }; if (err != CL_SUCCESS) { std::stringstream str; str << errorString[-err] << " (" << err << ")"; throw std::string(name)+(str.str()); } } int main(){ try{ cl_context m_context; cl_platform_id* m_platforms; unsigned int m_numPlatforms; cl_command_queue m_queue; cl_device_id m_device; cl_int error = 0; // Used to handle error codes clGetPlatformIDs(0,NULL,&m_numPlatforms); m_platforms = new cl_platform_id[m_numPlatforms]; error = clGetPlatformIDs(m_numPlatforms,m_platforms,&m_numPlatforms); checkOpenCLErr(error, "getPlatformIDs"); // Device error = clGetDeviceIDs(m_platforms[0], CL_DEVICE_TYPE_GPU, 1, &m_device, NULL); checkOpenCLErr(error, "getDeviceIDs"); // Context cl_context_properties properties[] = { CL_CONTEXT_PLATFORM, (cl_context_properties)(m_platforms[0]), 0}; m_context = clCreateContextFromType(properties, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL); // m_private->m_context = clCreateContext(properties, 1, &m_private->m_device, NULL, NULL, &error); checkOpenCLErr(error, "Create context"); // Command-queue m_queue = clCreateCommandQueue(m_context, m_device, 0, &error); checkOpenCLErr(error, "Create command queue"); //Build program and kernel const char* source = "#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable\n" "\n" "__kernel void bufToImage(__global unsigned char* in, __write_only image2d_t out, const unsigned int offset_x, const unsigned int image_width , const unsigned int maxval ){\n" "\tint i = get_global_id(0);\n" "\tint j = get_global_id(1);\n" "\tint width = get_global_size(0);\n" "\tint height = get_global_size(1);\n" "\n" "\tint pos = j*image_width*3+(offset_x+i)*3;\n" "\tif( maxval < 256 ){\n" "\t\tfloat4 c = (float4)(in[pos],in[pos+1],in[pos+2],1.0f);\n" "\t\tc.x /= maxval;\n" "\t\tc.y /= maxval;\n" "\t\tc.z /= maxval;\n" "\t\twrite_imagef(out, (int2)(i,j), c);\n" "\t}else{\n" "\t\tfloat4 c = (float4)(255.0f*in[2*pos]+in[2*pos+1],255.0f*in[2*pos+2]+in[2*pos+3],255.0f*in[2*pos+4]+in[2*pos+5],1.0f);\n" "\t\tc.x /= maxval;\n" "\t\tc.y /= maxval;\n" "\t\tc.z /= maxval;\n" "\t\twrite_imagef(out, (int2)(i,j), c);\n" "\t}\n" "}\n" "\n" "__constant sampler_t imageSampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;\n" "\n" "__kernel void imageToBuf(__read_only image2d_t in, __global unsigned char* out, const unsigned int offset_x, const unsigned int image_width ){\n" "\tint i = get_global_id(0);\n" "\tint j = get_global_id(1);\n" "\tint pos = j*image_width*3+(offset_x+i)*3;\n" "\tfloat4 c = read_imagef(in, imageSampler, (int2)(i,j));\n" "\tif( c.x <= 1.0f && c.y <= 1.0f && c.z <= 1.0f ){\n" "\t\tout[pos] = c.x*255.0f;\n" "\t\tout[pos+1] = c.y*255.0f;\n" "\t\tout[pos+2] = c.z*255.0f;\n" "\t}else{\n" "\t\tout[pos] = 200.0f;\n" "\t\tout[pos+1] = 0.0f;\n" "\t\tout[pos+2] = 255.0f;\n" "\t}\n" "}\n"; cl_int err; cl_program prog = clCreateProgramWithSource(m_context,1,&source,NULL,&err); if( -err != CL_SUCCESS ) throw std::string("clCreateProgramWithSources"); err = clBuildProgram(prog,0,NULL,"-cl-opt-disable",NULL,NULL); if( -err != CL_SUCCESS ) throw std::string("clBuildProgram(fromSources)"); cl_kernel kernel = clCreateKernel(prog,"bufToImage",&err); checkOpenCLErr(err,"CreateKernel"); cl_uint imageWidth = 8; cl_uint imageHeight = 9; //Initialize datas cl_uint maxVal = 255; cl_uint offsetX = 0; int size = imageWidth*imageHeight*3; int resSize = imageWidth*imageHeight*4; cl_uchar* data = new cl_uchar[size]; cl_float* expectedData = new cl_float[resSize]; for( int i = 0,j=0; i < size; i++,j++ ){ data[i] = (cl_uchar)i; expectedData[j] = (cl_float)i/255.0f; if ( i%3 == 2 ){ j++; expectedData[j] = 1.0f; } } cl_mem inBuffer = clCreateBuffer(m_context,CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,size*sizeof(cl_uchar),data,&err); checkOpenCLErr(err, "clCreateBuffer()"); clFinish(m_queue); cl_image_format imgFormat; imgFormat.image_channel_order = CL_RGBA; imgFormat.image_channel_data_type = CL_FLOAT; cl_mem outImg = clCreateImage2D( m_context, CL_MEM_READ_WRITE, &imgFormat, imageWidth, imageHeight, 0, NULL, &err ); checkOpenCLErr(err,"get2DImage()"); clFinish(m_queue); size_t kernelRegion[]={imageWidth,imageHeight}; size_t kernelWorkgroup[]={1,1}; //Fill kernel with data clSetKernelArg(kernel,0,sizeof(cl_mem),&inBuffer); clSetKernelArg(kernel,1,sizeof(cl_mem),&outImg); clSetKernelArg(kernel,2,sizeof(cl_uint),&offsetX); clSetKernelArg(kernel,3,sizeof(cl_uint),&imageWidth); clSetKernelArg(kernel,4,sizeof(cl_uint),&maxVal); //Run kernel err = clEnqueueNDRangeKernel(m_queue,kernel,2,NULL,kernelRegion,kernelWorkgroup,0,NULL,NULL); checkOpenCLErr(err,"RunKernel"); clFinish(m_queue); //Check resulting data for validty cl_float* computedData = new cl_float[resSize];; size_t region[]={imageWidth,imageHeight,1}; const size_t offset[] = {0,0,0}; err = clEnqueueReadImage(m_queue,outImg,CL_TRUE,offset,region,0,0,computedData,0,NULL,NULL); checkOpenCLErr(err, "readDataFromImage()"); clFinish(m_queue); for( int i = 0; i < resSize; i++ ){ if( fabs(expectedData[i]-computedData[i])>0.1 ){ std::cout << "Expected: \n"; for( int j = 0; j < resSize; j++ ){ std::cout << expectedData[j] << " "; } std::cout << "\nComputed: \n"; std::cout << "\n"; for( int j = 0; j < resSize; j++ ){ std::cout << computedData[j] << " "; } std::cout << "\n"; throw std::string("Error, computed and expected data are not the same!\n"); } } }catch(std::string& e){ std::cout << "\nCaught an exception: " << e << "\n"; return 1; } std::cout << "Works fine\n"; return 0; } I also uploaded the source code for you to make it easier to test it: http://www.file-upload.net/download-3513797/strangeOpenCLError.cpp.html Please can you tell me if I've done wrong anything? Is there any mistake in the code or is this a bug in my driver? Best reagards, Alex

    Read the article

  • solve a classic map-reduce problem with opencl?

    - by liuliu
    I am trying to parallel a classic map-reduce problem (which can parallel well with MPI) with OpenCL, namely, the AMD implementation. But the result bothers me. Let me brief about the problem first. There are two type of data that flow into the system: the feature set (30 parameters for each) and the sample set (9000+ dimensions for each). It is a classic map-reduce problem in the sense that I need to calculate the score of every feature on every sample (Map). And then, sum up the overall score for every feature (Reduce). There are around 10k features and 30k samples. I tried different ways to solve the problem. First, I tried to decompose the problem by features. The problem is that the score calculation consists of random memory access (pick some of the 9000+ dimensions and do plus/subtraction calculations). Since I cannot coalesce memory access, it costs. Then, I tried to decompose the problem by samples. The problem is that to sum up overall score, all threads are competing for few score variables. It keeps overwriting the score which turns out to be incorrect. (I cannot carry out individual score first and sum up later because it requires 10k * 30k * 4 bytes). The first method I tried gives me the same performance on i7 860 CPU with 8 threads. However, I don't think the problem is unsolvable: it is remarkably similar to ray tracing problem (for which you carry out calculation that millions of rays against millions of triangles). Any ideas?

    Read the article

  • OpenCL - incremental summation during compute

    - by user1721997
    I'm absolutelly novice in OpenCL programming. For my app. (molecular simulaton) I wrote a kernel for calculate intermolecular potential of lennard-jones liquid. In this kernel I need to compute cumulative value of the potential of all particles with one: __kernel void Molsim(__global const float* inmatrix, __global float* fi, const int c, const float r1, const float r2, const float r3, const float rc, const float epsilon, const float sigma, const float h1, const float h23) { float fi0; float fi1; float d; unsigned int i = get_global_id(0); //number of particles (typically 2000) if(c!=i) { // potential before particle movement d=sqrt(pow((0.5*h1-fabs(0.5*h1-fabs(inmatrix[c*3]-inmatrix[i*3]))),2.0)+pow((0.5*h23-fabs(0.5*h23-fabs(inmatrix[c*3+1]-inmatrix[i*3+1]))),2.0)+pow((0.5*h23-fabs(0.5*h23-fabs(inmatrix[c*3+2]-inmatrix[i*3+2]))),2.0)); if(d<rc) { fi0=4.0*epsilon*(pow(sigma/d,12.0)-pow(sigma/d,6.0)); } else { fi0=0; } // potential after particle movement d=sqrt(pow((0.5*h1-fabs(0.5*h1-fabs(r1-inmatrix[i*3]))),2.0)+pow((0.5*h23-fabs(0.5*h23-fabs(r2-inmatrix[i*3+1]))),2.0)+pow((0.5*h23-fabs(0.5*h23-fabs(r3-inmatrix[i*3+2]))),2.0)); if(d<rc) { fi1=4.0*epsilon*(pow(sigma/d,12.0)-pow(sigma/d,6.0)); } else { fi1=0; } // cumulative difference of potentials fi[0]+=fi1-fi0; } } My problem is in the line: fi[0]+=fi1-fi0;. In the one-element vector fi[0] are wrong results. I read something about sum reduction, but I do not know how to do it during the calculation. Exist any simple solution of my problem?

    Read the article

  • OpenCL Matrix Multiplication - Getting wrong answer

    - by Yash
    here's a simple OpenCL Matrix Multiplication kernel which is driving me crazy: __kernel void matrixMul( __global int* C, __global int* A, __global int* B, int wA, int wB){ int row = get_global_id(1); //2D Threas ID x int col = get_global_id(0); //2D Threas ID y //Perform dot-product accumulated into value int value; for ( int k = 0; k < wA; k++ ){ value += A[row*wA + k] * B[k*wB+col]; } C[row*wA+col] = value; //Write to the device memory } Where (inputs) A = [72 45 75 61] B = [26 53 46 76] Output I am getting: C = [3942 7236 3312 5472] But the output should be: C = [3943 7236 4756 8611] The problem I am facing here is that for any dimension array the elements of the first row of the resulting matrix is correct. The elements of all the other rows of the resulting matrix is wrong. By the way I am using pyopencl. I don't know what I mistake I am doing here. I have spent the entire day with no luck. Please help me with this

    Read the article

  • OpenCL: does it play well with OpenMP, can I connect other languages to it, etc.

    - by Cem Karan
    The 1.0 spec for OpenCL just came out a few days ago (Spec is here) and I've just started to read through it. I want to know if it plays well with other high performance multiprocessing APIs like OpenMP (spec) and I want to know what I should learn. So, here are my basic questions: If I am already using OpenMP, will that break OpenCL or vice-versa? Is OpenCL more powerful than OpenMP? Or are they intended to be complementary? Is there a standard way of connecting an OpenCL program to a standard C99 program (or any other language)? What is it? Does anyone know if anyone is writing an OpenCL book? I'm reading the spec, but I've found books to be more helpful.

    Read the article

  • OpenCL or CUDA Which way to go?

    - by holydiver
    I'm investigating ways of using GPU in order to process streaming data. I had two choices but couldn't decide which way to go? My criterias are as below: Ease of use.(good API) Community and Documentation. Performance Future I'll code in C and C++.

    Read the article

  • Using OpenCl to jiggle the Pipe

    - by TOAOGG
    I've got the Idea to use OpenCL to program a simple Renderer. A clear contra is, that this approach won't benefit from the hardware as the functions on the device (I think). Would it be useful to do this in OpenCL..lets say we want to Cull as early as possible so we won't have many per vertex operations. Is it correct, that Culling is done after the Vertex-Shader? For static-vertecies who won't get effected by the shader it could be interesting to cull them before. Another idea would be an deferred renderer. So the main question is: Would it make sense to program a renderer in OpenCL (aside the effort)? The resulting picture would be drawn in OpenGL.

    Read the article

  • The right way to setup VisualStudio 2010 for OpenCL

    - by LonliLokli
    what is the right way to setup VisualStuio 2010 for working with *.cl files? I have added *.cl under Tool/Text editor/File extensions and copied usertype.dat into the common7/ide folder, but VS underlines keywords like float4 or cross. Is it necessary to add some key in registry or can somebody propose a tutorial? Thanks in advance. PS i have already asked similar question old one question, but now i am looking explicit for a solution with vs2010. It is not bad, but really nerves and deflects me from programming tasks.

    Read the article

  • Simple OpenCL Kernel

    - by Yuuta
    I'm trying to write a kernel which is a little long but the output is not correct. I took out a lot almost everything and finally narrowed down the an initialization problem and I found that the following works: __kernel void k_sIntegral(__global const float* eData,__global const unsigned long* elements,__global float* area){ const int i = get_global_id(0); if(i < elements[0]){ area[i] = i; } } But the following does not work: __kernel void k_sIntegral(__global const float* eData,__global const unsigned long* elements,__global float* area){ const int i = get_global_id(0); if(i < elements[0]){ __local float a,b,c,j,k,h,s; area[i] = i; } } Using the first kernel, I get: area[1] = 1 Using the second kernel, I get: area[1] = 0 (from calloc) Update: It seems like the code does work, but I need to change the function name otherwise it somehow calls the previous function even though it was not compiled (?). Any leads to why that happens? If anyone can let me know what might be the problem I'll be really grateful, thanks in advance!

    Read the article

  • Sortie des spécifications d'OpenCL 1.2 : séparation compilation/linkage, partitionnement et support de nouveaux types de périphériques

    Sortie des spécifications d'OpenCL 1.2 Séparation compilation/linkage, partitionnement et support de nouveaux types de périphérique Le groupe Khronos vient de ratifier et publier les spécifications d'OpenCL 1.2 (Open Computing Language), l'API et extension standardisée du langage C pour supporter le développement sur GPU et la programmation parallèle distribuée sur plusieurs types de processeurs compatibles. Parmi les nouveautés de cette version, citons : Le partitionnement des périphériques permet de diviser un périphérique en plusieurs sous-périphériques pour contrôler directement les tâches assignées à chaque unité de calcul ; Séparation de la compilation et ...

    Read the article

  • OpenCL 1.1 backward compatible, enhanced performance

    <b>Linux Magazine: </b>"The Khronos Group today announced OpenCL 1.1, a backwards compatible update that boosts performance in the parallel programming standard. OpenCL is a free programming standard designed from the ground up to optimize coding in muliticore processors."

    Read the article

  • I want to run the examples of QtOpenCl, i.e. Qt in OpenCl. Installation and setup help?

    - by Skkard
    The thing is I have to run the OpenCl examples, as given here:http://labs.trolltech.com/blogs/2010/04/07/using-opencl-with-qt/. The problem is that I have no clue where to start. I downloaded the source for QtOpenCl but it needs a valid OpenCl installation. I have Qt installed already. How do I install OpenCl? I don't have a GPU at home unfortunately, and need to implement it on my CPU for now. I have to later give a presentation where I will be supplied a system with a GPU. How do I go about installing OpenCl? Thanks you.

    Read the article

  • My OpenCL kernel is slower on faster hardware.. But why?

    - by matdumsa
    Hi folks, As I was finishing coding my project for a multicore programming class I came up upon something really weird I wanted to discuss with you. We were asked to create any program that would show significant improvement in being programmed for a multi-core platform. I’ve decided to try and code something on the GPU to try out OpenCL. I’ve chosen the matrix convolution problem since I’m quite familiar with it (I’ve parallelized it before with open_mpi with great speedup for large images). So here it is, I select a large GIF file (2.5 MB) [2816X2112] and I run the sequential version (original code) and I get an average of 15.3 seconds. I then run the new OpenCL version I just wrote on my MBP integrated GeForce 9400M and I get timings of 1.26s in average.. So far so good, it’s a speedup of 12X!! But now I go in my energy saver panel to turn on the “Graphic Performance Mode” That mode turns off the GeForce 9400M and turns on the Geforce 9600M GT my system has. Apple says this card is twice as fast as the integrated one. Guess what, my timing using the kick-ass graphic card are 3.2 seconds in average… My 9600M GT seems to be more than two times slower than the 9400M.. For those of you that are OpenCL inclined, I copy all data to remote buffers before starting, so the actual computation doesn’t require roundtrip to main ram. Also, I let OpenCL determine the optimal local-worksize as I’ve read they’ve done a pretty good implementation at figuring that parameter out.. Anyone has a clue? edit: full source code with makefiles here http://www.mathieusavard.info/convolution.zip cd gimage make cd ../clconvolute make put a large input.gif in clconvolute and run it to see results

    Read the article

  • How do I test OpenCL on GPU when logged in remotely on Mac?

    - by Christopher Bruns
    My OpenCL program can find the GPU device when I am logged in at the console, but not when I am logged in remotely with ssh. Further, if I run the program as root in the ssh session, the program can find the GPU. The computer is a Snow Leopard Mac with a GeForce 9400 GPU. If I run the program (see below) from the console or as root, the output is as follows (notice the "GeForce 9400" line): 2 devices found Device #0 name = GeForce 9400 Device #1 name = Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz but if it is just me, over ssh, there is no GeForce 9400 entry: 1 devices found Device #0 name = Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz I would like to test my code on the GPU without having to be root. Is that possible? Simplified GPU finding program below: #include <stdio.h> #include <OpenCL/opencl.h> int main(int argc, char** argv) { char dname[500]; size_t namesize; cl_device_id devices[10]; cl_uint num_devices; int d; clGetDeviceIDs(0, CL_DEVICE_TYPE_ALL, 10, devices, &num_devices); printf("%d devices found\n", num_devices); for (d = 0; d < num_devices; ++d) { clGetDeviceInfo(devices[d], CL_DEVICE_NAME, 500, dname, &namesize); printf("Device #%d name = %s\n", d, dname); } return 0; } EDIT: I found essentially the same question being asked on nvidia's forums. Unfortunately, the only answer was of the form "this is the wrong forum".

    Read the article

  • How do I ensure GUI responsiveness when using OpenCL on the display GPU?

    - by pauldoo
    In my relatively short time learning OpenCL I frequently see my application cause the operating system UI to become significantly less responsive (several seconds for a window to respond to a drag for example). I have encountered this problem on Windows Vista and Mac OS X both with NVidia GPUs. What can I do when using OpenCL on the same GPU as the display to ensure that my application does not significantly degrade the UI responsiveness like this? Also, can this be done without taking needless performance losses within my application? (Ie, if the user is not doing some UI intensive task then I would not expect my application to run any slower than it does now.) I understand that any answers will be very platform specific (where platform includes OS/GPU/driver combo).

    Read the article

  • C++11: thread_local or array of OpenCL 1.2 cl_kernel objects?

    - by user926918
    I need to run several C++11 threads (GCC 4.7.1) parallely in host. Each of them needs to use a device, say a GPU. As per OpenCL 1.2 spec (p. 357): All OpenCL API calls are thread-safe75 except clSetKernelArg. clSetKernelArg is safe to call from any host thread, and is safe to call re-entrantly so long as concurrent calls operate on different cl_kernel objects. However, the behavior of the cl_kernel object is undefined if clSetKernelArg is called from multiple host threads on the same cl_kernel object at the same time. An elegant way would be to use thread_local cl_kernel objects and the other way I can think of is to use an array of these objects such that i'th thread uses i'th object. As I have not implemented these earlier I was wondering if any of the two are good or are there better ways of getting things done. TIA, S

    Read the article

  • OpenCL maintenant disponible en version 1.1 : rajoute, entre autre, une meilleure interopérabilité a

    Le Khronos Group a publié, hier, la première mise à jour de leurs spécifications concernant OpenCL. Après 18 mois, voici les principaux changements au menu : -une amélioration de la parallélisation; -un wrapper pour le C++; -certaines fonctionnalités optionnelles sont désormais standardisées; Plus d'informations sur : http://www.khronos.org/news/press/re...uting-standard Que pensez-vous de ces changements? De quelles manières l'utilisez-vous?...

    Read the article

< Previous Page | 1 2 3 4 5 6  | Next Page >