Search Results

Search found 3521 results on 141 pages for 'parallel computing'.

Page 41/141 | < Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48  | Next Page >

  • How to synchronize cuda threads when they are in the same loop and we need to synchronize them to ex

    - by Vickey
    Hi all, I have written a code and Now I want to implement this on cuda GPU but I'm new to synchronization so please help me with this, It's little urgent to me. Below I'm presenting the code and I want to that LOOP1 to be executed by all threads (heance I want to this portion to take advantage of cuda and the remaining portion (the portion other from the LOOP1) is to be executed by only a single thread. do{ point_set = master_Q[(*num_mas) - 1].q; List* temp = point_set; List* pa = point_set; if(master_Q[num_mas[0] - 1].max) max_level = (int) (ceilf(il2 * log(master_Q[num_mas[0] - 1].max))); *num_mas = (*num_mas) - 1; while(point_set){ List* insert_ele = temp; while(temp){ insert_ele = temp; if((insert_ele->dist[insert_ele->dist_index-1] <= pow(2, max_level-1)) || (top_level == max_level)){ if(point_set == temp){ point_set = temp->next; pa = temp->next; } else{ pa->next = temp->next; } temp = NULL; List* new_point_set = point_set; float maximum_dist = 0; if(parent->p_index != insert_ele->point_index){ List* tmp = new_point_set; float *b = &(data[(insert_ele->point_index)*point_len]); **LOOP 1:** while(tmp){ float *c = &(data[(tmp->point_index)*point_len]); float sum = 0.; for(int j = 0; j < point_len; j+=2){ float d1 = b[j] - c[j]; float d2 = b[j+1] - c[j+1]; d1 *= d1; d2 *= d2; sum = sum + d1 + d2; } tmp->dist[tmp->dist_index] = sqrt(sum); if(maximum_dist < tmp->dist[tmp->dist_index]) maximum_dist = tmp->dist[tmp->dist_index]; tmp->dist_index = tmp->dist_index+1; tmp = tmp->next; } max_distance = maximum_dist; } while(new_point_set || insert_ele){ List* far, *par, *tmp, *tmp_new; far = NULL; tmp = new_point_set; tmp_new = NULL; float level_dist = pow(2, max_level-1); float maxdist = 0, maxp = 0; while(tmp){ if(tmp->dist[(tmp->dist_index)-1] > level_dist){ if(maxdist < tmp->dist[tmp->dist_index-1]) maxdist = tmp->dist[tmp->dist_index-1]; if(tmp == new_point_set){ new_point_set = tmp->next; par = tmp->next; } else{ par->next = tmp->next; } if(far == NULL){ far = tmp; tmp_new = far; } else{ tmp_new->next = tmp; tmp_new = tmp; } if(parent->p_index != insert_ele->point_index) tmp->dist_index = tmp->dist_index - 1; tmp = tmp->next; tmp_new->next = NULL; } else{ par = tmp; if(maxp < tmp->dist[(tmp->dist_index)-1]) maxp = tmp->dist[(tmp->dist_index)-1]; tmp = tmp->next; } } if(0 == maxp){ tmp = new_point_set; aloc_mem[*tree_index].p_index = insert_ele->point_index; aloc_mem[*tree_index].no_child = 0; aloc_mem[*tree_index].level = max_level--; parent->children_index[parent->no_child++] = *tree_index; parent = &(aloc_mem[*tree_index]); tree_index[0] = tree_index[0]+1; while(tmp){ aloc_mem[*tree_index].p_index = tmp->point_index; aloc_mem[(*tree_index)].no_child = 0; aloc_mem[(*tree_index)].level = master_Q[(*cur_count_Q)-1].level; parent->children_index[parent->no_child] = *tree_index; parent->no_child = parent->no_child + 1; (*tree_index)++; tmp = tmp->next; } cur_count_Q[0] = cur_count_Q[0]-1; new_point_set = NULL; } master_Q[*num_mas].q = far; master_Q[*num_mas].parent = parent; master_Q[*num_mas].valid = true; master_Q[*num_mas].max = maxdist; master_Q[*num_mas].level = max_level; num_mas[0] = num_mas[0]+1; if(0 != maxp){ aloc_mem[*tree_index].p_index = insert_ele->point_index; aloc_mem[*tree_index].no_child = 0; aloc_mem[*tree_index].level = max_level; parent->children_index[parent->no_child++] = *tree_index; parent = &(aloc_mem[*tree_index]); tree_index[0] = tree_index[0]+1; if(maxp){ int new_level = ((int) (ceilf(il2 * log(maxp)))) +1; if (new_level < (max_level-1)) max_level = new_level; else max_level--; } else max_level--; } if( 0 == maxp ) insert_ele = NULL; } } else{ if(NULL == temp->next){ master_Q[*num_mas].q = point_set; master_Q[*num_mas].parent = parent; master_Q[*num_mas].valid = true; master_Q[*num_mas].level = max_level; num_mas[0] = num_mas[0]+1; } pa = temp; temp = temp->next; } } if((*num_mas) > 1){ List *temp2 = master_Q[(*num_mas)-1].q; while(temp2){ List* temp3 = master_Q[(*num_mas)-2].q; master_Q[(*num_mas)-2].q = temp2; if((master_Q[(*num_mas)-1].parent)->p_index != (master_Q[(*num_mas)-2].parent)->p_index){ temp2->dist_index = temp2->dist_index - 1; } temp2 = temp2->next; master_Q[(*num_mas)-2].q->next = temp3; } num_mas[0] = num_mas[0]-1; } point_set = master_Q[(*num_mas)-1].q; temp = point_set; pa = point_set; parent = master_Q[(*num_mas)-1].parent; max_level = master_Q[(*num_mas)-1].level; if(master_Q[(*num_mas)-1].max) if( max_level > ((int) (ceilf(il2 * log(master_Q[(*num_mas)-1].max)))) +1) max_level = ((int) (ceilf(il2 * log(master_Q[(*num_mas)-1].max)))) +1; num_mas[0] = num_mas[0]-1; } }while(*num_mas > 0);

    Read the article

  • MongoDB: What's the point of using MapReduce without parallelism?

    - by netvope
    Quoting http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Parallelism As of right now, MapReduce jobs on a single mongod process are single threaded Without parallelism, what are the benefits of MapReduce compared to simpler or more traditional methods for queries and data aggregation? To avoid confusion: the question is NOT "what are the benefits of document-oriented DB over traditional relational DB"

    Read the article

  • How can one manage to fully use the newly enhanced Parallelism features in .NET 4.0?

    - by Will Marcouiller
    I am pretty much interested into using the newly enhanced Parallelism features in .NET 4.0. I have also seen some possibilities of using it in F#, as much as in C#. Despite, I can only see what PLINQ has to offer with, for example, the following: var query = from c in Customers.AsParallel() where (c.Name.Contains("customerNameLike") select c; There must for sure be some other use of this parallelism thing. Have you any other examples of using it? Is this particularly turned toward PLINQ, or are there other usage as easy as PLINQ? Thanks! =)

    Read the article

  • Expanding Git SHA1 information into a checkin without archiving?

    - by Tim Lin
    Is there a way to include git commit hashes inside a file everytime I commit? I can only find out how to do this during archiving but I haven't been able to find out how to do this for every commit. I'm doing scientific programming with git as revision control, so this kind of functionality would be very helpful for reproducibility reasons (i.e., have the git hash automatically included in all result files and figures).

    Read the article

  • What should i do for accomodating large scale data storage and retrieval?

    - by kailashbuki
    There's two columns in the table inside mysql database. First column contains the fingerprint while the second one contains the list of documents which have that fingerprint. It's much like an inverted index built by search engines. An instance of a record inside the table is shown below; 34 "doc1, doc2, doc45" The number of fingerprints is very large(can range up to trillions). There are basically following operations in the database: inserting/updating the record & retrieving the record accoring to the match in fingerprint. The table definition python snippet is: self.cursor.execute("CREATE TABLE IF NOT EXISTS `fingerprint` (fp BIGINT, documents TEXT)") And the snippet for insert/update operation is: if self.cursor.execute("UPDATE `fingerprint` SET documents=CONCAT(documents,%s) WHERE fp=%s",(","+newDocId, thisFP))== 0L: self.cursor.execute("INSERT INTO `fingerprint` VALUES (%s, %s)", (thisFP,newDocId)) The only bottleneck i have observed so far is the query time in mysql. My whole application is web based. So time is a critical factor. I have also thought of using cassandra but have less knowledge of it. Please suggest me a better way to tackle this problem.

    Read the article

  • How to printf a time_t variable as a floating point number?

    - by soneangel
    Hi guys, I'm using a time_t variable in C (openMP enviroment) to keep cpu execution time...I define a float value sum_tot_time to sum time for all cpu's...I mean sum_tot_time is the sum of cpu's time_t values. The problem is that printing the value sum_tot_time it appear as an integer or long, by the way without its decimal part! I tried in these ways: to printf sum_tot_time as a double being a double value to printf sum_tot_time as float being a float value to printf sum_tot_time as double being a time_t value to printf sum_tot_time as float being a time_t value Please help me!!

    Read the article

  • SDCC and malloc() - allocating much less memory than is available

    - by Duncan Bayne
    When I run compile this code with SDCC 3.1.0, and run it on an Amstrad CPC 464 (under emulation, with WinCPC 0.9.26 running on Wine): void _test_malloc() { long idx = 0; while (1) { if (malloc(5)) { printf("%ld\r\n", ++idx); } else { printf("done"); break; } } } ... it consistently taps out at 92 malloc()s. I make that 460 bytes, which leads me to a couple of questions: What is malloc() doing on this system? I was sort of hoping for an order of magnitude more storage even on a 64kB system The behaviour is consistent on 64kB systems and 128kB systems; do I have to perform some sort of magic to access the additional memory, like manual bank switching?

    Read the article

  • Are batch mutations atomic in Cassandra?

    - by user317459
    The Cassandra API supports batch mutations: batch_mutate(keyspace, mutation_map, consistency_level): Executes the specified mutations on the keyspace. mutation_map is a map; the outer map maps the key to the inner map, which maps the column family to the Mutation; can be read as: map. To be more specific, the outer map key is a row key, the inner map key is the column family name. A Mutation specifies either columns to insert or columns to delete. See Mutation and Deletion above for more details. Are all mutations that are executed in a batch executed atomically? So if one of the mutations fails, do the others fail too?

    Read the article

  • Preallocate memory for a program in Linux before it gets started

    - by Fyg
    Hi, folks, I have a program that repeatedly solves large systems of linear equations using cholesky decomposition. Characterising is that I sometimes need to store the complete factorisation which can exceed about 20 GB of memory. The factorisation happens inside a library that I call. Furthermore, this matrix and the resulting factorisation changes quite frequently and as such the memory requirements as well. I am not the only person to use this compute-node. Therefore, is there a way to start the program under Linux and preallocate free memory for the process? Something like: $: prealloc -m 25G ./program

    Read the article

  • What are the advantages / disadvantages of a Cloud-based / Web-based IDE?

    - by Gabe
    I'm writing this as DevConnections in Las Vegas is happening. Visual Studio 2010 has been released and I now have this 3GB beast installed to my machine. (I'll admit, it has some nice features.) However, while the install was monopolizing my computer's resources I began to wish that my IDE worked more like Google Documents (instantly available, available anywhere, easy to share, easy to collaborate, naturally versioned). A few Google (and StackOverflow) searches led me to : Coderun Bespin I'm well aware that these IDE's are missing a lot of what exists in VS 2010. However, that isn't my question. Instead, I'm wondering what benefits a web-based IDE might have? Assuming a company invests the time to create the missing features, what is the downside?

    Read the article

  • Efficiency of manually written loops vs operator overloads (C++)

    - by Sagekilla
    Hi all, in the program I'm working on I have 3-element arrays, which I use as mathematical vectors for all intents and purposes. Through the course of writing my code, I was tempted to just roll my own Vector class with simple +, -, *, /, etc overloads so I can simplify statements like: for (int i = 0; i < 3; i++) r[i] = r1[i] - r2[i]; // becomes: r = r1 - r2; Which should be more or less identical in generated code. But when it comes to more complicated things, could this really impact my performance heavily? One example that I have in my code is this: Manually written version: for (int j = 0; j < 3; j++) { p.vel[j] = p.oldVel[j] + (p.oldAcc[j] + p.acc[j]) * dt2 + (p.oldJerk[j] - p.jerk[j]) * dt12; p.pos[j] = p.oldPos[j] + (p.oldVel[j] + p.vel[j]) * dt2 + (p.oldAcc[j] - p.acc[j]) * dt12; } Using a Vector class with operator overloads: p.vel = p.oldVel + (p.oldAcc + p.acc) * dt2 + (p.oldJerk - p.jerk) * dt12; p.pos = p.oldPos + (p.oldVel + p.vel) * dt2 + (p.oldAcc - p.acc) * dt12; I am compiling my code for maximum possible speed, as it's extremely important that this code runs quickly and calculates accurately. So will me relying on my Vector's for these sorts of things really affect me? For those curious, this is part of some numerical integration code which is not trivial to run in my program. Any insight would be appreciated, as would any idioms or tricks I'm unaware of.

    Read the article

  • Erlang message loops

    - by Roger Alsing
    How does message loops in erlang work, are they sync when it comes to processing messages? As far as I understand, the loop will start by "receive"ing a message and then perform something and hit another iteration of the loop. So that has to be sync? right? If multiple clients send messages to the same message loop, then all those messages are queued and performed one after another, or? To process multiple messages in parallell, you would have to spawn multiple message loops in different processes, right? Or did I misunderstand all of it?

    Read the article

  • Subdomain forwarding using .htaccess

    - by RJ
    I want to redirect a praticular subdomain to the main domain http(s)://dl.example.com/par1/par2 to http(s)://www.example.com/par1/par2 How to achieve the above using .htaccess Why i want to do this: Whenever any user download a file from my server, if the file is huge , then user cannot do any other operation until the file is downloaded completely...so the solution that i have thought is to forward the download request through subdomain so that the browser may continue with rest of the operation. Thanks

    Read the article

  • help me understand cuda

    - by scatman
    i am having some troubles understanding threads in NVIDIA gpu architecture with cuda. please could anybody clarify these info: an 8800 gpu has 16 SMs with 8 SPs each. so we have 128 SPs. i was viewing stanford's video presentation and it was saying that every SP is capable of running 96 threads cuncurrently. does this mean that it (SP) can run 96/32=3 warps concurrently? moreover, since every SP can run 96 threads and we have 8 SPs in every SM. does this mean that every SM can run 96*8=768 threads concurrently?? but if every SM can run a single Block at a time, and the maximum number of threads in a block is 512, so what is the purpose of running 768 threads concurrently and have a max of 512 threads? a more general question is:how are blocks,threads,and warps distributed to SMs and SPs? i read that every SM gets a single block to execute at a time and threads in a block is divided into warps (32 threads), and SPs execute warps.

    Read the article

  • Cloud Apps and Single Sign-On (AD integration)

    - by Pablo Alvim
    I've been investigating some cloud vendors and the ability to implement single sign-on with them, especially when it comes to AD (Active Directory) integration. So far I've learned that with Azure this is possible through ADFS and the AppFabric Access Control offer. In AWS, since it is possible to create a VPN and see EC2 instances as a natural extension of a private datacenter, I believe implementing SSO would be rather simple (not sure if I'm right on this one... Please correct me if I'm wrong). With App Engine though, even though there is some documentation on AD synchronization (not full integration) for Google Apps, I'm struggling to find out whether AD integration would be possible... Is there any strategy for that? Any bit of information on cloud apps and AD integration will be appreciated!

    Read the article

  • Higher speed options for executing very large (20 GB) .sql file in MySQL

    - by Jonogan
    My firm was delivered a 20+ GB .sql file in reponse to a request for data from the gov't. I don't have many options for getting the data in a different format, so I need options for how to import it in a reasonable amount of time. I'm running it on a high end server (Win 2008 64bit, MySQL 5.1) using Navicat's batch execution tool. It's been running for 14 hours and shows no signs of being near completion. Does anyone know of any higher speed options for such a transaction? Or is this what I should expect given the large file size? Thanks

    Read the article

  • Problem with Boost::Asio for C++

    - by Martin Lauridsen
    Hi there, For my bachelors thesis, I am implementing a distributed version of an algorithm for factoring large integers (finding the prime factorisation). This has applications in e.g. security of the RSA cryptosystem. My vision is, that clients (linux or windows) will download an application and compute some numbers (these are independant, thus suited for parallelization). The numbers (not found very often), will be sent to a master server, to collect these numbers. Once enough numbers have been collected by the master server, it will do the rest of the computation, which cannot be easily parallelized. Anyhow, to the technicalities. I was thinking to use Boost::Asio to do a socket client/server implementation, for the clients communication with the master server. Since I want to compile for both linux and windows, I thought windows would be as good a place to start as any. So I downloaded the Boost library and compiled it, as it said on the Boost Getting Started page: bootstrap .\bjam It all compiled just fine. Then I try to compile one of the tutorial examples, client.cpp, from Asio, found (here.. edit: cant post link because of restrictions). I am using the Visual C++ compiler from Microsoft Visual Studio 2008, like this: cl /EHsc /I D:\Downloads\boost_1_42_0 client.cpp But I get this error: /out:client.exe client.obj LINK : fatal error LNK1104: cannot open file 'libboost_system-vc90-mt-s-1_42.lib' Anyone have any idea what could be wrong, or how I could move forward? I have been trying pretty much all week, to get a simple client/server socket program for c++ working, but with no luck. Serious frustration kicking in. Thank you in advance.

    Read the article

  • Is a Service Bus a good option to communicate with multiple external servers?

    - by PFreitas
    We are developing an application that communicates, via web services or TCP/IP sockets, with multiple servers (up to 50 different external companies). Basically, the exchanged messages are the same (XML), but depending on the inputs of our application we should call 1 or more external servers. The benefits we would expect of introducing a Service Bus in the architecture would be: 1- Remove the need to manage all point-to-point configurations (all the 50 endpoints); 2- Simplify the communication layer of our application by having only one server to talk to; Is a Service Bus a good architectural option for this scenario? What is the best (simplest) Service Bus for this kind of communication? I read a few MSDN articles on Azure Service Bus Relay, but it didn’t seem to fit our needs. Am I wrong? Thanks for your help.

    Read the article

  • MPI Large Data all to all transfer

    - by csslayer
    My application of MPI has some process that generate some large data. Say we have N+1 process (one for master control, others are workers), each of worker processes generate large data, which is now simply write to normal file, named file1, file2, ..., fileN. The size of each file may be quite different. Now I need to send all fileM to rank M process to do the next job, So it's just like all to all data transfer. My problem is how should I use MPI API to send these files efficiently? I used to use windows share folder to transfer these before, but I think it's not a good idea. I have think about MPI_file and MPI_All_to_all, but these functions seems not to be so suitable for my case. Simple MPI_Send and MPI_Recv seems hard to be used because every process need to transfer large data, and I don't want to use distributed file system for now.

    Read the article

  • Improve performance writing 10 million records to text file using windows service

    - by user1039583
    I'm fetching more than 10 millions of records from database and writing to a text file. It takes hours of time to complete this operation. Is there any option to use TPL features here? It would be great if someone could get me started implementing this with the TPL. using (FileStream fStream = new FileStream("d:\\file.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite)) { BufferedStream bStream = new BufferedStream(fStream); TextWriter writer = new StreamWriter(bStream); for (int i = 0; i < 100000000; i++) { writer.WriteLine(i); } bStream.Flush(); writer.Flush(); // empty buffer; fStream.Flush(); }

    Read the article

< Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48  | Next Page >