cpu cores - Page 181 - Developer IT

How do i write task? (parallel code)

- by acidzombie24

I am impressed with intel thread building blocks. I like how i should write task and not thread code and i like how it works under the hood with my limited understanding (task are in a pool, there wont be 100 threads on 4cores, a task is not guaranteed to run because it isnt on its own thread and may be far into the pool. But it may be run with another related task so you cant do bad things like typical thread unsafe code). I wanted to know more about writing task. I like the 'Task-based Multithreading - How to Program for 100 cores' video here http://www.gdcvault.com/sponsor.php?sponsor_id=1 (currently second last link. WARNING it isnt 'great'). My fav part was 'solving the maze is better done in parallel' which is around the 48min mark (you can click the link on the left side). However i like to see more code examples and some API of how to write task. Does anyone have a good resource? I have no idea how a class or pieces of code may look after pushing it onto a pool or how weird code may look when you need to make a copy of everything and how much of everything is pushed onto a pool.

Read the article

Using XCode and instruments to improve iPhone app performance

- by MrDatabase

I've been experimenting with Instruments off and on for a while and and I still can't do the following (with any sensible results): determine or estimate the average runtime of a function that's called many times. For example if I'm driving my gameLoop at 60 Hz with a CADisplayLink I'd like to see how long the loop takes to run on average... 10 ms? 30 ms etc. I've come close with the "CPU activity" instrument but the results are inconsistent or don't make sense. The time profiler seems promising but all I can get is "% of runtime"... and I'd like an actual runtime.

Read the article

Fastest inline-assembly spinlock

- by sigvardsen

I'm writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I have chosen to use spinlocks. I have done some research and speed testing on this and I found that most implementations are roughly equally fast: Microsofts CRITICAL_SECTION, with SpinCount set to 1000, scores about 140 time units Implementing this algorithm with Microsofts InterlockedCompareExchange scores about 95 time units Ive also tried to use some inline assembly with __asm {} using something like this code and it scores about 70 time units, but I am not sure that a proper memory barrier has been created. Edit: The times given here are the time it takes for 2 threads to lock and unlock the spinlock 1,000,000 times. I know this isn't a lot of difference but as a spinlock is a heavily used object, one would think that programmers would have agreed on the fastest possible way to make a spinlock. Googling it leads to many different approaches however. I would think this aforementioned method would be the fastest if implemented using inline assembly and using the instruction CMPXCHG8B instead of comparing 32bit registers. Furthermore memory barriers must be taken into account, this could be done by LOCK CMPXHG8B (I think?), which guarantees "exclusive rights" to the shared memory between cores. At last [some suggests] that for busy waits should be accompanied by NOP:REP that would enable Hyper-threading processors to switch to another thread, but I am not sure whether this is true or not? From my performance-test of different spinlocks, it is seen that there is not much difference, but for purely academic purpose I would like to know which one is fastest. However as I have extremely limited experience in the assembly-language and with memory barriers, I would be happy if someone could write the assembly code for the last example I provided with LOCK CMPXCHG8B and proper memory barriers in the following template: __asm { spin_lock: ;locking code. spin_unlock: ;unlocking code. }

Read the article

scala REPL is slow on vista

- by Jacques René Mesrine

I installed scala-2.8.0.RC3 by extracting the tgz file into my cygwin (vista) home directory. I made sure to set $PATH to scala-2.8.0.RC3/bin. I start the REPL by typing: $ scala Welcome to Scala version 2.8.0.RC3 (Java HotSpot(TM) Client VM, Java 1.6.0_20). Type in expressions to have them evaluated. Type :help for more information. scala> Now when I tried to enter an expression scala> 1 + 'a' the cursor hangs there without any response. Granted that I have chrome open with a million tabs and VLC playing in the background, but CPU utilization was 12% and virtual memory was about 75% utilized. What's going on ? Do I have to set the CLASSPATH or perform other steps.

Read the article

Optimal Sharing of heavy computation job using Snow and/or multicore

- by James

Hi, I have the following problem. First my environment, I have two 24-CPU servers to work with and one big job (resampling a large dataset) to share among them. I've setup multicore and (a socket) Snow cluster on each. As a high-level interface I'm using foreach. What is the optimal sharing of the job? Should I setup a Snow cluster using CPUs from both machines and split the job that way (i.e. use doSNOW for the foreach loop). Or should I use the two servers separately and use multicore on each server (i.e. split the job in two chunks, run them on each server and then stich it back together). Basically what is an easy way to: 1. Keep communication between servers down (since this is probably the slowest bit). 2. Ensure that the random numbers generated in the servers are not highly correlated.

Read the article

Recreation of DB using "mysql mydb < mydb.sql" is really slow when the table has tens of millions of

- by Jian Lin

It seems that a MySQL database that has a table with tens of millions of records will get a big INSERT INTO statement when the following mysqldump some_db > some_db.sql is done to back up the database. (is it 1 insert statement that handles all the records?) So when reconstructing the DB using mysql some_db < some_db.sql then the CPU is hardly busy (about 1.8% usage by the mysql process... I don't see a mysqld either?) and also the hard disk doesn't seem to be too busy... Last time, the whole restore process took 5 hours. Is there a way to make it faster? Such as, when doing mysqldump, can it break the INSERT statement into shorter ones, so that the mysql doesn't need to parse the line so hard when restoring the DB?

Read the article

jQuery.keypad Performance Issues

- by John Duff

I am working on a Kiosk Touch Screen application and using the JQuery.keypad plugin and noticing some major performance issues. If you click a number of buttons in rapid succession the CPU gets pegged, the button clicks don't keep up with the clicking and some button presses even get lost. On my dev machine this isn't as noticeable, but on the Kiosk itself with 1 gig of ram it's painful. Trying the demo keypad at http://keith-wood.name/keypad.html#inline the one with multiple targets (which is the case with mine) has the exact same issues. Does anyone have any suggestions on how we might be able to improve this? The Kiosk runs in Firefox only so something specific to that would work. I'm using v1.2.1 of jquery.keypad and just upgraded to v1.4.2 of jquery.

Read the article

PagedDataSource does not support serialization - how can I enforce this ?

- by Darkyo

Sounds like I want to override a physics law, but at least it is the most reasonnable solution, cpu / HDD and Ram effective for my asp.net project. In fact, I got a pageddataSource and a customDataReader that supports paginated data. The truth is my data are in a viewstate variable, because it is re-used in an update panel. When I intend to use it into my pageddatasource, asp.net 3.5 kills me with a System.Web.UI.WebControls.PagedDataSource' in Assembly 'System.Web, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' is not marked as serializable. cool exception... So I'd rather not offend newton because I know he'll always win, but I would need some help to enforce this pagedDataSource law, that seems so unbelievable, except if someone has an explanation.

Read the article

How to force programs out of swap file when a resources-intensive batch finishes?

- by sharptooth

We use employees' desktops for CPU-intensive simulation during the night. Desktops run Windows - usually Windows XP. Employees don't log off, they just lock the desktops, switch off their monitors and go. Every employee has a configuration file which he can edit to specify when he is most likely out of office. When that time comes a background program grabs data for simulation from the server, spawns worker processes, watches them, gets results and sends them to the server. When the time specified by the employee elapses simulation stops so that normal desktop usage is not interfered. The problem is that simuation consumes a lot of memory, so when the worker processes run they force other programs into the swap file. So when the employee comes all the programs he left are luggish and slow until he opens them one by one so that they are unswapped. Is there a way the program can force other programs out of swap file when it stops simulation so that they again run smoothly?

Read the article

Time with and without OpenMP

- by was

I have a question.. I tried to improve a well known program algorithm in C, FOX algorithm for matrix multiplication.. relative link without openMP: (http://web.mst.edu/~ercal/387/MPI/ppmpi_c/chap07/fox.c). The initial program had only MPI and I tried to insert openMP in the matrix multiplication method, in order to improve the time of computation: (This program runs in a cluster and computers have 2 cores, thus I created 2 threads.) The problem is that there is no difference of time, with and without openMP. I observed that using openMP sometimes, time is equivalent or greater than the time without openMP. I tried to multiply two 600x600 matrices. void Local_matrix_multiply( LOCAL_MATRIX_T* local_A /* in */, LOCAL_MATRIX_T* local_B /* in */, LOCAL_MATRIX_T* local_C /* out */) { int i, j, k; chunk = CHUNKSIZE; // 100 #pragma omp parallel shared(local_A, local_B, local_C, chunk, nthreads) private(i,j,k,tid) num_threads(2) { /* tid = omp_get_thread_num(); if(tid == 0){ nthreads = omp_get_num_threads(); printf("O Pollaplasiamos pinakwn ksekina me %d threads\n", nthreads); } printf("Thread %d use the matrix: \n", tid); */ #pragma omp for schedule(static, chunk) for (i = 0; i < Order(local_A); i++) for (j = 0; j < Order(local_A); j++) for (k = 0; k < Order(local_B); k++) Entry(local_C,i,j) = Entry(local_C,i,j) + Entry(local_A,i,k)*Entry(local_B,k,j); } //end pragma omp parallel } /* Local_matrix_multiply */

Read the article

For single-producer, single-consumer should I use a BlockingCollection or a ConcurrentQueue?

- by Jonathan Allen

For single-producer, single-consumer should I use a BlockingCollection or a ConcurrentQueue? Concerns: * My goal is to pull up to 100 items at a time and send them as a batch to the next step. * If I use a ConcurrentQueue, I have to manually cause it to go asleep when there is no work to be done. Otherwise I waste CPU cycles on spinning. * If I use a BlockingQueue and I only have 99 work items, it could indefinitely block until there the 100th item arrives. http://msdn.microsoft.com/en-us/library/system.collections.concurrent.aspx

Read the article

Timeout with GAE Java

- by user242153

Hi, I am having some issues with an app I have deployed on GAE. Specifically, I am intermittently running into the DeadlineExceededException where the server is not responding within the 30 seconds required. What is odd is that the code is not overly complex, it should run in milliseconds. My guess is that the delay is in dealing with the persistence manager and accessing the datastore. 2 questions: 1) What is the best way to track where all of the CPU time on the server is being used up? Log files do not seem helpful and to make things more complicated the code runs very fast when I am running it locally 2) Any tips / best practices in dealing with the 30 second exception? What are the biggest drivers of this? Datastore? HTTP requests / responses? Thanks

Read the article

Any ideas for developing a Risc Processor friendly string allocator?

- by Richard Fabian

I'm working on some tools to enable high throughput data-oriented development, and one thing that I've not got an immediate answer for is how you go about allocating strings quickly. On risc processors you've got another problem of implementation that the CPU doesn't like branching, which is what I'm trying to minimise or avoid. Also, cache coherence is important on most CPUs, so that's gotta be influential in the design too. So, how would you go about reducing the overhead for a generic string allocator? Sometimes it's easier to solve a more explicit problem, so any ideas for string sizes of 5-30?

Read the article

MySQL Locking Up

- by Ian

I've got a innodb table that gets a lot of reads and almost no writes (like, 1 write for every 400,000 reads approx). I'm running into a pretty big problem though when I do INSERT into the table. MySQL completely locks up. It uses 100% cpu, and every single other table (in other databases even) have their statuses set to "Locked" until the INSERT is done. This is a big problem because MySQL stays locked up for up to 4 minutes. I'm using version 5.1.47 (rpm from mysql.com). Any ideas?

Read the article

HTML Audio performance

- by user1888309

I'm working on HTML drum machine, and I`ve met some performance issues, rhythm start to break if BPM is higher than 110 but I'm expecting to make it work on BPM over 180. I guess that it can be related with format or codec of audio files, however it also maybe that my code is not very optimised (as I can see from JS CPU profiling it's not). So I'm expecting you guys give me some code review or some hints on optimisation. Although all similar projects I've found on internet didn't work good and maybe it's just restrictions of Audio API. By the way, it's very raw and sounds works only on Chrome under Mac OS, so any advise on audio encoding for web also would be great Project on Github pages Screenshot of Groove which breaks UPDATE Ok, I've found that I was encoding audio files incorrectly, after fixing that rhythm stopped breaking, and also it started working in Mozilla. But still there are issues on windows OS.

Read the article

Considering getting into reverse engineering/disassembly

- by Zombies

Assuming a decent understanding of assembly on common CPU architectures (eg: x86), how can one explore a potential path (career, fun and profit, etc) into the field of reverse engineering? There is so little educational guides out there so it is difficult to understand what potential uses this has today (eg: is searching for buffer overflow exploits still common, or do stack monitoring programs make this obselete?). I am not looking for any step by step program, just some relevant information such as tips on how to efficiently find a specific area of a program. Basic things in the trade. As well as what it is currently being used for today. So to recap, what current uses does reverse engineering yield today? And how can one find some basic information on how to learn the trade (again it doesn't have to be step-by-step, just anything which can through a clue would be helpful).

Read the article

C#: Efficiently search a large string for occurences of other strings

- by Jon

Hi, I'm using C# to continuously search for multiple string "keywords" within large strings, which are = 4kb. This code is constantly looping, and sleeps aren't cutting down CPU usage enough while maintaining a reasonable speed. The bog-down is the keyword matching method. I've found a few possibilities, and all of them give similar efficiency. 1) http://tomasp.net/articles/ahocorasick.aspx -I do not have enough keywords for this to be the most efficient algorithm. 2) Regex. Using an instance level, compiled regex. -Provides more functionality than I require, and not quite enough efficiency. 3) String.IndexOf. -I would need to do a "smart" version of this for it provide enough efficiency. Looping through each keyword and calling IndexOf doesn't cut it. Does anyone know of any algorithms or methods that I can use to attain my goal?

Read the article

Untrusted GPGPU code (OpenCL etc) - is it safe? What risks?

- by Grzegorz Wierzowiecki

There are many approaches when it goes about running untrusted code on typical CPU : sandboxes, fake-roots, virtualization... What about untrusted code for GPGPU (OpenCL,cuda or already compiled one) ? Assuming that memory on graphics card is cleared before running such third-party untrusted code, are there any security risks? What kind of risks? Any way to prevent them ? (Possible sandboxing on gpgpu or other technique?) P.S. I am more interested in gpu binary code level security rather than hight-level gpgpu programming language security (But those solutions are welcome as well). What I mean is that references to gpu opcodes (a.k.a machine code) are welcome.

Read the article

unroll nested for loops in C++

- by Hristo

How would I unroll the following nested loops? for(k = begin; k != end; ++k) { for(j = 0; j < Emax; ++j) { for(i = 0; i < N; ++i) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); } } } I tried the following, but my output isn't the same, and it should be: for(k = begin; k != end; ++k) { for(j = 0; j < Emax; ++j) { for(i = 0; i+4 < N; i+=4) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); array[k] += foo(i+1, tr[k][i+1], ex[j][i+1]); array[k] += foo(i+2, tr[k][i+2], ex[j][i+2]); array[k] += foo(i+3, tr[k][i+3], ex[j][i+3]); } if (i < N) { for (; i < N; ++i) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); } } } } I will be running this code in parallel using Intel's TBB so that it takes advantage of multiple cores. After this is finished running, another function prints out what is in array[] and right now, with my unrolling, the output isn't identical. Any help is appreciated. Thanks, Hristo

Read the article

Uploadify and Image Compression

- by Ilya Biryukov

Hi, I am using Uploadify on one of my client's web sites to allow them to upload a large amount of pictures at once to their photo gallery. I am seeing issues lately. They seem to upload large photographs (3 MB and above). I am wondering, is it possible to compress (reduce their size) on the client side, instead of doing it on the server (just like facebook does it). I know I could easily do it on the server, but I am working on another project right now, where I am expecting a large flow of photo uploads. It would require significant amount of CPU time to process them all. So I thought, I'd ask about the client side processing. Thanks.

Read the article

Does the number of busy worker threads in the CLR ThreadPool affect performance of I/O threads?

- by andrej351

We have a Windows Service which hosts a number of WCF services and, in an unrelated part of the app, makes extensive use of the TPL Task class to asynchronously do relatively short bits of work. It is my understanding that WCF uses managed I/O threads from the ThreadPool to execute requests. I noticed that after deploying a feature which significantly raised the applications use of Tasks, and as such the use of ThreadPool worker threads as well, performance of a couple of web services has become very slow. We're talking minutes instead of less than a second. The number of Tasks actually trying to run at any one time can range between 20 and 1000, which makes me think that any new (last in) work needing some CPU time could be forced to wait for quite some time. Does the (in my case extremely large) number of busy ThreadPool worker threads affect the ThreadPool's managed I/O threads? Or could these two be connected in any way? Thanks!

Read the article

How to profile a silverlight application?

- by rudigrobler

Is their any profilers that support Silverlight? I have tried ANTS (Version 3.1) without any success? Does version 4 support it? Any other products I can try? Updated since the release of Silverlight 4, it is now possible to do full profiling on SL applications... check out this article on the topic At PDC, I announced that Silverlight 4 came with the new CoreCLR capability of being profile-able by the VS2010 profilers: this means that for the first time, we give you the power to profile the managed and native code (user or platform) used by a Silverlight application. woohoo. kudos to the CLR team. Sidenote: From silverlight 1-3, one could only use things like xperf (see XPerf: A CPU Sampler for Silverlight) which is very powerful to see the layout/text/media/gfx/etc pipelines, but only gives the native callstack.) From SilverLite (PDC video, TechEd Iceland, VS2010, profiling, Silverlight 4)

Read the article

Trying to right click on code in VS2008 causes lockup.

- by Adam Haile

Working on a Win32 DLL using Visual Studio 2008 SP1 and, since yesterday, whenever I try to right click on code, to go to a variable definition for example, VS completely locks up and I have to manually kill the process. To make it even weirder, whenever this happens the devenv.exe process uses exactly 25% of the CPU. And I mean exactly, never 24%, never 26%, always 25% Also, I've run ProcMon to see if devenv is actually doing something, but it's doing absolutely nothing external of the process. No disk, network, registry access. Nothing. This is getting really aggravating because I have a large code base to deal with and the only other way of jumping to the definition is to first search for it. Has anyone run into a similar issue? And, better yet, know a fix?

Read the article

(Newbie) Amazon Web Services Apache Server

- by Samnsparky

Hello! I am trying to get a feel for the costs imposed by running apache on AWS continually. Assuming that the service is scarcely used, does anyone know how many cpu hours that would eat up in a month just by sitting there and running? I understand that this is slightly impractical but I am trying to figure out what the cost of entry is to deploy an application on this platform (as compared to GAE). I suspect it to be small but I would like to know. Thank you for your help, Sam

Read the article

What limits scaling in this simple OpenMP program?

- by Douglas B. Staple

I'm trying to understand limits to parallelization on a 48-core system (4xAMD Opteron 6348, 2.8 Ghz, 12 cores per CPU). I wrote this tiny OpenMP code to test the speedup in what I thought would be the best possible situation (the task is embarrassingly parallel): // Compile with: gcc scaling.c -std=c99 -fopenmp -O3 #include <stdio.h> #include <stdint.h> int main(){ const uint64_t umin=1; const uint64_t umax=10000000000LL; double sum=0.; #pragma omp parallel for reduction(+:sum) for(uint64_t u=umin; u<umax; u++) sum+=1./u/u; printf("%e\n", sum); } I was surprised to find that the scaling is highly nonlinear. It takes about 2.9s for the code to run with 48 threads, 3.1s with 36 threads, 3.7s with 24 threads, 4.9s with 12 threads, and 57s for the code to run with 1 thread. Unfortunately I have to say that there is one process running on the computer using 100% of one core, so that might be affecting it. It's not my process, so I can't end it to test the difference, but somehow I doubt that's making the difference between a 19~20x speedup and the ideal 48x speedup. To make sure it wasn't an OpenMP issue, I ran two copies of the program at the same time with 24 threads each (one with umin=1, umax=5000000000, and the other with umin=5000000000, umax=10000000000). In that case both copies of the program finish after 2.9s, so it's exactly the same as running 48 threads with a single instance of the program. What's preventing linear scaling with this simple program?

Search Results

Search found 5954 results on 239 pages for 'cpu cores'.

Page 181/239 | < Previous Page | 177 178 179 180 181 182 183 184 185 186 187 188 | Next Page >

- by acidzombie24

- by MrDatabase

- by sigvardsen

- by Jacques René Mesrine

- by James

- by Jian Lin

- by John Duff

- by Darkyo

- by sharptooth

- by was

- by Jonathan Allen

- by user242153

- by Richard Fabian

- by Ian

- by user1888309

- by Zombies

- by Jon

- by Grzegorz Wierzowiecki

- by Hristo

- by Ilya Biryukov

- by andrej351

- by rudigrobler

- by Adam Haile

- by Samnsparky

- by Douglas B. Staple

< Previous Page | 177 178 179 180 181 182 183 184 185 186 187 188 | Next Page >