Search Results

Search found 5954 results on 239 pages for 'cpu cores'.

Page 181/239 | < Previous Page | 177 178 179 180 181 182 183 184 185 186 187 188  | Next Page >

  • Fastest inline-assembly spinlock

    - by sigvardsen
    I'm writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I have chosen to use spinlocks. I have done some research and speed testing on this and I found that most implementations are roughly equally fast: Microsofts CRITICAL_SECTION, with SpinCount set to 1000, scores about 140 time units Implementing this algorithm with Microsofts InterlockedCompareExchange scores about 95 time units Ive also tried to use some inline assembly with __asm {} using something like this code and it scores about 70 time units, but I am not sure that a proper memory barrier has been created. Edit: The times given here are the time it takes for 2 threads to lock and unlock the spinlock 1,000,000 times. I know this isn't a lot of difference but as a spinlock is a heavily used object, one would think that programmers would have agreed on the fastest possible way to make a spinlock. Googling it leads to many different approaches however. I would think this aforementioned method would be the fastest if implemented using inline assembly and using the instruction CMPXCHG8B instead of comparing 32bit registers. Furthermore memory barriers must be taken into account, this could be done by LOCK CMPXHG8B (I think?), which guarantees "exclusive rights" to the shared memory between cores. At last [some suggests] that for busy waits should be accompanied by NOP:REP that would enable Hyper-threading processors to switch to another thread, but I am not sure whether this is true or not? From my performance-test of different spinlocks, it is seen that there is not much difference, but for purely academic purpose I would like to know which one is fastest. However as I have extremely limited experience in the assembly-language and with memory barriers, I would be happy if someone could write the assembly code for the last example I provided with LOCK CMPXCHG8B and proper memory barriers in the following template: __asm { spin_lock: ;locking code. spin_unlock: ;unlocking code. }

    Read the article

  • Can 3D OpenGL game written in Python look good and run fast?

    - by praavDa
    I am planning to write an simple 3d(isometric view) game in Java using jMonkeyEngine - nothing to fancy, I just want to learn something about OpenGL and writing efficient algorithms (random map generating ones). When I was planning what to do, I started wondering about switching to Python. I know that Python didn't come into existence to be a tool to write 3d games, but is it possible to write good looking games with this language? I have in mind 3d graphics, nice effects and free CPU time to power to rest of game engine? I had seen good looking java games - and too be honest, I was rather shocked when I saw level of detail achieved in Runescape HD. On the other hand, pygame.org has only 2d games, with some starting 3d projects. Are there any efficient 3d game engines for python? Is pyopengl the only alternative? Good looking games in python aren't popular or possible to achieve? I would be grateful for any information / feedback.

    Read the article

  • scala REPL is slow on vista

    - by Jacques René Mesrine
    I installed scala-2.8.0.RC3 by extracting the tgz file into my cygwin (vista) home directory. I made sure to set $PATH to scala-2.8.0.RC3/bin. I start the REPL by typing: $ scala Welcome to Scala version 2.8.0.RC3 (Java HotSpot(TM) Client VM, Java 1.6.0_20). Type in expressions to have them evaluated. Type :help for more information. scala> Now when I tried to enter an expression scala> 1 + 'a' the cursor hangs there without any response. Granted that I have chrome open with a million tabs and VLC playing in the background, but CPU utilization was 12% and virtual memory was about 75% utilized. What's going on ? Do I have to set the CLASSPATH or perform other steps.

    Read the article

  • Optimal Sharing of heavy computation job using Snow and/or multicore

    - by James
    Hi, I have the following problem. First my environment, I have two 24-CPU servers to work with and one big job (resampling a large dataset) to share among them. I've setup multicore and (a socket) Snow cluster on each. As a high-level interface I'm using foreach. What is the optimal sharing of the job? Should I setup a Snow cluster using CPUs from both machines and split the job that way (i.e. use doSNOW for the foreach loop). Or should I use the two servers separately and use multicore on each server (i.e. split the job in two chunks, run them on each server and then stich it back together). Basically what is an easy way to: 1. Keep communication between servers down (since this is probably the slowest bit). 2. Ensure that the random numbers generated in the servers are not highly correlated.

    Read the article

  • How to improve performance

    - by Ram
    Hi, In one of mine applications I am dealing with graphics objects. I am using open source GPC library to clip/merge two shapes. To improve accuracy I am sampling (adding multiple points between two edges) existing shapes. But before displaying back the merged shape I need to remove all the points between two edges. But I am not able to find an efficient algorithm that will remove all points between two edges which has same slope with minimum CPU utilization. Currently all points are of type PointF Any pointer on this will be a great help.

    Read the article

  • 2008 Datacenter Word Automation issue

    - by Brad
    We have an application that uses word automation. It works fine under Windows XP, but does not work on our Windows Server 2008 64-bit virtual machine running on VMware ESX unless it is running as the domain administrator. Under any other account (including a local admin), Word starts, uses a lot of CPU for 40 seconds when opening a document, and then just hangs. Our application does not access anything not on the local machine, and this machine is not being used for anything else (not a domain controller, etc). I know others have posted similar issues, with the solution of creating a Desktop folder somewhere under the windows directory. We did this, and it did not solve the problem (Word did not get as far as it did before we did this though). Please don't turn this into a thread about why I am trying to do this, whether I should do this, or whether I need to. For argument sake, I don't need to do this, but understanding what privilege a local admin does not have that is needed to do this is a legitimate concern.

    Read the article

  • Timeout with GAE Java

    - by user242153
    Hi, I am having some issues with an app I have deployed on GAE. Specifically, I am intermittently running into the DeadlineExceededException where the server is not responding within the 30 seconds required. What is odd is that the code is not overly complex, it should run in milliseconds. My guess is that the delay is in dealing with the persistence manager and accessing the datastore. 2 questions: 1) What is the best way to track where all of the CPU time on the server is being used up? Log files do not seem helpful and to make things more complicated the code runs very fast when I am running it locally 2) Any tips / best practices in dealing with the 30 second exception? What are the biggest drivers of this? Datastore? HTTP requests / responses? Thanks

    Read the article

  • For single-producer, single-consumer should I use a BlockingCollection or a ConcurrentQueue?

    - by Jonathan Allen
    For single-producer, single-consumer should I use a BlockingCollection or a ConcurrentQueue? Concerns: * My goal is to pull up to 100 items at a time and send them as a batch to the next step. * If I use a ConcurrentQueue, I have to manually cause it to go asleep when there is no work to be done. Otherwise I waste CPU cycles on spinning. * If I use a BlockingQueue and I only have 99 work items, it could indefinitely block until there the 100th item arrives. http://msdn.microsoft.com/en-us/library/system.collections.concurrent.aspx

    Read the article

  • How to force programs out of swap file when a resources-intensive batch finishes?

    - by sharptooth
    We use employees' desktops for CPU-intensive simulation during the night. Desktops run Windows - usually Windows XP. Employees don't log off, they just lock the desktops, switch off their monitors and go. Every employee has a configuration file which he can edit to specify when he is most likely out of office. When that time comes a background program grabs data for simulation from the server, spawns worker processes, watches them, gets results and sends them to the server. When the time specified by the employee elapses simulation stops so that normal desktop usage is not interfered. The problem is that simuation consumes a lot of memory, so when the worker processes run they force other programs into the swap file. So when the employee comes all the programs he left are luggish and slow until he opens them one by one so that they are unswapped. Is there a way the program can force other programs out of swap file when it stops simulation so that they again run smoothly?

    Read the article

  • HTML Audio performance

    - by user1888309
    I'm working on HTML drum machine, and I`ve met some performance issues, rhythm start to break if BPM is higher than 110 but I'm expecting to make it work on BPM over 180. I guess that it can be related with format or codec of audio files, however it also maybe that my code is not very optimised (as I can see from JS CPU profiling it's not). So I'm expecting you guys give me some code review or some hints on optimisation. Although all similar projects I've found on internet didn't work good and maybe it's just restrictions of Audio API. By the way, it's very raw and sounds works only on Chrome under Mac OS, so any advise on audio encoding for web also would be great Project on Github pages Screenshot of Groove which breaks UPDATE Ok, I've found that I was encoding audio files incorrectly, after fixing that rhythm stopped breaking, and also it started working in Mozilla. But still there are issues on windows OS.

    Read the article

  • MySQL Locking Up

    - by Ian
    I've got a innodb table that gets a lot of reads and almost no writes (like, 1 write for every 400,000 reads approx). I'm running into a pretty big problem though when I do INSERT into the table. MySQL completely locks up. It uses 100% cpu, and every single other table (in other databases even) have their statuses set to "Locked" until the INSERT is done. This is a big problem because MySQL stays locked up for up to 4 minutes. I'm using version 5.1.47 (rpm from mysql.com). Any ideas?

    Read the article

  • Recreation of DB using "mysql mydb < mydb.sql" is really slow when the table has tens of millions of

    - by Jian Lin
    It seems that a MySQL database that has a table with tens of millions of records will get a big INSERT INTO statement when the following mysqldump some_db > some_db.sql is done to back up the database. (is it 1 insert statement that handles all the records?) So when reconstructing the DB using mysql some_db < some_db.sql then the CPU is hardly busy (about 1.8% usage by the mysql process... I don't see a mysqld either?) and also the hard disk doesn't seem to be too busy... Last time, the whole restore process took 5 hours. Is there a way to make it faster? Such as, when doing mysqldump, can it break the INSERT statement into shorter ones, so that the mysql doesn't need to parse the line so hard when restoring the DB?

    Read the article

  • PagedDataSource does not support serialization - how can I enforce this ?

    - by Darkyo
    Sounds like I want to override a physics law, but at least it is the most reasonnable solution, cpu / HDD and Ram effective for my asp.net project. In fact, I got a pageddataSource and a customDataReader that supports paginated data. The truth is my data are in a viewstate variable, because it is re-used in an update panel. When I intend to use it into my pageddatasource, asp.net 3.5 kills me with a System.Web.UI.WebControls.PagedDataSource' in Assembly 'System.Web, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' is not marked as serializable. cool exception... So I'd rather not offend newton because I know he'll always win, but I would need some help to enforce this pagedDataSource law, that seems so unbelievable, except if someone has an explanation.

    Read the article

  • How do i write task? (parallel code)

    - by acidzombie24
    I am impressed with intel thread building blocks. I like how i should write task and not thread code and i like how it works under the hood with my limited understanding (task are in a pool, there wont be 100 threads on 4cores, a task is not guaranteed to run because it isnt on its own thread and may be far into the pool. But it may be run with another related task so you cant do bad things like typical thread unsafe code). I wanted to know more about writing task. I like the 'Task-based Multithreading - How to Program for 100 cores' video here http://www.gdcvault.com/sponsor.php?sponsor_id=1 (currently second last link. WARNING it isnt 'great'). My fav part was 'solving the maze is better done in parallel' which is around the 48min mark (you can click the link on the left side). However i like to see more code examples and some API of how to write task. Does anyone have a good resource? I have no idea how a class or pieces of code may look after pushing it onto a pool or how weird code may look when you need to make a copy of everything and how much of everything is pushed onto a pool.

    Read the article

  • What is the optimal number of threads for performing IO operations in java?

    - by marc
    In Goetz's "Java Concurrency in Practice", in a footnote on page 101, he writes "For computational problems like this that do not I/O and access no shared data, Ncpu or Ncpu+1 threads yield optimal throughput; more threads do not help, and may in fact degrade performance..." My question is, when performing I/O operations such as file writing, file reading, file deleting, etc, are there guidelines for the number of threads to use to achieve maximum performance? I understand this will be just a guide number, since disk speeds and a host of other factors play into this. Still, I'm wondering: can 20 threads write 1000 separate files to disk faster than 4 threads can on a 4-cpu machine?

    Read the article

  • How to create real-life robots?

    - by Click Upvote
    Even before I learnt programming I've been fascinated with how robots could work. Now I know how the underlying programming instructions would be written, but what I don't understand is how those intructions are followed by the robot. For example, if I wrote this code: object=Robot.ScanSurroundings(300,400); if (Objects.isEatable(object)) { Robot.moveLeftArm(300,400); Robot.pickObject(object); } How would this program be followed by the CPU in a way that would make the robot do the physical action of looking to the left, moving his arm, and such? Is it done primarily in binary language/ASM? Lastly, where would i go if I wanted to learn how to create a robot?

    Read the article

  • jQuery.keypad Performance Issues

    - by John Duff
    I am working on a Kiosk Touch Screen application and using the JQuery.keypad plugin and noticing some major performance issues. If you click a number of buttons in rapid succession the CPU gets pegged, the button clicks don't keep up with the clicking and some button presses even get lost. On my dev machine this isn't as noticeable, but on the Kiosk itself with 1 gig of ram it's painful. Trying the demo keypad at http://keith-wood.name/keypad.html#inline the one with multiple targets (which is the case with mine) has the exact same issues. Does anyone have any suggestions on how we might be able to improve this? The Kiosk runs in Firefox only so something specific to that would work. I'm using v1.2.1 of jquery.keypad and just upgraded to v1.4.2 of jquery.

    Read the article

  • Why doesn't infinite recursion hit a stack overflow exception in F#?

    - by Amazingant
    I know this is somewhat the reverse of the issue people are having when they ask about a stack overflow issue, but if I create a function and call it as follows, I never receive any errors, and the application simply grinds up a core of my CPU until I force-quit it: let rec recursionTest x = recursionTest x recursionTest 1 Of course I can change this out so it actually does something like this: let rec recursionTest (x: uint64) = recursionTest (x + 1UL) recursionTest 0UL This way I can occasionally put a breakpoint in my code and see the value of x is going up rather quickly, but it still doesn't complain. Does F# not mind infinite recursion?

    Read the article

  • Does the number of busy worker threads in the CLR ThreadPool affect performance of I/O threads?

    - by andrej351
    We have a Windows Service which hosts a number of WCF services and, in an unrelated part of the app, makes extensive use of the TPL Task class to asynchronously do relatively short bits of work. It is my understanding that WCF uses managed I/O threads from the ThreadPool to execute requests. I noticed that after deploying a feature which significantly raised the applications use of Tasks, and as such the use of ThreadPool worker threads as well, performance of a couple of web services has become very slow. We're talking minutes instead of less than a second. The number of Tasks actually trying to run at any one time can range between 20 and 1000, which makes me think that any new (last in) work needing some CPU time could be forced to wait for quite some time. Does the (in my case extremely large) number of busy ThreadPool worker threads affect the ThreadPool's managed I/O threads? Or could these two be connected in any way? Thanks!

    Read the article

  • C#: Efficiently search a large string for occurences of other strings

    - by Jon
    Hi, I'm using C# to continuously search for multiple string "keywords" within large strings, which are = 4kb. This code is constantly looping, and sleeps aren't cutting down CPU usage enough while maintaining a reasonable speed. The bog-down is the keyword matching method. I've found a few possibilities, and all of them give similar efficiency. 1) http://tomasp.net/articles/ahocorasick.aspx -I do not have enough keywords for this to be the most efficient algorithm. 2) Regex. Using an instance level, compiled regex. -Provides more functionality than I require, and not quite enough efficiency. 3) String.IndexOf. -I would need to do a "smart" version of this for it provide enough efficiency. Looping through each keyword and calling IndexOf doesn't cut it. Does anyone know of any algorithms or methods that I can use to attain my goal?

    Read the article

  • Uploadify and Image Compression

    - by Ilya Biryukov
    Hi, I am using Uploadify on one of my client's web sites to allow them to upload a large amount of pictures at once to their photo gallery. I am seeing issues lately. They seem to upload large photographs (3 MB and above). I am wondering, is it possible to compress (reduce their size) on the client side, instead of doing it on the server (just like facebook does it). I know I could easily do it on the server, but I am working on another project right now, where I am expecting a large flow of photo uploads. It would require significant amount of CPU time to process them all. So I thought, I'd ask about the client side processing. Thanks.

    Read the article

  • Considering getting into reverse engineering/disassembly

    - by Zombies
    Assuming a decent understanding of assembly on common CPU architectures (eg: x86), how can one explore a potential path (career, fun and profit, etc) into the field of reverse engineering? There is so little educational guides out there so it is difficult to understand what potential uses this has today (eg: is searching for buffer overflow exploits still common, or do stack monitoring programs make this obselete?). I am not looking for any step by step program, just some relevant information such as tips on how to efficiently find a specific area of a program. Basic things in the trade. As well as what it is currently being used for today. So to recap, what current uses does reverse engineering yield today? And how can one find some basic information on how to learn the trade (again it doesn't have to be step-by-step, just anything which can through a clue would be helpful).

    Read the article

  • Any ideas for developing a Risc Processor friendly string allocator?

    - by Richard Fabian
    I'm working on some tools to enable high throughput data-oriented development, and one thing that I've not got an immediate answer for is how you go about allocating strings quickly. On risc processors you've got another problem of implementation that the CPU doesn't like branching, which is what I'm trying to minimise or avoid. Also, cache coherence is important on most CPUs, so that's gotta be influential in the design too. So, how would you go about reducing the overhead for a generic string allocator? Sometimes it's easier to solve a more explicit problem, so any ideas for string sizes of 5-30?

    Read the article

  • Python: Taking an array and break it into subarrays based on some criteria

    - by randombits
    I have an array of files. I'd like to be able to break that array down into one array with multiple subarrays, each subarray contains files that were created on the same day. So right now if the array contains files from March 1 - March 31, I'd like to have an array with 31 subarrays (assuming there is at least 1 file for each day). In the long run, I'm trying to find the file from each day with the latest creation/modification time. If there is a way to bundle that into the iterations that are required above to save some CPU cycles, that would be even more ideal. Then I'd have one flat array with 31 files, one for each day, for the latest file created on each individual day.

    Read the article

  • unroll nested for loops in C++

    - by Hristo
    How would I unroll the following nested loops? for(k = begin; k != end; ++k) { for(j = 0; j < Emax; ++j) { for(i = 0; i < N; ++i) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); } } } I tried the following, but my output isn't the same, and it should be: for(k = begin; k != end; ++k) { for(j = 0; j < Emax; ++j) { for(i = 0; i+4 < N; i+=4) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); array[k] += foo(i+1, tr[k][i+1], ex[j][i+1]); array[k] += foo(i+2, tr[k][i+2], ex[j][i+2]); array[k] += foo(i+3, tr[k][i+3], ex[j][i+3]); } if (i < N) { for (; i < N; ++i) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); } } } } I will be running this code in parallel using Intel's TBB so that it takes advantage of multiple cores. After this is finished running, another function prints out what is in array[] and right now, with my unrolling, the output isn't identical. Any help is appreciated. Thanks, Hristo

    Read the article

< Previous Page | 177 178 179 180 181 182 183 184 185 186 187 188  | Next Page >