# Search Results

• ### Optimizing hash lookup & memory performance in Go

##### - by Moishe
As an exercise, I'm implementing HashLife in Go. In brief, HashLife works by memoizing nodes in a quadtree so that once a given node's value in the future has been calculated, it can just be looked up instead of being re-calculated. So eg. if you have a node at the 8x8 level, you remember it by its four children (each at the 2x2 level). So next time you see an 8x8 node, when you calculate the next generation, you first check if you've already seen a node with those same four children. This is extended up through all levels of the quadtree, which gives you some pretty amazing optimizations if eg. you're 10 levels above the leaves. Unsurprisingly, it looks like the perfmance crux of this is the lookup of nodes by child-node values. Currently I have a hashmap of {&upper_left_node,&upper_right_node,&lower_left_node,&lower_right_node} -> node So my lookup function is this: func FindNode(ul, ur, ll, lr *Node) *Node { var node *Node var ok bool nc := NodeChildren{ul, ur, ll, lr} node, ok = NodeMap[nc] if ok { return node } node = &Node{ul, ur, ll, lr, 0, ul.Level + 1, nil} NodeMap[nc] = node return node } What I'm trying to figure out is if the "nc := NodeChildren..." line causes a memory allocation each time the function is called. If it does, can I/should I move the declaration to the global scope and just modify the values each time this function is called? Or is there a more efficient way to do this? Any advice/feedback would be welcome. (even coding style nits; this is literally the first thing I've written in Go so I'd love any feedback)

• ### Ubuntu with KDE and kernel 3.6.2 - performance issues

##### - by Pelda
I have recently swiched to Ubuntu and installed KDe on it. I like software included in Ubuntu but interface from Kubuntu. My problem is, that after installation of kernel 3.6.2 (deafult ubuntu 12.04 is 3.2 I think) - whole KDE interface is laggy and I have to render using Xrended because Opel AL doesnt work. So please tell me - I didint find it anywhere - Does KDE has some problems with new kernel? should i downgrade back to 3.x.x? Thank you for answers and your time. Pelda

• ### AWR Performance Report and Read by Other Session Waits

##### - by user702295
For the questions regarding "read by other session" and its relation to "db file sequential/scattered read", the logic is like this: When a "db file sequential/scattered read" is done, the blocks are either already in the cache or on the disk.  Since any operation on blocks is done in the cache and since and the issue is "read by other session" I will relate to the case the blocks are on the disk. Process A is reading the needed block from the disk to the cache.  During that time, if process B (and C and others) need the same block, it will wait on "read by other session".  A and B can be threads of the same process running in parallel or unrelated processes.  For example two processes doing full table scan on mdp_matrix etc. Solutions for that can be lowering the number of processes competing on the same blocks, increasing PCTFREE.  If it is a full table scan, maybe an index is missing that can result in less blocks being read from the cache and so on.

• ### Intel programming "performance" books? [closed]

##### - by user997112
I vaguely remember seeing that Intel have produced a few good books, especially with regards to low latency programming, but I cannot remember the titles. Could people suggest the titles of Intel books (or ones relating to Intel products)? Examples include books on: -Intel Compiler -Intel Assembler -Any low level programming on Intel assembler -The Intel CPU architecture -Intel threading blocks library

• ### Pain of the Week/Expert's Perspective: Performance Tuning for Backups and Restores

##### - by KKline
First off - the Pain of the Week webcast series has been renamed. It's now known as The Expert's Perspective . Please join us for future webcasts and, if you're interested in speaking, drop me a note to see if we can get you on the roster! The bigger your databases get, the longer backups take. That doesn't really seem like a huge problem — until disaster strikes and you need to restore your databases as fast as possible. Join my buddy Brent Ozar ( blog | twitter ), a Microsoft Certified Master of...(read more)

• ### Displaying performance data per engine subsystem

##### - by liortal
Our game (Android based) traces how long it takes to do the world logic updates, and how long it takes to a render a frame to the device screen. These traces are collected every frame, and displayed at a constant interval (currently every 1 second). I've seen games where on-screen data of various engine subsystems is displayed, with the time they consume (either in text) or as horizontal colored bars. I am wondering how to implement such a feature?

• ### Practical RAID Performance?

##### - by wag2639
I've always thought the following to be a general rule of thumb for RAID: RAID 0: Best performance for READ and WRITE from stripping, greatest risk RAID 1: Redundant, decent for READ (I believe it can read from different parts of a file from different hard drives), not the best for WRITE RAID 0+1 (01): combines redundancy of RAID 1 with performance of RAID 0 RAID 1+0 (10): slightly better version of RAID 0+1 RAID 5: good READ performance, bad WRITE performance, redundant IS THIS ASSUMPTION CORRECT? (and how do they compare to a JBOD setup for R/W IO performance) Are certain practical RAID setups better for different applications: gaming, video editing, database (Acccess or SQL)? I was thinking about hard disk drives but does this apply to solid state drives as well?

• ### OpenVZ vs Xen, how much difference in performance?

##### - by Aleksandr Levchuk
There is a Xen vs. KVM in performance question on ServerFault. What will be the performance difference if the choice is between Xen and OpenVZ? How is it best to measure? Some may say "you're comparing apples and oranges" but I have to choose one of the two and it needs to be wise choice. Performance is most important to us. We may switching to Xen from OpenVZ because Xen is more ubiquitous but only if performance difference is not significant. In January 2011 I'm thinking of doing a head to head performance comparison - here is my project proposal to our Bioinformatics facility director.

• ### How does a website latency simulator work

##### - by nighthawk457
Sites like webpagetest allow users to enter a website url and a test location, to run a speed test on the site from multiple locations using real browsers. Can anyone give me a basic idea of how sites like this work? You also have plugin's like Aptimize latency simulator or charles web debugging proxy app, that simulate the delay while accessing a site from different locations. I am assuming since these are plugin's these function in a different way. How do these plugin's work ?

• ### How to test the render speed of my solution in a web browser?

##### - by Cuartico
Ok, I need to test the speed of my solution in a web browser, but I have some problems, there are 2 versions of the web solution, the original one that is on server A and the "fixed" version that is on server B. I have VS2010 Ultimate, so I can make a web and load test on solution B, but I can't load the A solution on my IDE. I was trying to use fiddle2 and jmeter, but they only gave me the times of the request and response of the browsers with the server, I also want the time it takes to the browser to render the whole page. Maybe I'm misusing some of this tools... I don't know if this could be usefull but: Solution A is on VB 6.0 Solution B is on VB.Net Thanks in advance!

• ### Strange performance issue with Dell R7610 and LSI 2208 RAID controller

##### - by GregC
Connecting controller to any of the three PCIe x16 slots yield choppy read performance around 750 MB/sec Lowly PCIe x4 slot yields steady 1.2 GB/sec read Given same files, same Windows Server 2008 R2 OS, same RAID6 24-disk Seagate ES.2 3TB array on LSI 9286-8e, same Dell R7610 Precision Workstation with A03 BIOS, same W5000 graphics card (no other cards), same settings etc. I see super-low CPU utilization in both cases. SiSoft Sandra reports x8 at 5GT/sec in x16 slot, and x4 at 5GT/sec in x4 slot, as expected. I'd like to be able to rely on the sheer speed of x16 slots. What gives? What can I try? Any ideas? Please assist Cross-posted from http://en.community.dell.com/support-forums/desktop/f/3514/t/19526990.aspx Follow-up information We did some more performance testing with reading from 8 SSDs, connected directly (without an expander chip). This means that both SAS cables were utilized. We saw nearly double performance, but it varied from run to run: {2.0, 1.8, 1.6, and 1.4 GB/sec were observed, then performance jumped back up to 2.0}. The SSD RAID0 tests were conducted in a x16 PCIe slot, all other variables kept the same. It seems to me that we were getting double the performance of HDD-based RAID6 array. Just for reference: maximum possible read burst speed over single channel of SAS 6Gb/sec is 570 MB/sec due to 8b/10b encoding and protocol limitations (SAS cable provides four such channels).

• ### Looking for application performance tracking software

##### - by JavaRocky
I have multiple java-based applications which produce statistics on how long method calls take. Right now the information is being written into a log file and I analyse performance that way. However with multiple apps and more monitoring requirements this is being becoming a bit overwhelming. I am looking for an application which will collect stats and graph them so I can analyse performance and be aware of performance degradation. I have looked at Solarwinds Application Performance Monitoring, however this polls periodically to gather information. My applications are totally event based and we would like to graph and track this accordingly. I almost started hacking together some scripts to produce Google Charts but surely there are applications which do this already. Suggestions?

• ### Hyper-V performance comparisons vs physical client?

##### - by rwmnau
Are there any comparisons between Hyper-V client machines and their physical equivalent? I've looked around and can find 4000 articles about improving Hyper-V performance, but I can't find any that actually do a side-by-side comparison or give benchmarking numbers. Ideally, I'm interested in a comparison of CPU, memory, disk, and graphics performance between something like the following: Some powerful workstation (with plenty of RAM) with Windows 7 installed on it directly Same exact worksation with Hyper-V Server 2008 R2 (the bare Server role) and a full-screen Windows 7 client machine Virtual Server 2005 had performance that didn't compare at all with actual hardware, but with the advances in CPU and hardware-level virtualization, has performance improved significantly? How obvious would it be to a user of the two above scenarios that one of them was virtualized, and does anybody know of actual benchmarking of this type?

• ### Raid-5 Performance per spindle scaling

##### - by Bill N.
So I am stuck in a corner, I have a storage project that is limited to 24 spindles, and requires heavy random Write (the corresponding read side is purely sequential). Needs every bit of space on my Drives, ~13TB total in a n-1 raid-5, and has to go fast, over 2GB/s sort of fast. The obvious answer is to use a Stripe/Concat (Raid-0/1), or better yet a raid-10 in place of the raid-5, but that is disallowed for reasons beyond my control. So I am here asking for help in getting a sub optimal configuration to be as good as it can be. The array built on direct attached SAS-2 10K rpm drives, backed by a ARECA 18xx series controller with 4GB of cache. 64k array stripes and an 4K stripe aligned XFS File system, with 24 Allocation groups (to avoid some of the penalty for being raid 5). The heart of my question is this: In the same setup with 6 spindles/AG's I see a near disk limited performance on the write, ~100MB/s per spindle, at 12 spindles I see that drop to ~80MB/s and at 24 ~60MB/s. I would expect that with a distributed parity and matched AG's, the performance should scale with the # of spindles, or be worse at small spindle counts, but this array is doing the opposite. What am I missing ? Should Raid-5 performance scale with # of spindles ? Many thanks for your answers and any ideas, input, or guidance. --Bill Edit: Improving RAID performance The other relevant thread I was able to find, discusses some of the same issues in the answers, though it still leaves me with out an answer on the performance scaling.

• ### How to limit disk performance?

##### - by DrakeES
I am load-testing a web application and studying the impact of some config tweaks (related to disk i/o) on the overall app performance, i.e. the amount of users that can be handled simultaneously. But the problem is that I hit 100% CPU before I can see any effect of the disk-related config settings. I am therefore wondering if there is a way I could deliberately limit the disk performance so that it becomes the bottleneck and the tweaks I am trying to play with actually start impacting performance. Should I just make the hard disk busy with something else? What would serve the best for this purpose? More details (probably irrelevant, but anyway): PHP/Magento/Apache, studying the impact of apc.stat. Setting it to 0 makes APC not checking PHP scripts for modification which should increase performance where disk is the bottleneck. Using JMeter for benchmarking.

• ### Good C++ books regarding Performance?

##### - by Leon
Besides the books everyone knows about, like Meyer's 3 Effective C++/STL books, are there any other really good C++ books specifically aimed towards performance code? Maybe this is for gaming, telecommunications, finance/high frequency etc? When I say performance I mean things where a normal C++ book wouldnt bother advising because the gain in performance isn't worthwhile for 95% of C++ developers. Maybe suggestions like avoiding virtual pointers, going into great depth about inlining etc? A book going into great depth on C++ memory allocation or multithreading performance would obviously be very useful.

• ### Are there any resources for language independent performance tips?

##### - by Matt Pascoe
On a regular basis I have to work with people using numerous different languages and no matter who I work with or in which language performance is always an issue. Does anyone know of a general performance programming guide that is not language specific?

• ### puzzled with java if else performance

##### - by user1906966
I am doing an investigation on a method's performance and finally identified the overhead was caused by the "else" portion of the if else statement. I have written a small program to illustrate the performance difference even when the else portion of the code never gets executed: public class TestIfPerf { public static void main( String[] args ) { boolean condition = true; long time = 0L; int value = 0; // warm up test for( int count=0; count<10000000; count++ ) { if ( condition ) { value = 1 + 2; } else { value = 1 + 3; } } // benchmark if condition only time = System.nanoTime(); for( int count=0; count<10000000; count++ ) { if ( condition ) { value = 1 + 2; } } time = System.nanoTime() - time; System.out.println( "1) performance " + time ); time = System.nanoTime(); // benchmark if else condition for( int count=0; count<10000000; count++ ) { if ( condition ) { value = 1 + 2; } else { value = 1 + 3; } } time = System.nanoTime() - time; System.out.println( "2) performance " + time ); } } and run the test program with java -classpath . -Dmx=800m -Dms=800m TestIfPerf. I performed this on both Mac and Linux Java with 1.6 latest build. Consistently the first benchmark, without the else is much faster than the second benchmark with the else section even though the code is structured such that the else portion is never executed because of the condition. I understand that to some, the difference might not be significant but the relative performance difference is large. I wonder if anyone has any insight to this (or maybe there is something I did incorrectly). Linux benchmark (in nano) performance 1215488 performance 2629531 Mac benchmark (in nano) performance 1667000 performance 4208000

• ### Performance Testing Versus Unit Testing

##### - by Mystagogue
I'm reading Osherove's "The Art of Unit Testing," and though I've not yet seen him say anything about performance testing, two thoughts still cross my mind: Performance tests generally can't be unit tests, because performance tests generally need to run for long periods of time. Performance tests generally can't be unit tests, because performance issues too often manifest at an integration or system level (or at least the logic of a single unit test needed to re-create the performance of the integration environment would be too involved to be a unit test). Particularly for the first reason stated above, I doubt it makes sense for performance tests to be handled by a unit testing framework (such as NUnit). My question is: do my findings / leanings correspond with the thoughts of the community?

• ### What do you think of a performance engineer should have?

##### - by Vance
I believe performance tuning (or even testing) is one the most challenging for an engineer. Well, in lots of company, this is the lowest priority than others "important" thing. My purpose of opening this post is to know what do you think*good* performance engineer should have. I can list some things like: Solid database,programming knowledge. Do single thread performance testing. Good knowledge of using the load generator tools to simulate the concurrent loads. Use different tools to monitor/measure the app/db server performance status Understand and can debug the codes. Even tune the codes. Any more ideas are always appreciated!

• ### In MATLAB, how can 'preallocating' cell arrays improve performance?

##### - by Alex McMurray
I was reading this article on MathWorks about improving MATLAB performance and you will notice that one of the first suggestions is to preallocate arrays, which makes sense. But it also says that preallocating Cell arrays (that is arrays which may contain different, unknown datatypes) will improve performance. But how will doing so improve performance because the datatypes are unknown so it doesn't know how much contiguous memory it will require even if it knows the shape of the cell array, and therefore it can't preallocate the memory surely? So how does this result in any improvement in performance? I apologise if this question is better suited for StackOverflow than Programmers but it isn't asking about a specific problem so I thought it fit better here, please let me know if I am mistaken though. Any explanation would be greatly appreciated :)