Search Results

Search found 415 results on 17 pages for 'bottleneck'.

Page 6/17 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

Example of moving from MySQL to NoSQL?

- by OverTheRainbow

Hello, For a Facebook-like site, ie. which is write-intensive and delivers user-customized pages, I'd like to build a prototype to investigate whether the document-centric NoSQL architecture would be a good alternative to sharding and reduce the load on the single master (+ multiple slaves) that we currently use and is the bottleneck. Does someone know of a good article that would give actual, simple examples of going from a relational layout in MySQL to a NoSQL layout? Thank you.

Read the article
What's the best Free C++ Profiler for windows (if there are)

- by ugasoft

Hi, I'm looking for a profiler in order to find the bottleneck of my c++ code. I'd like to find a free, non intrusive, good profiling tool. I'm a game developer and I use PIX for Xbox360, I found it very good (but not free) I know the Intel v-Tune, but it's not free. There exists any free profiling tool?

Read the article
Why is there a large gap between Begin PreRenderComplete and End PreRenderComplete on my page?

- by Middletone

I'd like to know what can cause this kind of disparity between the begin and end PreRendercomplete events or how I migh go about locating the bottleneck. aspx.page End PreRender 0.193179639923915 0.001543 aspx.page Begin PreRenderComplete 0.193206263076064 0.000027 aspx.page End PreRenderComplete 1.96926008935549 1.776054 aspx.page Begin SaveState 2.13108461902679 0.161825

Read the article
C# Performance Pitfall – Interop Scenarios Change the Rules

- by Reed

C# and .NET, overall, really do have fantastic performance in my opinion. That being said, the performance characteristics dramatically differ from native programming, and take some relearning if you’re used to doing performance optimization in most other languages, especially C, C++, and similar. However, there are times when revisiting tricks learned in native code play a critical role in performance optimization in C#. I recently ran across a nasty scenario that illustrated to me how dangerous following any fixed rules for optimization can be… The rules in C# when optimizing code are very different than C or C++. Often, they’re exactly backwards. For example, in C and C++, lifting a variable out of loops in order to avoid memory allocations often can have huge advantages. If some function within a call graph is allocating memory dynamically, and that gets called in a loop, it can dramatically slow down a routine. This can be a tricky bottleneck to track down, even with a profiler. Looking at the memory allocation graph is usually the key for spotting this routine, as it’s often “hidden” deep in call graph. For example, while optimizing some of my scientific routines, I ran into a situation where I had a loop similar to: for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i]); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This loop was at a fairly high level in the call graph, and often could take many hours to complete, depending on the input data. As such, any performance optimization we could achieve would be greatly appreciated by our users. After a fair bit of profiling, I noticed that a couple of function calls down the call graph (inside of ProcessElement), there was some code that effectively was doing: // Allocate some data required DataStructure* data = new DataStructure(num); // Call into a subroutine that passed around and manipulated this data highly CallSubroutine(data); // Read and use some values from here double values = data->Foo; // Cleanup delete data; // ... return bar; Normally, if “DataStructure” was a simple data type, I could just allocate it on the stack. However, it’s constructor, internally, allocated it’s own memory using new, so this wouldn’t eliminate the problem. In this case, however, I could change the call signatures to allow the pointer to the data structure to be passed into ProcessElement and through the call graph, allowing the inner routine to reuse the same “data” memory instead of allocating. At the highest level, my code effectively changed to something like: DataStructure* data = new DataStructure(numberToProcess); for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i], data); } delete data; Granted, this dramatically reduced the maintainability of the code, so it wasn’t something I wanted to do unless there was a significant benefit. In this case, after profiling the new version, I found that it increased the overall performance dramatically – my main test case went from 35 minutes runtime down to 21 minutes. This was such a significant improvement, I felt it was worth the reduction in maintainability. In C and C++, it’s generally a good idea (for performance) to: Reduce the number of memory allocations as much as possible, Use fewer, larger memory allocations instead of many smaller ones, and Allocate as high up the call stack as possible, and reuse memory I’ve seen many people try to make similar optimizations in C# code. For good or bad, this is typically not a good idea. The garbage collector in .NET completely changes the rules here. In C#, reallocating memory in a loop is not always a bad idea. In this scenario, for example, I may have been much better off leaving the original code alone. The reason for this is the garbage collector. The GC in .NET is incredibly effective, and leaving the allocation deep inside the call stack has some huge advantages. First and foremost, it tends to make the code more maintainable – passing around object references tends to couple the methods together more than necessary, and overall increase the complexity of the code. This is something that should be avoided unless there is a significant reason. Second, (unlike C and C++) memory allocation of a single object in C# is normally cheap and fast. Finally, and most critically, there is a large advantage to having short lived objects. If you lift a variable out of the loop and reuse the memory, its much more likely that object will get promoted to Gen1 (or worse, Gen2). This can cause expensive compaction operations to be required, and also lead to (at least temporary) memory fragmentation as well as more costly collections later. As such, I’ve found that it’s often (though not always) faster to leave memory allocations where you’d naturally place them – deep inside of the call graph, inside of the loops. This causes the objects to stay very short lived, which in turn increases the efficiency of the garbage collector, and can dramatically improve the overall performance of the routine as a whole. In C#, I tend to: Keep variable declarations in the tightest scope possible Declare and allocate objects at usage While this tends to cause some of the same goals (reducing unnecessary allocations, etc), the goal here is a bit different – it’s about keeping the objects rooted for as little time as possible in order to (attempt) to keep them completely in Gen0, or worst case, Gen1. It also has the huge advantage of keeping the code very maintainable – objects are used and “released” as soon as possible, which keeps the code very clean. It does, however, often have the side effect of causing more allocations to occur, but keeping the objects rooted for a much shorter time. Now – nowhere here am I suggesting that these rules are hard, fast rules that are always true. That being said, my time spent optimizing over the years encourages me to naturally write code that follows the above guidelines, then profile and adjust as necessary. In my current project, however, I ran across one of those nasty little pitfalls that’s something to keep in mind – interop changes the rules. In this case, I was dealing with an API that, internally, used some COM objects. In this case, these COM objects were leading to native allocations (most likely C++) occurring in a loop deep in my call graph. Even though I was writing nice, clean managed code, the normal managed code rules for performance no longer apply. After profiling to find the bottleneck in my code, I realized that my inner loop, a innocuous looking block of C# code, was effectively causing a set of native memory allocations in every iteration. This required going back to a “native programming” mindset for optimization. Lifting these variables and reusing them took a 1:10 routine down to 0:20 – again, a very worthwhile improvement. Overall, the lessons here are: Always profile if you suspect a performance problem – don’t assume any rule is correct, or any code is efficient just because it looks like it should be Remember to check memory allocations when profiling, not just CPU cycles Interop scenarios often cause managed code to act very differently than “normal” managed code. Native code can be hidden very cleverly inside of managed wrappers

Read the article
Premature-Optimization and Performance Anxiety

- by James Michael Hare

While writing my post analyzing the new .NET 4 ConcurrentDictionary class (here), I fell into one of the classic blunders that I myself always love to warn about. After analyzing the differences of time between a Dictionary with locking versus the new ConcurrentDictionary class, I noted that the ConcurrentDictionary was faster with read-heavy multi-threaded operations. Then, I made the classic blunder of thinking that because the original Dictionary with locking was faster for those write-heavy uses, it was the best choice for those types of tasks. In short, I fell into the premature-optimization anti-pattern. Basically, the premature-optimization anti-pattern is when a developer is coding very early for a perceived (whether rightly-or-wrongly) performance gain and sacrificing good design and maintainability in the process. At best, the performance gains are usually negligible and at worst, can either negatively impact performance, or can degrade maintainability so much that time to market suffers or the code becomes very fragile due to the complexity. Keep in mind the distinction above. I'm not talking about valid performance decisions. There are decisions one should make when designing and writing an application that are valid performance decisions. Examples of this are knowing the best data structures for a given situation (Dictionary versus List, for example) and choosing performance algorithms (linear search vs. binary search). But these in my mind are macro optimizations. The error is not in deciding to use a better data structure or algorithm, the anti-pattern as stated above is when you attempt to over-optimize early on in such a way that it sacrifices maintainability. In my case, I was actually considering trading the safety and maintainability gains of the ConcurrentDictionary (no locking required) for a slight performance gain by using the Dictionary with locking. This would have been a mistake as I would be trading maintainability (ConcurrentDictionary requires no locking which helps readability) and safety (ConcurrentDictionary is safe for iteration even while being modified and you don't risk the developer locking incorrectly) -- and I fell for it even when I knew to watch out for it. I think in my case, and it may be true for others as well, a large part of it was due to the time I was trained as a developer. I began college in in the 90s when C and C++ was king and hardware speed and memory were still relatively priceless commodities and not to be squandered. In those days, using a long instead of a short could waste precious resources, and as such, we were taught to try to minimize space and favor performance. This is why in many cases such early code-bases were very hard to maintain. I don't know how many times I heard back then to avoid too many function calls because of the overhead -- and in fact just last year I heard a new hire in the company where I work declare that she didn't want to refactor a long method because of function call overhead. Now back then, that may have been a valid concern, but with today's modern hardware even if you're calling a trivial method in an extremely tight loop (which chances are the JIT compiler would optimize anyway) the results of removing method calls to speed up performance are negligible for the great majority of applications. Now, obviously, there are those coding applications where speed is absolutely king (for example drivers, computer games, operating systems) where such sacrifices may be made. But I would strongly advice against such optimization because of it's cost. Many folks that are performing an optimization think it's always a win-win. That they're simply adding speed to the application, what could possibly be wrong with that? What they don't realize is the cost of their choice. For every piece of straight-forward code that you obfuscate with performance enhancements, you risk the introduction of bugs in the long term technical debt of the application. It will become so fragile over time that maintenance will become a nightmare. I've seen such applications in places I have worked. There are times I've seen applications where the designer was so obsessed with performance that they even designed their own memory management system for their application to try to squeeze out every ounce of performance. Unfortunately, the application stability often suffers as a result and it is very difficult for anyone other than the original designer to maintain. I've even seen this recently where I heard a C++ developer bemoaning that in VS2010 the iterators are about twice as slow as they used to be because Microsoft added range checking (probably as part of the 0x standard implementation). To me this was almost a joke. Twice as slow sounds bad, but it almost never as bad as you think -- especially if you're gaining safety. The only time twice is really that much slower is when once was too slow to begin with. Think about it. 2 minutes is slow as a response time because 1 minute is slow. But if an iterator takes 1 microsecond to move one position and a new, safer iterator takes 2 microseconds, this is trivial! The only way you'd ever really notice this would be in iterating a collection just for the sake of iterating (i.e. no other operations). To my mind, the added safety makes the extra time worth it. Always favor safety and maintainability when you can. I know it can be a hard habit to break, especially if you started out your career early or in a language such as C where they are very performance conscious. But in reality, these type of micro-optimizations only end up hurting you in the long run. Remember the two laws of optimization. I'm not sure where I first heard these, but they are so true: For beginners: Do not optimize. For experts: Do not optimize yet. This is so true. If you're a beginner, resist the urge to optimize at all costs. And if you are an expert, delay that decision. As long as you have chosen the right data structures and algorithms for your task, your performance will probably be more than sufficient. Chances are it will be network, database, or disk hits that will be your slow-down, not your code. As they say, 98% of your code's bottleneck is in 2% of your code so premature-optimization may add maintenance and safety debt that won't have any measurable impact. Instead, code for maintainability and safety, and then, and only then, when you find a true bottleneck, then you should go back and optimize further.

Read the article
SQLAuthority News – A Successful Performance Tuning Seminar at Pune – Dec 4-5, 2010

- by pinaldave

This is report to my third of very successful seminar event on SQL Server Performance Tuning. SQL Server Performance Tuning Seminar in Colombo was oversubscribed with total of 35 attendees. You can read the details over here SQLAuthority News – SQL Server Performance Optimizations Seminar – Grand Success – Colombo, Sri Lanka – Oct 4 – 5, 2010. SQL Server Performance Tuning Seminar in Hyderabad was oversubscribed with total of 25 attendees. You can read the details over here SQL SERVER – A Successful Performance Tuning Seminar – Hyderabad – Nov 27-28, 2010. The same Seminar was offered in Pune on December 4,-5, 2010. We had another successful seminar with lots of performance talk. This seminar was attended by 30 attendees. The best part of the seminar was that along with the our agenda, we have talked about following very interesting concepts. Deadlocks Detection and Removal Dynamic SQL and Inline Code SQL Optimizations Multiple OR conditions and performance tuning Dynamic Search Condition Building and Improvement Memory Cache and Improvement Bottleneck Detections – Memory, CPU and IO Beginning Performance Tuning on Production Parametrization Improving already Super Fast Queries Convenience vs. Performance Proper way to create Indexes Hints and Disadvantages I had great time doing the seminar and sharing my performance tricks with all. The highlight of this seminar was I have explained the attendees, how I begin doing performance tuning when I go for Performance Tuning Consultations. Pinal Dave at SQL Performance Tuning Seminar SQL Server Performance Tuning Seminar Pinal Dave at SQL Performance Tuning Seminar Pinal Dave at SQL Performance Tuning Seminar SQL Server Performance Tuning Seminar SQL Server Performance Tuning Seminar This seminar series are 100% demo oriented and no usual PowerPoint talk. They are created from my experiences of various organizations for performance tuning. I am not planning any more seminar this year as it was great but I am booked currently for next 60 days at various performance tuning engagements. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQL Training, SQLAuthority News, T SQL, Technology

Read the article
DB2 insert performance - How to measure

- by svrist

[From stackoverflow] Im trying to find a way to speedup my inserts to a DB2 9.7.1 (ubuntu linux) Im watching vmstat and trying to gather some statistics via the db2 get snapshot commands but im not able to figure out which numbers im looking for to be able to see where the trouble is. I've read lits of stuff like http://www.eggheadcafe.com/software/aspnet/35692526/question-multiple-row-in.aspx, and http://www.ibm.com/developerworks/data/library/tips/dm-0403wilkins/ and tricks like ALTER TABLE lalala APPEND ON works somewhat (the difference between a dd if=/dev/zero and insert is still a factor 10) but I would like to be able to find the counters or other performance indicators that actually show why it makes sense to use those tricks. For example: What is the metric called that shows me that it is buffer pages allocation (FSCR stuff) that is the problem Where do I see that the insert time is hampered by clustered indexes? I find db2top very useful but im still searching for more direct view of "this is your bottleneck" methods

Read the article
SQL SERVER – A Successful Performance Tuning Seminar – Hyderabad – Nov 27-28, 2010 – Next Pune

- by pinaldave

My recent SQL Server Performance Tuning Seminar in Colombo was oversubscribed with total of 35 attendees. You can read the details over here SQLAuthority News – SQL Server Performance Optimizations Seminar – Grand Success – Colombo, Sri Lanka – Oct 4 – 5, 2010. I had recently completed another seminar in Hyderabad which was again blazing success. We had 25 attendees to the seminar and had wonderful time together. There is one thing very different between usual class room training and this seminar series. In this seminar series we go 100% demo oriented and real world scenario deep down. We do not talk usual theory talk-talk. The goal of this seminar to give anybody who attends a jump start and deep dive on the performance tuning subject. I will share many different examples and scenarios from my years of experience of performance tuning. The beginning of the second day is always interesting as I take attendees the server as example of the talk, and together we will attempt to identify the bottleneck and see if we can resolve the same. So far I have got excellent feedback on this unique session, where we pick database of the attendees and address the issues. I plan to do the same again in next sessions. The next Seminar is in Pune.I am very excited for the same. Date and Time: December 4-5, 2010. 10 AM to 6 PM The Pride Hotel 05, University Road, Shivaji Nagar, Pune – 411 005 Tel: 020 255 34567 Click here for the agenda of the seminar. Instead of writing more details, I will let the photos do the talk for latest Hyderabad Seminar. Hotel Amrutha Castle King Arthur's Court Pinal Presenting Seminar Pinal Presenting Seminar Seminar Attendees Pinal Presenting Seminar Group Photo of Hyderabad Seminar Attendees Seminar Support Staff - Nupur and Shaivi Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Training, SQLAuthority Author Visit, SQLAuthority News, T SQL, Technology

Read the article
SQL Server and Hyper-V Dynamic Memory - Part 1

- by SQLOS Team

SQL and Dynamic Memory Blog Post Series Hyper-V Dynamic Memory is a new feature in Windows Server 2008 R2 SP1 that allows the memory assigned to guest virtual machines to vary according to demand. Using this feature with SQL Server is supported, but how well does it work in an environment where available memory can vary dynamically, especially since SQL Server likes memory, and is not very eager to let go of it? The next three posts will look at this question in detail. In Part 1 Serdar Sutay, a program manager in the Windows Hyper-V team, introduces Dynamic Memory with an overview of the basic architecture, configuration and monitoring concepts. In subsequent parts we will look at SQL Server memory handling, and develop some guidelines on using SQL Server with Dynamic Memory. Part 1: Dynamic Memory Introduction In virtualized environments memory is often the bottleneck for reaching higher VM densities. In Windows Server 2008 R2 SP1 Hyper-V introduced a new feature “Dynamic Memory” to improve VM densities on Hyper-V hosts. Dynamic Memory increases the memory utilization in virtualized environments by enabling VM memory to be changed dynamically when the VM is running. This brings up the question of how to utilize this feature with SQL Server VMs as SQL Server performance is very sensitive to the memory being used. In the next three posts we’ll discuss the internals of Dynamic Memory, SQL Server Memory Management and how to use Dynamic Memory with SQL Server VMs. Memory Utilization Efficiency in Virtualized Environments The primary reason memory is usually the bottleneck for higher VM densities is that users tend to be generous when assigning memory to their VMs. Here are some memory sizing practices we’ve heard from customers: · I assign 4 GB of memory to my VMs. I don’t know if all of it is being used by the applications but no one complains. · I take the minimum system requirements and add 50% more. · I go with the recommendations provided by my software vendor. In reality correctly sizing a virtual machine requires significant effort to monitor the memory usage of the applications. Since this is not done in most environments, VMs are usually over-provisioned in terms of memory. In other words, a SQL Server VM that is assigned 4 GB of memory may not need to use 4 GB. How does Dynamic Memory help? Dynamic Memory improves the memory utilization by removing the requirement to determine the memory need for an application. Hyper-V determines the memory needed by applications in the VM by evaluating the memory usage information in the guest with Dynamic Memory. VMs can start with a small amount of memory and they can be assigned more memory dynamically based on the workload of applications running inside. Overview of Dynamic Memory Concepts · Startup Memory: Startup Memory is the starting amount of memory when Dynamic Memory is enabled for a VM. Dynamic Memory will make sure that this amount of memory is always assigned to the VMs by default. · Maximum Memory: Maximum Memory specifies the maximum amount of memory that a VM can grow to with Dynamic Memory. · Memory Demand: Memory Demand is the amount determined by Dynamic Memory as the memory needed by the applications in the VM. In Windows Server 2008 R2 SP1, this is equal to the total amount of committed memory of the VM. · Memory Buffer: Memory Buffer is the amount of memory assigned to the VMs in addition to their memory demand to satisfy immediate memory requirements and file cache needs. Once Dynamic Memory is enabled for a VM, it will start with the “Startup Memory”. After the boot process Dynamic Memory will determine the “Memory Demand” of the VM. Based on this memory demand it will determine the amount of “Memory Buffer” that needs to be assigned to the VM. Dynamic Memory will assign the total of “Memory Demand” and “Memory Buffer” to the VM as long as this value is less than “Maximum Memory” and as long as physical memory is available on the host. What happens when there is not enough physical memory available on the host? Once there is not enough physical memory on the host to satisfy VM needs, Dynamic Memory will assign less than needed amount of memory to the VMs based on their importance. A concept known as “Memory Weight” is used to determine how much VMs should be penalized based on their needed amount of memory. “Memory Weight” is a configuration setting on the VM. It can be configured to be higher for the VMs with high performance requirements. Under high memory pressure on the host, the “Memory Weight” of the VMs are evaluated in a relative manner and the VMs with lower relative “Memory Weight” will be penalized more than the ones with higher “Memory Weight”. Dynamic Memory Configuration Based on these concepts “Startup Memory”, “Maximum Memory”, “Memory Buffer” and “Memory Weight” can be configured as shown below in Windows Server 2008 R2 SP1 Hyper-V Manager. Memory Demand is automatically calculated by Dynamic Memory once VMs start running. Dynamic Memory Monitoring In Windows Server 2008 R2 SP1, Hyper-V Manager displays the memory status of VMs in the following three columns: · Assigned Memory represents the current physical memory assigned to the VM. In regular conditions this will be equal to the sum of “Memory Demand” and “Memory Buffer” assigned to the VM. When there is not enough memory on the host, this value can go below the Memory Demand determined for the VM. · Memory Demand displays the current “Memory Demand” determined for the VM. · Memory Status displays the current memory status of the VM. This column can represent three values for a VM: o OK: In this condition the VM is assigned the total of Memory Demand and Memory Buffer it needs. o Low: In this condition the VM is assigned all the Memory Demand and a certain percentage of the Memory Buffer it needs. o Warning: In this condition the VM is assigned a lower memory than its Memory Demand. When VMs are running in this condition, it’s likely that they will exhibit performance problems due to internal paging happening in the VM. So far so good! But how does it work with SQL Server? SQL Server is aggressive in terms of memory usage for good reasons. This raises the question: How do SQL Server and Dynamic Memory work together? To understand the full story, we’ll first need to understand how SQL Server Memory Management works. This will be covered in our second post in “SQL and Dynamic Memory” series. Meanwhile if you want to dive deeper into Dynamic Memory you can check the below posts from the Windows Virtualization Team Blog: http://blogs.technet.com/virtualization/archive/2010/03/18/dynamic-memory-coming-to-hyper-v.aspx http://blogs.technet.com/virtualization/archive/2010/03/25/dynamic-memory-coming-to-hyper-v-part-2.aspx http://blogs.technet.com/virtualization/archive/2010/04/07/dynamic-memory-coming-to-hyper-v-part-3.aspx http://blogs.technet.com/b/virtualization/archive/2010/04/21/dynamic-memory-coming-to-hyper-v-part-4.aspx http://blogs.technet.com/b/virtualization/archive/2010/05/20/dynamic-memory-coming-to-hyper-v-part-5.aspx http://blogs.technet.com/b/virtualization/archive/2010/07/12/dynamic-memory-coming-to-hyper-v-part-6.aspx - Serdar Sutay Originally posted at http://blogs.msdn.com/b/sqlosteam/

Read the article
SQL Server and Hyper-V Dynamic Memory - Part 1

- by SQLOS Team

SQL and Dynamic Memory Blog Post Series Hyper-V Dynamic Memory is a new feature in Windows Server 2008 R2 SP1 that allows the memory assigned to guest virtual machines to vary according to demand. Using this feature with SQL Server is supported, but how well does it work in an environment where available memory can vary dynamically, especially since SQL Server likes memory, and is not very eager to let go of it? The next three posts will look at this question in detail. In Part 1 Serdar Sutay, a program manager in the Windows Hyper-V team, introduces Dynamic Memory with an overview of the basic architecture, configuration and monitoring concepts. In subsequent parts we will look at SQL Server memory handling, and develop some guidelines on using SQL Server with Dynamic Memory. Part 1: Dynamic Memory Introduction In virtualized environments memory is often the bottleneck for reaching higher VM densities. In Windows Server 2008 R2 SP1 Hyper-V introduced a new feature “Dynamic Memory” to improve VM densities on Hyper-V hosts. Dynamic Memory increases the memory utilization in virtualized environments by enabling VM memory to be changed dynamically when the VM is running. This brings up the question of how to utilize this feature with SQL Server VMs as SQL Server performance is very sensitive to the memory being used. In the next three posts we’ll discuss the internals of Dynamic Memory, SQL Server Memory Management and how to use Dynamic Memory with SQL Server VMs. Memory Utilization Efficiency in Virtualized Environments The primary reason memory is usually the bottleneck for higher VM densities is that users tend to be generous when assigning memory to their VMs. Here are some memory sizing practices we’ve heard from customers: · I assign 4 GB of memory to my VMs. I don’t know if all of it is being used by the applications but no one complains. · I take the minimum system requirements and add 50% more. · I go with the recommendations provided by my software vendor. In reality correctly sizing a virtual machine requires significant effort to monitor the memory usage of the applications. Since this is not done in most environments, VMs are usually over-provisioned in terms of memory. In other words, a SQL Server VM that is assigned 4 GB of memory may not need to use 4 GB. How does Dynamic Memory help? Dynamic Memory improves the memory utilization by removing the requirement to determine the memory need for an application. Hyper-V determines the memory needed by applications in the VM by evaluating the memory usage information in the guest with Dynamic Memory. VMs can start with a small amount of memory and they can be assigned more memory dynamically based on the workload of applications running inside. Overview of Dynamic Memory Concepts · Startup Memory: Startup Memory is the starting amount of memory when Dynamic Memory is enabled for a VM. Dynamic Memory will make sure that this amount of memory is always assigned to the VMs by default. · Maximum Memory: Maximum Memory specifies the maximum amount of memory that a VM can grow to with Dynamic Memory. · Memory Demand: Memory Demand is the amount determined by Dynamic Memory as the memory needed by the applications in the VM. In Windows Server 2008 R2 SP1, this is equal to the total amount of committed memory of the VM. · Memory Buffer: Memory Buffer is the amount of memory assigned to the VMs in addition to their memory demand to satisfy immediate memory requirements and file cache needs. Once Dynamic Memory is enabled for a VM, it will start with the “Startup Memory”. After the boot process Dynamic Memory will determine the “Memory Demand” of the VM. Based on this memory demand it will determine the amount of “Memory Buffer” that needs to be assigned to the VM. Dynamic Memory will assign the total of “Memory Demand” and “Memory Buffer” to the VM as long as this value is less than “Maximum Memory” and as long as physical memory is available on the host. What happens when there is not enough physical memory available on the host? Once there is not enough physical memory on the host to satisfy VM needs, Dynamic Memory will assign less than needed amount of memory to the VMs based on their importance. A concept known as “Memory Weight” is used to determine how much VMs should be penalized based on their needed amount of memory. “Memory Weight” is a configuration setting on the VM. It can be configured to be higher for the VMs with high performance requirements. Under high memory pressure on the host, the “Memory Weight” of the VMs are evaluated in a relative manner and the VMs with lower relative “Memory Weight” will be penalized more than the ones with higher “Memory Weight”. Dynamic Memory Configuration Based on these concepts “Startup Memory”, “Maximum Memory”, “Memory Buffer” and “Memory Weight” can be configured as shown below in Windows Server 2008 R2 SP1 Hyper-V Manager. Memory Demand is automatically calculated by Dynamic Memory once VMs start running. Dynamic Memory Monitoring In Windows Server 2008 R2 SP1, Hyper-V Manager displays the memory status of VMs in the following three columns: · Assigned Memory represents the current physical memory assigned to the VM. In regular conditions this will be equal to the sum of “Memory Demand” and “Memory Buffer” assigned to the VM. When there is not enough memory on the host, this value can go below the Memory Demand determined for the VM. · Memory Demand displays the current “Memory Demand” determined for the VM. · Memory Status displays the current memory status of the VM. This column can represent three values for a VM: o OK: In this condition the VM is assigned the total of Memory Demand and Memory Buffer it needs. o Low: In this condition the VM is assigned all the Memory Demand and a certain percentage of the Memory Buffer it needs. o Warning: In this condition the VM is assigned a lower memory than its Memory Demand. When VMs are running in this condition, it’s likely that they will exhibit performance problems due to internal paging happening in the VM. So far so good! But how does it work with SQL Server? SQL Server is aggressive in terms of memory usage for good reasons. This raises the question: How do SQL Server and Dynamic Memory work together? To understand the full story, we’ll first need to understand how SQL Server Memory Management works. This will be covered in our second post in “SQL and Dynamic Memory” series. Meanwhile if you want to dive deeper into Dynamic Memory you can check the below posts from the Windows Virtualization Team Blog: http://blogs.technet.com/virtualization/archive/2010/03/18/dynamic-memory-coming-to-hyper-v.aspx http://blogs.technet.com/virtualization/archive/2010/03/25/dynamic-memory-coming-to-hyper-v-part-2.aspx http://blogs.technet.com/virtualization/archive/2010/04/07/dynamic-memory-coming-to-hyper-v-part-3.aspx http://blogs.technet.com/b/virtualization/archive/2010/04/21/dynamic-memory-coming-to-hyper-v-part-4.aspx http://blogs.technet.com/b/virtualization/archive/2010/05/20/dynamic-memory-coming-to-hyper-v-part-5.aspx http://blogs.technet.com/b/virtualization/archive/2010/07/12/dynamic-memory-coming-to-hyper-v-part-6.aspx - Serdar Sutay Originally posted at http://blogs.msdn.com/b/sqlosteam/

Read the article
Revisiting ANTS Performance Profiler 7.4

- by James Michael Hare

Last year, I did a small review on the ANTS Performance Profiler 6.3, now that it’s a year later and a major version number higher, I thought I’d revisit the review and revise my last post. This post will take the same examples as the original post and update them to show what’s new in version 7.4 of the profiler. Background A performance profiler’s main job is to keep track of how much time is typically spent in each unit of code. This helps when we have a program that is not running at the performance we expect, and we want to know where the program is experiencing issues. There are many profilers out there of varying capabilities. Red Gate’s typically seem to be the very easy to “jump in” and get started with very little training required. So let’s dig into the Performance Profiler. I’ve constructed a very crude program with some obvious inefficiencies. It’s a simple program that generates random order numbers (or really could be any unique identifier), adds it to a list, sorts the list, then finds the max and min number in the list. Ignore the fact it’s very contrived and obviously inefficient, we just want to use it as an example to show off the tool: 1: // our test program 2: public static class Program 3: { 4: // the number of iterations to perform 5: private static int _iterations = 1000000; 6: 7: // The main method that controls it all 8: public static void Main() 9: { 10: var list = new List<string>(); 11: 12: for (int i = 0; i < _iterations; i++) 13: { 14: var x = GetNextId(); 15: 16: AddToList(list, x); 17: 18: var highLow = GetHighLow(list); 19: 20: if ((i % 1000) == 0) 21: { 22: Console.WriteLine("{0} - High: {1}, Low: {2}", i, highLow.Item1, highLow.Item2); 23: Console.Out.Flush(); 24: } 25: } 26: } 27: 28: // gets the next order id to process (random for us) 29: public static string GetNextId() 30: { 31: var random = new Random(); 32: var num = random.Next(1000000, 9999999); 33: return num.ToString(); 34: } 35: 36: // add it to our list - very inefficiently! 37: public static void AddToList(List<string> list, string item) 38: { 39: list.Add(item); 40: list.Sort(); 41: } 42: 43: // get high and low of order id range - very inefficiently! 44: public static Tuple<int,int> GetHighLow(List<string> list) 45: { 46: return Tuple.Create(list.Max(s => Convert.ToInt32(s)), list.Min(s => Convert.ToInt32(s))); 47: } 48: } So let’s run it through the profiler and see what happens! Visual Studio Integration First, let’s look at how the ANTS profilers integrate with Visual Studio’s menu system. Once you install the ANTS profilers, you will get an ANTS menu item with several options: Notice that you can either Profile Performance or Launch ANTS Performance Profiler. These sound similar but achieve two slightly different actions: Profile Performance: this immediately launches the profiler with all defaults selected to profile the active project in Visual Studio. Launch ANTS Performance Profiler: this launches the profiler much the same way as starting it from the Start Menu. The profiler will pre-populate the application and path information, but allow you to change the settings before beginning the profile run. So really, the main difference is that Profile Performance immediately begins profiling with the default selections, where Launch ANTS Performance Profiler allows you to change the defaults and attach to an already-running application. Let’s Fire it Up! So when you fire up ANTS either via Start Menu or Launch ANTS Performance Profiler menu in Visual Studio, you are presented with a very simple dialog to get you started: Notice you can choose from many different options for application type. You can profile executables, services, web applications, or just attach to a running process. In fact, in version 7.4 we see two new options added: ASP.NET Web Application (IIS Express) SharePoint web application (IIS) So this gives us an additional way to profile ASP.NET applications and the ability to profile SharePoint applications as well. You can also choose your level of detail in the Profiling Mode drop down. If you choose Line-Level and method-level timings detail, you will get a lot more detail on the method durations, but this will also slow down profiling somewhat. If you really need the profiler to be as unintrusive as possible, you can change it to Sample method-level timings. This is performing very light profiling, where basically the profiler collects timings of a method by examining the call-stack at given intervals. Which method you choose depends a lot on how much detail you need to find the issue and how sensitive your program issues are to timing. So for our example, let’s just go with the line and method timing detail. So, we check that all the options are correct (if you launch from VS2010, the executable and path are filled in already), and fire it up by clicking the [Start Profiling] button. Profiling the Application Once you start profiling the application, you will see a real-time graph of CPU usage that will indicate how much your application is using the CPU(s) on your system. During this time, you can select segments of the graph and bookmark them, giving them mnemonic names. This can be useful if you want to compare performance in one part of the run to another part of the run. Notice that once you select a block, it will give you the call tree breakdown for that selection only, and the relative performance of those calls. Once you feel you have collected enough information, you can click [Stop Profiling] to stop the application run and information collection and begin a more thorough analysis. Analyzing Method Timings So now that we’ve halted the run, we can look around the GUI and see what we can see. By default, the times are shown in terms of percentage of time of the total run of the application, though you can change it in the View menu item to milliseconds, ticks, or seconds as well. This won’t affect the percentages of methods, it only affects what units the times are shown. Notice also that the major hotspot seems to be in a method without source, ANTS Profiler will filter these out by default, but you can right-click on the line and remove the filter to see more detail. This proves especially handy when a bottleneck is due to a method in the BCL. So now that we’ve removed the filter, we see a bit more detail: In addition, ANTS Performance Profiler gives you the ability to decompile the methods without source so that you can dive even deeper, though typically this isn’t necessary for our purposes. When looking at timings, there are generally two types of timings for each method call: Time: This is the time spent ONLY in this method, not including calls this method makes to other methods. Time With Children: This is the total of time spent in both this method AND including calls this method makes to other methods. In other words, the Time tells you how much work is being done exclusively in this method, and the Time With Children tells you how much work is being done inclusively in this method and everything it calls. You can also choose to display the methods in a tree or in a grid. The tree view is the default and it shows the method calls arranged in terms of the tree representing all method calls and the parent method that called them, etc. This is useful for when you find a hot-spot method, you can see who is calling it to determine if the problem is the method itself, or if it is being called too many times. The grid method represents each method only once with its totals and is useful for quickly seeing what method is the trouble spot. In addition, you can choose to display Methods with source which are generally the methods you wrote (as opposed to native or BCL code), or Any Method which shows not only your methods, but also native calls, JIT overhead, synchronization waits, etc. So these are just two ways of viewing the same data, and you’re free to choose the organization that best suits what information you are after. Analyzing Method Source If we look at the timings above, we see that our AddToList() method (and in particular, it’s call to the List<T>.Sort() method in the BCL) is the hot-spot in this analysis. If ANTS sees a method that is consuming the most time, it will flag it as a hot-spot to help call out potential areas of concern. This doesn’t mean the other statistics aren’t meaningful, but that the hot-spot is most likely going to be your biggest bang-for-the-buck to concentrate on. So let’s select the AddToList() method, and see what it shows in the source window below: Notice the source breakout in the bottom pane when you select a method (from either tree or grid view). This shows you the timings in this method per line of code. This gives you a major indicator of where the trouble-spot in this method is. So in this case, we see that performing a Sort() on the List<T> after every Add() is killing our performance! Of course, this was a very contrived, duh moment, but you’d be surprised how many performance issues become duh moments. Note that this one line is taking up 86% of the execution time of this application! If we eliminate this bottleneck, we should see drastic improvement in the performance. So to fix this, if we still wanted to maintain the List<T> we’d have many options, including: delay Sort() until after all Add() methods, using a SortedSet, SortedList, or SortedDictionary depending on which is most appropriate, or forgoing the sorting all together and using a Dictionary. Rinse, Repeat! So let’s just change all instances of List<string> to SortedSet<string> and run this again through the profiler: Now we see the AddToList() method is no longer our hot-spot, but now the Max() and Min() calls are! This is good because we’ve eliminated one hot-spot and now we can try to correct this one as well. As before, we can then optimize this part of the code (possibly by taking advantage of the fact the list is now sorted and returning the first and last elements). We can then rinse and repeat this process until we have eliminated as many bottlenecks as possible. Calls by Web Request Another feature that was added recently is the ability to view .NET methods grouped by the HTTP requests that caused them to run. This can be helpful in determining which pages, web services, etc. are causing hot spots in your web applications. Summary If you like the other ANTS tools, you’ll like the ANTS Performance Profiler as well. It is extremely easy to use with very little product knowledge required to get up and running. There are profilers built into the higher product lines of Visual Studio, of course, which are also powerful and easy to use. But for quickly jumping in and finding hot spots rapidly, Red Gate’s Performance Profiler 7.4 is an excellent choice. Technorati Tags: Influencers,ANTS,Performance Profiler,Profiler

Read the article
Not Playing Nice Together

- by David Douglass

One of the things I’ve noticed is that two industry trends are not playing nice together, those trends being multi-core CPUs and massive hard drives. It’s not a problem if you keep your cores busy with compute intensive work, but for software developers the beauty of multi-core CPUs (along with gobs of RAM and a 64 bit OS) is virtualization. But when you have only one hard drive (who needs another when it holds 2 TB of data?) you wind up with a serious hard drive bottleneck. A solid state drive would definitely help, and might even be a complete solution, but the cost is ridiculous. Two TB of solid state storage will set you back around $7,000! A spinning 2 TB drive is only $150. I see a couple of solutions for this. One is the mainframe concept of near and far storage: put the stuff that will be heavily access on a solid state drive and the rest on a spinning drive. Another solution is multiple spinning drives. Instead of a single 2 TB drive, get four 500 GB drives. In total, the four 500 GB drives will cost about $100 more than the single 2 TB drive. You’ll need to be smart about what drive you place things on so that the load is spread evenly. Another option, for better performance, would be four 10,000 RPM 300 GB drives, but that would cost about $800 more than the singe 2 TB drive and would deliver only 1.2 TB of space. All pricing based on Microcenter as of March 14, 2010.

Read the article
Determine nginx reverse-proxy load limits

- by Aaron

Hi all: I have an nginx server (CentOS 5.3, linux) that I'm using as a reverse-proxy load-balancer in front of 8 ruby on rails application servers. As our load on these servers increases, I'm beginning to wonder at what point will the nginx server become a bottleneck? The CPUs are hardly used, but that's to be expected. The memory seems to be fine. No IO to speak of. So is my only limitation bandwidth on the NICs? Currently, according to some cacti graphs, the server is hitting around 700Kbps ( 5 min average ) on each NIC during high load. I would think this is still pretty low. Or, will the limit be in sockets or some other resource in the operating system? Thanks for any thoughts and insights. Aaron

Read the article
Performance Optimization – It Is Faster When You Can Measure It

- by Alois Kraus

Performance optimization in bigger systems is hard because the measured numbers can vary greatly depending on the measurement method of your choice. To measure execution timing of specific methods in your application you usually use Time Measurement Method Potential Pitfalls Stopwatch Most accurate method on recent processors. Internally it uses the RDTSC instruction. Since the counter is processor specific you can get greatly different values when your thread is scheduled to another core or the core goes into a power saving mode. But things do change luckily: Intel's Designer's vol3b, section 16.11.1 "16.11.1 Invariant TSC The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource." DateTime.Now Good but it has only a resolution of 16ms which can be not enough if you want more accuracy. Reporting Method Potential Pitfalls Console.WriteLine Ok if not called too often. Debug.Print Are you really measuring performance with Debug Builds? Shame on you. Trace.WriteLine Better but you need to plug in some good output listener like a trace file. But be aware that the first time you call this method it will read your app.config and deserialize your system.diagnostics section which does also take time. In general it is a good idea to use some tracing library which does measure the timing for you and you only need to decorate some methods with tracing so you can later verify if something has changed for the better or worse. In my previous article I did compare measuring performance with quantum mechanics. This analogy does work surprising well. When you measure a quantum system there is a lower limit how accurately you can measure something. The Heisenberg uncertainty relation does tell us that you cannot measure of a quantum system the impulse and location of a particle at the same time with infinite accuracy. For programmers the two variables are execution time and memory allocations. If you try to measure the timings of all methods in your application you will need to store them somewhere. The fastest storage space besides the CPU cache is the memory. But if your timing values do consume all available memory there is no memory left for the actual application to run. On the other hand if you try to record all memory allocations of your application you will also need to store the data somewhere. This will cost you memory and execution time. These constraints are always there and regardless how good the marketing of tool vendors for performance and memory profilers are: Any measurement will disturb the system in a non predictable way. Commercial tool vendors will tell you they do calculate this overhead and subtract it from the measured values to give you the most accurate values but in reality it is not entirely true. After falling into the trap to trust the profiler timings several times I have got into the habit to Measure with a profiler to get an idea where potential bottlenecks are. Measure again with tracing only the specific methods to check if this method is really worth optimizing. Optimize it Measure again. Be surprised that your optimization has made things worse. Think harder Implement something that really works. Measure again Finished! - Or look for the next bottleneck. Recently I have looked into issues with serialization performance. For serialization DataContractSerializer was used and I was not sure if XML is really the most optimal wire format. After looking around I have found protobuf-net which uses Googles Protocol Buffer format which is a compact binary serialization format. What is good for Google should be good for us. A small sample app to check out performance was a matter of minutes: using ProtoBuf; using System; using System.Diagnostics; using System.IO; using System.Reflection; using System.Runtime.Serialization; [DataContract, Serializable] class Data { [DataMember(Order=1)] public int IntValue { get; set; } [DataMember(Order = 2)] public string StringValue { get; set; } [DataMember(Order = 3)] public bool IsActivated { get; set; } [DataMember(Order = 4)] public BindingFlags Flags { get; set; } } class Program { static MemoryStream _Stream = new MemoryStream(); static MemoryStream Stream { get { _Stream.Position = 0; _Stream.SetLength(0); return _Stream; } } static void Main(string[] args) { DataContractSerializer ser = new DataContractSerializer(typeof(Data)); Data data = new Data { IntValue = 100, IsActivated = true, StringValue = "Hi this is a small string value to check if serialization does work as expected" }; var sw = Stopwatch.StartNew(); int Runs = 1000 * 1000; for (int i = 0; i < Runs; i++) { //ser.WriteObject(Stream, data); Serializer.Serialize<Data>(Stream, data); } sw.Stop(); Console.WriteLine("Did take {0:N0}ms for {1:N0} objects", sw.Elapsed.TotalMilliseconds, Runs); Console.ReadLine(); } } The results are indeed promising: Serializer Time in ms N objects protobuf-net 807 1000000 DataContract 4402 1000000 Nearly a factor 5 faster and a much more compact wire format. Lets use it! After switching over to protbuf-net the transfered wire data has dropped by a factor two (good) and the performance has worsened by nearly a factor two. How is that possible? We have measured it? Protobuf-net is much faster! As it turns out protobuf-net is faster but it has a cost: For the first time a type is de/serialized it does use some very smart code-gen which does not come for free. Lets try to measure this one by setting of our performance test app the Runs value not to one million but to 1. Serializer Time in ms N objects protobuf-net 85 1 DataContract 24 1 The code-gen overhead is significant and can take up to 200ms for more complex types. The break even point where the code-gen cost is amortized by its faster serialization performance is (assuming small objects) somewhere between 20.000-40.000 serialized objects. As it turned out my specific scenario involved about 100 types and 1000 serializations in total. That explains why the good old DataContractSerializer is not so easy to take out of business. The final approach I ended up was to reduce the number of types and to serialize primitive types via BinaryWriter directly which turned out to be a pretty good alternative. It sounded good until I measured again and found that my optimizations so far do not help much. After looking more deeper at the profiling data I did found that one of the 1000 calls did take 50% of the time. So how do I find out which call it was? Normal profilers do fail short at this discipline. A (totally undeserved) relatively unknown profiler is SpeedTrace which does unlike normal profilers create traces of your applications by instrumenting your IL code at runtime. This way you can look at the full call stack of the one slow serializer call to find out if this stack was something special. Unfortunately the call stack showed nothing special. But luckily I have my own tracing as well and I could see that the slow serializer call did happen during the serialization of a bool value. When you encounter after much analysis something unreasonable you cannot explain it then the chances are good that your thread was suspended by the garbage collector. If there is a problem with excessive GCs remains to be investigated but so far the serialization performance seems to be mostly ok. When you do profile a complex system with many interconnected processes you can never be sure that the timings you just did measure are accurate at all. Some process might be hitting the disc slowing things down for all other processes for some seconds as well. There is a big difference between warm and cold startup. If you restart all processes you can basically forget the first run because of the OS disc cache, JIT and GCs make the measured timings very flexible. When you are in need of a random number generator you should measure cold startup times of a sufficiently complex system. After the first run you can try again getting different and much lower numbers. Now try again at least two times to get some feeling how stable the numbers are. Oh and try to do the same thing the next day. It might be that the bottleneck you found yesterday is gone today. Thanks to GC and other random stuff it can become pretty hard to find stuff worth optimizing if no big bottlenecks except bloatloads of code are left anymore. When I have found a spot worth optimizing I do make the code changes and do measure again to check if something has changed. If it has got slower and I am certain that my change should have made it faster I can blame the GC again. The thing is that if you optimize stuff and you allocate less objects the GC times will shift to some other location. If you are unlucky it will make your faster working code slower because you see now GCs at times where none were before. This is where the stuff does get really tricky. A safe escape hatch is to create a repro of the slow code in an isolated application so you can change things fast in a reliable manner. Then the normal profilers do also start working again. As Vance Morrison does point out it is much more complex to profile a system against the wall clock compared to optimize for CPU time. The reason is that for wall clock time analysis you need to understand how your system does work and which threads (if you have not one but perhaps 20) are causing a visible delay to the end user and which threads can wait a long time without affecting the user experience at all. Next time: Commercial profiler shootout.

Read the article
PASS 13 Dispatches: Memory Optimized = On

- by Tony Davis

I'm at the PASS Summit in Charlotte for the Day 1 keynote by Quentin Clarke, Corporate VP of the data platform group at Microsoft. He's talking about how SQL Server 2014 is “pushing boundaries” and first up is SQL Server 2014's In-Memory OLTP technology (former codename “hekaton”) It is a feature that provokes a lot of interest and for good reason as, without any need for application rewrites or hardware updates, it can enable us to ensure that an application can find in memory most or all of the data it needs, and can lead to huge improvements in processing times. A good recent hekaton use cases article talks about applications that need a “Shock Absorber” when either spikes or just a high rate of incoming workload (including data in ETL scenarios) become a primary bottleneck. To get a really deep look at this technology, I would check out David DeWitt's summit keynote tomorrow (it will be live streamed). Other than that, to get started I'd recommend Kalen Delaney's whitepaper. She offers a lot of insight into how it works and how to start to define memory-optimized tables, and natively compiled stored procedures. These memory-optimized tables uses completely optimistic multi-version concurrency control – no waiting on locks! After that, Tom LaRock has compiled a useful set of links to drill deeper, and includes one to Microsoft's AMR tool to help you gauge the tables that might benefit most. Tony.

Read the article
Why Oracle Delivers More Value than IBM in Data Integration Solutions

- by irem.radzik(at)oracle.com

For data integration projects, IT organization look for a robust but an easy-to-use solution, which simplifies enterprise data architecture while providing exceptional value-- not one that adds complexity and costs. This is a major challenge today for customers who are using IBM InfoSphere products like DataStage or Change Data Capture. Whereas, Oracle consistently delivers higher level value with its data integration products such as Oracle Data Integrator, Oracle GoldenGate. There are many differentiators for Oracle's Data Integration offering in comparison to IBM. Here are the top five: Lower cost of ownership Higher performance in both real-time and bulk data movement Ease of use and flexibility Reliability Complete, Open, and Integrated Middleware Offering Architectural differences between products contribute a great deal to these differences. First of all, Oracle's ETL architecture does not require a middle-tier transformation server, something IBM does require. Not only it costs more to manage an additional transformation server including energy costs, but it adds a performance bottleneck as well. In addition, IBM's data integration products are complex and often require lengthy professional services engagements to integrate. This translates to higher costs and delayed time to market. Then there's the reliability factor. Our customers choose Oracle GoldenGate over IBM's InfoSphere Change Data Capture product because Oracle GoldenGate is designed for mission-critical systems that require guaranteed data delivery and automatic recovery in case of process interruptions. On Thursday we will discuss these key differentiators in detail and provide customer examples that chose Oracle over IBM in data integration projects. Join us on Thursday Feb 10th at 11am PT to learn how Oracle delivers more value than IBM in data integration solutions.

Read the article
Home ZFS based NAS...What processor/chipset to use?

- by MrBlargityBlarg

So, I'm building a home/personal NAS. My plan is to expose both SMB fileshares for sharing files/media between hosts, but also to carve an iSCSI target LUN out of it for use by VMWare as a datastore. I want to use ZFS (software RAID) so that means I'll either be using FreeNAS, Solaris Express, or OpenIndiana. My question is basically: How much horsepower do I need? Obviously I/O is going to be my bottleneck but I want to be sure that I am not limiting my I/O because of a slow processor or chipset. So far the hardware plan is to use an Intel i3 and motherboard with one of the H87, Q87, or Z87 chipsets, a SAS controller (JBOD, no RAID) and if budget allows, I'm also hoping to get an SSD for the ZFS L2ARC and ZIL. Does anyone think I could get away with an Intel Atom or cheaper/less-capable processor/chipset than the i3 and [HQZ]87 listed above?

Read the article
Configuring home wireless network

- by dvanaria

I'm new to setting up a home wireless network. I have Comcast tv/internet/phone service (modem included) as well as a wireless router. My question is pretty basic. How can I tell the performance of the following parts of the network? 1. incoming internet speed 2. speed of the modem 3. speed of the wireless router I basically want as fast an internet connection as possible, of course, but I'm not sure where to look for the bottleneck (and so, not sure where I can spend some money to speed things up). Right now I'm getting about 36 Mbps (as it shows in Windows). If I run an online speed test (xfinity has one) it shows Average download speed of 14.91 Mbps and Average upload speed of 5.72 Mbps. Thanks for your help.

Read the article
Whole Lotta Virtualization Goin' On

- by rickramsey

Lately we've published a lot of content about virtualization. Here's a sampling. Podcat: Technology Preview of Transcendent Memory Turns out that in a virtual environment, RAM is the bottleneck. Not because it's slow, it's not, but because each CPU still had to use its own RAM. Which gets expensive. In this podcast, Dan Magenheimer describes how Oracle and the open source community taught the guest kernel in Oracle Linux to share its memory with other CPU's. Transcendent memory will wind up saving large data centers a lot of money. Find out how. Tech Article: How to Use Oracle VM Templates This article describes how to prepare an Oracle VM environment to use Oracle VM Templates, how to obtain a template, and how to deploy the template to your Oracle VM environment. It also describes how to create a virtual machine based on that template and how you can clone the template and change the clone's configuration. Tech Article: How to Set Up a Load Balanced Application Across Two Oracle Solaris Zones Install Apache Tomcat on two Oracle Solaris zones. Connect them across a VPN. And let the Integrated Load Balancer in Oracle Solaris 11 manage traffic. Presto: high(er) availability in a single server. Tech Article: How to Install Oracle RAC on Oracle Solaris Zone Clusters Learn how to implement a multi-tiered database environment that isolates database tiers and administrative domains, while taking advantage of centralized (and simpler) cluster admin. For fans of Jerry Lee Lewis If you're a fan of Jerry Lee Lewis, you might enjoy this video. - Rick Website Newsletter Facebook Twitter

Read the article
xVelocity engines compared: VertiPaq vs ColumnStore #ssas #vertipaq #xvelocity #sql #tabular

- by Marco Russo (SQLBI)

During the last months I and Alberto worked in several projects using Analysis Services Tabular and we had to face real world issues, such as complex queries, large data volume, frequent data updates and so on. Sometime we faced the challenge of comparing Tabular performance with SQL Server. It seemed a non-sense, because even if the same core xVelocity technology is implemented in both products (SQL Server 2012 uses ColumnStore indexes, whereas Analysis Services 2012 uses VertiPaq), we initially assumed that the better optimization for the in-memory engine used by Analysis Services would have been always better than SQL Server. However, we discovered several important things: Processing time might be different and having data on SQL Server could make ColumnStore way faster for processing. Partitioning in SQL Server might be much more effective for query performance than Analysis Services. A single query can scale easily on more processor on SQL Server, whereas in Analysis Services the formula engine is single-threaded and could be a bottleneck for certain queries. In case of a large workload with many concurrent users, storage engine cache in Analysis Services could be a big advantage over SQL Server, especially for scalability As you can see, these considerations are not always obvious and you might be tempted to make other assumptions based on these information. Well, don’t do that. Before anything else, read the whitepaper VertiPaq vs ColumnStore Comparison written by Alberto Ferrari. Then, measure your workload. Finally, make some conclusion. But don’t make too many assumptions. You might be wrong, as we did at the beginning of this journey.

Read the article
SQL SERVER – A Funny Cartoon on Index

- by pinaldave

Performance Tuning has been my favorite subject and I have done it for many years now. Today I will list one of the most common conversation about Index I have heard in my life. Every single time, I am at consultation for performance tuning I hear following conversation among various team members. I want to ask you, does this kind of conversation happens in your organization? Any way, If you think Index solves all of your performance problem I think it is not true. There are many other reason one has to consider along with Indexes. For example I consider following various topic one need to understand for performance tuning. ?Logical Query Processing ?Efficient Join Techniques ?Query Tuning Considerations ?Avoiding Common Performance Tuning Issues Statistics and Best Practices ?TempDB Tuning ?Hardware Planning ?Understanding Query Processor ?Using SQL Server 2005 and 2008 Updated Feature Sets ?CPU, Memory, I/O Bottleneck Index Tuning (of course) ?Many more… Well, I have written this blog thinking I will keep this blog post a bit easy and not load up. I will in future discuss about other performance tuning concepts. Let me know what do you think about the cartoon I made. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Humor, SQL Index, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Increasing FreeBSD threads

- by sh-beta

For network apps that create one thread per connection (like Pound), threadcount can become a bottleneck on the number of concurrent connections you can server. I'm running FreeBSD 8 x64: $ sysctl kern.maxproc kern.maxproc: 6164 $ sysctl kern.threads.max_threads_per_proc kern.threads.max_threads_per_proc: 1500 $ limits Resource limits (current): cputime infinity secs filesize infinity kB datasize 33554432 kB stacksize 524288 kB coredumpsize infinity kB memoryuse infinity kB memorylocked infinity kB maxprocesses 5547 openfiles 200000 sbsize infinity bytes vmemoryuse infinity kB pseudo-terminals infinity swapuse infinity kB I want to increase kern.threads.max_threads_per_proc to 4096. Assuming each thread starts with a stack size of 512k, what else do I need to change to ensure that I don't hose my machine?

Read the article
Firefox "auto-complete" is very slow

- by netvope

Firefox version: 3.6 My places.sqlite is rather big (114MB, after being optimized by SpeedyFox.) If I turn on auto-complete, it may take 1 or 2 seconds for Firefox to accept a newly typed URL. To reproduce the issue: Type a URL into the URL bar, press enter. Nothing happens, and Firefox consumes 100% CPU (actually 50% of 2 cores) for 1 to 2 seconds Then Firefox start the network connection and load the webpage. Since it consumes 100% CPU, I don't think the bottleneck is the disk. I have some experience with SQLite and I know a 100MB DB is very small. To achieve the delay Firefox must be doing some expensive processing or inefficient queries. The issue does not appear if: auto-complete is turned off, or the URL is frequently used, or a new profile with no history is used Does anyone have any idea how to solve the problem? Should I file this as a bug? I don't want to give up my 100MB history, but I don't want to give up auto-complete either :)

Read the article
Questioning pythonic type checking

- by Pace

I've seen countless times the following approach suggested for "taking in a collection of objects and doing X if X is a Y and ignoring the object otherwise" def quackAllDucks(ducks): for duck in ducks: try: duck.quack("QUACK") except AttributeError: #Not a duck, can't quack, don't worry about it pass The alternative implementation below always gets flak for the performance hit caused by type checking def quackAllDucks(ducks): for duck in ducks: if hasattr(duck,"quack"): duck.quack("QUACK") However, it seems to me that in 99% of scenarios you would want to use the second solution because of the following: If the user gets the parameters wrong then they will not be treated like a duck and there will be no indication. A lot of time will be wasted debugging why there is no quacking going on until the user finally realizes his silly mistake. The second solution would throw a stack trace as soon the user tried to quack. If the user has any bugs in their quack() method which cause an AttributeError then those bugs will be silently swallowed. Once again time will be wasted digging for the bug when the second solution would simply give a stack trace showing the immediate issue. In fact, it seems to me that the only time you would ever want to use the first method is when: The block of code in question is in an extremely performance critical section of your application. Following the principal of "avoid premature optimization" you would only realize this of course, after you had implemented the safer approach and found it to be a bottleneck. There are many types of quacking objects out there and you are only interested in quacking objects that quack with a very specific set of arguments (this seems to be a very rare case to me). Given this, why is it that so many people prefer the first approach over the second approach? What is it that I am missing? Also, I realize there are other solutions (such as using abcs) but these are the two solutions I seem to see most often for the basic case.

Read the article
Improving SpriteBatch performance for tiles

- by Richard Rast

I realize this is a variation on what has got to be a common question, but after reading several (good answers) I'm no closer to a solution here. So here's my situation: I'm making a 2D game which has (among some other things) a tiled world, and so, drawing this world implies drawing a jillion tiles each frame (depending on resolution: it's roughly a 64x32 tile with some transparency). Now I want the user to be able to maximize the game (or fullscreen mode, actually, as its a bit more efficient) and instead of scaling textures (bleagh) this will just allow lots and lots of tiles to be shown at once. Which is great! But it turns out this makes upward of 2000 tiles on the screen each time, and this is framerate-limiting (I've commented out enough other parts of the game to make sure this is the bottleneck). It gets worse if I use multiple source rectangles on the same texture (I use a tilesheet; I believe changing textures entirely makes things worse), or if you tint the tiles, or whatever. So, the general question is this: What are some general methods for improving the drawing of thousands of repetitive sprites? Answers pertaining to XNA's SpriteBatch would be helpful but I'm equally happy with general theory. Also, any tricks pertaining to this situation in particular (drawing a tiled world efficiently) are also welcome. I really do want to draw all of them, though, and I need the SpriteMode.BackToFront to be active, because

Read the article

Search Results

Search found 415 results on 17 pages for 'bottleneck'.

Page 6/17 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

- by OverTheRainbow

- by ugasoft

- by Middletone

- by Reed

- by James Michael Hare

- by pinaldave

- by svrist

- by pinaldave

- by SQLOS Team

- by SQLOS Team

- by James Michael Hare

- by David Douglass

- by Aaron

- by Alois Kraus

- by Tony Davis

- by irem.radzik(at)oracle.com

- by MrBlargityBlarg

- by dvanaria

- by rickramsey

- by Marco Russo (SQLBI)

- by pinaldave

- by sh-beta

- by netvope

- by Pace

- by Richard Rast

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >