Search Results

Search found 1282 results on 52 pages for 'overhead'.

Page 7/52 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

The Kayak Framework: An easy way to speak HTTP with .NET

Kayak is a lightweight HTTP server for the CLR, and the Kayak Framework is a utility for mapping HTTP requests to C# method invocations. With Kayak, you can skip the bulk, hassle, and overhead of IIS and ASP.NET. Kayak enables you to do more with less syntax, and is easy to configure to work in any way you care to dream up.

Read the article
Finding Legitimate Work from Home

With the economy still recovering from a recession, many companies are hiring work from home contractors. To companies hiring work from home transcription contractors saves on overhead. One great wor... [Author: Jennifer Yaniz - Computers and Internet - April 08, 2010]

Read the article
GZipped Images - Is It Worth?

- by charlie

Most image formats are already compressed. But in fact, if I take an image and compress it [gzipping it], and then I compare the compressed one to the uncompressed one, there is a difference in size, even though not such a dramatic difference. The question is: is it worth gzipping images? the content size flushed down to the client's browser will be smaller, but there will be some client overhead when de-gzipping it. Please advise.

Read the article
Existent js libs for tileset / map loading and rendering?

- by ylluminate

I'm building an rts style overhead tileset game with JavaScript (particularly using Ember.js framework as a base). The map is so large that I'd very much like to be able to load and render the board and layered items in a Google Maps'esque. I'm curious as to whether there are existing libs that would be helpful and already well thought out in these regards vs trying to reinvent the wheel. Are there any such libraries or code examples that would be useful in this area of board / map management?

Read the article
Build Sales As an Exhibitor at a Marketing Expo

A marketing expo is the ideal opportunity to showcase products or services for groups that are most interested in a certain company's field of expertise. Most exhibitor spaces are relatively inexpensive compared to traditional advertising costs. It is easy to recover the booth rental and supply overhead with just a few good sales or networking connections.

Read the article
Software Programming Research Promises Faster Applications

<b>Information Week:</b> "A new approach to memory management allows computer code to operate more efficiently on multicore processors and can reduce the overhead of security checks."

Read the article
Granularity and Parallel Performance

One key to attaining good parallel performance is choosing the right granularity for the application. The goal is to determine the right granularity (usually larger is better) for parallel tasks, while avoiding load imbalance and communication overhead to achieve the best performance.

Read the article
Performance variation

- by Ree

During my time spent working with multiple machines, I have noticed that performance of the same machine doing the same tasks in the same order differs and sometimes the difference is big enough to be noticeable. This applies to all the machines I've owned and/or maintained (old and modern). Some examples (many of them you may have noticed yourself) that sometimes are completed in different time frames: POST OS installation Hardware tests and operations (usually executed within a customized OS such as one of the many DOS variants), HDD tests and "low level" formats Software installation or other tasks (such as benchmarks) within a general purpose OS (Windows, Linux, etc) I can imagine this is caused by the fact that a machine is built with many components having to communicate as a whole and since the mechanical and electronic parts aren't perfect the overhead occurs. In the last example, I assume the OS complexity and concurrently running multiple processes has some additional effect as well. However, I'm wondering if this hardware imperfection and overhead is indeed that high to be humanly noticeable? Maybe there are other factors that are influencial as much or even more? So, in short - why? To emphasize: the difference is noticeable on the same machine performing the same tasks and this applies to ANY machine in my experience. I'm not comparing machine to machine performance.

Read the article
What are the advantages and disadvantages of the various virtual machine image formats?

- by Matt

Xen and Virtualbox etc both support a range of different virtual machine image formats. These are: vmdk, vdi, qcow & qcow2, hdd & vhd. Without any bias toward a particular product, I'm wanting to know what are the advantages and disadvantages of the various formats both from a features perspective, robustness and speed? One piece of info I discovered in a forum post was this: "The major difference is that VDI uses relatively large blocks (1MB) when growing an image, and thus has less overhead for block pointers etc. but isn't ultimately space efficient in the sense that if a single byte is non-zero in such a 1MB block the entire space is used. VMDK in contrast uses 64K blocks, and thus has more management overhead and generally a bit less disk space consumption What offsets this is that VDI is more efficient when it comes to snapshots." You might be thinking, I want to know this because I want to know which format to choose? Not exactly, I'm developing some software which utilises these formats and want to support one or more of them. Simplicity, large disks and ease of development are my main drivers.

Read the article
Parallelism in .NET – Part 2, Simple Imperative Data Parallelism

- by Reed

In my discussion of Decomposition of the problem space, I mentioned that Data Decomposition is often the simplest abstraction to use when trying to parallelize a routine. If a problem can be decomposed based off the data, we will often want to use what MSDN refers to as Data Parallelism as our strategy for implementing our routine. The Task Parallel Library in .NET 4 makes implementing Data Parallelism, for most cases, very simple. Data Parallelism is the main technique we use to parallelize a routine which can be decomposed based off data. Data Parallelism refers to taking a single collection of data, and having a single operation be performed concurrently on elements in the collection. One side note here: Data Parallelism is also sometimes referred to as the Loop Parallelism Pattern or Loop-level Parallelism. In general, for this series, I will try to use the terminology used in the MSDN Documentation for the Task Parallel Library. This should make it easier to investigate these topics in more detail. Once we’ve determined we have a problem that, potentially, can be decomposed based on data, implementation using Data Parallelism in the TPL is quite simple. Let’s take our example from the Data Decomposition discussion – a simple contrast stretching filter. Here, we have a collection of data (pixels), and we need to run a simple operation on each element of the pixel. Once we know the minimum and maximum values, we most likely would have some simple code like the following: for (int row=0; row < pixelData.GetUpperBound(0); ++row) { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This simple routine loops through a two dimensional array of pixelData, and calls the AdjustContrast routine on each pixel. As I mentioned, when you’re decomposing a problem space, most iteration statements are potentially candidates for data decomposition. Here, we’re using two for loops – one looping through rows in the image, and a second nested loop iterating through the columns. We then perform one, independent operation on each element based on those loop positions. This is a prime candidate – we have no shared data, no dependencies on anything but the pixel which we want to change. Since we’re using a for loop, we can easily parallelize this using the Parallel.For method in the TPL: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Here, by simply changing our first for loop to a call to Parallel.For, we can parallelize this portion of our routine. Parallel.For works, as do many methods in the TPL, by creating a delegate and using it as an argument to a method. In this case, our for loop iteration block becomes a delegate creating via a lambda expression. This lets you write code that, superficially, looks similar to the familiar for loop, but functions quite differently at runtime. We could easily do this to our second for loop as well, but that may not be a good idea. There is a balance to be struck when writing parallel code. We want to have enough work items to keep all of our processors busy, but the more we partition our data, the more overhead we introduce. In this case, we have an image of data – most likely hundreds of pixels in both dimensions. By just parallelizing our first loop, each row of pixels can be run as a single task. With hundreds of rows of data, we are providing fine enough granularity to keep all of our processors busy. If we parallelize both loops, we’re potentially creating millions of independent tasks. This introduces extra overhead with no extra gain, and will actually reduce our overall performance. This leads to my first guideline when writing parallel code: Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy. Also note that I parallelized the outer loop. I could have just as easily partitioned the inner loop. However, partitioning the inner loop would have led to many more discrete work items, each with a smaller amount of work (operate on one pixel instead of one row of pixels). My second guideline when writing parallel code reflects this: Partition your problem in a way to place the most work possible into each task. This typically means, in practice, that you will want to parallelize the routine at the “highest” point possible in the routine, typically the outermost loop. If you’re looking at parallelizing methods which call other methods, you’ll want to try to partition your work high up in the stack – as you get into lower level methods, the performance impact of parallelizing your routines may not overcome the overhead introduced. Parallel.For works great for situations where we know the number of elements we’re going to process in advance. If we’re iterating through an IList<T> or an array, this is a typical approach. However, there are other iteration statements common in C#. In many situations, we’ll use foreach instead of a for loop. This can be more understandable and easier to read, but also has the advantage of working with collections which only implement IEnumerable<T>, where we do not know the number of elements involved in advance. As an example, lets take the following situation. Say we have a collection of Customers, and we want to iterate through each customer, check some information about the customer, and if a certain case is met, send an email to the customer and update our instance to reflect this change. Normally, this might look something like: foreach(var customer in customers) { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } } Here, we’re doing a fair amount of work for each customer in our collection, but we don’t know how many customers exist. If we assume that theStore.GetLastContact(customer) and theStore.EmailCustomer(customer) are both side-effect free, thread safe operations, we could parallelize this using Parallel.ForEach: Parallel.ForEach(customers, customer => { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } }); Just like Parallel.For, we rework our loop into a method call accepting a delegate created via a lambda expression. This keeps our new code very similar to our original iteration statement, however, this will now execute in parallel. The same guidelines apply with Parallel.ForEach as with Parallel.For. The other iteration statements, do and while, do not have direct equivalents in the Task Parallel Library. These, however, are very easy to implement using Parallel.ForEach and the yield keyword. Most applications can benefit from implementing some form of Data Parallelism. Iterating through collections and performing “work” is a very common pattern in nearly every application. When the problem can be decomposed by data, we often can parallelize the workload by merely changing foreach statements to Parallel.ForEach method calls, and for loops to Parallel.For method calls. Any time your program operates on a collection, and does a set of work on each item in the collection where that work is not dependent on other information, you very likely have an opportunity to parallelize your routine.

Read the article
Parallelism in .NET – Part 11, Divide and Conquer via Parallel.Invoke

- by Reed

Many algorithms are easily written to work via recursion. For example, most data-oriented tasks where a tree of data must be processed are much more easily handled by starting at the root, and recursively “walking” the tree. Some algorithms work this way on flat data structures, such as arrays, as well. This is a form of divide and conquer: an algorithm design which is based around breaking up a set of work recursively, “dividing” the total work in each recursive step, and “conquering” the work when the remaining work is small enough to be solved easily. Recursive algorithms, especially ones based on a form of divide and conquer, are often a very good candidate for parallelization. This is apparent from a common sense standpoint. Since we’re dividing up the total work in the algorithm, we have an obvious, built-in partitioning scheme. Once partitioned, the data can be worked upon independently, so there is good, clean isolation of data. Implementing this type of algorithm is fairly simple. The Parallel class in .NET 4 includes a method suited for this type of operation: Parallel.Invoke. This method works by taking any number of delegates defined as an Action, and operating them all in parallel. The method returns when every delegate has completed: Parallel.Invoke( () => { Console.WriteLine("Action 1 executing in thread {0}", Thread.CurrentThread.ManagedThreadId); }, () => { Console.WriteLine("Action 2 executing in thread {0}", Thread.CurrentThread.ManagedThreadId); }, () => { Console.WriteLine("Action 3 executing in thread {0}", Thread.CurrentThread.ManagedThreadId); } ); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Running this simple example demonstrates the ease of using this method. For example, on my system, I get three separate thread IDs when running the above code. By allowing any number of delegates to be executed directly, concurrently, the Parallel.Invoke method provides us an easy way to parallelize any algorithm based on divide and conquer. We can divide our work in each step, and execute each task in parallel, recursively. For example, suppose we wanted to implement our own quicksort routine. The quicksort algorithm can be designed based on divide and conquer. In each iteration, we pick a pivot point, and use that to partition the total array. We swap the elements around the pivot, then recursively sort the lists on each side of the pivot. For example, let’s look at this simple, sequential implementation of quicksort: public static void QuickSort<T>(T[] array) where T : IComparable<T> { QuickSortInternal(array, 0, array.Length - 1); } private static void QuickSortInternal<T>(T[] array, int left, int right) where T : IComparable<T> { if (left >= right) { return; } SwapElements(array, left, (left + right) / 2); int last = left; for (int current = left + 1; current <= right; ++current) { if (array[current].CompareTo(array[left]) < 0) { ++last; SwapElements(array, last, current); } } SwapElements(array, left, last); QuickSortInternal(array, left, last - 1); QuickSortInternal(array, last + 1, right); } static void SwapElements<T>(T[] array, int i, int j) { T temp = array[i]; array[i] = array[j]; array[j] = temp; } Here, we implement the quicksort algorithm in a very common, divide and conquer approach. Running this against the built-in Array.Sort routine shows that we get the exact same answers (although the framework’s sort routine is slightly faster). On my system, for example, I can use framework’s sort to sort ten million random doubles in about 7.3s, and this implementation takes about 9.3s on average. Looking at this routine, though, there is a clear opportunity to parallelize. At the end of QuickSortInternal, we recursively call into QuickSortInternal with each partition of the array after the pivot is chosen. This can be rewritten to use Parallel.Invoke by simply changing it to: // Code above is unchanged... SwapElements(array, left, last); Parallel.Invoke( () => QuickSortInternal(array, left, last - 1), () => QuickSortInternal(array, last + 1, right) ); } This routine will now run in parallel. When executing, we now see the CPU usage across all cores spike while it executes. However, there is a significant problem here – by parallelizing this routine, we took it from an execution time of 9.3s to an execution time of approximately 14 seconds! We’re using more resources as seen in the CPU usage, but the overall result is a dramatic slowdown in overall processing time. This occurs because parallelization adds overhead. Each time we split this array, we spawn two new tasks to parallelize this algorithm! This is far, far too many tasks for our cores to operate upon at a single time. In effect, we’re “over-parallelizing” this routine. This is a common problem when working with divide and conquer algorithms, and leads to an important observation: When parallelizing a recursive routine, take special care not to add more tasks than necessary to fully utilize your system. This can be done with a few different approaches, in this case. Typically, the way to handle this is to stop parallelizing the routine at a certain point, and revert back to the serial approach. Since the first few recursions will all still be parallelized, our “deeper” recursive tasks will be running in parallel, and can take full advantage of the machine. This also dramatically reduces the overhead added by parallelizing, since we’re only adding overhead for the first few recursive calls. There are two basic approaches we can take here. The first approach would be to look at the total work size, and if it’s smaller than a specific threshold, revert to our serial implementation. In this case, we could just check right-left, and if it’s under a threshold, call the methods directly instead of using Parallel.Invoke. The second approach is to track how “deep” in the “tree” we are currently at, and if we are below some number of levels, stop parallelizing. This approach is a more general-purpose approach, since it works on routines which parse trees as well as routines working off of a single array, but may not work as well if a poor partitioning strategy is chosen or the tree is not balanced evenly. This can be written very easily. If we pass a maxDepth parameter into our internal routine, we can restrict the amount of times we parallelize by changing the recursive call to: // Code above is unchanged... SwapElements(array, left, last); if (maxDepth < 1) { QuickSortInternal(array, left, last - 1, maxDepth); QuickSortInternal(array, last + 1, right, maxDepth); } else { --maxDepth; Parallel.Invoke( () => QuickSortInternal(array, left, last - 1, maxDepth), () => QuickSortInternal(array, last + 1, right, maxDepth)); } We no longer allow this to parallelize indefinitely – only to a specific depth, at which time we revert to a serial implementation. By starting the routine with a maxDepth equal to Environment.ProcessorCount, we can restrict the total amount of parallel operations significantly, but still provide adequate work for each processing core. With this final change, my timings are much better. On average, I get the following timings: Framework via Array.Sort: 7.3 seconds Serial Quicksort Implementation: 9.3 seconds Naive Parallel Implementation: 14 seconds Parallel Implementation Restricting Depth: 4.7 seconds Finally, we are now faster than the framework’s Array.Sort implementation.

Read the article
Performance Optimization – It Is Faster When You Can Measure It

- by Alois Kraus

Performance optimization in bigger systems is hard because the measured numbers can vary greatly depending on the measurement method of your choice. To measure execution timing of specific methods in your application you usually use Time Measurement Method Potential Pitfalls Stopwatch Most accurate method on recent processors. Internally it uses the RDTSC instruction. Since the counter is processor specific you can get greatly different values when your thread is scheduled to another core or the core goes into a power saving mode. But things do change luckily: Intel's Designer's vol3b, section 16.11.1 "16.11.1 Invariant TSC The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource." DateTime.Now Good but it has only a resolution of 16ms which can be not enough if you want more accuracy. Reporting Method Potential Pitfalls Console.WriteLine Ok if not called too often. Debug.Print Are you really measuring performance with Debug Builds? Shame on you. Trace.WriteLine Better but you need to plug in some good output listener like a trace file. But be aware that the first time you call this method it will read your app.config and deserialize your system.diagnostics section which does also take time. In general it is a good idea to use some tracing library which does measure the timing for you and you only need to decorate some methods with tracing so you can later verify if something has changed for the better or worse. In my previous article I did compare measuring performance with quantum mechanics. This analogy does work surprising well. When you measure a quantum system there is a lower limit how accurately you can measure something. The Heisenberg uncertainty relation does tell us that you cannot measure of a quantum system the impulse and location of a particle at the same time with infinite accuracy. For programmers the two variables are execution time and memory allocations. If you try to measure the timings of all methods in your application you will need to store them somewhere. The fastest storage space besides the CPU cache is the memory. But if your timing values do consume all available memory there is no memory left for the actual application to run. On the other hand if you try to record all memory allocations of your application you will also need to store the data somewhere. This will cost you memory and execution time. These constraints are always there and regardless how good the marketing of tool vendors for performance and memory profilers are: Any measurement will disturb the system in a non predictable way. Commercial tool vendors will tell you they do calculate this overhead and subtract it from the measured values to give you the most accurate values but in reality it is not entirely true. After falling into the trap to trust the profiler timings several times I have got into the habit to Measure with a profiler to get an idea where potential bottlenecks are. Measure again with tracing only the specific methods to check if this method is really worth optimizing. Optimize it Measure again. Be surprised that your optimization has made things worse. Think harder Implement something that really works. Measure again Finished! - Or look for the next bottleneck. Recently I have looked into issues with serialization performance. For serialization DataContractSerializer was used and I was not sure if XML is really the most optimal wire format. After looking around I have found protobuf-net which uses Googles Protocol Buffer format which is a compact binary serialization format. What is good for Google should be good for us. A small sample app to check out performance was a matter of minutes: using ProtoBuf; using System; using System.Diagnostics; using System.IO; using System.Reflection; using System.Runtime.Serialization; [DataContract, Serializable] class Data { [DataMember(Order=1)] public int IntValue { get; set; } [DataMember(Order = 2)] public string StringValue { get; set; } [DataMember(Order = 3)] public bool IsActivated { get; set; } [DataMember(Order = 4)] public BindingFlags Flags { get; set; } } class Program { static MemoryStream _Stream = new MemoryStream(); static MemoryStream Stream { get { _Stream.Position = 0; _Stream.SetLength(0); return _Stream; } } static void Main(string[] args) { DataContractSerializer ser = new DataContractSerializer(typeof(Data)); Data data = new Data { IntValue = 100, IsActivated = true, StringValue = "Hi this is a small string value to check if serialization does work as expected" }; var sw = Stopwatch.StartNew(); int Runs = 1000 * 1000; for (int i = 0; i < Runs; i++) { //ser.WriteObject(Stream, data); Serializer.Serialize<Data>(Stream, data); } sw.Stop(); Console.WriteLine("Did take {0:N0}ms for {1:N0} objects", sw.Elapsed.TotalMilliseconds, Runs); Console.ReadLine(); } } The results are indeed promising: Serializer Time in ms N objects protobuf-net 807 1000000 DataContract 4402 1000000 Nearly a factor 5 faster and a much more compact wire format. Lets use it! After switching over to protbuf-net the transfered wire data has dropped by a factor two (good) and the performance has worsened by nearly a factor two. How is that possible? We have measured it? Protobuf-net is much faster! As it turns out protobuf-net is faster but it has a cost: For the first time a type is de/serialized it does use some very smart code-gen which does not come for free. Lets try to measure this one by setting of our performance test app the Runs value not to one million but to 1. Serializer Time in ms N objects protobuf-net 85 1 DataContract 24 1 The code-gen overhead is significant and can take up to 200ms for more complex types. The break even point where the code-gen cost is amortized by its faster serialization performance is (assuming small objects) somewhere between 20.000-40.000 serialized objects. As it turned out my specific scenario involved about 100 types and 1000 serializations in total. That explains why the good old DataContractSerializer is not so easy to take out of business. The final approach I ended up was to reduce the number of types and to serialize primitive types via BinaryWriter directly which turned out to be a pretty good alternative. It sounded good until I measured again and found that my optimizations so far do not help much. After looking more deeper at the profiling data I did found that one of the 1000 calls did take 50% of the time. So how do I find out which call it was? Normal profilers do fail short at this discipline. A (totally undeserved) relatively unknown profiler is SpeedTrace which does unlike normal profilers create traces of your applications by instrumenting your IL code at runtime. This way you can look at the full call stack of the one slow serializer call to find out if this stack was something special. Unfortunately the call stack showed nothing special. But luckily I have my own tracing as well and I could see that the slow serializer call did happen during the serialization of a bool value. When you encounter after much analysis something unreasonable you cannot explain it then the chances are good that your thread was suspended by the garbage collector. If there is a problem with excessive GCs remains to be investigated but so far the serialization performance seems to be mostly ok. When you do profile a complex system with many interconnected processes you can never be sure that the timings you just did measure are accurate at all. Some process might be hitting the disc slowing things down for all other processes for some seconds as well. There is a big difference between warm and cold startup. If you restart all processes you can basically forget the first run because of the OS disc cache, JIT and GCs make the measured timings very flexible. When you are in need of a random number generator you should measure cold startup times of a sufficiently complex system. After the first run you can try again getting different and much lower numbers. Now try again at least two times to get some feeling how stable the numbers are. Oh and try to do the same thing the next day. It might be that the bottleneck you found yesterday is gone today. Thanks to GC and other random stuff it can become pretty hard to find stuff worth optimizing if no big bottlenecks except bloatloads of code are left anymore. When I have found a spot worth optimizing I do make the code changes and do measure again to check if something has changed. If it has got slower and I am certain that my change should have made it faster I can blame the GC again. The thing is that if you optimize stuff and you allocate less objects the GC times will shift to some other location. If you are unlucky it will make your faster working code slower because you see now GCs at times where none were before. This is where the stuff does get really tricky. A safe escape hatch is to create a repro of the slow code in an isolated application so you can change things fast in a reliable manner. Then the normal profilers do also start working again. As Vance Morrison does point out it is much more complex to profile a system against the wall clock compared to optimize for CPU time. The reason is that for wall clock time analysis you need to understand how your system does work and which threads (if you have not one but perhaps 20) are causing a visible delay to the end user and which threads can wait a long time without affecting the user experience at all. Next time: Commercial profiler shootout.

Read the article
Making a Case For The Command Line

- by Jesse Taber

Originally posted on: http://geekswithblogs.net/GruffCode/archive/2013/06/30/making-a-case-for-the-command-line.aspxI have had an idea percolating in the back of my mind for over a year now that I’ve just recently started to implement. This idea relates to building out “internal tools” to ease the maintenance and on-going support of a software system. The system that I currently work on is (mostly) web-based, so we traditionally we have built these internal tools in the form of pages within the app that are only accessible by our developers and support personnel. These pages allow us to perform tasks within the system that, for one reason or another, we don’t want to let our end users perform (e.g. mass create/update/delete operations on data, flipping switches that turn paid modules of the system on or off, etc). When we try to build new tools like this we often struggle with the level of effort required to build them. Effort Required Creating a whole new page in an existing web application can be a fairly large undertaking. You need to create the page and ensure it will have a layout that is consistent with the other pages in the app. You need to decide what types of input controls need to go onto the page. You need to ensure that everything uses the same style as the rest of the site. You need to figure out what the text on the page should say. Then, when you figure out that you forgot about an input that should really be present you might have to go back and re-work the entire thing. Oh, and in addition to all of that, you still have to, you know, write the code that actually performs the task. Everything other than the code that performs the task at hand is just overhead. We don’t need a fancy date picker control in a nicely styled page for the vast majority of our internal tools. We don’t even really need a page, for that matter. We just need a way to issue a command to the application and have it, in turn, execute the code that we’ve written to accomplish a given task. All we really need is a simple console application! Plumbing Problems A former co-worker of mine, John Sonmez, always advocated the Unix philosophy for building internal tools: start with something that runs at the command line, and then build a UI on top of that if you need to. John’s idea has a lot of merit, and we tried building out some internal tools as simple Console applications. Unfortunately, this was often easier said that done. Doing a “File –> New Project” to build out a tool for a mature system can be pretty daunting because that new project is totally empty. In our case, the web application code had a lot of of “plumbing” built in: it managed authentication and authorization, it handled database connection management for our multi-tenanted architecture, it managed all of the context that needs to follow a user around the application such as their timezone and regional/language settings. In addition, the configuration file for the web application (a web.config in our case because this is an ASP .NET application) is large and would need to be reproduced into a similar configuration file for a Console application. While most of these problems are could be solved pretty easily with some refactoring of the codebase, building Console applications for internal tools still potentially suffers from one pretty big drawback: you’d have to execute them on a machine with network access to all of the needed resources. Obviously, our web servers can easily communicate the the database servers and can publish messages to our service bus, but the same is not true for all of our developer and support personnel workstations. We could have everyone run these tools remotely via RDP or SSH, but that’s a bit cumbersome and certainly a lot less convenient than having the tools built into the web application that is so easily accessible. Mix and Match So we need a way to build tools that are easily accessible via the web application but also don’t require the overhead of creating a user interface. This is where my idea comes into play: why not just build a command line interface into the web application? If it’s part of the web application we get all of the plumbing that comes along with that code, and we’re executing everything on the web servers which means we’ll have access to any external resources that we might need. Rather than having to incur the overhead of creating a brand new page for each tool that we want to build, we can create one new page that simply accepts a command in text form and executes it as a request on the web server. In this way, we can focus on writing the code to accomplish the task. If the tool ends up being heavily used, then (and only then) should we consider spending the time to build a better user experience around it. To be clear, I’m not trying to downplay the importance of building great user experiences into your system; we should all strive to provide the best UX possible to our end users. I’m only advocating this sort of bare-bones interface for internal consumption by the technical staff that builds and supports the software. This command line interface should be the “back end” to a highly polished and eye-pleasing public face. Implementation As I mentioned at the beginning of this post, this is an idea that I’ve had for awhile but have only recently started building out. I’ve outlined some general guidelines and design goals for this effort as follows: Text in, text out: In the interest of keeping things as simple as possible, I want this interface to be purely text-based. Users will submit commands as plain text, and the application will provide responses in plain text. Obviously this text will be “wrapped” within the context of HTTP requests and responses, but I don’t want to have to think about HTML or CSS when taking input from the user or displaying responses back to the user. Task-oriented code only: After building the initial “harness” for this interface, the only code that should need to be written to create a new internal tool should be code that is expressly needed to accomplish the task that the tool is intended to support. If we want to encourage and enable ourselves to build good tooling, we need to lower the barriers to entry as much as possible. Built-in documentation: One of the great things about most command line utilities is the ‘help’ switch that provides usage guidelines and details about the arguments that the utility accepts. Our web-based command line utility should allow us to build the documentation for these tools directly into the code of the tools themselves. I finally started trying to implement this idea when I heard about a fantastic open-source library called CLAP (Command Line Auto Parser) that lets me meet the guidelines outlined above. CLAP lets you define classes with public methods that can be easily invoked from the command line. Here’s a quick example of the code that would be needed to create a new tool to do something within your system: 1: public class CustomerTools 2: { 3: [Verb] 4: public void UpdateName(int customerId, string firstName, string lastName) 5: { 6: //invoke internal services/domain objects/hwatever to perform update 7: } 8: } This is just a regular class with a single public method (though you could have as many methods as you want). The method is decorated with the ‘Verb’ attribute that tells the CLAP library that it is a method that can be invoked from the command line. Here is how you would invoke that code: Parser.Run(args, new CustomerTools()); Note that ‘args’ is just a string[] that would normally be passed passed in from the static Main method of a Console application. Also, CLAP allows you to pass in multiple classes that define [Verb] methods so you can opt to organize the code that CLAP will invoke in any way that you like. You can invoke this code from a command line application like this: SomeExe UpdateName -customerId:123 -firstName:Jesse -lastName:Taber ‘SomeExe’ in this example just represents the name of .exe that is would be created from our Console application. CLAP then interprets the arguments passed in order to find the method that should be invoked and automatically parses out the parameters that need to be passed in. After a quick spike, I’ve found that invoking the ‘Parser’ class can be done from within the context of a web application just as easily as it can from within the ‘Main’ method entry point of a Console application. There are, however, a few sticking points that I’m working around: Splitting arguments into the ‘args’ array like the command line: When you invoke a standard .NET console application you get the arguments that were passed in by the user split into a handy array (this is the ‘args’ parameter referenced above). Generally speaking they get split by whitespace, but it’s also clever enough to handle things like ignoring whitespace in a phrase that is surrounded by quotes. We’ll need to re-create this logic within our web application so that we can give the ‘args’ value to CLAP just like a console application would. Providing a response to the user: If you were writing a console application, you might just use Console.WriteLine to provide responses to the user as to the progress and eventual outcome of the command. We can’t use Console.WriteLine within a web application, so I’ll need to find another way to provide feedback to the user. Preferably this approach would allow me to use the same handler classes from both a Console application and a web application, so some kind of strategy pattern will likely emerge from this effort. Submitting files: Often an internal tool needs to support doing some kind of operation in bulk, and the easiest way to submit the data needed to support the bulk operation is in a file. Getting the file uploaded and available to the CLAP handler classes will take a little bit of effort. Mimicking the console experience: This isn’t really a requirement so much as a “nice to have”. To start out, the command-line interface in the web application will probably be a single ‘textarea’ control with a button to submit the contents to a handler that will pass it along to CLAP to be parsed and run. I think it would be interesting to use some javascript and CSS trickery to change that page into something with more of a “shell” interface look and feel. I’ll be blogging more about this effort in the future and will include some code snippets (or maybe even a full blown example app) as I progress. I also think that I’ll probably end up either submitting some pull requests to the CLAP project or possibly forking/wrapping it into a more web-friendly package and open sourcing that.

Read the article
Unleash the Power of Cryptography on SPARC T4

- by B.Koch

by Rob Ludeman Oracle’s SPARC T4 systems are architected to deliver enhanced value for customer via the inclusion of many integrated features. One of the best examples of this approach is demonstrated in the on-chip cryptographic support that delivers wire speed encryption capabilities without any impact to application performance. The Evolution of SPARC Encryption SPARC T-Series systems have a long history of providing this capability, dating back to the release of the first T2000 systems that featured support for on-chip RSA encryption directly in the UltraSPARC T1 processor. Successive generations have built on this approach by support for additional encryption ciphers that are tightly coupled with the Oracle Solaris 10 and Solaris 11 encryption framework. While earlier versions of this technology were implemented using co-processors, the SPARC T4 was redesigned with new crypto instructions to eliminate some of the performance overhead associated with the former approach, resulting in much higher performance for encrypted workloads. The Superiority of the SPARC T4 Approach to Crypto As companies continue to engage in more and more e-commerce, the need to provide greater degrees of security for these transactions is more critical than ever before. Traditional methods of securing data in transit by applications have a number of drawbacks that are addressed by the SPARC T4 cryptographic approach. 1. Performance degradation – cryptography is highly compute intensive and therefore, there is a significant cost when using other architectures without embedded crypto functionality. This performance penalty impacts the entire system, slowing down performance of web servers (SSL), for example, and potentially bogging down the speed of other business applications. The SPARC T4 processor enables customers to deliver high levels of security to internal and external customers while not incurring an impact to overall SLAs in their IT environment. 2. Added cost – one of the methods to avoid performance degradation is the addition of add-in cryptographic accelerator cards or external offload engines in other systems. While these solutions provide a brute force mechanism to avoid the problem of slower system performance, it usually comes at an added cost. Customers looking to encrypt datacenter traffic without the overhead and expenditure of extra hardware can rely on SPARC T4 systems to deliver the performance necessary without the need to purchase other hardware or add-on cards. 3. Higher complexity – the addition of cryptographic cards or leveraging load balancers to perform encryption tasks results in added complexity from a management standpoint. With SPARC T4, encryption keys and the framework built into Solaris 10 and 11 means that administrators generally don’t need to spend extra cycles determining how to perform cryptographic functions. In fact, many of the instructions are built-in and require no user intervention to be utilized. For example, For OpenSSL on Solaris 11, SPARC T4 crypto is available directly with a new built-in OpenSSL 1.0 engine, called the "t4 engine." For a deeper technical dive into the new instructions included in SPARC T4, consult Dan Anderson’s blog. Conclusion In summary, SPARC T4 systems offer customers much more value for applications than just increased performance. The integration of key virtualization technologies, embedded encryption, and a true Enterprise Operating System, Oracle Solaris, provides direct business benefits that supersedes the commodity approach to data center computing. SPARC T4 removes the roadblocks to secure computing by offering integrated crypto accelerators that can save IT organizations in operating cost while delivering higher levels of performance and meeting objectives around compliance. For more on the SPARC T4 family of products, go to here.

Read the article
Converting from mp4 to Xvid avi using avconv?

- by Ricardo Gladwell

I normally use avidemux to convert mp4s to Xvid AVI for my Philips Streamium SLM5500. Normally I select MPEG-4 ASP (Xvid) at Two Pass with an average bitrate f 1500kb/s for video and AC3 (lav) audio and it converts correctly. However, I'm trying to using avconv so I can automate the process with a script, but when I do this the video stutters and stops playing part way through. I have a suspicion its something to do with a faulty audio conversion. The commands I'm using are as follows: avconv -y -i video.mp4 -pass 1 -vtag xvid -c:a ac3 -b:a 128k -b:v 1500k -f avi /dev/null avconv -y -i video.mp4 -pass 2 -vtag xvid -c:a ac3 -b:a 128k -b:v 1500k -f avi video.avi There is a bewildering array of arguments for avconv. Is there something I'm doing wrong? Is there a way I can script avidemux from a headless server? Please see command line output: $ avconv -y -i video.mp4 -pass 1 -vtag xvid -an -b:v 1500k -f avi /dev/null avconv version 0.8.5-6:0.8.5-0ubuntu0.12.10.1, Copyright (c) 2000-2012 the Libav developers built on Jan 24 2013 14:49:20 with gcc 4.7.2 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'video.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 creation_time : 2013-02-04 13:53:38 Duration: 00:44:09.16, start: 0.000000, bitrate: 669 kb/s Stream #0.0(und): Video: h264 (High), yuv420p, 720x404 [PAR 1:1 DAR 180:101], 538 kb/s, 25 fps, 25 tbr, 100 tbn, 50 tbc Metadata: creation_time : 2013-02-04 13:53:38 Stream #0.1(und): Audio: ac3, 44100 Hz, stereo, s16, 127 kb/s Metadata: creation_time : 2013-02-04 13:53:42 [buffer @ 0x7f4c40] w:720 h:404 pixfmt:yuv420p Output #0, avi, to '/dev/null': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 creation_time : 2013-02-04 13:53:38 ISFT : Lavf53.21.1 Stream #0.0(und): Video: mpeg4, yuv420p, 720x404 [PAR 1:1 DAR 180:101], q=2-31, pass 1, 1500 kb/s, 25 tbn, 25 tbc Metadata: creation_time : 2013-02-04 13:53:38 Stream mapping: Stream #0:0 -> #0:0 (h264 -> mpeg4) Press ctrl-c to stop encoding frame=66227 fps=328 q=2.0 Lsize= 0kB time=2649.16 bitrate= 0.0kbits/s video:401602kB audio:0kB global headers:0kB muxing overhead -100.000000% $ avconv -y -i video.mp4 -pass 2 -vtag xvid -c:a ac3 -b:a 128k -b:v 1500k -f avi video.avi avconv version 0.8.5-6:0.8.5-0ubuntu0.12.10.1, Copyright (c) 2000-2012 the Libav developers built on Jan 24 2013 14:49:20 with gcc 4.7.2 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'video.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 creation_time : 2013-02-04 13:53:38 Duration: 00:44:09.16, start: 0.000000, bitrate: 669 kb/s Stream #0.0(und): Video: h264 (High), yuv420p, 720x404 [PAR 1:1 DAR 180:101], 538 kb/s, 25 fps, 25 tbr, 100 tbn, 50 tbc Metadata: creation_time : 2013-02-04 13:53:38 Stream #0.1(und): Audio: ac3, 44100 Hz, stereo, s16, 127 kb/s Metadata: creation_time : 2013-02-04 13:53:42 [buffer @ 0x12b4f00] w:720 h:404 pixfmt:yuv420p Incompatible sample format 's16' for codec 'ac3', auto-selecting format 'flt' [mpeg4 @ 0x12b3ec0] [lavc rc] Using all of requested bitrate is not necessary for this video with these parameters. Output #0, avi, to 'video.avi': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 creation_time : 2013-02-04 13:53:38 ISFT : Lavf53.21.1 Stream #0.0(und): Video: mpeg4, yuv420p, 720x404 [PAR 1:1 DAR 180:101], q=2-31, pass 2, 1500 kb/s, 25 tbn, 25 tbc Metadata: creation_time : 2013-02-04 13:53:38 Stream #0.1(und): Audio: ac3, 44100 Hz, stereo, flt, 128 kb/s Metadata: creation_time : 2013-02-04 13:53:42 Stream mapping: Stream #0:0 -> #0:0 (h264 -> mpeg4) Stream #0:1 -> #0:1 (ac3 -> ac3) Press ctrl-c to stop encoding Input stream #0:1 frame changed from rate:44100 fmt:s16 ch:2 to rate:44100 fmt:flt ch:2 frame=66227 fps=284 q=2.2 Lsize= 458486kB time=2649.13 bitrate=1417.8kbits/s video:413716kB audio:41393kB global headers:0kB muxing overhead 0.741969%

Read the article
When to use Aspect Oriented Architecture (AOA/AOD)

When is it appropriate to use aspect oriented architecture? I think the only honest answer to this question is that it depends on the context for which the question is being asked. There really are no hard and fast rules regarding the selection of an architectural model(s) for a project because each model provides good and bad benefits. Every system is built with a unique requirements and constraints. This context will dictate when to use one type of architecture over another or in conjunction with others. To me aspect oriented architecture models should be a sub-phase in the architectural modeling and design process especially when creating enterprise level models. Personally, I like to use this approach to create a base architectural model that is defined by non-functional requirements and system quality attributes. This general model can then be used as a starting point for additional models because it is targets all of the business key quality attributes required by the system.Aspect oriented architecture is a method for modeling non-functional requirements and quality attributes of a system known as aspects. These models do not deal directly with specific functionality. They do categorize functionality of the system. This approach allows a system to be created with a strong emphasis on separating system concerns into individual components. These cross cutting components enables a systems to create with compartmentalization in regards to non-functional requirements or quality attributes. This allows for the reduction in code because an each component maintains an aspect of a system that can be called by other aspects. This approach also allows for a much cleaner and smaller code base during the implementation and support of a system. Additionally, enabling developers to develop systems based on aspect-oriented design projects will be completed faster and will be more reliable because existing components can be shared across a system; thus, the time needed to create and test the functionality is reduced. Example of an effective use of Aspect Oriented ArchitectureIn my experiences, aspect oriented architecture can be very effective with large or more complex systems. Typically, these types of systems have a large number of concerns so the act of defining them is very beneficial for reducing the system’s complexity because components can be developed to address each concern while exposing functionality to the other system components. The benefits to using the aspect oriented approach as the starting point for a system is that it promotes communication between IT and the business due to the fact that the aspect oriented models are quality attributes focused so not much technical understanding is needed to understand the model.An example of this can be in developing a new intranet website. Common Intranet Concerns: Error Handling Security Logging Notifications Database connectivity Example of a not as effective use of Aspect Oriented ArchitectureAgain in my experiences, aspect oriented architecture is not as effective with small or less complex systems in comparison. There is no need to model concerns for a system that has a limited amount of them because the added overhead would not be justified for the actual benefits of creating the aspect oriented architecture model. Furthermore, these types of projects typically have a reduced time schedule and a limited budget. The creation of the Aspect oriented models would increase the overhead of a project and thus increase the time needed to implement the system. An example of this is seen by creating a small application to poll a network share for new files and then FTP them to a new location. The two primary concerns for this project is to monitor a network drive and FTP files to a new location. There is no need to create an aspect model for this system because there will never be a need to share functionality amongst either of these concerns. To add to my point, this system is so small that it could be created with just a few classes so the added layer of componentizing the concerns would be complete overkill for this situation. References:Brichau, Johan; D'Hondt, Theo. (2006) Aspect-Oriented Software Development (AOSD) - An Introduction. Retreived from: http://www.info.ucl.ac.be/~jbrichau/courses/introductionToAOSD.pdf

Read the article
How the number of indexes built on a table can impact performances?

- by Davide Mauri

We all know that putting too many indexes (I’m talking of non-clustered index only, of course) on table may produce performance problems due to the overhead that each index bring to all insert/update/delete operations on that table. But how much? I mean, we all agree – I think – that, generally speaking, having many indexes on a table is “bad”. But how bad it can be? How much the performance will degrade? And on a concurrent system how much this situation can also hurts SELECT performances? If SQL Server take more time to update a row on a table due to the amount of indexes it also has to update, this also means that locks will be held for more time, slowing down the perceived performance of all queries involved. I was quite curious to measure this, also because when teaching it’s by far more impressive and effective to show to attended a chart with the measured impact, so that they can really “feel” what it means! To do the tests, I’ve create a script that creates a table (that has a clustered index on the primary key which is an identity column) , loads 1000 rows into the table (inserting 1000 row using only one insert, instead of issuing 1000 insert of one row, in order to minimize the overhead needed to handle the transaction, that would have otherwise ), and measures the time taken to do it. The process is then repeated 16 times, each time adding a new index on the table, using columns from table in a round-robin fashion. Test are done against different row sizes, so that it’s possible to check if performance changes depending on row size. The result are interesting, although expected. This is the chart showing how much time it takes to insert 1000 on a table that has from 0 to 16 non-clustered indexes. Each test has been run 20 times in order to have an average value. The value has been cleaned from outliers value due to unpredictable performance fluctuations due to machine activity. The test shows that in a table with a row size of 80 bytes, 1000 rows can be inserted in 9,05 msec if no indexes are present on the table, and the value grows up to 88 (!!!) msec when you have 16 indexes on it This means a impact on performance of 975%. That’s *huge*! Now, what happens if we have a bigger row size? Say that we have a table with a row size of 1520 byte. Here’s the data, from 0 to 16 indexes on that table: In this case we need near 22 msec to insert 1000 in a table with no indexes, but we need more that 500msec if the table has 16 active indexes! Now we’re talking of a 2410% impact on performance! Now we can have a tangible idea of what’s the impact of having (too?) many indexes on a table and also how the size of a row also impact performances. That’s why the golden rule of OLTP databases “few indexes, but good” is so true! (And in fact last week I saw a database with tables with 1700bytes row size and 23 (!!!) indexes on them!) This also means that a too heavy denormalization is really not a good idea (we’re always talking about OLTP systems, keep it in mind), since the performance get worse with the increase of the row size. So, be careful out there, and keep in mind the “equilibrium” is the key world of a database professional: equilibrium between read and write performance, between normalization and denormalization, between to few and too may indexes. PS Tests are done on a VMWare Workstation 7 VM with 2 CPU and 4 GB of Memory. Host machine is a Dell Precsioni M6500 with i7 Extreme X920 Quad-Core HT 2.0Ghz and 16Gb of RAM. Database is stored on a SSD Intel X-25E Drive, Simple Recovery Model, running on SQL Server 2008 R2. If you also want to to tests on your own, you can download the test script here: Open TestIndexPerformance.sql

Read the article
Indexed view deadlocking

- by Dave Ballantyne

Deadlocks can be a really tricky thing to track down the root cause of. There are lots of articles on the subject of tracking down deadlocks, but seldom do I find that in a production system that the cause is as straightforward. That being said, deadlocks are always caused by process A needs a resource that process B has locked and process B has a resource that process A needs. There may be a longer chain of processes involved, but that is the basic premise. Here is one such (much simplified) scenario that was at first non-obvious to its cause: The system has two tables, Products and Stock. The Products table holds the description and prices of a product whilst Stock records the current stock level. USE tempdb GO CREATE TABLE Product ( ProductID INTEGER IDENTITY PRIMARY KEY, ProductName VARCHAR(255) NOT NULL, Price MONEY NOT NULL ) GO CREATE TABLE Stock ( ProductId INTEGER PRIMARY KEY, StockLevel INTEGER NOT NULL ) GO INSERT INTO Product SELECT TOP(1000) CAST(NEWID() AS VARCHAR(255)), ABS(CAST(CAST(NEWID() AS VARBINARY(255)) AS INTEGER))%100 FROM sys.columns a CROSS JOIN sys.columns b GO INSERT INTO Stock SELECT ProductID,ABS(CAST(CAST(NEWID() AS VARBINARY(255)) AS INTEGER))%100 FROM Product There is a single stored procedure of GetStock: Create Procedure GetStock as SELECT Product.ProductID,Product.ProductName FROM dbo.Product join dbo.Stock on Stock.ProductId = Product.ProductID where Stock.StockLevel <> 0 Analysis of the system showed that this procedure was causing a performance overhead and as reads of this data was many times more than writes, an indexed view was created to lower the overhead. CREATE VIEW vwActiveStock With schemabinding AS SELECT Product.ProductID,Product.ProductName FROM dbo.Product join dbo.Stock on Stock.ProductId = Product.ProductID where Stock.StockLevel <> 0 go CREATE UNIQUE CLUSTERED INDEX PKvwActiveStock on vwActiveStock(ProductID) This worked perfectly, performance was improved, the team name was cheered to the rafters and beers all round. Then, after a while, something else happened… The system updating the data changed, The update pattern of both the Stock update and the Product update used to be: BEGIN TRAN UPDATE... COMMIT BEGIN TRAN UPDATE... COMMIT BEGIN TRAN UPDATE... COMMIT It changed to: BEGIN TRAN UPDATE... UPDATE... UPDATE... COMMIT Nothing that would raise an eyebrow in even the closest of code reviews. But after this change we saw deadlocks occuring. You can reproduce this by opening two sessions. In session 1 begin transaction Update Product set ProductName ='Test' where ProductID = 998 Then in session 2 begin transaction Update Stock set Stocklevel = 5 where ProductID = 999 Update Stock set Stocklevel = 5 where ProductID = 998 Hop back to session 1 and.. Update Product set ProductName ='Test' where ProductID = 999 Looking at the deadlock graphs we could see the contention was between two processes, one updating stock and the other updating product, but we knew that all the processes do to the tables is update them. Period. There are separate processes that handle the update of stock and product and never the twain shall meet, no reason why one should be requiring data from the other. Then it struck us, AH the indexed view. Naturally, when you make an update to any table involved in a indexed view, the view has to be updated. When this happens, the data in all the tables have to be read, so that explains our deadlocks. The data from stock is read when you update product and vice-versa. The fix, once you understand the problem fully, is pretty simple, the apps did not guarantee the order in which data was updated. Luckily it was a relatively simple fix to order the updates and deadlocks went away. Note, that there is still a *slight* risk of a deadlock occurring, if both a stock update and product update occur at *exactly* the same time.

Read the article
Dependency injection: How to sell it

- by Mel

Let it be known that I am a big fan of dependency injection (DI) and automated testing. I could talk all day about it. Background Recently, our team just got this big project that is to built from scratch. It is a strategic application with complex business requirements. Of course, I wanted it to be nice and clean, which for me meant: maintainable and testable. So I wanted to use DI. Resistance The problem was in our team, DI is taboo. It has been brought up a few times, but the gods do not approve. But that did not discourage me. My Move This may sound weird but third-party libraries are usually not approved by our architect team (think: "thou shalt not speak of Unity, Ninject, NHibernate, Moq or NUnit, lest I cut your finger"). So instead of using an established DI container, I wrote an extremely simple container. It basically wired up all your dependencies on startup, injects any dependencies (constructor/property) and disposed any disposable objects at the end of the web request. It was extremely lightweight and just did what we needed. And then I asked them to review it. The Response Well, to make it short. I was met with heavy resistance. The main argument was, "We don't need to add this layer of complexity to an already complex project". Also, "It's not like we will be plugging in different implementations of components". And "We want to keep it simple, if possible just stuff everything into one assembly. DI is an uneeded complexity with no benefit". Finally, My Question How would you handle my situation? I am not good in presenting my ideas, and I would like to know how people would present their argument. Of course, I am assuming that like me, you prefer to use DI. If you don't agree, please do say why so I can see the other side of the coin. It would be really interesting to see the point of view of someone who disagrees. Update Thank you for everyone's answers. It really puts things into perspective. It's nice enough to have another set of eyes to give you feedback, fifteen is really awesome! This are really great answers and helped me see the issue from different sides, but I can only choose one answer, so I will just pick the top voted one. Thanks everyone for taking the time to answer. I have decided that it is probably not the best time to implement DI, and we are not ready for it. Instead, I will concentrate my efforts on making the design testable and attempt to present automated unit testing. I am aware that writing tests is additional overhead and if ever it is decided that the additional overhead is not worth it, personally I would still see it as a win situation since the design is still testable. And if ever testing or DI is a choice in future, the design can easily handle it.

Read the article
Qt vs WPF/.NET

- by aaronc

My company is trying to make the decision between using Qt/C++ for our GUI framework or migrating to .NET and using WPF. We have up to this point been using MFC. It seems that .NET/WPF is technically the most advanced and feature-rich platform. I do, however, have several concerns. These include: Platform support Framework longevity (i.e. future-proofing) Performance and overhead For this application we are willing to sacrifice support for Windows 2000, Macs, and Linux. But, the issue is more related to Microsoft's commitment to the framework and their extant platforms. It seems like Microsoft has a bad habit of coming up with something new, hyping it for a few years, and then relegating it to the waste-bin essentially abandoning the developers who chose it. First it was MFC and VB6, then Windows Forms, and now there's WPF. Also with .NET, versions of Windows were progressively nicked off the support list. Looks like WPF could be here to stay for a while, but since its not open source its really in Microsoft's hands. I'm also concerned about the overhead and performance of WPF since some of our applications involve processing large amounts of information and doing real-time data capture. Qt seems like a really good option, but it doesn't have all the features of WPF/.NET couldn't use languages like C#. Basically, what does the community think about Microsoft's commitment to WPF compared with previous frameworks? Are the performance considerations significant enough to avoid using it for a realtime app? And, how significant are the benefits of WPF/.NET in terms of productivity and features compared to Qt?

Read the article
WPF - LayoutUpdated event firing repeatedly

- by Drew Noakes

I've been adding a bit of animation to my WPF application. Thanks to Dan Crevier's unique solution to animating the children of a panel combined with the awesome WPF Penner animations it turned out to be fairly straightforward to make one of my controls look great and have its children move about with some nice animation. Unfortunately this all comes with a performance overhead. I'm happy to have the performance hit when items are added/removed or the control is resized, but it seems that this perf hit occurs consistently throughout the application's lifetime, even when items are completely static. The PanelLayoutAnimator class uses an attached property to hook the UIElement.LayoutUpdated.aspx) event. When this event fires, render transforms are animated to cause the children to glide to their new positions. Unfortunately it seems that the LayoutUpdated event fires every second or so, even when nothing is happening in the application (at least I don't think my code's doing anything -- the app doesn't have focus and the mouse is steady.) As the reason for the event is not immediately apparent to the event handler, all children of the control have to be reevaluated. This event is being called about once a second when idle. The frequency increases when actually using the app. So my question is, how can I improve the performance here? Any answer that assists would be appreciated, but I'm currently stuck on these sub-questions: What causes the LayoutUpdated event to fire so frequently? Is this supposed to happen, and if not, how can I find out why it's firing and curtail it? Is there a more convenient way within the handler to know whether something has happened that might have moved children? If so, I could bail out early and avoid the overhead of looping each child. For now I will work around this issue by disabling animation when there are more than N children in the panel.

Read the article
Optimizing MySQL update query

- by Jernej Jerin

This is currently my MySQL UPDATE query, which is called from program written in Java: String query = "UPDATE maxday SET DatePressureREL = (SELECT Date FROM ws3600 WHERE PressureREL = (SELECT MAX" + "(PressureREL) FROM ws3600 WHERE Date >= '" + Date + "') AND Date >= '" + Date + "' ORDER BY Date DESC LIMIT 1), " + "PressureREL = (SELECT PressureREL FROM ws3600 WHERE PressureREL = (SELECT MAX(PressureREL) FROM ws3600 " + "WHERE Date >= '" + Date + "') AND Date >= '" + Date + "' ORDER BY Date DESC LIMIT 1), ..."; try { s.execute(query); } catch (SQLException e) { System.out.println("SQL error"); } catch(Exception e) { e.printStackTrace(); } Let me explain first, what does it do. I have two tables, first is ws3600, which holds columns (Date, PressureREL, TemperatureOUT, Dewpoint, ...). Then I have second table, called maxday, which holds columns like DatePressureREL, PressureREL, DateTemperatureOUT, TemperatureOUT,... Now as you can see from an example, I update each column, the question is, is there a faster way? I am asking this, because I am calling MAX twice, first to find the Date for that value and secondly to find the actual value. Now I know that I could write like that: SELECT Date, PressureREL FROM ws3600 WHERE PressureREL = (SELECT MAX(PressureREL) FROM ws3600 WHERE Date >= '" + Date + "') AND Date >= '" + Date + "' ORDER BY Date DESC LIMIT 1 That way I get the Date of the max and the max value at the same time and then update with those values the data in maxday table. But the problem of this solution is, that I have to execute many queries, which as I understand takes alot more time compared to executing one long mysql query because of overhead in sending each query to the server. If there is no better way, which solution beetwen this two should I choose. The first, which only takes one query but is very unoptimized or the second which is beter in terms of optimization, but needs alot more queries which probably means that the preformance gain is lost because of overhead in sending each query to the server?

Read the article
Does HttpListener work well on Mono?

- by billpg

Hi everyone. I'm looking to write a small web service to run on a small Linux box. I prefer to code in C#, so I'm looking to use Mono. I don't want the overhead of running a full web server or Mono's version of ASP.NET. I'm thinking of having a single process with a thread dealing with each client connection. Shared memory between threads instead of a database. I've read a little on Microsoft's version of HttpListener and how it works with the Http.sys driver. Alas, Mono's documentation on this class is just the automated class interface with no discussion of how it works under the hood. (Linux doesn't have Http.sys, so I imagine it's implemented substantially differently.) Could anyone point me towards some resources discussing this module please? Many thanks, Bill, billpg.com (A little background to my question for the interested.) Some time ago, I asked this question, interested in keeping a long conversation open with lots of back-and-forth. I had settled on designing my own ad-hoc protocol, but people I spoke to really wanted a REST interface, even at the cost of the "Okay, send your command now" signal. So, I wondered about running ASP.NET on a Linux/Mono server, but stumbled upon HttpListener. This seemed ideal, as each "conversation" could run in a separate thread. The thread that calls HttpListener in a loop can look for which thread each incomming connection is for and pass the reference to that thread. The alternative for an ASP.NET driven service, would be to have the ASPX code pick up the state from a database, and write back the new state when it finishes. Yes, it would work, but that's a lot of overhead.

Read the article
Exiting from the Middle of an Expression Without Using Exceptions

- by Jon Purdy

Is there a way to emulate the use of flow-control constructs in the middle of an expression? Is it possible, in a comma-delimited expression x, y, for y to cause a return? Edit: I'm working on a compiler for something rather similar to a functional language, and the target language is C++. Everything is an expression in the source language, and the sanest, simplest translation to the destination language leaves as many things expressions as possible. Basically, semicolons in the target language become C++ commas. In-language flow-control constructs have presented no problems thus far; it's only return. I just need a way to prematurely exit a comma-delimited expression, and I'd prefer not to use exceptions unless someone can show me that they don't have excessive overhead in this situation. The problem of course is that most flow-control constructs are not legal expressions in C++. The only solution I've found so far is something like this: try { return x(), // x(); (1 ? throw Return(0) : 0); // return 0; } catch (Return& ret) { return ref.value; } The return statement is always there (in the event that a Return construct is not reached), and as such the throw has to be wrapped in ?: to get the compiler to shut up about its void result being used in an expression. I would really like to avoid using exceptions for flow control, unless in this case it can be shown that no particular overhead is incurred; does throwing an exception cause unwinding or anything here? This code needs to run with reasonable efficiency. I just need a function-level equivalent of exit().

Read the article
how can I save/keep-in-sync an in-memory graph of objects with the database?

- by Greg

Question - What is a good best practice approach for how can I save/keep-in-sync an jn-memory graph of objects with the database? Background: That is say I have the classes Node and Relationship, and the application is building up a graph of related objects using these classes. There might be 1000 nodes with various relationships between them. The application needs to query the structure hence an in-memory approach is good for performance no doubt (e.g. traverse the graph from Node X to find the root parents) The graph does need to be persisted however into a database with tables NODES and RELATIONSHIPS. Therefore what is a good best practice approach for how can I save/keep-in-sync an jn-memory graph of objects with the database? Ideal requirements would include: build up changes in-memory and then 'save' afterwards (mandatory) when saving, apply updates to database in correct order to avoid hitting any database constraints (mandatory) keep persistence mechanism separate from model, for ease in changing persistence layer if needed, e.g. don't just wrap an ADO.net DataRow in the Node and Relationship classes (desirable) mechanism for doing optimistic locking (desirable) Or is the overhead of all this for a smallish application just not worth it and I should just hit the database each time for everything? (assuming the response times were acceptable) [would still like to avoid if not too much extra overhead to remain somewhat scalable re performance]

Read the article

Search Results

Search found 1282 results on 52 pages for 'overhead'.

Page 7/52 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

- by charlie

- by ylluminate

- by Ree

- by Matt

- by Reed

- by Reed

- by Alois Kraus

- by Jesse Taber

- by B.Koch

- by Ricardo Gladwell

- by Davide Mauri

- by Dave Ballantyne

- by Mel

- by aaronc

- by Drew Noakes

- by Jernej Jerin

- by billpg

- by Jon Purdy

- by Greg

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >