Search Results

Search found 1282 results on 52 pages for 'overhead'.

Page 5/52 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Advantages of Hudson and Sonar over manual process or homegrown scripts.

- by Tom G

My coworker and I recently got into a debate over a proposed plan at our workplace. We've more or less finished transitioning our Java codebase into one managed and built with Maven. Now, I'd like for us to integrate with Hudson and Sonar or something similar. My reasons for this are that it'll provide a 'zero-click' build step to provide testers with new experimental builds, that it will let us deploy applications to a server more easily, that tools such as Sonar will provide us with well-needed metrics on code coverage, Javadoc, package dependencies and the like. He thinks that the overhead of getting up to speed with two new frameworks is unacceptable, and that we should simply double down on documentation and create our own scripts for deployment. Since we plan on some aggressive rewrites to pay down the technical debt previous developers incurred (gratuitous use of Java's Serializable interface as a file storage mechanism that has predictably bit us in the ass) he argues that we can document as we go, and that we'll end up changing a large swath of code in the process anyways. I contend that having accurate metrics that Sonar (or fill in your favorite similar tool) provide gives us a good place to start for any refactoring efforts, not to mention general maintenance -- after all, knowing which classes are the most poorly documented, even if it's just a starting point, is better than seat-of-the-pants guessing. Am I wrong, and trying to introduce more overhead than we really need? Some more background: an alumni of our company is working at a Navy research lab now and suggested these two tools in particular as one they've had great success with using. My coworker and I have also had our share of friendly disagreements before -- he's more of the "CLI for all, compiles Gentoo in his spare time and uses Git" and I'm more of a "Give me an intuitive GUI, plays with XNA and is fine with SVN" type, so there's definitely some element of culture clash here.

Read the article
Interfaces and Virtuals Everywhere????

- by David V. Corbin

First a disclaimer; this post is about micro-optimization of C# programs and does not apply to most common scenarios - but when it does, it is important to know. Many developers are in the habit of declaring member virtual to allow for future expansion or using interface based designs1. Few of these developers think about what the runtime performance impact of this decision is. A simple test will show that this decision can have a serious impact. For our purposes, we used a simple loop to time the execution of 1 billion calls to both non-virtual and virtual implementations of a method that took no parameters and had a void return type: Direct Call: 1.5uS Virtual Call: 13.0uS The overhead of the call increased by nearly an order of magnitude! Once again, it is important to realize that if the method does anything of significance then this ratio drops quite quickly. If the method does just 1mS of work, then the differential only accounts for a 1% decrease in performance. Additionally the method in question must be called thousands of times in order to produce a meaqsurable impact at the application level. Yet let us consider a situation such as the per-pixel processing of a graphics processing application. Here we may have a method which is called millions of times and even the slightest increase in overhead can have significant ramification. In this case using either explicit virtuals or interface based constructs is likely to be a mistake. In conclusion, good design principles should always be the driving force behind descisions such as these; but remember that these decisions do not come for free. 1) When a concrete class member implements an interface it does not need to be explicitly marked as virtual (unless, of course, it is to be overriden in a derived concerete class). Nevertheless, when accessed via the interface it behaves exactly as if it had been marked as virtual.

Read the article
What is recommended minimum object size for gzip benefits?

- by utt73

I'm working on improving page speed display times, and one of the methods is to gzip content from the webserver. Google recommends: Note that gzipping is only beneficial for larger resources. Due to the overhead and latency of compression and decompression, you should only gzip files above a certain size threshold; we recommend a minimum range between 150 and 1000 bytes. Gzipping files below 150 bytes can actually make them larger. We serve our content through Akamai, using their network for a proxy and CDN. What they've told me: Following up on your question regarding what is the minimum size Akamai will compress the requested object when sending it to the end user: The minimum size is 860 bytes. My reply: What is the reason(s) for why Akamai's minimum size is 860 bytes? And why, for example, is this not the case for files Akamai serves for facebook? (see below) Google recommends to gzip more agressively. And that seems appropriate on our site where the most frequent hits, by far, are AJAX calls that are <860 bytes. Akamai's response: The reasons 860 bytes is the minimum size for compression is twofold: (1) The overhead of compressing an object under 860 bytes outweighs performance gain. (2) Objects under 860 bytes can be transmitted via a single packet anyway, so there isn't a compelling reason to compress them. So I'm here for some fact checking. Is the 860 byte limit due to packet size the end of this reasoning? Why would high traffic sites push this lower/closer to the 150 byte limit... just to save on bandwidth costs, or is there a performance gain in doing so?

Read the article
Branching and CI Builds with Agile

- by Bob Horn

We follow many agile processes, including automated tests, continuous integration, sprint reviews, etc... We're currently having a debate about how often we should branch release builds. We've been doing two-week sprints and trying to deploy to production at the end of each sprint. Some of us think we should be branching every sprint. Some of us think that's overkill. If a project encompasses three Visual Studio solutions, and we branch every sprint, then that's three branches, and three CI builds to create every two weeks. If we do this for six months, we'll end up with 36 branches and 36 CI builds. There is overhead involved in that. For those of us that think that branching every sprint is overkill, we don't have a very good alternative. On my last project, we deployed some solutions from the Main trunk. Yeah, that's not good, but it saved on some of the overhead. What's the right way to manage branching/releasing and CI builds, using agile, when we have such short (two-week) sprint cycles?

Read the article
Analyzing Memory Usage: Java vs C++ Negligible?

- by Anthony

How does the memory usage of an integer object written in Java compare\contrast with the memory usage of a integer object written in C++? Is the difference negligible? No difference? A big difference? I'm guessing it's the same because an int is an int regardless of the language (?) The reason why I asked this is because I was reading about the importance of knowing when a program's memory requirements will prevent the programmer from solving a given problem. What fascinated me is the amount of memory required for creating a single Java object. Take for example, an integer object. Correct me if I'm wrong but a Java integer object requires 24 bytes of memory: 4 bytes for its int instance variable 16 bytes of overhead (reference to the object's class, garbage collection info & synchronization info) 4 bytes of padding As another example, a Java array (which is implemented as an object) requires 48+bytes: 24 bytes of header info 16 bytes of object overhead 4 bytes for length 4 bytes for padding plus the memory needed to store the values How do these memory usages compare with the same code written in C++? I used to be oblivious about the memory usage of the C++ and Java programs I wrote, but now that I'm beginning to learn about algorithms, I'm having a greater appreciation for the computer's resources.

Read the article
LOD in modern games

- by Firas Assaad

I'm currently working on my master's thesis about LOD and mesh simplification, and I've been reading many academic papers and articles about the subject. However, I can't find enough information about how LOD is being used in modern games. I know many games use some sort of dynamic LOD for terrain, but what about elsewhere? Level of Detail for 3D Graphics for example points out that discrete LOD (where artists prepare several models in advance) is widely used because of the performance overhead of continuous LOD. That book was published in 2002 however, and I'm wondering if things are different now. There has been some research in performing dynamic LOD using the geometry shader (this paper for example, with its implementation in ShaderX6), would that be used in a modern game? To summarize, my question is about the state of LOD in modern video games, what algorithms are used and why? In particular, is view dependent continuous simplification used or does the runtime overhead make using discrete models with proper blending and impostors a more attractive solution? If discrete models are used, is an algorithm used (e.g. vertex clustering) to generate them offline, do artists manually create the models, or perhaps a combination of both methods is used?

Read the article
How should I describe the process of learning someone else's code? (In an invoicing situation.)

- by MattyG

I have a contract to upgrade some in-house software for a large company. The company has requested multiple feature additions and a few bug fixes. This is my first freelance style job. First, I needed to become familiar with how the application worked - I learnt it as if I was a user. Next, I had to learn how the software worked. I started with broad concepts, and then narrowed down into necessary detail before working on each bug fix and feature. At least at the start of the project, it took me a lot longer to learn the existing code than it did to write the additional features. How can I describe the process of learning the existing code on the invoice? (This part of the company usually does things in-house, so doesn't have much experience dealing with software contractors like me, and I fear they may not understand the overhead of learning someone else's code). I don't want to just tack the learning time onto the actual feature upgrade, because in some cases this would make a 'simple task' look like it took me way too long. I want break the invoice into relevant steps, and communicate that I'm charging for the large overhead of learning someone else's code before being able to add my own to it. Is there a standard way of describing this sort of activity when billing for a job?

Read the article
Dealing with Fine-Grained Cache Entries in Coherence

- by jpurdy

On occasion we have seen significant memory overhead when using very small cache entries. Consider the case where there is a small key (say a synthetic key stored in a long) and a small value (perhaps a number or short string). With most backing maps, each cache entry will require an instance of Map.Entry, and in the case of a LocalCache backing map (used for expiry and eviction), there is additional metadata stored (such as last access time). Given the size of this data (usually a few dozen bytes) and the granularity of Java memory allocation (often a minimum of 32 bytes per object, depending on the specific JVM implementation), it is easily possible to end up with the case where the cache entry appears to be a couple dozen bytes but ends up occupying several hundred bytes of actual heap, resulting in anywhere from a 5x to 10x increase in stated memory requirements. In most cases, this increase applies to only a few small NamedCaches, and is inconsequential -- but in some cases it might apply to one or more very large NamedCaches, in which case it may dominate memory sizing calculations. Ultimately, the requirement is to avoid the per-entry overhead, which can be done either at the application level by grouping multiple logical entries into single cache entries, or at the backing map level, again by combining multiple entries into a smaller number of larger heap objects. At the application level, it may be possible to combine objects based on parent-child or sibling relationships (basically the same requirements that would apply to using partition affinity). If there is no natural relationship, it may still be possible to combine objects, effectively using a Coherence NamedCache as a "map of maps". This forces the application to first find a collection of objects (by performing a partial hash) and then to look within that collection for the desired object. This is most naturally implemented as a collection of entry processors to avoid pulling unnecessary data back to the client (and also to encapsulate that logic within a service layer). At the backing map level, the NIO storage option keeps keys on heap, and so has limited benefit for this situation. The Elastic Data features of Coherence naturally combine entries into larger heap objects, with the caveat that only data -- and not indexes -- can be stored in Elastic Data.

Read the article
Parallelism in .NET – Part 8, PLINQ’s ForAll Method

- by Reed

Parallel LINQ extends LINQ to Objects, and is typically very similar. However, as I previously discussed, there are some differences. Although the standard way to handle simple Data Parellelism is via Parallel.ForEach, it’s possible to do the same thing via PLINQ. PLINQ adds a new method unavailable in standard LINQ which provides new functionality… LINQ is designed to provide a much simpler way of handling querying, including filtering, ordering, grouping, and many other benefits. Reading the description in LINQ to Objects on MSDN, it becomes clear that the thinking behind LINQ deals with retrieval of data. LINQ works by adding a functional programming style on top of .NET, allowing us to express filters in terms of predicate functions, for example. PLINQ is, generally, very similar. Typically, when using PLINQ, we write declarative statements to filter a dataset or perform an aggregation. However, PLINQ adds one new method, which provides a very different purpose: ForAll. The ForAll method is defined on ParallelEnumerable, and will work upon any ParallelQuery<T>. Unlike the sequence operators in LINQ and PLINQ, ForAll is intended to cause side effects. It does not filter a collection, but rather invokes an action on each element of the collection. At first glance, this seems like a bad idea. For example, Eric Lippert clearly explained two philosophical objections to providing an IEnumerable<T>.ForEach extension method, one of which still applies when parallelized. The sole purpose of this method is to cause side effects, and as such, I agree that the ForAll method “violates the functional programming principles that all the other sequence operators are based upon”, in exactly the same manner an IEnumerable<T>.ForEach extension method would violate these principles. Eric Lippert’s second reason for disliking a ForEach extension method does not necessarily apply to ForAll – replacing ForAll with a call to Parallel.ForEach has the same closure semantics, so there is no loss there. Although ForAll may have philosophical issues, there is a pragmatic reason to include this method. Without ForAll, we would take a fairly serious performance hit in many situations. Often, we need to perform some filtering or grouping, then perform an action using the results of our filter. Using a standard foreach statement to perform our action would avoid this philosophical issue: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action foreach (var item in filteredItems) { // These will now run serially item.DoSomething(); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This would cause a loss in performance, since we lose any parallelism in place, and cause all of our actions to be run serially. We could easily use a Parallel.ForEach instead, which adds parallelism to the actions: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action once the filter completes Parallel.ForEach(filteredItems, item => { // These will now run in parallel item.DoSomething(); }); This is a noticeable improvement, since both our filtering and our actions run parallelized. However, there is still a large bottleneck in place here. The problem lies with my comment “perform an action once the filter completes”. Here, we’re parallelizing the filter, then collecting all of the results, blocking until the filter completes. Once the filtering of every element is completed, we then repartition the results of the filter, reschedule into multiple threads, and perform the action on each element. By moving this into two separate statements, we potentially double our parallelization overhead, since we’re forcing the work to be partitioned and scheduled twice as many times. This is where the pragmatism comes into play. By violating our functional principles, we gain the ability to avoid the overhead and cost of rescheduling the work: // Perform an action on the results of our filter collection .AsParallel() .Where( i => i.SomePredicate() ) .ForAll( i => i.DoSomething() ); The ability to avoid the scheduling overhead is a compelling reason to use ForAll. This really goes back to one of the key points I discussed in data parallelism: Partition your problem in a way to place the most work possible into each task. Here, this means leaving the statement attached to the expression, even though it causes side effects and is not standard usage for LINQ. This leads to my one guideline for using ForAll: The ForAll extension method should only be used to process the results of a parallel query, as returned by a PLINQ expression. Any other usage scenario should use Parallel.ForEach, instead.

Read the article
How does one find out which application is associated with an indicator icon?

- by Amos Annoy

It is trivial to do this in Ubuntu 10.04. The question is specific to Ubuntu 12.04. some pertinent references (src: answer to What is the difference between indicators and a system tray?: Here is the documentation for indicators: Application indicators | Ubuntu App Developer libindicate Reference Manual libappindicator Reference Manual also DesktopExperienceTeam/ApplicationIndicators - Ubuntu Wiki ref: How can the application that makes an indicator icon be identified? bookmark: How does one find out which application is associated with an indicator icon in Ubuntu 12.04? is a serious question for reasons & problems outlined below and for which a significant investment has been made and is necessary for remedial purposes. reviewing refs. to find an orchestrated resolution ... (an indicator ap. indicator maybe needed) This has nothing to do (does it?) with right click. How can an indicator's icon in Ubuntu 12.04 be matched with the program responsible for it's manifestation on the top panel? A list of running applications can include all processes using System Monitor. How is the correct matching process found for an indicator? How are the sub-indicator applications identified? These are the aps associated with the components of an indicators drop-down menu. (This was to be a separate question and quite naturally follows up the progression. It is included here as it is obvious there is no provisioning to track down offending either sub or indicator aps. easily.) (The examination of SM points out a rather poignant factor in the faster battery depletion and shortened run time - the ambient quiescent CPU rate in 12.04 is now well over 20% when previously, in 10.04, it was well under 10%, between 5% and 7%! - the huge inordinate cpu overhead originates from Xorg and compiz - after booting the system, only SM is run and All Processes are selected, sorting on %CPU - switching between Resources and Processes profiles the execution overhead problem - running another ap like gedit "Text Editor" briefly gives it CPU priority - going back to S&M several aps. are at the top of the list in order: gnome-system-monitor as expected, then: Xorg, compiz, unity-panel-service, hud-service, with dbus-daemon and kworker/x:y's mixed in with some expected daemons and background tasks like nm-applet - not only do Xorg and compiz require excessive CPU time but their entourage has to come along too! further exacerbating the problem - our compute bound tasks no longer work effectively in the field - reduced battery life, reduced CPU time for custom ap.s etc. - and all this precipitated from an examination of what is going on with the battery ap. indicator - this was and is not a flippant, rhetorical or idle musing but has consequences for the credible deployment of 12.04 to reduce the negative impact of its overhead in a production environment) (I have a problem with the battery indicator - it sometimes has % and other times hh:mm - it is necessary to know the ap. & v. to get more info on controlling same. ditto: There are issues with other indicator aps.: NM vs. iwlist/iwconfig conflict, BT ap. vs RF switch, Battery ap. w/ no suspend/sleep for poor battery runtime, ... the list goes on) Details from: How can I find Application Indicator ID's? suggests looking at: file:///usr/share/indicator-application/ordering-override.keyfile [Ordering Index Overrides] nm-applet=1 gnome-power-manager=2 ibus=3 gst-keyboard-xkb=4 gsd-keyboard-xkb=5 which solves the battery ap. identification, and presumably nm is NetworkManager for the rf icon, but the envelope, blue tooth and speaker indicator aps. are still a mystery. (Also, the ordering is not correlated.) Mind you, it was simple in the past to simply right click to get the About option to find the ap. & v. info. browsing around and about: file:///usr/share/indicator-application/ordering-override.keyfile examined: file:///usr/share/indicators file:///usr/share/indicators/messages/applications/ ... perhaps?/presumably? the information sought may be buried in file:///usr/share/indicators A reference in the comments was given to: What is the difference between indicators and a system tray? quoting from that source ... Unfortunately desktop indicators are not well documented yet: I couldn't find any specification doc ... Well ... the actual document https://wiki.ubuntu.com/DesktopExperienceTeam/ApplicationIndicators#Summary does not help much but it's existential information provides considerable insight ...

Read the article
Why lock-free data structures just aren't lock-free enough

- by Alex.Davies

Today's post will explore why the current ways to communicate between threads don't scale, and show you a possible way to build scalable parallel programming on top of shared memory. The problem with shared memory Soon, we will have dozens, hundreds and then millions of cores in our computers. It's inevitable, because individual cores just can't get much faster. At some point, that's going to mean that we have to rethink our architecture entirely, as millions of cores can't all access a shared memory space efficiently. But millions of cores are still a long way off, and in the meantime we'll see machines with dozens of cores, struggling with shared memory. Alex's tip: The best way for an application to make use of that increasing parallel power is to use a concurrency model like actors, that deals with synchronisation issues for you. Then, the maintainer of the actors framework can find the most efficient way to coordinate access to shared memory to allow your actors to pass messages to each other efficiently. At the moment, NAct uses the .NET thread pool and a few locks to marshal messages. It works well on dual and quad core machines, but it won't scale to more cores. Every time we use a lock, our core performs an atomic memory operation (eg. CAS) on a cell of memory representing the lock, so it's sure that no other core can possibly have that lock. This is very fast when the lock isn't contended, but we need to notify all the other cores, in case they held the cell of memory in a cache. As the number of cores increases, the total cost of a lock increases linearly. A lot of work has been done on "lock-free" data structures, which avoid locks by using atomic memory operations directly. These give fairly dramatic performance improvements, particularly on systems with a few (2 to 4) cores. The .NET 4 concurrent collections in System.Collections.Concurrent are mostly lock-free. However, lock-free data structures still don't scale indefinitely, because any use of an atomic memory operation still involves every core in the system. A sync-free data structure Some concurrent data structures are possible to write in a completely synchronization-free way, without using any atomic memory operations. One useful example is a single producer, single consumer (SPSC) queue. It's easy to write a sync-free fixed size SPSC queue using a circular buffer*. Slightly trickier is a queue that grows as needed. You can use a linked list to represent the queue, but if you leave the nodes to be garbage collected once you're done with them, the GC will need to involve all the cores in collecting the finished nodes. Instead, I've implemented a proof of concept inspired by this intel article which reuses the nodes by putting them in a second queue to send back to the producer. * In all these cases, you need to use memory barriers correctly, but these are local to a core, so don't have the same scalability problems as atomic memory operations. Performance tests I tried benchmarking my SPSC queue against the .NET ConcurrentQueue, and against a standard Queue protected by locks. In some ways, this isn't a fair comparison, because both of these support multiple producers and multiple consumers, but I'll come to that later. I started on my dual-core laptop, running a simple test that had one thread producing 64 bit integers, and another consuming them, to measure the pure overhead of the queue. So, nothing very interesting here. Both concurrent collections perform better than the lock-based one as expected, but there's not a lot to choose between the ConcurrentQueue and my SPSC queue. I was a little disappointed, but then, the .NET Framework team spent a lot longer optimising it than I did. So I dug out a more powerful machine that Red Gate's DBA tools team had been using for testing. It is a 6 core Intel i7 machine with hyperthreading, adding up to 12 logical cores. Now the results get more interesting. As I increased the number of producer-consumer pairs to 6 (to saturate all 12 logical cores), the locking approach was slow, and got even slower, as you'd expect. What I didn't expect to be so clear was the drop-off in performance of the lock-free ConcurrentQueue. I could see the machine only using about 20% of available CPU cycles when it should have been saturated. My interpretation is that as all the cores used atomic memory operations to safely access the queue, they ended up spending most of the time notifying each other about cache lines that need invalidating. The sync-free approach scaled perfectly, despite still working via shared memory, which after all, should still be a bottleneck. I can't quite believe that the results are so clear, so if you can think of any other effects that might cause them, please comment! Obviously, this benchmark isn't realistic because we're only measuring the overhead of the queue. Any real workload, even on a machine with 12 cores, would dwarf the overhead, and there'd be no point worrying about this effect. But would that be true on a machine with 100 cores? Still to be solved. The trouble is, you can't build many concurrent algorithms using only an SPSC queue to communicate. In particular, I can't see a way to build something as general purpose as actors on top of just SPSC queues. Fundamentally, an actor needs to be able to receive messages from multiple other actors, which seems to need an MPSC queue. I've been thinking about ways to build a sync-free MPSC queue out of multiple SPSC queues and some kind of sign-up mechanism. Hopefully I'll have something to tell you about soon, but leave a comment if you have any ideas.

Read the article
The Unspoken - The Why of GC Ergonomics

- by jonthecollector

Do you use GC ergonomics, -XX:+UseAdaptiveSizePolicy, with the UseParallelGC collector? The jist of GC ergonomics for that collector is that it tries to grow or shrink the heap to meet a specified goal. The goals that you can choose are maximum pause time and/or throughput. Don't get too excited there. I'm speaking about UseParallelGC (the throughput collector) so there are definite limits to what pause goals can be achieved. When you say out loud "I don't care about pause times, give me the best throughput I can get" and then say to yourself "Well, maybe 10 seconds really is too long", then think about a pause time goal. By default there is no pause time goal and the throughput goal is high (98% of the time doing application work and 2% of the time doing GC work). You can get more details on this in my very first blog. GC ergonomics The UseG1GC has its own version of GC ergonomics, but I'll be talking only about the UseParallelGC version. If you use this option and wanted to know what it (GC ergonomics) was thinking, try -XX:AdaptiveSizePolicyOutputInterval=1 This will print out information every i-th GC (above i is 1) about what the GC ergonomics to trying to do. For example, UseAdaptiveSizePolicy actions to meet *** throughput goal *** GC overhead (%) Young generation: 16.10 (attempted to grow) Tenured generation: 4.67 (attempted to grow) Tenuring threshold: (attempted to decrease to balance GC costs) = 1 GC ergonomics tries to meet (in order) Pause time goal Throughput goal Minimum footprint The first line says that it's trying to meet the throughput goal. UseAdaptiveSizePolicy actions to meet *** throughput goal *** This run has the default pause time goal (i.e., no pause time goal) so it is trying to reach a 98% throughput. The lines Young generation: 16.10 (attempted to grow) Tenured generation: 4.67 (attempted to grow) say that we're currently spending about 16% of the time doing young GC's and about 5% of the time doing full GC's. These percentages are a decaying, weighted average (earlier contributions to the average are given less weight). The source code is available as part of the OpenJDK so you can take a look at it if you want the exact definition. GC ergonomics is trying to increase the throughput by growing the heap (so says the "attempted to grow"). The last line Tenuring threshold: (attempted to decrease to balance GC costs) = 1 says that the ergonomics is trying to balance the GC times between young GC's and full GC's by decreasing the tenuring threshold. During a young collection the younger objects are copied to the survivor spaces while the older objects are copied to the tenured generation. Younger and older are defined by the tenuring threshold. If the tenuring threshold hold is 4, an object that has survived fewer than 4 young collections (and has remained in the young generation by being copied to the part of the young generation called a survivor space) it is younger and copied again to a survivor space. If it has survived 4 or more young collections, it is older and gets copied to the tenured generation. A lower tenuring threshold moves objects more eagerly to the tenured generation and, conversely a higher tenuring threshold keeps copying objects between survivor spaces longer. The tenuring threshold varies dynamically with the UseParallelGC collector. That is different than our other collectors which have a static tenuring threshold. GC ergonomics tries to balance the amount of work done by the young GC's and the full GC's by varying the tenuring threshold. Want more work done in the young GC's? Keep objects longer in the survivor spaces by increasing the tenuring threshold. This is an example of the output when GC ergonomics is trying to achieve a pause time goal UseAdaptiveSizePolicy actions to meet *** pause time goal *** GC overhead (%) Young generation: 20.74 (no change) Tenured generation: 31.70 (attempted to shrink) The pause goal was set at 50 millisecs and the last GC was 0.415: [Full GC (Ergonomics) [PSYoungGen: 2048K-0K(26624K)] [ParOldGen: 26095K-9711K(28992K)] 28143K-9711K(55616K), [Metaspace: 1719K-1719K(2473K/6528K)], 0.0758940 secs] [Times: user=0.28 sys=0.00, real=0.08 secs] The full collection took about 76 millisecs so GC ergonomics wants to shrink the tenured generation to reduce that pause time. The previous young GC was 0.346: [GC (Allocation Failure) [PSYoungGen: 26624K-2048K(26624K)] 40547K-22223K(56768K), 0.0136501 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] so the pause time there was about 14 millisecs so no changes are needed. If trying to meet a pause time goal, the generations are typically shrunk. With a pause time goal in play, watch the GC overhead numbers and you will usually see the cost of setting a pause time goal (i.e., throughput goes down). If the pause goal is too low, you won't achieve your pause time goal and you will spend all your time doing GC. GC ergonomics is meant to be simple because it is meant to be used by anyone. It was not meant to be mysterious and so this output was added. If you don't like what GC ergonomics is doing, you can turn it off with -XX:-UseAdaptiveSizePolicy, but be pre-warned that you have to manage the size of the generations explicitly. If UseAdaptiveSizePolicy is turned off, the heap does not grow. The size of the heap (and the generations) at the start of execution is always the size of the heap. I don't like that and tried to fix it once (with some help from an OpenJDK contributor) but it unfortunately never made it out the door. I still have hope though. Just a side note. With the default throughput goal of 98% the heap often grows to it's maximum value and stays there. Definitely reduce the throughput goal if footprint is important. Start with -XX:GCTimeRatio=4 for a more modest throughput goal (%20 of the time spent in GC). A higher value means a smaller amount of time in GC (as the throughput goal).

Read the article
Which of these algorithms is best for my goal?

- by JonathonG

I have created a program that restricts the mouse to a certain region based on a black/white bitmap. The program is 100% functional as-is, but uses an inaccurate, albeit fast, algorithm for repositioning the mouse when it strays outside the area. Currently, when the mouse moves outside the area, basically what happens is this: A line is drawn between a pre-defined static point inside the region and the mouse's new position. The point where that line intersects the edge of the allowed area is found. The mouse is moved to that point. This works, but only works perfectly for a perfect circle with the pre-defined point set in the exact center. Unfortunately, this will never be the case. The application will be used with a variety of rectangles and irregular, amorphous shapes. On such shapes, the point where the line drawn intersects the edge will usually not be the closest point on the shape to the mouse. I need to create a new algorithm that finds the closest point to the mouse's new position on the edge of the allowed area. I have several ideas about this, but I am not sure of their validity, in that they may have far too much overhead. While I am not asking for code, it might help to know that I am using Objective C / Cocoa, developing for OS X, as I feel the language being used might affect the efficiency of potential methods. My ideas are: Using a bit of trigonometry to project lines would work, but that would require some kind of intense algorithm to test every point on every line until it found the edge of the region... That seems too resource intensive since there could be something like 200 lines that would have each have to have as many as 200 pixels checked for black/white.... Using something like an A* pathing algorithm to find the shortest path to a black pixel; however, A* seems resource intensive, even though I could probably restrict it to only checking roughly in one direction. It also seems like it will take more time and effort than I have available to spend on this small portion of the much larger project I am working on, correct me if I am wrong and it would not be a significant amount of code (100 lines or around there). Mapping the border of the region before the application begins running the event tap loop. I think I could accomplish this by using my current line-based algorithm to find an edge point and then initiating an algorithm that checks all 8 pixels around that pixel, finds the next border pixel in one direction, and continues to do this until it comes back to the starting pixel. I could then store that data in an array to be used for the entire duration of the program, and have the mouse re-positioning method check the array for the closest pixel on the border to the mouse target position. That last method would presumably execute it's initial border mapping fairly quickly. (It would only have to map between 2,000 and 8,000 pixels, which means 8,000 to 64,000 checked, and I could even permanently store the data to make launching faster.) However, I am uncertain as to how much overhead it would take to scan through that array for the shortest distance for every single mouse move event... I suppose there could be a shortcut to restrict the number of elements in the array that will be checked to a variable number starting with the intersecting point on the line (from my original algorithm), and raise/lower that number to experiment with the overhead/accuracy tradeoff. Please let me know if I am over thinking this and there is an easier way that will work just fine, or which of these methods would be able to execute something like 30 times per second to keep mouse movement smooth, or if you have a better/faster method. I've posted relevant parts of my code below for reference, and included an example of what the area might look like. (I check for color value against a loaded bitmap that is black/white.) // // This part of my code runs every single time the mouse moves. // CGPoint point = CGEventGetLocation(event); float tX = point.x; float tY = point.y; if( is_in_area(tX,tY, mouse_mask)){ // target is inside O.K. area, do nothing }else{ CGPoint target; //point inside restricted region: float iX = 600; // inside x float iY = 500; // inside y // delta to midpoint between iX,iY and tX,tY float dX; float dY; float accuracy = .5; //accuracy to loop until reached do { dX = (tX-iX)/2; dY = (tY-iY)/2; if(is_in_area((tX-dX),(tY-dY),mouse_mask)){ iX += dX; iY += dY; } else { tX -= dX; tY -= dY; } } while (abs(dX)>accuracy || abs(dY)>accuracy); target = CGPointMake(roundf(tX), roundf(tY)); CGDisplayMoveCursorToPoint(CGMainDisplayID(),target); } Here is "is_in_area(int x, int y)" : bool is_in_area(NSInteger x, NSInteger y, NSBitmapImageRep *mouse_mask){ NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init]; NSUInteger pixel[4]; [mouse_mask getPixel:pixel atX:x y:y]; if(pixel[0]!= 0){ [pool release]; return false; } [pool release]; return true; }

Read the article
Optimizing a thread safe Java NIO / Serialization / FIFO Queue [migrated]

- by trialcodr

I've written a thread safe, persistent FIFO for Serializable items. The reason for reinventing the wheel is that we simply can't afford any third party dependencies in this project and want to keep this really simple. The problem is it isn't fast enough. Most of it is undoubtedly due to reading and writing directly to disk but I think we should be able to squeeze a bit more out of it anyway. Any ideas on how to improve the performance of the 'take'- and 'add'-methods? /** * <code>DiskQueue</code> Persistent, thread safe FIFO queue for * <code>Serializable</code> items. */ public class DiskQueue<ItemT extends Serializable> { public static final int EMPTY_OFFS = -1; public static final int LONG_SIZE = 8; public static final int HEADER_SIZE = LONG_SIZE * 2; private InputStream inputStream; private OutputStream outputStream; private RandomAccessFile file; private FileChannel channel; private long offs = EMPTY_OFFS; private long size = 0; public DiskQueue(String filename) { try { boolean fileExists = new File(filename).exists(); file = new RandomAccessFile(filename, "rwd"); if (fileExists) { size = file.readLong(); offs = file.readLong(); } else { file.writeLong(size); file.writeLong(offs); } } catch (FileNotFoundException e) { throw new RuntimeException(e); } catch (IOException e) { throw new RuntimeException(e); } channel = file.getChannel(); inputStream = Channels.newInputStream(channel); outputStream = Channels.newOutputStream(channel); } /** * Add item to end of queue. */ public void add(ItemT item) { try { synchronized (this) { channel.position(channel.size()); ObjectOutputStream s = new ObjectOutputStream(outputStream); s.writeObject(item); s.flush(); size++; file.seek(0); file.writeLong(size); if (offs == EMPTY_OFFS) { offs = HEADER_SIZE; file.writeLong(offs); } notify(); } } catch (IOException e) { throw new RuntimeException(e); } } /** * Clears overhead by moving the remaining items up and shortening the file. */ public synchronized void defrag() { if (offs > HEADER_SIZE && size > 0) { try { long totalBytes = channel.size() - offs; ByteBuffer buffer = ByteBuffer.allocateDirect((int) totalBytes); channel.position(offs); for (int bytes = 0; bytes < totalBytes;) { int res = channel.read(buffer); if (res == -1) { throw new IOException("Failed to read data into buffer"); } bytes += res; } channel.position(HEADER_SIZE); buffer.flip(); for (int bytes = 0; bytes < totalBytes;) { int res = channel.write(buffer); if (res == -1) { throw new IOException("Failed to write buffer to file"); } bytes += res; } offs = HEADER_SIZE; file.seek(LONG_SIZE); file.writeLong(offs); file.setLength(HEADER_SIZE + totalBytes); } catch (IOException e) { throw new RuntimeException(e); } } } /** * Returns the queue overhead in bytes. */ public synchronized long overhead() { return (offs == EMPTY_OFFS) ? 0 : offs - HEADER_SIZE; } /** * Returns the first item in the queue, blocks if queue is empty. */ public ItemT peek() throws InterruptedException { block(); synchronized (this) { if (offs != EMPTY_OFFS) { return readItem(); } } return peek(); } /** * Returns the number of remaining items in queue. */ public synchronized long size() { return size; } /** * Removes and returns the first item in the queue, blocks if queue is empty. */ public ItemT take() throws InterruptedException { block(); try { synchronized (this) { if (offs != EMPTY_OFFS) { ItemT result = readItem(); size--; offs = channel.position(); file.seek(0); if (offs == channel.size()) { truncate(); } file.writeLong(size); file.writeLong(offs); return result; } } return take(); } catch (IOException e) { throw new RuntimeException(e); } } /** * Throw away all items and reset the file. */ public synchronized void truncate() { try { offs = EMPTY_OFFS; file.setLength(HEADER_SIZE); size = 0; } catch (IOException e) { throw new RuntimeException(e); } } /** * Block until an item is available. */ protected void block() throws InterruptedException { while (offs == EMPTY_OFFS) { try { synchronized (this) { wait(); file.seek(LONG_SIZE); offs = file.readLong(); } } catch (IOException e) { throw new RuntimeException(e); } } } /** * Read and return item. */ @SuppressWarnings("unchecked") protected ItemT readItem() { try { channel.position(offs); return (ItemT) new ObjectInputStream(inputStream).readObject(); } catch (ClassNotFoundException e) { throw new RuntimeException(e); } catch (IOException e) { throw new RuntimeException(e); } } }

Read the article
No speed-up with useless printf's using OpenMP

- by t2k32316

I just wrote my first OpenMP program that parallelizes a simple for loop. I ran the code on my dual core machine and saw some speed up when going from 1 thread to 2 threads. However, I ran the same code on a school linux server and saw no speed-up. After trying different things, I finally realized that removing some useless printf statements caused the code to have significant speed-up. Below is the main part of the code that I parallelized: #pragma omp parallel for private(i) for(i = 2; i <= n; i++) { printf("useless statement"); prime[i-2] = is_prime(i); } I guess that the implementation of printf has significant overhead that OpenMP must be duplicating with each thread. What causes this overhead and why can OpenMP not overcome it?

Read the article
Splitting assemblies - finding the balance (avoiding overkill)

- by M.A. Hanin

I'm writing a wide component infrastructure, to be used in my projects. Since not all projects will require every component created, I've been thinking of splitting the component into discrete assemblies, so that every application developed will only be deployed with the required assemblies. I assume that creating an assembly has some storage overhead (the assembly's code, wrapping whatever is inside). Therefore, there must be some limit to the advantage gained by splitting an assembly - a certain point where splitting the assembly is worse than keeping it united (storage-wise and performance-wise). Now, here is the question: how do I know when splitting an assembly is an overkill? P.S I guess there are other overheads to assembly splitting, aside from the storage overhead. If anyone can point out these overheads, it would be much appreciated.

Read the article
Made an interview mistake. Should I try to correct after the fact?

- by AT Developer

Ever been in a situation where you were in an interview, and realized immediately afterwards (after the nervousness wore off) that you did something wrong? I had a phone interview today. I was asked an n-ary tree problem, and coded an algorithm that used a space overhead, then a different algorithm with no space overhead. However, my solution was inefficient, since I traversed the tree top-down rather than bottom-up. The interviewer said I did a good job, but I'm still wondering if he noticed and marked down for my choice of implementation. Should I follow up with an email correcting myself, or just let it and avoid making things worse?

Read the article
What's the (hidden) cost of lazy val? (Scala)

- by Jesper

One handy feature of Scala is lazy val, where the evaluation of a val is delayed until it's necessary (at first access). Ofcourse a lazy val must have some overhead - somewhere Scala must keep track of whether the value has already been evaluated and the evaluation must be synchronized, because multiple threads might try to access the value for the first time at the same time. What exactly is the cost of a lazy val - is there a hidden boolean flag associated with a lazy val to keep track if it has been evaluated or not, what exactly is synchronized and are there any more costs? And a follow-up question: Suppose I do this: class Something { lazy val (x, y) = { ... } } Is this the same as having two separate lazy vals x and y or do I get the overhead only once, for the pair (x, y)?

Read the article
Streaming audio - where to start?

- by Adam Davis

I need to develop an embedded audio streaming server. Requirements: Voice quality or better Intended for low power wifi transmission Broad support in existing software and devices (ie, windows media player, quicktime, vlc, iPhone, Android, etc). Royalty/patent free, or cheap to license Preferences: Low overhead TCP/IP based streaming protocol Voice grade codec (easy to implement in software, no DSP, 32bit CPU if needed) Would be nice if it supported HTML5 browsers, but is there any codec (such as raw) that is supported by the latest browsers that is lower overhead than MP3? Therefore: What are the relevant streaming protocols I should be looking at? What are the relevant codecs I should be looking at? What transport streams should I be looking at? What am I missing, or where else should I be looking for this type of need?

Read the article
Persistent (purely functional) Red-Black trees on disk performance

- by Waneck

I'm studying the best data structures to implement a simple open-source object temporal database, and currently I'm very fond of using Persistent Red-Black trees to do it. My main reasons for using persistent data structures is first of all to minimize the use of locks, so the database can be as parallel as possible. Also it will be easier to implement ACID transactions and even being able to abstract the database to work in parallel on a cluster of some kind. The great thing of this approach is that it makes possible implementing temporal databases almost for free. And this is something quite nice to have, specially for web and for data analysis (e.g. trends). All of this is very cool, but I'm a little suspicious about the overall performance of using a persistent data structure on disk. Even though there are some very fast disks available today, and all writes can be done asynchronously, so a response is always immediate, I don't want to build all application under a false premise, only to realize it isn't really a good way to do it. Here's my line of thought: - Since all writes are done asynchronously, and using a persistent data structure will enable not to invalidate the previous - and currently valid - structure, the write time isn't really a bottleneck. - There are some literature on structures like this that are exactly for disk usage. But it seems to me that these techniques will add more read overhead to achieve faster writes. But I think that exactly the opposite is preferable. Also many of these techniques really do end up with a multi-versioned trees, but they aren't strictly immutable, which is something very crucial to justify the persistent overhead. - I know there still will have to be some kind of locking when appending values to the database, and I also know there should be a good garbage collecting logic if not all versions are to be maintained (otherwise the file size will surely rise dramatically). Also a delta compression system could be thought about. - Of all search trees structures, I really think Red-Blacks are the most close to what I need, since they offer the least number of rotations. But there are some possible pitfalls along the way: - Asynchronous writes -could- affect applications that need the data in real time. But I don't think that is the case with web applications, most of the time. Also when real-time data is needed, another solutions could be devised, like a check-in/check-out system of specific data that will need to be worked on a more real-time manner. - Also they could lead to some commit conflicts, though I fail to think of a good example of when it could happen. Also commit conflicts can occur in normal RDBMS, if two threads are working with the same data, right? - The overhead of having an immutable interface like this will grow exponentially and everything is doomed to fail soon, so this all is a bad idea. Any thoughts? Thanks! edit: There seems to be a misunderstanding of what a persistent data structure is: http://en.wikipedia.org/wiki/Persistent_data_structure

Read the article
What are the Pros & Cons of using SQL Azure for existing apps on dedicated servers

- by Mark Redman

We currently own our own servers, and rent a rack in a datacentre. Looking at the pricing, scalabilty and SLAs for Azure SQL, I am thinking that it might be viable to only use Azure SQL but continue to use our existing applications on our own servers in a datacentres. This will enable us to not worry about the database and its infrastructure so we can concentrate on building an application server farm with disk storeage for files etc. Our application is quite big and has various windows services and parts of it used unmanaged libraries that may not be feasible in the cloud, so probably coulnt have everything in the Azure cloud. The pros: Reduced Total Cost of ownership (no database servers, no sql server licenses) The Cons: I guess there would be overhead in the transfer of data between the Azure Cloud and our datacentre (ie cloud may be in US and datacentre is in the UK) but would this overhead be usable?

Read the article
why compiling code on the fly in Java?

- by javaaudio

i saw some are writing out java files and THEN call the compiler, doesnt that produce overhead while doing it in runtime mode?

Read the article
C++ fixed point library?

- by uj2

I am looking for a free C++ fixed point library (Mainly for use with embedded devices, not for arbitrary precision math). Basically, the requirements are: No unnecessary runtime overhead: whatever can be done at compile time, should be done at compile time. Ability to transparently switch code between fixed and floating point, with no inherent overhead. Fixed point math functions. There's no much point using fixed point if you need to cast back and forth in order to take a square root. Small footprint. Any suggestions?

Read the article
java System.nanoTime is really slow. Is it possible to implement a high performance java profiler?

- by willpowerforever

I did a test and found the overhead of a function call to System.nanoTime() is at least 500 ns on my machine. Seemed that it is very hard to have a high performance java profiler. For enterprise software, suppose a function takes about 350 seconds and has 12,500,000,000 times of method calls. Therefore, the number of calls to System.nanoTime() is: 12,500,000,000 * 2 = 25,000,000,000 (one for start timestamp, one for end timestamp) And the overhead of System.nanoTime in total is: 500 ns * 25,000,000,000 = 500 * 25000 s = 12500000s. Note: all data from real case. Any better way to acquire the timestamp?

Read the article
Consuming WCF REST service in multiple ways (.Net, plain XML)

- by Jan Jongboom

I have become quite frustrated of WCF as I just want to use this simple scenario: Provide a webservice using REST, with a UriTemplate like /method/{param1}/{param2}/ and a 3th parameter that is sent to the service as XML as POST data. Use just plain XML, no SOAP overhead. Be able to generate a proxy in Visual Studio so a .Net using client can easily use the service (don't care about SOAP overhead here). I can create 1. and 2. but no way I can use 3. I tried adding both webHttpBinding and basicHttpBinding endpoints in my services config; I fooled around with the <services/> tag, but I just can't get this working. What am I missing here?! N.B. I checked out this article: http://stackoverflow.com/questions/186631/rest-soap-endpoints-for-a-wcf-service but nothing what is described there seems to work here?!

Read the article

Search Results

Search found 1282 results on 52 pages for 'overhead'.

Page 5/52 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Tom G

- by David V. Corbin

- by utt73

- by Bob Horn

- by Anthony

- by Firas Assaad

- by MattyG

- by jpurdy

- by Reed

- by Amos Annoy

- by Alex.Davies

- by jonthecollector

- by JonathonG

- by trialcodr

- by t2k32316

- by M.A. Hanin

- by AT Developer

- by Jesper

- by Adam Davis

- by Waneck

- by Mark Redman

- by javaaudio

- by uj2

- by willpowerforever

- by Jan Jongboom

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >