Search Results

Search found 26374 results on 1055 pages for 'aaron solution evangelist'.

Page 172/1055 | < Previous Page | 168 169 170 171 172 173 174 175 176 177 178 179  | Next Page >

  • Response.Redirect with a fragment identifier causes unexpected refresh when later using location.has

    - by Matt
    Hi All, I was hoping someone can assist in describing a workaround solution to the following issue I am running into on my ASP.NET website on IE. In the following I will describe the bug and clarify the requirements of the needed solution. Repro Steps: User visits A.aspx A.aspx uses Response.Redirect to bring the user to B.aspx#house On B.aspx#house, the user clicks a button that sets window.location.hash='test' Actual Results: B.aspx is loaded again. The URL now shows B.aspx#test Expected Results: No reload. The URL will just change to B.aspx#test Requirements: Page A must redirect to page B with a fragment identifier in the url Any user action on page B will set the location.hash Setting location.hash must not make page B refresh This must work on IE Notes: Bug only repros on IE (tested on ie6|7|8). Opera, FF, Chrome, Safari all have the expected results of no reload. This error may have nothing to do with ASP.NET, and everything to do with IE For any kind soul willing to have a look at this, I have created a minimal ASP.NET web project to make it easy to repro here

    Read the article

  • DataGridView Winform Auto Insert Ordinal Column

    - by pang
    I want to get some trick for this problem. I have my table like this Product (uuid, Name) and one datagridview to display this information ( dataGridView.DataSouce = Products which is assign on runtime) My problem is that I want to have "No" column on dataGridView which will show like below No | Name 1 | ProdctA 2 | ProductB 3 | ProductC What I do right now is create a "No" field on my product Model, and loop through all the row and assign value to it. I think this is not a good solution. Hope anyone can suggest better solution. Thanks,

    Read the article

  • Moving from WCF RIA Beta to RC: best practices?

    - by Duncan Bayne
    I have an existing WCF RIA project built on the Release Candidate; I'm now moving to the Release version & have discovered many changes. David Scruggs made the following comment on his (MSDN) blog: "If you’ve written anything in SIlverlight 4 RIA Services, you’ll need to rewrite it. There has been a lot of refactoring and namespace moves." Having made a brief attempt to compile the old solution with the new RIA framework I'm inclined to agree. My current plan is to: remove the Silverlight Business Application projects from the Solution rebuild the EF4 items from the database create a new Silverlight Business Application project re-add the files (XAML, CS) from the old Silverlight Business Application project Does this sound like a reasonable approach? I think it's cleaner than trying to manually alter the existing project.

    Read the article

  • z-Index overlay in google maps version 3

    - by fredrik
    Hi, I'm trying to get an overlay in google maps api v3 to appear above all markers. But it seems that the google api always put my overlay with lowest z-index priority. Only solution i've found is to iterate up through the DOM tree and manually set z-index to a higher value. Poor solution. I'm adding my a new div to my overlay with: onclick : function (e) { var index = $(e.target).index(), lngLatXYposition = $.view.overlay.getProjection().fromLatLngToDivPixel(this.getPosition()); icon = this.getIcon(), x = lngLatXYposition.x - icon.anchor.x, y = lngLatXYposition.y - icon.anchor.y $('<div>test</div>').css({ left: x, position: 'absolute', top: y + 'px', zIndex: 1000 }).appendTo('.overlay'); } I've tried every property I could think of while creating my overlay. zIndex, zPriority etc. I'm adding my overlay with: $.view.overlay = new GmapOverlay( { map: view.map.gmap }); And GmapOverlay inherits from new google.maps.OverlayView. Any ideas? ..fredrik

    Read the article

  • Turning off IE8 Compatibility Mode, Good or Bad?

    - by Mike Cornell
    Hopefully this question isn't as subjective as I think it may be. I have an Intranet application which needs to work with IE8 as the enterprise is replacing IE6 as the standard browser. Our testing team found that it did not work in IE8, little did they know that it actually did. Their browsers were set to run IE8 in compatibility mode for Intranet applications. I found that if I set the meta tag for X-UA-Compatible to IE=EmulateIE8 that I could force the browser to render this application as IE8 and the application worked fine. Are there any pitfalls that I don't know about for this solution? If so, is there a better solution?

    Read the article

  • UINavigationController back button half works in iPad landscape orientation

    - by drawnonward
    In an iPad application with a UINavigationController, everything works in portrait mode but in landscape mode the back button sometimes only goes back half a level. That is to say, on the first press the title bar animates as if it was popping a controller, and on the second press it animates the content popping. Has anyone found a solution to this? The contradictory solution in this question did not help. I have a shouldAutorotate method in the navigation controller but no others. -(BOOL) shouldAutorotateToInterfaceOrientation:(UIInterfaceOrientation)inOrientation { return YES; }

    Read the article

  • Push DVCS repository to master without needing codebase

    - by Scorchin
    To work on a client's staging environment I have to connect through a VPN which locks all normal network traffic and prevents any connection to the Internet. This would immediately prevent any of the "normal" VCS solutions from being used as it's not possible to gain access to the server. A solution to this would be to create a DVCS repository (git?) locally and then push changes to the master, as and when needed. There is one flaw in this plan. The entire codebase is around 14GB. To download all of this over the internet would take some time, especially when I'm likely to be working on 3 or 4 different machines in each case. This seems silly and overkill for a DVCS. TL;DR Can any DVCS solution allow you to push to a master server/repo without needing the codebase? Bad example: copy the .git folder (not the 14GB codebase) to another directory and push this to the master once disconnected from the VPN.

    Read the article

  • "port forwarding": redirect calls to webservice at port 8081 to port 80

    - by niba
    Hi, a colleague of mine wrote a webservice that runs on port 8081 of our Windows 2008 Server. He uses the class ServiceHost, afaik this means its a standalone host (no IIS or ASP involvement). Note: I'm new into WCF ;) Now there are some issues with clients behind a firewall blocking the requests to remote port 8081 of our server (where the webservice runs). The easiest solution would be: run the webservice host at port 80 ... But: there is also a Apache 2.2 webserver running on the Windows Server, hosting some websites. By default it runs on port 80. My solution after some researching: use a virtual host to route requests to a virtual host (lets say http://webservice.[hostname]:80) to the webservice host (http://[hostname]:8081). Is this a good idea? Can Apache handle forwards to standalone webservice hosts? It would be nice if someone could lead me on to the right track :) Best regards, Niels

    Read the article

  • hibernate and ehcache replication

    - by cachingsol
    Hi, I am working on a cache replication solution between nodes Node A - master node = Hibernate + Database + Ehcache as secondary cache Node B - regional node= Ehcache as prmiary cache. no Hibernate Node B is used only as near-by cache for query. Now I am updating data (Say SudentInfo) in Node A, it gets persisted and cached correctly. On replication side (I am using JMS) it sends a message to Node B. But the problem is, the message it sends is of instance CacheEntry(deep Inside Element), there is no way to resurrect the original object (StudentInfo). What I got in node B is CacheEntry with some attributes of Students but not actually an Student Object. Please note that I don't need Hibernate session/persistence in Node B, node B is only for fast query, persistence is done through Node A. So has anybody tried any solution like this? Is there any way to convert CacheEntry to actual object? or Tell ehcache to replicate original object rather than CacheEntry. Thanks for the help

    Read the article

  • Keeping the DI-container usage in the composition root in Silverlight and MVVM

    - by adrian hara
    It's not quite clear to me how I can design so I keep the reference to the DI-container in the composition root for a Silverlight + MVVM application. I have the following simple usage scenario: there's a main view (perhaps a list of items) and an action to open an edit view for one single item. So the main view has to create and show the edit view when the user takes the action (e.g. clicks some button). For this I have the following code: public interface IView { IViewModel ViewModel {get; set;} } Then, for each view that I need to be able to create I have an abstract factory, like so public interface ISomeViewFactory { IView CreateView(); } This factory is then declared a dependency of the "parent" view model, like so: public class SomeParentViewModel { public SomeParentViewModel(ISomeViewFactory viewFactory) { // store it } private void OnSomeUserAction() { IView view = viewFactory.CreateView(); dialogService.ShowDialog(view); } } So all is well until here, no DI-container in sight :). Now comes the implementation of ISomeViewFactory: public class SomeViewFactory : ISomeViewFactory { public IView CreateView() { IView view = new SomeView(); view.ViewModel = ???? } } The "????" part is my problem, because the view model for the view needs to be resolved from the DI-container so it gets its dependencies injected. What I don't know is how I can do this without having a dependency to the DI-container anywhere except the composition root. One possible solution would be to have either a dependency on the view model that gets injected into the factory, like so: public class SomeViewFactory : ISomeViewFactory { public SomeViewFactory(ISomeViewModel viewModel) { // store it } public IView CreateView() { IView view = new SomeView(); view.ViewModel = viewModel; } } While this works, it has the problem that since the whole object graph is wired up "statically" (i.e. the "parent" view model will get an instance of SomeViewFactory, which will get an instance of SomeViewModel, and these will live as long as the "parent" view model lives), the injected view model implementation is stateful and if the user opens the child view twice, the second time the view model will be the same instance and have the state from before. I guess I could work around this with an "Initialize" method or something similar, but it doesn't smell quite right. Another solution might be to wrap the DI-container and have the factories depend on the wrapper, but it'd still be a DI-container "in disguise" there :) Any thoughts on this are greatly appreciated. Also, please forgive any mistakes or rule-breaking, since this is my first post on stackoverflow :) Thanks! ps: my current solution is that the factories know about the DI-container, and it's only them and the composition root that have this dependency.

    Read the article

  • Web Page Rendering Capture

    - by Chaitanya
    I start with describing the problem itself. Rather than a problem I'm looking for a better solution. I have a asp.net page which has a bunch of images and a link underneath it, Each image is infact the latest rendering of the link underneath it. I scheduled a bat script which runs every hour to fetch the images through IECapt a web page rendering capture utility. One thing am annoyed about this utility is it takes a lot of time for the 20 images I have and for few because of the flash content it misses to take the actual screenshot of the website. Now I like to know can this rendering be done by traditional programming am not interested in using any utilities. I'm interested in trying this. The solution need not be necessarily a C# based am ready to try in any other language. Because it gives me a chance to learn. Thank you.

    Read the article

  • Default Database Collations got messed up

    - by dominicdinada
    I am using Ubuntu 9.10 with XAMPP ( Lampp "MYSQL 5.1.45 PHPMYADMIN 3.3.1 PHP 5.3.2 ) What my problem is, is that I set up my testing env to debug my scripts locally and when I did so there arose a problem. This problem is that I used firefox's addon SQLinject ME to test for weakness' and upon doing so it caused mysql to change the default local collations; character sets dir /opt/lampp/share/mysql/charsets/ collation connection latin1_general_ci (Global value) latin1_swedish_ci collation database latin1_swedish_ci collation server latin1_swedish_ci I have searched for quite sometime in regards to a solution to this problem and have come up with searching for the db.opt file which stores this information without success. Upon not finding a solution I removed lampp with the "sudo rm -fR /opt" command and reinstall and the problem still persists. I have tried to change the collations manually and still come up with the database displaying latin1_swedish_ci as the default language. Why is this a problem?? Why is it a problem with mysql? Because the application I am testing and debugging locally is built on the CodeIgnitor with Smarty framework and since this combination of framework is built to detect the LOCALES, Rather what the database defaults are I keep getting errors saying no language file for swedish...... Of course I could get the swedish language file to work around this problem but I do not feel the need to make this work around a perminant solution as with time when I move on to projects I will run into simular problems every time that A; When importing database files, backups etc it will default to import such databases as the locale swedish. B; As time passes on I might completly forget of this error and will be back to square one. I have found this code in searches for a fix,which seems to alter the tables to a desired Collaion; $value) { mysql_query("ALTER TABLE $value COLLATE latin1_general_ci"); }} echo "The collation of your database has been successfully changed!"; ? Which is handy to switch collations in One Schema at a time however this is not a fix when a framework doesnt care that the said database is in one langugae. It tests for the Default of the entire server. Someone with any knowledge of a purge or fix to this I would greatly appricate the help. One more final note is that when I was testing I only figured to back up the applications DataBase and not the entire Schema of the install. No matter if I uninstall or reinstall the database still seems to carry these problems.

    Read the article

  • Do Precompiled headers help with rebuilds?

    - by brickner
    I read some of the questions about precompiled headers but couldn't find a direct answer to that. I usually rebuild my entire Visual Studio 2010 solution. One of the projects in my solution is a C++/CLI project. I thought that using precompiled headers in that project will increase the speed of the compilation. After some experiments, it seems that using precompiled headers only slows the rebuild process. Do precompiled headers only help with builds that didn't completely clean the old files?

    Read the article

  • Sql server version error...655 version needed but you computer has 612 or earlier version ? error

    - by xpugur
    Hi i have a error like " dbFileName cannot be opened because it is version 655. This server supports version 612 and earlier. " what should i do ? some friend of mine done a project but i guess he done it with sql 2008 and i have sql 2005 is that the reason why i got this error? can i fix it ? if i setup a newer version of sql does it will solve the problem? www.microsoft.com/express/Database/default.aspx#Installation_Options here sql server 2008 R2 express is available can it be the solution? thank you... by the way i found a link of an update http://www.microsoft.com/downloads/details.aspx?FamilyID=E1109AEF-1AA2-408D-AA0F-9DF094F993BF&displaylang=en is this a solution to my problem ?

    Read the article

  • Windows 2008 RenderFarm Service: CreateProcessAsUser "Session 0 Isolation" and OpenGL

    - by holtavolt
    Hello, I have a legacy Windows server service and (spawned) application that works fine in XP-64 and W2K3, but fails on W2K8. I believe it is because of the new "Session 0 isolation " feature. (Note: As a StackOverflow newbie I'm being limited to one link in this post, so you'll need to scroll to bottom to lookup the links for '' items)* Consequently, I'm looking for code samples/security settings mojo that let you create a new process from a windows service for Windows 2008 Server such that I can restore (and possibly surpass) the previous behavior. I need a solution that: Creates the new process in a non-zero session to get around session-0 isolation restrictions (no access to graphics hardware from session 0) - the official MS line on this is: Because Session 0 is no longer a user session, services that are running in Session 0 do not have access to the video driver. This means that any attempt that a service makes to render graphics fails. Querying the display resolution and color depth in Session 0 reports the correct results for the system up to a maximum of 1920x1200 at 32 bits per pixel. The new process gets a windows station/desktop (e.g. winsta0/default) that can be used to create windows DCs. I've found a solution (that launches OK in an interactive session) for this here: *(Starting an Interactive Client Process in C++ - 2) The windows DC, when used as the basis for an *(OpenGL DescribePixelFormat enumeration - 3), is able to find and use the hardware-accelerated format (on a system appropriately equipped with OpenGL hardware.) Note that our current solution works OK on XP-64 and W2K3, except if a terminal services session is running (VNC works fine.) A solution that also allowed the process to work (i.e. run with OpenGL hardware acceleration even when a terminal services session is open) would be fanastic, although not required. I'm stuck at item #1 currently, and although there are some similar postings that discuss this (like *(this -4), and *(this - 5) - they are not suitable solutions, as there is no guarantee of a user session logged in already to "take" a session id from, nor am I running from a LocalSystem account (I'm running from a domain account for the service, for which I can adjust the privileges of, within reason, although I'd prefer to not have to escalate priorities to include SeTcbPrivileges.) For instance - here's a stub that I think should work, but always returns an error 1314 on the SetTokenInformation call (even though the AdjustTokenPrivileges returned no errors) I've used some alternate strategies involving "LogonUser" as well (instead of opening the existing process token), but I can't seem to swap out the session id. I'm also dubious about using the WTSActiveConsoleSessionId in all cases (for instance, if no interactive user is logged in) - although a quick test of the service running with no sessions logged in seemed to return a reasonable session value (1). I’ve removed error handling for ease of reading (still a bit messy - apologies) //Also tried using LogonUser(..) here OpenProcessToken(GetCurrentProcess(), TOKEN_QUERY | TOKEN_ADJUST_PRIVILEGES | TOKEN_ADJUST_SESSIONID | TOKEN_ADJUST_DEFAULT | TOKEN_ASSIGN_PRIMARY | TOKEN_DUPLICATE, &hToken) GetTokenInformation( hToken, TokenSessionId, &logonSessionId, sizeof(DWORD), &dwTokenLength ) DWORD consoleSessionId = WTSGetActiveConsoleSessionId(); /* Can't use this - requires very elevated privileges (LOCAL only, SeTcbPrivileges as well) if( !WTSQueryUserToken(consoleSessionId, &hToken)) ... */ DuplicateTokenEx(hToken, (TOKEN_QUERY | TOKEN_ADJUST_PRIVILEGES | TOKEN_ADJUST_SESSIONID | TOKEN_ADJUST_DEFAULT | TOKEN_ASSIGN_PRIMARY | TOKEN_DUPLICATE), NULL, SecurityIdentification, TokenPrimary, &hDupToken)) // Look up the LUID for the TCB Name privilege. LookupPrivilegeValue(NULL, SE_TCB_NAME, &tp.Privileges[0].Luid)) // Enable the TCB Name privilege in the token. tp.PrivilegeCount = 1; tp.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED; if (!AdjustTokenPrivileges(hDupToken, FALSE, &tp, sizeof(TOKEN_PRIVILEGES), NULL, 0)) { DisplayError("AdjustTokenPrivileges"); ... } if (GetLastError() == ERROR_NOT_ALL_ASSIGNED) { DEBUG( "Token does not have the necessary privilege.\n"); } else { DEBUG( "No error reported from AdjustTokenPrivileges!\n"); } // Never errors here DEBUG(LM_INFO, "Attempting setting of sessionId to: %d\n", consoleSessionId ); if (!SetTokenInformation(hDupToken, TokenSessionId, &consoleSessionId, sizeof(DWORD))) *** ALWAYS FAILS WITH 1314 HERE *** All the debug output looks fine up until the SetTokenInformation call - I see session 0 is my current process session, and in my case, it's trying to set session 1 (the result of the WTSGetActiveConsoleSessionId). (Note that I'm logged into the W2K8 box via VNC, not RDC) So - a the questions: Is this approach valid, or are all service-initiated processes restricted to session 0 intentionally? Is there a better approach (short of "Launch on logon" and auto-logon for the servers?) Is there something wrong with this code, or a different way to create a process token where I can swap out the session id to indicate I want to spawn the process in a new session? I did try using LogonUser instead of OpenProcessToken, but that didn't work either. (I don't care if all spawned processes share the same non-zero session or not at this point.) Any help much appreciated - thanks! (You need to replace the 'zttp' with 'http' - StackOverflow restriction on one link in my newbie post) 2: http://msdn.microsoft.com/en-us/library/aa379608(VS.85).aspx 3: http://www.opengl.org/resources/faq/technical/mswindows.htm 4: http://stackoverflow.com/questions/2237696/creating-a-process-in-a-non-zero-session-from-a-service-in-windows-2008-server 5: http://stackoverflow.com/questions/1602996/how-can-i-lauch-a-process-which-has-a-ui-from-windows-service

    Read the article

  • Designing a data model in VS2010 and generating ORM code, application

    - by Kay Zed
    Simply put: I have a database design in my head and I now want to use Visual Studio 2010 to create a WPF application. Key is to use the VS2010 tools to take much as possible manual work out of my hands. -The database engine is SQLite -ORM probably through DBLINQ -Use of LINQ -The application can create new, empty database instances -Easily maintainable (changes in data model possible) Q- How do I start designing the database model (visually) in Visual Studio 2010? Should this be an xsd? Do I do this in a separate project? Q- Next, how can I make the most use of VS2010 code generation tools to generate a business layer? Q- I suppose the business layer will be added as a Data Source (in another project?) and from there it's a rather generic data binding solution? I tried finding clear examples of this but it's a jungle out there, the hunt for a solution is NOT converging to one clear method.... :_(

    Read the article

  • Minimal way to render Wiki Syntax (for example MoinMoin Wiki)

    - by Alex
    I enjoy using to Wikis to document all kind of stuff (recently I used MoinMoin, so I am used to that syntax). No I am looking for a more light weight solution, for documents were setting up a moinmoin server is to much hassle. What is the "easiest" way to render a .txt file in Wiki syntax. (for example by displaying it, or converting it to HTML). it should work on Linux, but the more platform in-dependent, the better. Maybe there is even a Javascript based solution?

    Read the article

  • Pair programming tools that are not remote

    - by JonathanTech
    I am currently in a job where we practice serious pair programming on windows machines. We both have a set of keyboards, mice, and we have two monitors, which works well for switching who's the driver really easy, but there are some points in the session that I would like to start writing tests at the same time that my pair is writing implementation. I am wondering if there is any program that would allow me to have effectively two cursors and keyboard focuses on the same computer. If they don't exist then I am willing to experiment with my own solution, but I would like input as to how to best accomplish this. I am most familiar with .Net 3.5 technologies, but I also know Java and am willing to learn C++ to solve this problem. If I was creating the solution myself I would go down the road of being able to grab the input of one hardware device (i.e. a specific mouse that's installed) and prevent Windows from moving the pointer, and instead move my own programs pointer independently.

    Read the article

  • Resharper doesn't play well with XAML?

    - by John Weldon
    I've been noticing a pattern with ReSharper (both 4.5 and 5). Very often (almost always) when I have solution-wide analysis turned on, and WPF code in my solution, ReSharper will mark a number of the .xaml.cs files as being broken. When I navigate to the file, sometimes it magically updates and displays no errors, and other times I have to open other files that are not being correctly read and close them again to force resharper to correctly analyze them. I assume it has something to do with the temporary .cs code that is generated with XAML, but does anyone know why this is actually happening, and if there is a work around? Should I just file a bug report with JetBrains? Does anyone else experience this?

    Read the article

  • Windows gadget in WPF - show while "Show desktop" is activated

    - by Jannick
    Hi I'm trying to create a "gadget" like application using WPF. The goal is to get the same behavior as a normal Windows 7 gadget: No task-bar entry Doesn't show up when you alt+tab windows NOT always on top, applications can be on top Visible while performing 'Aero Peek' Visible while using 'Show desktop' / Windows+D I've been able to accomplish the first four goals, but have been unable to find a solution to the fifth problem. The closest I've come is by using the utility class from http://stackoverflow.com/questions/75785/how-do-you-do-appbar-docking-to-screen-edge-like-winamp-in-wpf, but this turns the app into a "toolbar", thereby banishing applications from the part of the screen where my gadget GUI is placed. I can see that similar questions has been asked previously on Stackoverflow, but those have died out before a solution was found. Posting anyway in the hope that there is now someone out there with the knowledge to solve this =)

    Read the article

  • Is there an algorithm for converting quaternion rotations to Euler angle rotations?

    - by Will Baker
    Is there an existing algorithm for converting a quaternion representation of a rotation to an Euler angle representation? The rotation order for the Euler representation is known and can be any of the six permutations (i.e. xyz, xzy, yxz, yzx, zxy, zyx). I've seen algorithms for a fixed rotation order (usually the NASA heading, bank, roll convention) but not for arbitrary rotation order. Furthermore, because there are multiple Euler angle representations of a single orientation, this result is going to be ambiguous. This is acceptable (because the orientation is still valid, it just may not be the one the user is expecting to see), however it would be even better if there was an algorithm which took rotation limits (i.e. the number of degrees of freedom and the limits on each degree of freedom) into account and yielded the 'most sensible' Euler representation given those constraints. I have a feeling this problem (or something similar) may exist in the IK or rigid body dynamics domains. Solved: I just realised that it might not be clear that I solved this problem by following Ken Shoemake's algorithms from Graphics Gems. I did answer my own question at the time, but it occurs to me it may not be clear that I did so. See the answer, below, for more detail. Just to clarify - I know how to convert from a quaternion to the so-called 'Tait-Bryan' representation - what I was calling the 'NASA' convention. This is a rotation order (assuming the convention that the 'Z' axis is up) of zxy. I need an algorithm for all rotation orders. Possibly the solution, then, is to take the zxy order conversion and derive from it five other conversions for the other rotation orders. I guess I was hoping there was a more 'overarching' solution. In any case, I am surprised that I haven't been able to find existing solutions out there. In addition, and this perhaps should be a separate question altogether, any conversion (assuming a known rotation order, of course) is going to select one Euler representation, but there are in fact many. For example, given a rotation order of yxz, the two representations (0,0,180) and (180,180,0) are equivalent (and would yield the same quaternion). Is there a way to constrain the solution using limits on the degrees of freedom? Like you do in IK and rigid body dynamics? i.e. in the example above if there were only one degree of freedom about the Z axis then the second representation can be disregarded. I have tracked down one paper which could be an algorithm in this pdf but I must confess I find the logic and math a little hard to follow. Surely there are other solutions out there? Is arbitrary rotation order really so rare? Surely every major 3D package that allows skeletal animation together with quaternion interpolation (i.e. Maya, Max, Blender, etc) must have solved exactly this problem?

    Read the article

  • How to use Maven2 with Apache Sling?

    - by Jerome Baum
    Google doesn't help, neither does a stackoverflow.com search, and browsing through Apache Sling documentation doesn't show any solution so here goes: How can I best use Maven2 with Apache Sling? Basically as far as I see the entire application code will need to be placed in the JCR repository, but I need it versioned. There is some documentation about OSGi bundles that could contain resources, but isn't it kind of overkill to create a bundle with a bunch of Java code to talk to Sling when all I want to do is load some files to corresponding URLs? What I was hoping for is something like jetty:run that will automatically deploy a WAR into a new Jetty instance. Ideally the goal for Sling would run Sling and then load the specified content -- is there some Maven2 plugin for this right now? Or is the bundle-based solution the "canonical" way of doing this?

    Read the article

  • Cross-domain policy issues after redirect in Flash

    - by ggambett
    I'm having trouble with a cross-domain policy. I'm using the AS3 Loader to fetch an image; I'm making it load the policy file, like this : var pLoader : Loader = new Loader(); var pContext : LoaderContext = new LoaderContext(); pContext.checkPolicyFile = true; pLoader.load(new URLRequest(sURL), pContext); This works fine as long as the image is directly accessible; however, when the server sends a redirect, the loader follows it but loses the checkPolicyFile flag, resulting in a SecurityException - that is, it doesn't check the cross-domain policy of the redirected URL. I've found a solution here ( http://www.stevensacks.net/2008/12/23/solution-as3-security-error-2122-with-300-redirects ) but looks fragile (that is, looks like it will fail if there's more than one redirect). What would be the correct way of doing this?

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

< Previous Page | 168 169 170 171 172 173 174 175 176 177 178 179  | Next Page >