Search Results

Search found 15301 results on 613 pages for 'global assembly cache'.

Page 137/613 | < Previous Page | 133 134 135 136 137 138 139 140 141 142 143 144  | Next Page >

  • How can I disable DNSSC for Google Apps (GMail) MX records on my authoritative domains?

    - by meinemitternacht
    I'm running a BIND Master / Slave setup with DNSSEC, but some of my domains use Google Apps for e-mail services. Google doesn't support DNSSEC and BIND doesn't like it at all. Log output: Sep 6 17:12:51 srv549 named[5376]: error (broken trust chain) resolving 'ALT2.ASPMX.L.GOOGLE.COM.dlv.isc.org/DLV/IN': 70.32.45.42#53 Sep 6 17:12:51 srv549 named[5376]: error (broken trust chain) resolving 'ALT2.ASPMX.L.GOOGLE.COM/A/IN': 70.32.45.42#53 Sep 6 17:12:51 srv549 named[5376]: error (broken trust chain) resolving 'ALT2.ASPMX.L.GOOGLE.COM/AAAA/IN': 70.32.45.42#53 Sep 6 17:12:51 srv549 named[5376]: validating @0x7f755cb83950: ALT2.ASPMX.L.GOOGLE.COM AAAA: bad cache hit (ALT2.ASPMX.L.GOOGLE.COM.dlv.isc.org/DLV) Sep 6 17:12:51 srv549 named[5376]: error (broken trust chain) resolving 'ALT2.ASPMX.L.GOOGLE.COM/AAAA/IN': 69.147.224.178#53 Sep 6 17:12:51 srv549 named[5376]: validating @0x7f755ca52c30: ALT2.ASPMX.L.GOOGLE.COM A: bad cache hit (ALT2.ASPMX.L.GOOGLE.COM.dlv.isc.org/DLV) Sep 6 17:12:51 srv549 named[5376]: error (broken trust chain) resolving 'ALT2.ASPMX.L.GOOGLE.COM/A/IN': 69.147.224.178#53 Sep 6 17:12:51 srv549 named[5376]: validating @0x7f755ca52c30: ASPMX2.GOOGLEMAIL.COM AAAA: bad cache hit (ASPMX2.GOOGLEMAIL.COM.dlv.isc.org/DLV) Sep 6 17:12:51 srv549 named[5376]: error (broken trust chain) resolving 'ASPMX2.GOOGLEMAIL.COM/AAAA/IN': 70.32.45.42#53 Sep 6 17:12:51 srv549 named[5376]: validating @0x7f755cb83950: ASPMX2.GOOGLEMAIL.COM A: bad cache hit (ASPMX2.GOOGLEMAIL.COM.dlv.isc.org/DLV) Sep 6 17:12:51 srv549 named[5376]: error (broken trust chain) resolving 'ASPMX2.GOOGLEMAIL.COM/A/IN': 70.32.45.42#53 Sep 6 17:12:51 srv549 named[5376]: validating @0x7f754c1b0bd0: ASPMX2.GOOGLEMAIL.COM A: bad cache hit (ASPMX2.GOOGLEMAIL.COM.dlv.isc.org/DLV) Sep 6 17:12:51 srv549 named[5376]: error (broken trust chain) resolving 'ASPMX2.GOOGLEMAIL.COM/A/IN': 70.32.45.42#53 Sep 6 17:12:51 srv549 named[5376]: validating @0x7f754c1a6a30: ASPMX2.GOOGLEMAIL.COM AAAA: bad cache hit (ASPMX2.GOOGLEMAIL.COM.dlv.isc.org/DLV) Sep 6 17:12:51 srv549 named[5376]: error (broken trust chain) resolving 'ASPMX2.GOOGLEMAIL.COM/AAAA/IN': 70.32.45.42#53 Sep 6 17:12:51 srv549 named[5376]: validating @0x7f755cb83950: ASPMX3.GOOGLEMAIL.COM AAAA: bad cache hit (ASPMX3.GOOGLEMAIL.COM.dlv.isc.org/DLV) I'm not absolutely sure this is stopping Google Apps from working, because I just enabled all of the DNSSEC features. Does anyone here have experience with this?

    Read the article

  • Apache caching with mod_headers mod_expires

    - by Aaron Moodie
    Hi, I'm working on homework for uni and was hoping someone could clarify something for me. I need to set up the following: Configure the response header "Cache-Control" to have a "max-age" value of 7 days since access for all image files Configure the response header "Cache-Control" to have a "max-age" value of 5 days since modification for all static HTML files. Configure the response header "Cache-Control" to have a value of "public" for all static HTML and image files. Configure the response header "Cache-Control" to have a value of "private" for all PHP files. My question is whether it is better to use a FilesMatch, or the mod_expires ExpiresByType to best achieve this? I've so far used the following: <FilesMatch "\.(gif|jpe?g|png)$"> ExpiresDefault "access plus 7 days" Header set Cache-Control "public" </FilesMatch> <FilesMatch "\.(html)$"> ExpiresDefault "modification plus 5 days" Header set Cache-Control "public" </FilesMatch> <FilesMatch "\.(php)$"> Header set Cache-Control "private" </FilesMatch> Thanks.

    Read the article

  • Varnish does not start properly (crashes after startup) with no error messages

    - by Matthew Savage
    I am running Varnish (2.0.4 from the Ubuntu unstable apt repository, though I have also used the standard repository) in a test environment (Virtual Machines) on Ubuntu 9.10, soon to be 10.04. When I have a working configuration and the server starts successfully it seems like everything is fine, however if, for whatever reason, I stop and then restart the varnish daemon it doesn't always startup properly, and there are no errors going into syslog or messages to indicate what might be wrong. If I run varnish in debug mode (-d) and issue start when prompted then 7 times out of time it will run, but occasionally it will just shut down 'silently'. My startup command is (the $1 allows for me to pass -d to the script this lives in): varnishd -a :80 $1 \ -T 127.0.0.1:6082 \ -s malloc,1GB \ -f /home/deploy/mysite.vcl \ -u deploy \ -g deploy \ -p obj_workspace=4096 \ -p sess_workspace=262144 \ -p listen_depth=2048 \ -p overflow_max=2000 \ -p ping_interval=2 \ -p log_hashstring=off \ -h classic,5000009 \ -p thread_pool_max=1000 \ -p lru_interval=60 \ -p esi_syntax=0x00000003 \ -p sess_timeout=10 \ -p thread_pools=1 \ -p thread_pool_min=100 \ -p shm_workspace=32768 \ -p thread_pool_add_delay=1 and the VCL looks like this: # nginx/passenger server, HTTP:81 backend default { .host = "127.0.0.1"; .port = "81"; } sub vcl_recv { # Don't cache the /useradmin or /admin path if (req.url ~ "^/(useradmin|admin|session|sessions|login|members|logout|forgot_password)") { pipe; } # If cache is 'regenerating' then allow for old cache to be served set req.grace = 2m; # Forward to cache lookup lookup; } # This should be obvious sub vcl_hit { deliver; } sub vcl_fetch { # See link #16, allow for old cache serving set obj.grace = 2m; if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") { deliver; } remove obj.http.Set-Cookie; remove obj.http.Etag; set obj.http.Cache-Control = "no-cache"; set obj.ttl = 7d; deliver; } Any suggestions would be greatly appreciated, this is driving me absolutely crazy, especially because its such an inconsistent behaviour.

    Read the article

  • Why am I getting a Sharepoint error on a simple "hello world" web page?

    - by Fetchez la vache
    I've been granted admin access to an internal IIS server on which I need to set up a web site. Before doing anything technical I wanted to ensure that I could access the server, but when attempting to access a simple page (that does not refer to Sharepoint) at http://localhost/index.html when logged onto the server directly, I am getting Parser Error Description: An error occurred during the parsing of a resource required to service this request. Please review the following specific parse error details and modify your source file appropriately. Parser Error Message: Could not load file or assembly 'Microsoft.SharePoint' or one of its dependencies. The system cannot find the file specified. Source Error: Line 1: <%@ Assembly Name="Microsoft.SharePoint"%><%@ Application Language="C#" Inherits="Microsoft.SharePoint.ApplicationRuntime.SPHttpApplication" %> Source File: /global.asax Line: 1 Assembly Load Trace: The following information can be helpful to determine why the assembly 'Microsoft.SharePoint' could not be loaded. WRN: Assembly binding logging is turned OFF. To enable assembly bind failure logging, set the registry value [HKLM\Software\Microsoft\Fusion!EnableLog] (DWORD) to 1. Note: There is some performance penalty associated with assembly bind failure logging. To turn this feature off, remove the registry value [HKLM\Software\Microsoft\Fusion!EnableLog]. -------------------------------------------------------------------------------- Version Information: Microsoft .NET Framework Version:2.0.50727.5456; ASP.NET Version:2.0.50727.5456 To be quite honest I know zip about Sharepoint, so why am I getting a sharepoint error on a basic "hello world" html page? Cheers :) Update: I've since supposedly uninstalled Sharepoint, but am still getting this error. Any ideas welcome!

    Read the article

  • ASP.NET, IS7 and IE8 caching?

    - by jdege
    We're suddenly having problems with some of our sites having old versions of .css and .js files show up in the browser. Generally, these problems go away, when the user clears cache in the browser. Is there something we can do either in the code or in IIS7, to convince the browser to not used the cached files? In our weirdest case, we have one customer whose users hit our site, and get an old version of a js file. They clear cache, load the page, get the current version, and the page runs fine. Then they load the file again, and suddenly have the old version, again. Any ideas as to how that might be happening? I can think of three: The browser is somehow holding on to the old version, when we clear cache, and is putting it back in the cache, before the second page load. One of our servers has an old version of the file, and while the first page load after a clear cache pulls it from one of the servers with the current version, second and subsequent page loads pull it from the server that has the old version. The first load after a clear cache goes straight to our servers, while subsequent loads pull the file from the cache on the customer's web proxy. I have to say, all three of those scenarios seem outlandishly unlikely, but it's a repeatable behavior. Any ideas?

    Read the article

  • Does COM interop respect .NET AppDomain boundaries for assembly loading?

    - by Xiaofu
    Here's the core problem: I have a .NET application that is using COM interop in a separate AppDomain. The COM stuff seems to be loading assemblies back into the default domain, rather than the AppDomain from which the COM stuff is being called. What I want to know is: is this expected behaviour, or am I doing something wrong to cause these COM related assemblies to be loaded in the wrong AppDomain? Please see a more detailed description of the situation below... The application consists of 3 assemblies: - the main EXE, the entry point of the application. - common.dll, containing just an interface IController (in the IPlugin style) - controller.dll, containing a Controller class that implements IController and MarshalByRefObject. This class does all the work and uses COM interop to interact with another application. The relevant part of the main EXE looks like this: AppDomain controller_domain = AppDomain.CreateDomain("Controller Domain"); IController c = (IController)controller_domain.CreateInstanceFromAndUnwrap("controller.dll", "MyNamespace.Controller"); result = c.Run(); AppDomain.Unload(controller_domain); The common.dll only contains these 2 things: public enum ControllerRunResult{FatalError, Finished, NonFatalError, NotRun} public interface IController { ControllerRunResult Run(); } And the controller.dll contains this class (which also calls the COM interop stuff): public class Controller: IController, MarshalByRefObject When first running the application, Assembly.GetAssemblies() looks as expected, with common.dll being loaded in both AppDomains, and controller.dll only being loaded into the controller domain. After calling c.Run() however I see that assemblies related to the COM interop stuff have been loaded into the default AppDomain, and NOT in the AppDomain from which the COM interop is taking place. Why might this be occurring? And if you're interested, here's a bit of background: Originally this was a 1 AppDomain application. The COM stuff it interfaces with is a server API which is not stable over long periods of use. When a COMException (with no useful diagnostic information as to its cause) occurs from the COM stuff, the entire application has to restarted before the COM connection will work again. Simply reconnecting to the COM app server results in immediate COM exceptions again. To cope with this I have tried to move the COM interop stuff into a seperate AppDomain so that when the mystery COMExceptions occur I can unload the AppDomain in which it occurs, create a new one and start again, all without having to manually restart the application. That was the theory anyway...

    Read the article

  • Patterns: Local Singleton vs. Global Singleton?

    - by Mike Rosenblum
    There is a pattern that I use from time to time, but I'm not quite sure what it is called. I was hoping that the SO community could help me out. The pattern is pretty simple, and consists of two parts: A singleton factory, which creates objects based on the arguments passed to the factory method. Objects created by the factory. So far this is just a standard "singleton" pattern or "factory pattern". The issue that I'm asking about, however, is that the singleton factory in this case maintains a set of references to every object that it ever creates, held within a dictionary. These references can sometimes be strong references and sometimes weak references, but it can always reference any object that it has ever created. When receiving a request for a "new" object, the factory first searches the dictionary to see if an object with the required arguments already exits. If it does, it returns that object, if not, it returns a new object and also stores a reference to the new object within the dictionary. This pattern prevents having duplicative objects representing the same underlying "thing". This is useful where the created objects are relatively expensive. It can also be useful where these objects perform event handling or messaging - having one object per item being represented can prevent multiple messages/events for a single underlying source. There are probably other reasons to use this pattern, but this is where I've found this useful. My question is: what to call this? In a sense, each object is a singleton, at least with respect to the data it contains. Each is unique. But there are multiple instances of this class, however, so it's not at all a true singleton. In my own personal terminology, I tend to call the factory method a "global singleton". I then call the created objects "local singletons". I sometimes also say that the created objects have "reference equality", meaning that if two variables reference the same data (the same underlying item) then the reference they each hold must be to the same exact object, hence "reference equality". But these are my own invented terms, and I am not sure that they are good ones. Is there standard terminology for this concept? And if not, could some naming suggestions be made? Thanks in advance...

    Read the article

  • The DOS DEBUG Environment

    - by MarkPearl
    Today I thought I would go back in time and have a look at the DEBUG command that has been available since the beginning of dawn in DOS, MS-DOS and Microsoft Windows. up to today I always knew it was there, but had no clue on how to use it so for those that are interested this might be a great geek party trick to pull out when you want the awe the younger generation and want to show them what “real” programming is about. But wait, you will have to do it relatively quickly as it seems like DEBUG was finally dumped from the Windows group in Windows 7. Not to worry, pull out that Windows XP box which will get you even more geek points and you can still poke DEBUG a bit. So, for those that are interested and want to find out a bit about the history of DEBUG read the wiki link here. That all put aside, lets get our hands dirty.. How to Start DEBUG in Windows Make sure your version of Windows supports DEBUG. Open up a console window Make a directory where you want to play with debug – in my instance I called it C221 Enter the directory and type Debug You will get a response with a – as illustrated in the image below…   The commands available in DEBUG There are several commands available in DEBUG. The most common ones are A (Assemble) R (Register) T (Trace) G (Go) D (Dump or Display) U (Unassemble) E (Enter) P (Proceed) N (Name) L (Load) W (Write) H (Hexadecimal) I (Input) O (Output) Q (Quit) I am not going to cover all these commands, but what I will do is go through a few of them briefly. A is for Assemble Command (to write code) The A command translates assembly language statements into machine code. It is quite useful for writing small assembly programs. Below I have written a very basic assembly program. The code typed out is as follows mov ax,0015 mov cx,0023 sub cx,ax mov [120],al mov cl,[120]A nop R is for Register (to jump to a point in memory) The r command turns out to be one of the most frequent commands you will use in DEBUG. It allows you to view the contents of registers and to change their values. It can be used with the following combinations… R – Displays the contents of all the registers R f – Displays the flags register R register_name – Displays the contents of a specific register All three methods are illustrated in the image above T is for Trace (To execute a program step by step) The t command allows us to execute the program step by step. Before we can trace the program we need to point back to the beginning of the program. We do this by typing in r ip, which moves us back to memory point 100. We then type trace which executes the first line of code (line 100) (As shown in the image below starting from the red arrow). You can see from the above image that the register AX now contains 0015 as per our instruction mov ax,0015 You can also see that the IP points to line 0103 which has the MOV CX,0023 command If we type t again it will now execute the second line of the program which moves 23 in the cx register. Again, we can see that the line of code was executed and that the CX register now holds the value of 23. What I would like to highlight now is the section underlined in red. These are the status flags. The ones we are going to look at now are 1st (NV), 4th (PL), 5th (NZ) & 8th (NC) NV means no overflow, the alternate would be OV PL means that the sign of the previous arithmetic operation was Plus, the alternate would be NG (Negative) NZ means that the results of the previous arithmetic operation operation was Not Zero, the alternate would be ZR NC means that No final Carry resulted from the previous arithmetic operation. CY means that there was a final Carry. We could now follow this process of entering the t command until the entire program is executed line by line. G is for Go (To execute a program up to a certain line number) So we have looked at executing a program line by line, which is fine if your program is minuscule BUT totally unpractical if we have any decent sized program. A quicker way to run some lines of code is to use the G command. The ‘g’ command executes a program up to a certain specified point. It can be used in connection with the the reset IP command. You would set your initial point and then run the G command with the line you want to end on. P is for Proceed (Similar to trace but slightly more streamlined) Another command similar to trace is the proceed command. All that the p command does is if it is called and it encounters a CALL, INT or LOOP command it terminates the program execution. In the example below I modified our example program to include an int 20 at the end of it as illustrated in the image below… Then when executing the code when I encountered the int 20 command I typed the P command and the program terminated normally (illustrated below). D is for Dump (or for those more polite Display) So, we have all these assembly lines of code, but if you have ever opened up an exe or com file in a text/hex editor, it looks nothing like assembly code. The D command is a way that we can see what our code looks like in memory (or in a hex editor). If we examined the image above, we can see that Debug is storing our assembly code with each instruction following immediately after the previous one. For instance in memory address 110 we have int and 111 we have 20. If we examine the dump of memory we can see at memory point 110 CD is stored and at memory point 111 20 is stored. U is for Unassemble (or Convert Machine code to Assembly Code) So up to now we have gone through a bunch of commands, but probably one of the most useful is the U command. Let’s say we don’t understand machine code so well and so instead we want to see it in its equivalent assembly code. We can type the U command followed by the start memory point, followed by the end memory point and it will show us the assembly code equivalent of the machine code. E is for a bunch of things… The E command can be used for a bunch of things… One example is to enter data or machine code instructions directly into memory. It can also be used to display the contents of memory locations. I am not going to worry to much about it in this post. N / L / W is for Name, Load & Write So we have written out assembly code in debug, and now we want to save it to disk, or write it as a com file or load it. This is where the N, L & W command come in handy. The n command is used to give a name to the executable program file and is pretty simple to use. The w command is a bit trickier. It saves to disk all the memory between point bx and point cx so you need to specify the bx memory address and the cx memory address for it to write your code. Let’s look at an example illustrated below. You do this by calling the r command followed by the either bx or cx. We can then go to the directory where we were working and will see the new file with the name we specified. The L command is relatively simple. You would first specify the name of the file you would like to load using the N command, and then call the L command. Q is for Quit The last command that I am going to write about in this post is the Q command. Simply put, calling the Q command exits DEBUG. Commands we did not Cover Out of the standard DEBUG commands we covered A, T, G, D, U, E, P, R, N, L & W. The ones we did not cover were H, I & O – I might make mention of these in a later post, but for the basics they are not really needed. Some Useful Resources Please note this post is based on the COS2213 handouts for UNISA A Guide to DEBUG - http://mirror.href.com/thestarman/asm/debug/debug.htm#NT

    Read the article

  • IntelliSense for Razor Hosting in non-Web Applications

    - by Rick Strahl
    When I posted my Razor Hosting article a couple of weeks ago I got a number of questions on how to get IntelliSense to work inside of Visual Studio while editing your templates. The answer to this question is mainly dependent on how Visual Studio recognizes assemblies, so a little background is required. If you open a template just on its own as a standalone file by clicking on it say in Explorer, Visual Studio will open up with the template in the editor, but you won’t get any IntelliSense on any of your related assemblies that you might be using by default. It’ll give Intellisense on base System namespace, but not on your imported assembly types. This makes sense: Visual Studio has no idea what the assembly associations for the single file are. There are two options available to you to make IntelliSense work for templates: Add the templates as included files to your non-Web project Add a BIN folder to your template’s folder and add all assemblies required there Including Templates in your Host Project By including templates into your Razor hosting project, Visual Studio will pick up the project’s assembly references and make IntelliSense available for any of the custom types in your project and on your templates. To see this work I moved the \Templates folder from the samples from the Debug\Bin folder into the project root and included the templates in the WinForm sample project. Here’s what this looks like in Visual Studio after the templates have been included:   Notice that I take my original example and type cast the Context object to the specific type that it actually represents – namely CustomContext – by using a simple code block: @{ CustomContext Model = Context as CustomContext; } After that assignment my Model local variable is in scope and IntelliSense works as expected. Note that you also will need to add any namespaces with the using command in this case: @using RazorHostingWinForm which has to be defined at the very top of a Razor document. BTW, while you can only pass in a single Context 'parameter’ to the template with the default template I’ve provided realize that you can also assign a complex object to Context. For example you could have a container object that references a variety of other objects which you can then cast to the appropriate types as needed: @{ ContextContainer container = Context as ContextContainer; CustomContext Model = container.Model; CustomDAO DAO = container.DAO; } and so forth. IntelliSense for your Custom Template Notice also that you can get IntelliSense for the top level template by specifying an inherits tag at the top of the document: @inherits RazorHosting.RazorTemplateFolderHost By specifying the above you can then get IntelliSense on your base template’s properties. For example, in my base template there are Request and Response objects. This is very useful especially if you end up creating custom templates that include your custom business objects as you can get effectively see full IntelliSense from the ‘page’ level down. For Html Help Builder for example, I’d have a Help object on the page and assuming I have the references available I can see all the way into that Help object without even having to do anything fancy. Note that the @inherits key is a GREAT and easy way to override the base template you normally specify as the default template. It allows you to create a custom template and as long as it inherits from the base template it’ll work properly. Since the last post I’ve also made some changes in the base template that allow hooking up some simple initialization logic so it gets much more easy to create custom templates and hook up custom objects with an IntializeTemplate() hook function that gets called with the Context and a Configuration object. These objects are objects you can pass in at runtime from your host application and then assign to custom properties on your template. For example the default implementation for RazorTemplateFolderHost does this: public override void InitializeTemplate(object context, object configurationData) { // Pick up configuration data and stuff into Request object RazorFolderHostTemplateConfiguration config = configurationData as RazorFolderHostTemplateConfiguration; this.Request.TemplatePath = config.TemplatePath; this.Request.TemplateRelativePath = config.TemplateRelativePath; // Just use the entire ConfigData as the model, but in theory // configData could contain many objects or values to set on // template properties this.Model = config.ConfigData as TModel; } to set up a strongly typed Model and the Request object. You can do much more complex hookups here of course and create complex base template pages that contain all the objects that you need in your code with strong typing. Adding a Bin folder to your Template’s Root Path Including templates in your host project works if you own the project and you’re the only one modifying the templates. However, if you are distributing the Razor engine as a templating/scripting solution as part of your application or development tool the original project is likely not available and so that approach is not practical. Another option you have is to add a Bin folder and add all the related assemblies into it. You can also add a Web.Config file with assembly references for any GAC’d assembly references that need to be associated with the templates. Between the web.config and bin folder Visual Studio can figure out how to provide IntelliSense. The Bin folder should contain: The RazorHosting.dll Your host project’s EXE or DLL – renamed to .dll if it’s an .exe Any external (bin folder) dependent assemblies Note that you most likely also want a reference to the host project if it contains references that are going to be used in templates. Visual Studio doesn’t recognize an EXE reference so you have to rename the EXE to DLL to make it work. Apparently the binary signature of EXE and DLL files are identical and it just works – learn something new everyday… For GAC assembly references you can add a web.config file to your template root. The Web.config file then should contain any full assembly references to GAC components: <configuration> <system.web> <compilation debug="true"> <assemblies> <add assembly="System.Web.Mvc, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" /> <add assembly="System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a" /> <add assembly="System.Web.Extensions, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" /> </assemblies> </compilation> </system.web> </configuration> And with that you should get full IntelliSense. Note that if you add a BIN folder and you also have the templates in your Visual Studio project Visual Studio will complain about reference conflicts as it’s effectively seeing both the project references and the ones in the bin folder. So it’s probably a good idea to use one or the other but not both at the same time :-) Seeing IntelliSense in your Razor templates is a big help for users of your templates. If you’re shipping an application level scripting solution especially it’ll be real useful for your template consumers/users to be able to get some quick help on creating customized templates – after all that’s what templates are all about – easy customization. Making sure that everything is referenced in your bin folder and web.config is a good idea and it’s great to see that Visual Studio (and presumably WebMatrix/Visual Web Developer as well) will be able to pick up your custom IntelliSense in Razor templates.© Rick Strahl, West Wind Technologies, 2005-2011Posted in Razor  

    Read the article

  • Using Node.js as an accelerator for WCF REST services

    - by Elton Stoneman
    Node.js is a server-side JavaScript platform "for easily building fast, scalable network applications". It's built on Google's V8 JavaScript engine and uses an (almost) entirely async event-driven processing model, running in a single thread. If you're new to Node and your reaction is "why would I want to run JavaScript on the server side?", this is the headline answer: in 150 lines of JavaScript you can build a Node.js app which works as an accelerator for WCF REST services*. It can double your messages-per-second throughput, halve your CPU workload and use one-fifth of the memory footprint, compared to the WCF services direct.   Well, it can if: 1) your WCF services are first-class HTTP citizens, honouring client cache ETag headers in request and response; 2) your services do a reasonable amount of work to build a response; 3) your data is read more often than it's written. In one of my projects I have a set of REST services in WCF which deal with data that only gets updated weekly, but which can be read hundreds of times an hour. The services issue ETags and will return a 304 if the client sends a request with the current ETag, which means in the most common scenario the client uses its local cached copy. But when the weekly update happens, then all the client caches are invalidated and they all need the same new data. Then the service will get hundreds of requests with old ETags, and they go through the full service stack to build the same response for each, taking up threads and processing time. Part of that processing means going off to a database on a separate cloud, which introduces more latency and downtime potential.   We can use ASP.NET output caching with WCF to solve the repeated processing problem, but the server will still be thread-bound on incoming requests, and to get the current ETags reliably needs a database call per request. The accelerator solves that by running as a proxy - all client calls come into the proxy, and the proxy routes calls to the underlying REST service. We could use Node as a straight passthrough proxy and expect some benefit, as the server would be less thread-bound, but we would still have one WCF and one database call per proxy call. But add some smart caching logic to the proxy, and share ETags between Node and WCF (so the proxy doesn't even need to call the servcie to get the current ETag), and the underlying service will only be invoked when data has changed, and then only once - all subsequent client requests will be served from the proxy cache.   I've built this as a sample up on GitHub: NodeWcfAccelerator on sixeyed.codegallery. Here's how the architecture looks:     The code is very simple. The Node proxy runs on port 8010 and all client requests target the proxy. If the client request has an ETag header then the proxy looks up the ETag in the tag cache to see if it is current - the sample uses memcached to share ETags between .NET and Node. If the ETag from the client matches the current server tag, the proxy sends a 304 response with an empty body to the client, telling it to use its own cached version of the data. If the ETag from the client is stale, the proxy looks for a local cached version of the response, checking for a file named after the current ETag. If that file exists, its contents are returned to the client as the body in a 200 response, which includes the current ETag in the header. If the proxy does not have a local cached file for the service response, it calls the service, and writes the WCF response to the local cache file, and to the body of a 200 response for the client. So the WCF service is only troubled if both client and proxy have stale (or no) caches.   The only (vaguely) clever bit in the sample is using the ETag cache, so the proxy can serve cached requests without any communication with the underlying service, which it does completely generically, so the proxy has no notion of what it is serving or what the services it proxies are doing. The relative path from the URL is used as the lookup key, so there's no shared key-generation logic between .NET and Node, and when WCF stores a tag it also stores the "read" URL against the ETag so it can be used for a reverse lookup, e.g:   Key Value /WcfSampleService/PersonService.svc/rest/fetch/3 "28cd4796-76b8-451b-adfd-75cb50a50fa6" "28cd4796-76b8-451b-adfd-75cb50a50fa6" /WcfSampleService/PersonService.svc/rest/fetch/3    In Node we read the cache using the incoming URL path as the key and we know that "28cd4796-76b8-451b-adfd-75cb50a50fa6" is the current ETag; we look for a local cached response in /caches/28cd4796-76b8-451b-adfd-75cb50a50fa6.body (and the corresponding .header file which contains the original service response headers, so the proxy response is exactly the same as the underlying service). When the data is updated, we need to invalidate the ETag cache – which is why we need the reverse lookup in the cache. In the WCF update service, we don't need to know the URL of the related read service - we fetch the entity from the database, do a reverse lookup on the tag cache using the old ETag to get the read URL, update the new ETag against the URL, store the new reverse lookup and delete the old one.   Running Apache Bench against the two endpoints gives the headline performance comparison. Making 1000 requests with concurrency of 100, and not sending any ETag headers in the requests, with the Node proxy I get 102 requests handled per second, average response time of 975 milliseconds with 90% of responses served within 850 milliseconds; going direct to WCF with the same parameters, I get 53 requests handled per second, mean response time of 1853 milliseconds, with 90% of response served within 3260 milliseconds. Informally monitoring server usage during the tests, Node maxed at 20% CPU and 20Mb memory; IIS maxed at 60% CPU and 100Mb memory.   Note that the sample WCF service does a database read and sleeps for 250 milliseconds to simulate a moderate processing load, so this is *not* a baseline Node-vs-WCF comparison, but for similar scenarios where the  service call is expensive but applicable to numerous clients for a long timespan, the performance boost from the accelerator is considerable.     * - actually, the accelerator will work nicely for any HTTP request, where the URL (path + querystring) uniquely identifies a resource. In the sample, there is an assumption that the ETag is a GUID wrapped in double-quotes (e.g. "28cd4796-76b8-451b-adfd-75cb50a50fa6") – which is the default for WCF services. I use that assumption to name the cache files uniquely, but it is a trivial change to adapt to other ETag formats.

    Read the article

  • Microsoft Enterprise Library Caching Application Block not thread safe?!

    - by AlanR
    Good aftenoon, I created a super simple console app to test out the Enterprise Library Caching Application Block, and the behavior is blaffling. I'm hoping I screwed something that's easy to fix in the setup. Have each item expire after 5 seconds for testing purposes. Basic setup -- "every second pick a number between 0 and 2. if the cache doesn't already have it, put it in there -- otherwise just grab it from the cache. Do this inside a LOCK statement to ensure thread safety. APP.CONFIG: <configuration> <configSections> <section name="cachingConfiguration" type="Microsoft.Practices.EnterpriseLibrary.Caching.Configuration.CacheManagerSettings, Microsoft.Practices.EnterpriseLibrary.Caching, Version=4.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" /> </configSections> <cachingConfiguration defaultCacheManager="Cache Manager"> <cacheManagers> <add expirationPollFrequencyInSeconds="1" maximumElementsInCacheBeforeScavenging="1000" numberToRemoveWhenScavenging="10" backingStoreName="Null Storage" type="Microsoft.Practices.EnterpriseLibrary.Caching.CacheManager, Microsoft.Practices.EnterpriseLibrary.Caching, Version=4.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" name="Cache Manager" /> </cacheManagers> <backingStores> <add encryptionProviderName="" type="Microsoft.Practices.EnterpriseLibrary.Caching.BackingStoreImplementations.NullBackingStore, Microsoft.Practices.EnterpriseLibrary.Caching, Version=4.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" name="Null Storage" /> </backingStores> </cachingConfiguration> </configuration> C#: using System; using System.Collections.Generic; using System.Linq; using System.Text; using Microsoft.Practices.EnterpriseLibrary.Common; using Microsoft.Practices.EnterpriseLibrary.Caching; using Microsoft.Practices.EnterpriseLibrary.Caching.Expirations; namespace ConsoleApplication1 { class Program { public static ICacheManager cache = CacheFactory.GetCacheManager("Cache Manager"); static void Main(string[] args) { while (true) { System.Threading.Thread.Sleep(1000); // sleep for one second. var key = new Random().Next(3).ToString(); string value; lock (cache) { if (!cache.Contains(key)) { cache.Add(key, key, CacheItemPriority.Normal, null, new SlidingTime(TimeSpan.FromSeconds(5))); } value = (string)cache.GetData(key); } Console.WriteLine("{0} --> '{1}'", key, value); //if (null == value) throw new Exception(); } } } } OUPUT -- How can I prevent the cache to returning nulls? 2 --> '2' 1 --> '1' 2 --> '2' 0 --> '0' 2 --> '2' 0 --> '0' 1 --> '' 0 --> '0' 1 --> '1' 2 --> '' 0 --> '0' 2 --> '2' 0 --> '0' 1 --> '' 2 --> '2' 1 --> '1' Press any key to continue . . . Thanks in advance, -Alan.

    Read the article

  • Motherboard/PSU crippling USB and Sata

    - by celebdor
    I very recently bought a new desktop computer. The motherboard is: Z77MX-D3H and the power supply is ocz zs series 550w. The issue I have is that once I boot to the operating system (I have tried with fedora and Ubuntu with kernels 2.6.38 - 3.4.0), my hard drive (2.5" Magnetic) occasionally makes a power switch noise and it resets. Needless to say, when this drive is the OS drive, the OS crashes. I also have a SSD that works fine with the same OS configurations, but if I have the magnetic hard drive attached as second drive, it works erratically and the reconnects result in corrupted data. I also noticed that whenever I plug an external hard drive USB2.0 or USB3.0 to the computer the issue with the reconnects is even worse: [ 52.198441] sd 7:0:0:0: [sdc] Spinning up disk... [ 57.955811] usb 4-3: USB disconnect, device number 3 [ 58.023687] .ready [ 58.023914] sd 7:0:0:0: [sdc] READ CAPACITY(16) failed [ 58.023919] sd 7:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 58.023932] sd 7:0:0:0: [sdc] Sense not available. [ 58.024061] sd 7:0:0:0: [sdc] READ CAPACITY failed [ 58.024063] sd 7:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 58.024064] sd 7:0:0:0: [sdc] Sense not available. [ 58.024099] sd 7:0:0:0: [sdc] Write Protect is off [ 58.024101] sd 7:0:0:0: [sdc] Mode Sense: 00 00 00 00 [ 58.024135] sd 7:0:0:0: [sdc] Asking for cache data failed [ 58.024137] sd 7:0:0:0: [sdc] Assuming drive cache: write through [ 58.024400] sd 7:0:0:0: [sdc] READ CAPACITY(16) failed [ 58.024402] sd 7:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 58.024405] sd 7:0:0:0: [sdc] Sense not available. [ 58.024448] sd 7:0:0:0: [sdc] READ CAPACITY failed [ 58.024450] sd 7:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 58.024451] sd 7:0:0:0: [sdc] Sense not available. [ 58.024469] sd 7:0:0:0: [sdc] Asking for cache data failed [ 58.024471] sd 7:0:0:0: [sdc] Assuming drive cache: write through [ 58.024472] sd 7:0:0:0: [sdc] Attached SCSI disk [ 58.407725] usb 4-3: new SuperSpeed USB device number 4 using xhci_hcd [ 58.424921] scsi8 : usb-storage 4-3:1.0 [ 59.424185] scsi 8:0:0:0: Direct-Access WD My Passport 0740 1003 PQ: 0 ANSI: 6 [ 59.424406] scsi 8:0:0:1: Enclosure WD SES Device 1003 PQ: 0 ANSI: 6 [ 59.425098] sd 8:0:0:0: Attached scsi generic sg2 type 0 [ 59.425176] ses 8:0:0:1: Attached Enclosure device [ 59.425248] ses 8:0:0:1: Attached scsi generic sg3 type 13 [ 61.845836] sd 8:0:0:0: [sdc] 976707584 512-byte logical blocks: (500 GB/465 GiB) [ 61.845838] sd 8:0:0:0: [sdc] 4096-byte physical blocks [ 61.846336] sd 8:0:0:0: [sdc] Write Protect is off [ 61.846338] sd 8:0:0:0: [sdc] Mode Sense: 47 00 10 08 [ 61.846718] sd 8:0:0:0: [sdc] No Caching mode page present [ 61.846720] sd 8:0:0:0: [sdc] Assuming drive cache: write through [ 61.848105] sd 8:0:0:0: [sdc] No Caching mode page present [ 61.848106] sd 8:0:0:0: [sdc] Assuming drive cache: write through [ 61.857147] sdc: sdc1 [ 61.858915] sd 8:0:0:0: [sdc] No Caching mode page present [ 61.858916] sd 8:0:0:0: [sdc] Assuming drive cache: write through [ 61.858918] sd 8:0:0:0: [sdc] Attached SCSI disk [ 69.875809] usb 4-3: USB disconnect, device number 4 [ 70.275816] usb 4-3: new SuperSpeed USB device number 5 using xhci_hcd [ 70.293063] scsi9 : usb-storage 4-3:1.0 [ 71.292257] scsi 9:0:0:0: Direct-Access WD My Passport 0740 1003 PQ: 0 ANSI: 6 [ 71.292505] scsi 9:0:0:1: Enclosure WD SES Device 1003 PQ: 0 ANSI: 6 [ 71.293527] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ 71.293668] ses 9:0:0:1: Attached Enclosure device [ 71.293758] ses 9:0:0:1: Attached scsi generic sg3 type 13 [ 73.323804] usb 4-3: USB disconnect, device number 5 [ 101.868078] ses 9:0:0:1: Device offlined - not ready after error recovery [ 101.868124] ses 9:0:0:1: Failed to get diagnostic page 0x50000 [ 101.868131] ses 9:0:0:1: Failed to bind enclosure -19 [ 101.868288] sd 9:0:0:0: [sdc] READ CAPACITY(16) failed [ 101.868292] sd 9:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 101.868296] sd 9:0:0:0: [sdc] Sense not available. [ 101.868428] sd 9:0:0:0: [sdc] READ CAPACITY failed [ 101.868434] sd 9:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 101.868439] sd 9:0:0:0: [sdc] Sense not available. [ 101.868468] sd 9:0:0:0: [sdc] Write Protect is off [ 101.868473] sd 9:0:0:0: [sdc] Mode Sense: 00 00 00 00 [ 101.868580] sd 9:0:0:0: [sdc] Asking for cache data failed [ 101.868584] sd 9:0:0:0: [sdc] Assuming drive cache: write through [ 101.868845] sd 9:0:0:0: [sdc] READ CAPACITY(16) failed [ 101.868849] sd 9:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 101.868854] sd 9:0:0:0: [sdc] Sense not available. [ 101.868894] sd 9:0:0:0: [sdc] READ CAPACITY failed [ 101.868898] sd 9:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 101.868903] sd 9:0:0:0: [sdc] Sense not available. [ 101.868961] sd 9:0:0:0: [sdc] Asking for cache data failed [ 101.868966] sd 9:0:0:0: [sdc] Assuming drive cache: write through [ 101.868969] sd 9:0:0:0: [sdc] Attached SCSI disk Now, if I plug the same drive to the powered usb 2.0 hub of my monitor, the issue is not reproduced (at least on a 20h long operation). Also the issue of the usb reconnects is less frequent if the hard drive is plugged before I switch on the computer. Does anybody have some advice as to what I could do? Which is the faulty part/s that I should replace? As for me, I really don't know if to point my finger to the PSU or the Motherboard (I have updated to the latest firmware and checked the BIOS settings several times). EDIT: The reconnects are happening both in the Sata connected drives and the USBX connected drives.

    Read the article

  • Nginx & Apache Cannot get try_files to work with permalinks

    - by tcherokee
    I have been working on this for the past two weeks not and for some reason I cannot seem to get nginx's try_files to work with my wordpress permalinks. I am hoping someone will be able to tell me where I am going wrong and also hopefully tell me if I made any major errors with my configurations as well (I am an nginx newbie... but learning :) ). Here are my Configuration files nginx.conf user www-data; worker_processes 4; pid /var/run/nginx.pid; events { worker_connections 768; # multi_accept on; } http { ## # Basic Settings ## sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; # server_tokens off; # server_names_hash_bucket_size 64; # server_name_in_redirect off; include /etc/nginx/mime.types; default_type application/octet-stream; ## # Logging Settings ## # Defines the cache log format, cache log location # and the main access log location. log_format cache '***$time_local ' '$upstream_cache_status ' 'Cache-Control: $upstream_http_cache_control ' 'Expires: $upstream_http_expires ' '$host ' '"$request" ($status) ' '"$http_user_agent" ' ; access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } mydomain.com.conf server { listen 123.456.78.901:80; # IP goes here. server_name www.mydomain.com mydomain.com; #root /var/www/mydomain.com/prod; index index.php; ## mydomain.com -> www.mydomain.com (301 - Permanent) if ($host !~* ^(www|dev)) { rewrite ^/(.*)$ $scheme://www.$host/$1 permanent; } # Add trailing slash to */wp-admin requests. rewrite /wp-admin$ $scheme://$host$uri/ permanent; # All media (including uploaded) is under wp-content/ so # instead of caching the response from apache, we're just # going to use nginx to serve directly from there. location ~* ^/(wp-content|wp-includes)/(.*)\.(jpg|png|gif|jpeg|css|js|m$ root /var/www/mydomain.com/prod; } # Don't cache these pages. location ~* ^/(wp-admin|wp-login.php) { proxy_pass http://backend; } location / { if ($http_cookie ~* "wordpress_logged_in_[^=]*=([^%]+)%7C") { set $do_not_cache 1; } proxy_cache_key "$scheme://$host$request_uri $do_not_cache"; proxy_cache main; proxy_pass http://backend; proxy_cache_valid 30m; # 200, 301 and 302 will be cached. # Fallback to stale cache on certain errors. # 503 is deliberately missing, if we're down for maintenance # we want the page to display. #try_files $uri $uri/ /index.php?q=$uri$args; #try_files $uri =404; proxy_cache_use_stale error timeout invalid_header http_500 http_502 http_504 http_404; } # Cache purge URL - works in tandem with WP plugin. # location ~ /purge(/.*) { # proxy_cache_purge main "$scheme://$host$1"; # } # No access to .htaccess files. location ~ /\.ht { deny all; } } # End server gzip.conf # Gzip Configuration. gzip on; gzip_disable msie6; gzip_static on; gzip_comp_level 4; gzip_proxied any; gzip_types text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript; proxy.conf # Set proxy headers for the passthrough proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_max_temp_file_size 0; client_max_body_size 10m; client_body_buffer_size 128k; proxy_connect_timeout 90; proxy_send_timeout 90; proxy_read_timeout 90; proxy_buffer_size 4k; proxy_buffers 4 32k; proxy_busy_buffers_size 64k; proxy_temp_file_write_size 64k; add_header X-Cache-Status $upstream_cache_status; backend.conf upstream backend { # Defines backends. # Extracting here makes it easier to load balance # in the future. Needs to be specific IP as Plesk # doesn't have Apache listening on localhost. ip_hash; server 127.0.0.1:8001; # IP goes here. } cache.conf # Proxy cache and temp configuration. proxy_cache_path /var/www/nginx_cache levels=1:2 keys_zone=main:10m max_size=1g inactive=30m; proxy_temp_path /var/www/nginx_temp; proxy_cache_key "$scheme://$host$request_uri"; proxy_redirect off; # Cache different return codes for different lengths of time # We cached normal pages for 10 minutes proxy_cache_valid 200 302 10m; proxy_cache_valid 404 1m; The two commented out try_files in location \ of the mydomain config files are the ones I tried. This error I found in the error log can be found below. ...rewrite or internal redirection cycle while internally redirecting to "/index.php" Thanks in advance

    Read the article

  • The type or namespace cannot be found (are you missing a using directive or an assembly reference?)

    - by Kumu
    I get the following error when I try to compile my C# program: The type or namespace name 'Login' could not be found (are you missing a using directive or an assembly reference?) using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Windows.Forms; namespace FootballLeague { public partial class MainMenu : Form { FootballLeagueDatabase footballLeagueDatabase; Game game; Team team; Login login; //Error here public MainMenu() { InitializeComponent(); changePanel(1); } public MainMenu(FootballLeagueDatabase footballLeagueDatabaseIn) { InitializeComponent(); footballLeagueDatabase = footballLeagueDatabaseIn; } private void Form_Loaded(object sender, EventArgs e) { } private void gameButton_Click(object sender, EventArgs e) { int option = 0; changePanel(option); } private void scoreboardButton_Click(object sender, EventArgs e) { int option = 1; changePanel(option); } private void changePanel(int optionIn) { gamePanel.Hide(); scoreboardPanel.Hide(); string title = "Football League System"; switch (optionIn) { case 0: gamePanel.Show(); this.Text = title + " - Game Menu"; break; case 1: scoreboardPanel.Show(); this.Text = title + " - Display Menu"; break; } } private void logoutButton_Click(object sender, EventArgs e) { login = new Login(); login.Show(); this.Hide(); } Login.cs class: using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Windows.Forms; namespace FootballLeagueSystem { public partial class Login : Form { MainMenu menu; public Login() { InitializeComponent(); } private void administratorLoginButton_Click(object sender, EventArgs e) { string username1 = "08247739"; string password1 = "08247739"; if ((userNameTxt.Text.Length) == 0) MessageBox.Show("Please enter your username!"); else if ((passwordTxt.Text.Length) == 0) MessageBox.Show("Please enter your password!"); else if (userNameTxt.Text.Equals("") || passwordTxt.Text.Equals("")) MessageBox.Show("Invalid Username or Password!"); else { if (this.userNameTxt.Text == username1 && this.passwordTxt.Text == password1) MessageBox.Show("Welcome Administrator!", "Administrator Login"); menu = new MainMenu(); menu.Show(); this.Hide(); } } private void managerLoginButton_Click(object sender, EventArgs e) { { string username2 = "1111"; string password2 = "1111"; if ((userNameTxt.Text.Length) == 0) MessageBox.Show("Please enter your username!"); else if ((passwordTxt.Text.Length) == 0) MessageBox.Show("Please enter your password!"); else if (userNameTxt.Text.Equals("") && passwordTxt.Text.Equals("")) MessageBox.Show("Invalid Username or Password!"); else { if (this.userNameTxt.Text == username2 && this.passwordTxt.Text == password2) MessageBox.Show("Welcome Manager!", "Manager Login"); menu = new MainMenu(); menu.Show(); this.Hide(); } } } private void cancelButton_Click(object sender, EventArgs e) { this.Close(); } } } Where is the error? What am I doing wrong?

    Read the article

  • How to reduce virtual memory by optimising my PHP code?

    - by iCeR
    My current code (see below) uses 147MB of virtual memory! My provider has allocated 100MB by default and the process is killed once run, causing an internal error. The code is utilising curl multi and must be able to loop with more than 150 iterations whilst still minimizing the virtual memory. The code below is only set at 150 iterations and still causes the internal server error. At 90 iterations the issue does not occur. How can I adjust my code to lower the resource use / virtual memory? Thanks! <?php function udate($format, $utimestamp = null) { if ($utimestamp === null) $utimestamp = microtime(true); $timestamp = floor($utimestamp); $milliseconds = round(($utimestamp - $timestamp) * 1000); return date(preg_replace('`(?<!\\\\)u`', $milliseconds, $format), $timestamp); } $url = 'https://www.testdomain.com/'; $curl_arr = array(); $master = curl_multi_init(); for($i=0; $i<150; $i++) { $curl_arr[$i] = curl_init(); curl_setopt($curl_arr[$i], CURLOPT_URL, $url); curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl_arr[$i], CURLOPT_SSL_VERIFYHOST, FALSE); curl_setopt($curl_arr[$i], CURLOPT_SSL_VERIFYPEER, FALSE); curl_multi_add_handle($master, $curl_arr[$i]); } do { curl_multi_exec($master,$running); } while($running > 0); for($i=0; $i<150; $i++) { $results = curl_multi_getcontent ($curl_arr[$i]); $results = explode("<br>", $results); echo $results[0]; echo "<br>"; echo $results[1]; echo "<br>"; echo udate('H:i:s:u'); echo "<br><br>"; usleep(100000); } ?> Processor Information Total processors: 8 Processor #1 Vendor GenuineIntel Name Intel(R) Xeon(R) CPU E5405 @ 2.00GHz Speed 1995.120 MHz Cache 6144 KB Processor #2 Vendor GenuineIntel Name Intel(R) Xeon(R) CPU E5405 @ 2.00GHz Speed 1995.120 MHz Cache 6144 KB Processor #3 Vendor GenuineIntel Name Intel(R) Xeon(R) CPU E5405 @ 2.00GHz Speed 1995.120 MHz Cache 6144 KB Processor #4 Vendor GenuineIntel Name Intel(R) Xeon(R) CPU E5405 @ 2.00GHz Speed 1995.120 MHz Cache 6144 KB Processor #5 Vendor GenuineIntel Name Intel(R) Xeon(R) CPU E5405 @ 2.00GHz Speed 1995.120 MHz Cache 6144 KB Processor #6 Vendor GenuineIntel Name Intel(R) Xeon(R) CPU E5405 @ 2.00GHz Speed 1995.120 MHz Cache 6144 KB Processor #7 Vendor GenuineIntel Name Intel(R) Xeon(R) CPU E5405 @ 2.00GHz Speed 1995.120 MHz Cache 6144 KB Processor #8 Vendor GenuineIntel Name Intel(R) Xeon(R) CPU E5405 @ 2.00GHz Speed 1995.120 MHz Cache 6144 KB Memory Information Memory for crash kernel (0x0 to 0x0) notwithin permissible range Memory: 8302344k/9175040k available (2176k kernel code, 80272k reserved, 901k data, 228k init, 7466304k highmem) System Information Linux server3.server.com 2.6.18-194.17.1.el5PAE #1 SMP Wed Sep 29 13:31:51 EDT 2010 i686 i686 i386 GNU/Linux Physical Disks SCSI device sda: 1952448512 512-byte hdwr sectors (999654 MB) sda: Write Protect is off sda: Mode Sense: 03 00 00 08 SCSI device sda: drive cache: write back SCSI device sda: 1952448512 512-byte hdwr sectors (999654 MB) sda: Write Protect is off sda: Mode Sense: 03 00 00 08 SCSI device sda: drive cache: write back sd 0:1:0:0: Attached scsi disk sda sd 4:0:0:0: Attached scsi removable disk sdb sd 0:1:0:0: Attached scsi generic sg4 type 0 sd 4:0:0:0: Attached scsi generic sg7 type 0 Current Memory Usage total used free shared buffers cached Mem: 8306672 7847384 459288 0 487912 6444548 -/+ buffers/cache: 914924 7391748 Swap: 4095992 496 4095496 Total: 12402664 7847880 4554784 Current Disk Usage Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 898G 307G 546G 36% / /dev/sda1 99M 19M 76M 20% /boot none 4.0G 0 4.0G 0% /dev/shm /var/tmpMnt 4.0G 1.8G 2.0G 48% /tmp

    Read the article

  • Using Stub Objects

    - by user9154181
    Having told the long and winding tale of where stub objects came from and how we use them to build Solaris, I'd like to focus now on the the nuts and bolts of building and using them. The following new features were added to the Solaris link-editor (ld) to support the production and use of stub objects: -z stub This new command line option informs ld that it is to build a stub object rather than a normal object. In this mode, it accepts the same command line arguments as usual, but will quietly ignore any objects and sharable object dependencies. STUB_OBJECT Mapfile Directive In order to build a stub version of an object, its mapfile must specify the STUB_OBJECT directive. When producing a non-stub object, the presence of STUB_OBJECT causes the link-editor to perform extra validation to ensure that the stub and non-stub objects will be compatible. ASSERT Mapfile Directive All data symbols exported from the object must have an ASSERT symbol directive in the mapfile that declares them as data and supplies the size, binding, bss attributes, and symbol aliasing details. When building the stub objects, the information in these ASSERT directives is used to create the data symbols. When building the real object, these ASSERT directives will ensure that the real object matches the linking interface presented by the stub. Although ASSERT was added to the link-editor in order to support stub objects, they are a general purpose feature that can be used independently of stub objects. For instance you might choose to use an ASSERT directive if you have a symbol that must have a specific address in order for the object to operate properly and you want to automatically ensure that this will always be the case. The material presented here is derived from a document I originally wrote during the development effort, which had the dual goals of providing supplemental materials for the stub object PSARC case, and as a set of edits that were eventually applied to the Oracle Solaris Linker and Libraries Manual (LLM). The Solaris 11 LLM contains this information in a more polished form. Stub Objects A stub object is a shared object, built entirely from mapfiles, that supplies the same linking interface as the real object, while containing no code or data. Stub objects cannot be used at runtime. However, an application can be built against a stub object, where the stub object provides the real object name to be used at runtime, and then use the real object at runtime. When building a stub object, the link-editor ignores any object or library files specified on the command line, and these files need not exist in order to build a stub. Since the compilation step can be omitted, and because the link-editor has relatively little work to do, stub objects can be built very quickly. Stub objects can be used to solve a variety of build problems: Speed Modern machines, using a version of make with the ability to parallelize operations, are capable of compiling and linking many objects simultaneously, and doing so offers significant speedups. However, it is typical that a given object will depend on other objects, and that there will be a core set of objects that nearly everything else depends on. It is necessary to impose an ordering that builds each object before any other object that requires it. This ordering creates bottlenecks that reduce the amount of parallelization that is possible and limits the overall speed at which the code can be built. Complexity/Correctness In a large body of code, there can be a large number of dependencies between the various objects. The makefiles or other build descriptions for these objects can become very complex and difficult to understand or maintain. The dependencies can change as the system evolves. This can cause a given set of makefiles to become slightly incorrect over time, leading to race conditions and mysterious rare build failures. Dependency Cycles It might be desirable to organize code as cooperating shared objects, each of which draw on the resources provided by the other. Such cycles cannot be supported in an environment where objects must be built before the objects that use them, even though the runtime linker is fully capable of loading and using such objects if they could be built. Stub shared objects offer an alternative method for building code that sidesteps the above issues. Stub objects can be quickly built for all the shared objects produced by the build. Then, all the real shared objects and executables can be built in parallel, in any order, using the stub objects to stand in for the real objects at link-time. Afterwards, the executables and real shared objects are kept, and the stub shared objects are discarded. Stub objects are built from a mapfile, which must satisfy the following requirements. The mapfile must specify the STUB_OBJECT directive. This directive informs the link-editor that the object can be built as a stub object, and as such causes the link-editor to perform validation and sanity checking intended to guarantee that an object and its stub will always provide identical linking interfaces. All function and data symbols that make up the external interface to the object must be explicitly listed in the mapfile. The mapfile must use symbol scope reduction ('*'), to remove any symbols not explicitly listed from the external interface. All global data exported from the object must have an ASSERT symbol attribute in the mapfile to specify the symbol type, size, and bss attributes. In the case where there are multiple symbols that reference the same data, the ASSERT for one of these symbols must specify the TYPE and SIZE attributes, while the others must use the ALIAS attribute to reference this primary symbol. Given such a mapfile, the stub and real versions of the shared object can be built using the same command line for each, adding the '-z stub' option to the link for the stub object, and omiting the option from the link for the real object. To demonstrate these ideas, the following code implements a shared object named idx5, which exports data from a 5 element array of integers, with each element initialized to contain its zero-based array index. This data is available as a global array, via an alternative alias data symbol with weak binding, and via a functional interface. % cat idx5.c int _idx5[5] = { 0, 1, 2, 3, 4 }; #pragma weak idx5 = _idx5 int idx5_func(int index) { if ((index 4)) return (-1); return (_idx5[index]); } A mapfile is required to describe the interface provided by this shared object. % cat mapfile $mapfile_version 2 STUB_OBJECT; SYMBOL_SCOPE { _idx5 { ASSERT { TYPE=data; SIZE=4[5] }; }; idx5 { ASSERT { BINDING=weak; ALIAS=_idx5 }; }; idx5_func; local: *; }; The following main program is used to print all the index values available from the idx5 shared object. % cat main.c #include <stdio.h> extern int _idx5[5], idx5[5], idx5_func(int); int main(int argc, char **argv) { int i; for (i = 0; i The following commands create a stub version of this shared object in a subdirectory named stublib. elfdump is used to verify that the resulting object is a stub. The command used to build the stub differs from that of the real object only in the addition of the -z stub option, and the use of a different output file name. This demonstrates the ease with which stub generation can be added to an existing makefile. % cc -Kpic -G -M mapfile -h libidx5.so.1 idx5.c -o stublib/libidx5.so.1 -zstub % ln -s libidx5.so.1 stublib/libidx5.so % elfdump -d stublib/libidx5.so | grep STUB [11] FLAGS_1 0x4000000 [ STUB ] The main program can now be built, using the stub object to stand in for the real shared object, and setting a runpath that will find the real object at runtime. However, as we have not yet built the real object, this program cannot yet be run. Attempts to cause the system to load the stub object are rejected, as the runtime linker knows that stub objects lack the actual code and data found in the real object, and cannot execute. % cc main.c -L stublib -R '$ORIGIN/lib' -lidx5 -lc % ./a.out ld.so.1: a.out: fatal: libidx5.so.1: open failed: No such file or directory Killed % LD_PRELOAD=stublib/libidx5.so.1 ./a.out ld.so.1: a.out: fatal: stublib/libidx5.so.1: stub shared object cannot be used at runtime Killed We build the real object using the same command as we used to build the stub, omitting the -z stub option, and writing the results to a different file. % cc -Kpic -G -M mapfile -h libidx5.so.1 idx5.c -o lib/libidx5.so.1 Once the real object has been built in the lib subdirectory, the program can be run. % ./a.out [0] 0 0 0 [1] 1 1 1 [2] 2 2 2 [3] 3 3 3 [4] 4 4 4 Mapfile Changes The version 2 mapfile syntax was extended in a number of places to accommodate stub objects. Conditional Input The version 2 mapfile syntax has the ability conditionalize mapfile input using the $if control directive. As you might imagine, these directives are used frequently with ASSERT directives for data, because a given data symbol will frequently have a different size in 32 or 64-bit code, or on differing hardware such as x86 versus sparc. The link-editor maintains an internal table of names that can be used in the logical expressions evaluated by $if and $elif. At startup, this table is initialized with items that describe the class of object (_ELF32 or _ELF64) and the type of the target machine (_sparc or _x86). We found that there were a small number of cases in the Solaris code base in which we needed to know what kind of object we were producing, so we added the following new predefined items in order to address that need: NameMeaning ...... _ET_DYNshared object _ET_EXECexecutable object _ET_RELrelocatable object ...... STUB_OBJECT Directive The new STUB_OBJECT directive informs the link-editor that the object described by the mapfile can be built as a stub object. STUB_OBJECT; A stub shared object is built entirely from the information in the mapfiles supplied on the command line. When the -z stub option is specified to build a stub object, the presence of the STUB_OBJECT directive in a mapfile is required, and the link-editor uses the information in symbol ASSERT attributes to create global symbols that match those of the real object. When the real object is built, the presence of STUB_OBJECT causes the link-editor to verify that the mapfiles accurately describe the real object interface, and that a stub object built from them will provide the same linking interface as the real object it represents. All function and data symbols that make up the external interface to the object must be explicitly listed in the mapfile. The mapfile must use symbol scope reduction ('*'), to remove any symbols not explicitly listed from the external interface. All global data in the object is required to have an ASSERT attribute that specifies the symbol type and size. If the ASSERT BIND attribute is not present, the link-editor provides a default assertion that the symbol must be GLOBAL. If the ASSERT SH_ATTR attribute is not present, or does not specify that the section is one of BITS or NOBITS, the link-editor provides a default assertion that the associated section is BITS. All data symbols that describe the same address and size are required to have ASSERT ALIAS attributes specified in the mapfile. If aliased symbols are discovered that do not have an ASSERT ALIAS specified, the link fails and no object is produced. These rules ensure that the mapfiles contain a description of the real shared object's linking interface that is sufficient to produce a stub object with a completely compatible linking interface. SYMBOL_SCOPE/SYMBOL_VERSION ASSERT Attribute The SYMBOL_SCOPE and SYMBOL_VERSION mapfile directives were extended with a symbol attribute named ASSERT. The syntax for the ASSERT attribute is as follows: ASSERT { ALIAS = symbol_name; BINDING = symbol_binding; TYPE = symbol_type; SH_ATTR = section_attributes; SIZE = size_value; SIZE = size_value[count]; }; The ASSERT attribute is used to specify the expected characteristics of the symbol. The link-editor compares the symbol characteristics that result from the link to those given by ASSERT attributes. If the real and asserted attributes do not agree, a fatal error is issued and the output object is not created. In normal use, the link editor evaluates the ASSERT attribute when present, but does not require them, or provide default values for them. The presence of the STUB_OBJECT directive in a mapfile alters the interpretation of ASSERT to require them under some circumstances, and to supply default assertions if explicit ones are not present. See the definition of the STUB_OBJECT Directive for the details. When the -z stub command line option is specified to build a stub object, the information provided by ASSERT attributes is used to define the attributes of the global symbols provided by the object. ASSERT accepts the following: ALIAS Name of a previously defined symbol that this symbol is an alias for. An alias symbol has the same type, value, and size as the main symbol. The ALIAS attribute is mutually exclusive to the TYPE, SIZE, and SH_ATTR attributes, and cannot be used with them. When ALIAS is specified, the type, size, and section attributes are obtained from the alias symbol. BIND Specifies an ELF symbol binding, which can be any of the STB_ constants defined in <sys/elf.h>, with the STB_ prefix removed (e.g. GLOBAL, WEAK). TYPE Specifies an ELF symbol type, which can be any of the STT_ constants defined in <sys/elf.h>, with the STT_ prefix removed (e.g. OBJECT, COMMON, FUNC). In addition, for compatibility with other mapfile usage, FUNCTION and DATA can be specified, for STT_FUNC and STT_OBJECT, respectively. TYPE is mutually exclusive to ALIAS, and cannot be used in conjunction with it. SH_ATTR Specifies attributes of the section associated with the symbol. The section_attributes that can be specified are given in the following table: Section AttributeMeaning BITSSection is not of type SHT_NOBITS NOBITSSection is of type SHT_NOBITS SH_ATTR is mutually exclusive to ALIAS, and cannot be used in conjunction with it. SIZE Specifies the expected symbol size. SIZE is mutually exclusive to ALIAS, and cannot be used in conjunction with it. The syntax for the size_value argument is as described in the discussion of the SIZE attribute below. SIZE The SIZE symbol attribute existed before support for stub objects was introduced. It is used to set the size attribute of a given symbol. This attribute results in the creation of a symbol definition. Prior to the introduction of the ASSERT SIZE attribute, the value of a SIZE attribute was always numeric. While attempting to apply ASSERT SIZE to the objects in the Solaris ON consolidation, I found that many data symbols have a size based on the natural machine wordsize for the class of object being produced. Variables declared as long, or as a pointer, will be 4 bytes in size in a 32-bit object, and 8 bytes in a 64-bit object. Initially, I employed the conditional $if directive to handle these cases as follows: $if _ELF32 foo { ASSERT { TYPE=data; SIZE=4 } }; bar { ASSERT { TYPE=data; SIZE=20 } }; $elif _ELF64 foo { ASSERT { TYPE=data; SIZE=8 } }; bar { ASSERT { TYPE=data; SIZE=40 } }; $else $error UNKNOWN ELFCLASS $endif I found that the situation occurs frequently enough that this is cumbersome. To simplify this case, I introduced the idea of the addrsize symbolic name, and of a repeat count, which together make it simple to specify machine word scalar or array symbols. Both the SIZE, and ASSERT SIZE attributes support this syntax: The size_value argument can be a numeric value, or it can be the symbolic name addrsize. addrsize represents the size of a machine word capable of holding a memory address. The link-editor substitutes the value 4 for addrsize when building 32-bit objects, and the value 8 when building 64-bit objects. addrsize is useful for representing the size of pointer variables and C variables of type long, as it automatically adjusts for 32 and 64-bit objects without requiring the use of conditional input. The size_value argument can be optionally suffixed with a count value, enclosed in square brackets. If count is present, size_value and count are multiplied together to obtain the final size value. Using this feature, the example above can be written more naturally as: foo { ASSERT { TYPE=data; SIZE=addrsize } }; bar { ASSERT { TYPE=data; SIZE=addrsize[5] } }; Exported Global Data Is Still A Bad Idea As you can see, the additional plumbing added to the Solaris link-editor to support stub objects is minimal. Furthermore, about 90% of that plumbing is dedicated to handling global data. We have long advised against global data exported from shared objects. There are many ways in which global data does not fit well with dynamic linking. Stub objects simply provide one more reason to avoid this practice. It is always better to export all data via a functional interface. You should always hide your data, and make it available to your users via a function that they can call to acquire the address of the data item. However, If you do have to support global data for a stub, perhaps because you are working with an already existing object, it is still easilily done, as shown above. Oracle does not like us to discuss hypothetical new features that don't exist in shipping product, so I'll end this section with a speculation. It might be possible to do more in this area to ease the difficulty of dealing with objects that have global data that the users of the library don't need. Perhaps someday... Conclusions It is easy to create stub objects for most objects. If your library only exports function symbols, all you have to do to build a faithful stub object is to add STUB_OBJECT; and then to use the same link command you're currently using, with the addition of the -z stub option. Happy Stubbing!

    Read the article

  • How to handle BL cache for multiple web applications?

    - by Eran Betzalel
    I recently received a project that contains multiple web applications with no MVC structure. For starters I've created a library (DLL) that will contain the main Business Logic. The problem is with Caching - If I use the current web context cache object than I might end up with duplicate caching (as the web context will be different for every application). I'm currently thinking about implementing a simple caching mechanism with a singleton pattern that will allow the different web sites (aka different application domains) to share their "caching wisdom". I'd like to know what is the best way to solve this problem.

    Read the article

  • Custom ASP.NET MVC cache controllers in a shared hosting environment?

    - by Daniel Crenna
    I'm using custom controllers that cache static resources (CSS, JS, etc.) and images. I'm currently working with a hosting provider that has set me up under a full trust profile. Despite being in full trust, my controllers fail because the caching strategy relies on the File class to directly open a resource file prior to treatment and storage in memory. Is this something that would likely occur in all full trust shared hosting environments or is this specific to my host? The static files live within my application's structure and not in an arbitrary server path. It seems to me that custom caching would require code to access the file directly, and am hoping someone else has dealt with this issue.

    Read the article

  • Is it possible to cache JSP bytecode to avoid recompiles w/ Tomcat?

    - by Computer Guru
    Hi, Is there any way of caching the bytecode for JSP webapps/ In particular, using Tomcat as the Java servlet? I'm getting really fed up of Tomcat taking up all the CPU for 10 minutes while it compiles 4 different webapps every time I restart it.... I'm already using Jikes to "speed up" the compiles, but it's really killing me. The code does not change unless the webapp is upgraded (very rarely), and I cannot believe that there is no way to cache the compiled java bytecode instead of recompiling it each and every time. I'd appreciate any advice on the matter!

    Read the article

  • ASP.Net: Is it possible to cache the js-proxies generated by scriptmanager?

    - by AndreasKnudsen
    We have the following code: <asp:ScriptManager runat="server"> ... <Services> <asp:ServiceReference Path="~/JSONServices/ProfileService.svc" /> </Services> ... This results in a Javascript proxy found in /JSONServices/ProfileService.svc/js. This Javascript has content expiry set to the same time it was called (so it is never cached on the client). Is it possible to have the clients cache these proxies for some time?

    Read the article

  • Cache an FTP connection via session variables for use via AJAX?

    - by Chad Johnson
    I'm working on a Ruby web Application that uses the Net::FTP library. One part of it allows users to interact with an FTP site via AJAX. When the user does something, and AJAX call is made, and then Ruby reconnects to the FTP server, performs an action, and outputs information. Every time the AJAX call is made, Ruby has to reconnect to the FTP server, and that's slow. Is there a way I could cache this FTP connection? I've tried caching in the session hash, but "We're sorry, but something went wrong" is displayed, and a TCP dump is outputted in my logs whenever I attempt to store it in the session hash. I haven't tried memcache yet. Any suggestions?

    Read the article

  • ASP.NET - How can I cache user details for the duration of their visit?

    - by rockinthesixstring
    I've built a Repository that gets user details Public Function GetUserByOpenID(ByVal openid As String) As User Implements IUserRepository.GetUserByOpenID Dim user = (From u In dc.Users Where u.OpenID = openid Select u).FirstOrDefault Return user End Function And I'd like to be able to pull those details down IF the user is logged in AND IF the cached data is null. What is the best way to create a User object that contains all of the users details, and persist it across the entire site for the duration of their visit? I Was trying this in my Global.asax, but I'm not really happy using Session variables. I'd rather have a single object with all the details inside. Private Sub BaseGlobal_AcquireRequestState(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.AcquireRequestState If Session("UserName") Is Nothing AndAlso User.Identity.IsAuthenticated Then Dim repo As UrbanNow.Core.IUserRepository = New UrbanNow.Core.UserRepository Dim _user As New UrbanNow.Core.User _user = repo.GetUserByOpenID(User.Identity.Name) Session("UserName") = _user.UserName() Session("UserID") = _user.ID End If End Sub

    Read the article

  • Is it possible to evaluate a JSP only once per session, and cache it after that?

    - by Bears will eat you
    My site has a nav menu that is dynamically built as a separate JSP, and included in most pages via <jsp:include />. The contents and styling of the menu are determined by which pages the user does and doesn't have access to. The set of accessible pages is retrieved from the database when a user logs in, and not during the course of a session. So, there's really no need to re-evaluate the nav menu code every time the user requests a page. Is there an easy way to generate the markup from the JSP only once per session, and cache/reuse it during the session?

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Best tool to understand source

    - by cache
    I have a source code for a project. I am working on porting it to another device as the current source code is for a linux environment. I am having some error on the newly ported code. So i was thinking it would be best to once again understand the whole source code and this will help me localise the errors. Now the problem is that i tried using 'gdb' for linux to debug the code but it does not help. So is there any tool that I can use to trace the program line by line ? By doing so i can understand the program flow. Please Help !

    Read the article

< Previous Page | 133 134 135 136 137 138 139 140 141 142 143 144  | Next Page >