Search Results

Search found 44 results on 2 pages for 'alois kraus'.

Page 1/2 | 1 2 | Next Page >

digital signature - detached Pkcs#7 to XML-DSIG

- by Alois

Hi! I am struggling with the following scenario: an XML-message is created client-side and digitally signed using mozilla's window.crypto.signText. After signing, the message and the signature are transmitted via a webservice (.net) to the server. Everything is fine until this point. on the server, the XML shall be included in another XML-document, which is publicly accessible. The signature should be published as well in order to grant non-repudiation. Q: Is there a smooth option to convert the detached Pkcs#7 into XML-DSIG (e.g. functionality within the .net framework)? Q2: Or is it possible to create the XML-DSIG already client-side without using external plugins? Tnx for your help! Alois Paulin

Read the article
What Every Developer Should Know About MSI Components

- by Alois Kraus

Hopefully nothing. But if you have to do more than simple XCopy deployment and you need to support updates, upgrades and perhaps side by side scenarios there is no way around MSI. You can create Msi files with a Visual Studio Setup project which is severely limited or you can use the Windows Installer Toolset. I cannot talk about WIX with my German colleagues because WIX has a very special meaning. It is funny to always use the long name when I talk about deployment possibilities. Alternatively you can buy commercial tools which help you to author Msi files but I am not sure how good they are. Given enough pain with existing solutions you can also learn the MSI Apis and create your own packaging solution. If I were you I would use either a commercial visual tool when you do easy deployments or use the free Windows Installer Toolset. Once you know the WIX schema you can create well formed wix xml files easily with any editor. Then you can “compile” from the wxs files your Msi package. Recently I had the “pleasure” to get my hands dirty with C++ (again) and the MSI technology. Installation is a complex topic but after several month of digging into arcane MSI issues I can safely say that there should exist an easier way to install and update files as today. I am not alone with this statement as John Robbins (creator of the cool tool Paraffin) states: “.. It's a brittle and scary API in Windows …”. To help other people struggling with installation issues I present you the advice I (and others) found useful and what will happen if you ignore this advice. What is a MSI file? A MSI file is basically a database with tables which reference each other to control how your un/installation should work. The basic idea is that you declare via these tables what you want to install and MSI controls the how to get your stuff onto or off your machine. Your “stuff” consists usually of files, registry keys, shortcuts and environment variables. Therefore the most important tables are File, Registry, Environment and Shortcut table which define what will be un/installed. The key to master MSI is that every resource (file, registry key ,…) is associated with a MSI component. The actual payload consists of compressed files in the CAB format which can either be embedded into the MSI file or reside beside the MSI file or in a subdirectory below it. To examine MSI files you need Orca a free MSI editor provided by MS. There is also another free editor called Super Orca which does support diffs between MSI and it does not lock the MSI files. But since Orca comes with a shell extension I tend to use only Orca because it is so easy to right click on a MSI file and open it with this tool. How Do I Install It? Double click it. This does work for fresh installations as well as major upgrades. Updates need to be installed via the command line via msiexec /i <msi> REINSTALL=ALL REINSTALLMODE=vomus This tells the installer to reinstall all already installed features (new features will NOT be installed). The reinstallmode letters do force an overwrite of the old cached package in the %WINDIR%\Installer folder. All files, shortcuts and registry keys are redeployed if they are missing or need to be replaced with a newer version. When things did go really wrong and you want to overwrite everything unconditionally use REINSTALLMODE=vamus. How To Enable MSI Logs? You can download a MSI from Microsoft which installs some registry keys to enable full MSI logging. The log files can be found in your %TEMP% folder and are called MSIxxxx.log. Alternatively you can add to your msiexec command line the option msiexec …. /l*vx <LogFileName> Personally I find it rather strange that * does not mean full logging. To really get all logs I need to add v and x which is documented in the msiexec help but I still find this behavior unintuitive. What are MSI components? The whole MSI logic is bound to the concept of MSI components. Nearly every msi table has a Component column which binds an installable resource to a component. Below are the screenshots of the FeatureComponents and Component table of an example MSI. The Feature table defines basically the feature hierarchy. To find out what belongs to a feature you need to look at the FeatureComponents table where for each feature the components are listed which will be installed when a feature is installed. The MSI components are defined in the Component table. This table has as first column the component name and as second column the component id which is a GUID. All resources you want to install belong to a MSI component. Therefore nearly all MSI tables have a Component_ column which contains the component name. If you look e.g. a the File table you see that every file belongs to a component which is true for all other tables which install resources. The component table is the glue between all other tables which contain the resources you want to install. So far so easy. Why is MSI then so complex? Most MSI problems arise from the fact that you did violate a MSI component rule in one or the other way. When you install a feature the reference count for all components belonging to this feature will increase by one. If your component is installed by more than one feature it will get a higher refcount. When you uninstall a feature its refcount will drop by one. Interesting things happen if the component reference count reaches zero: Then all associated resources will be deleted. That looks like a reasonable thing and it is. What it makes complex are the strange component rules you have to follow. Below are some important component rules from the Tao of the Windows Installer … Rule 16: Follow Component Rules Components are a very important part of the Installer technology. They are the means whereby the Installer manages the resources that make up your application. The SDK provides the following guidelines for creating components in your package: Never create two components that install a resource under the same name and target location. If a resource must be duplicated in multiple components, change its name or target location in each component. This rule should be applied across applications, products, product versions, and companies. Two components must not have the same key path file. This is a consequence of the previous rule. The key path value points to a particular file or folder belonging to the component that the installer uses to detect the component. If two components had the same key path file, the installer would be unable to distinguish which component is installed. Two components however may share a key path folder. Do not create a version of a component that is incompatible with all previous versions of the component. This rule should be applied across applications, products, product versions, and companies. Do not create components containing resources that will need to be installed into more than one directory on the user’s system. The installer installs all of the resources in a component into the same directory. It is not possible to install some resources into subdirectories. Do not include more than one COM server per component. If a component contains a COM server, this must be the key path for the component. Do not specify more than one file per component as a target for the Start menu or a Desktop shortcut. … And these rules do not even talk about component ids, update packages and upgrades which you need to understand as well. Lets suppose you install two MSIs (MSI1 and MSI2) which have the same ComponentId but different component names. Both do install the same file. What will happen when you uninstall MSI2? Hm the file should stay there. But the component names are different. Yes and yes. But MSI uses not use the component name as key for the refcount. Instead the ComponentId column of the Component table which contains a GUID is used as identifier under which the refcount is stored. The components Comp1 and Comp2 are identical from the MSI perspective. After the installation of both MSIs the Component with the Id {100000….} has a refcount of two. After uninstallation of one MSI there is still a refcount of one which drops to zero just as expected when we uninstall the last msi. Then the file which was the same for both MSIs is deleted. You should remember that MSI keeps a refcount across MSIs for components with the same component id. MSI does manage components not the resources you did install. The resources associated with a component are then and only then deleted when the refcount of the component reaches zero. The dependencies between features, components and resources can be described as relations. m,k are numbers >= 1, n can be 0. Inside a MSI the following relations are valid Feature 1 –> n Components Component 1 –> m Features Component 1 –> k Resources These relations express that one feature can install several components and features can share components between them. Every (meaningful) component will install at least one resource which means that its name (primary key to stay in database speak) does occur in some other table in the Component column as value which installs some resource. Lets make it clear with an example. We want to install with the feature MainFeature some files a registry key and a shortcut. We can then create components Comp1..3 which are referenced by the resources defined in the corresponding tables. Feature Component Registry File Shortcuts MainFeature Comp1 RegistryKey1 MainFeature Comp2 File.txt MainFeature Comp3 File2.txt Shortcut to File2.txt It is illegal that the same resource is part of more than one component since this would break the refcount mechanism. Lets illustrate this: Feature ComponentId Resource Reference Count Feature1 {1000-…} File1.txt 1 Feature2 {2000-….} File1.txt 1 The installation part works well but what happens when you uninstall Feature2? Component {20000…} gets a refcount of zero where MSI deletes all resources belonging to this component. In this case File1.txt will be deleted. But Feature1 still has another component {10000…} with a refcount of one which means that the file was deleted too early. You just have ruined your installation. To fix it you then need to click on the Repair button under Add/Remove Programs to let MSI reinstall any missing registry keys, files or shortcuts. The vigilant reader might has noticed that there is more in the Component table. Beside its name and GUID it has also an installation directory, attributes and a KeyPath. The KeyPath is a reference to a file or registry key which is used to detect if the component is already installed. This becomes important when you repair or uninstall a component. To find out if the component is already installed MSI checks if the registry key or file referenced by the KeyPath property does exist. When it does not exist it assumes that it was either already uninstalled (can lead to problems during uninstall) or that it is already installed and all is fine. Why is this detail so important? Lets put all files into one component. The KeyPath should be then one of the files of your component to check if it was installed or not. When your installation becomes corrupt because a file was deleted you cannot repair it with the Repair button under Add/Remove Programs because MSI checks the component integrity via the Resource referenced by its KeyPath. As long as you did not delete the KeyPath file MSI thinks all resources with your component are installed and never executes any repair action. You get even more trouble when you try to remove files during an upgrade (you cannot remove files during an update) from your super component which contains all files. The only way out and therefore best practice is to assign for every resource you want to install an extra component. This ensures painless updatability and repairs and you have much less effort to remove specific files during an upgrade. In effect you get this best practice relation Feature 1 –> n Components Component 1 –> 1 Resources MSI Component Rules Rule 1 – One component per resource Every resource you want to install (file, registry key, value, environment value, shortcut, directory, …) must get its own component which does never change between versions as long as the install location is the same. Penalty If you add more than one resources to a component you will break the repair capability of MSI because the KeyPath is used to check if the component needs repair. MSI ComponentId Files MSI 1.0 {1000} File1-5 MSI 2.0 {2000} File2-5 You want to remove File1 in version 2.0 of your MSI. Since you want to keep the other files you create a new component and add them there. MSI will delete all files if the component refcount of {1000} drops to zero. The files you want to keep are added to the new component {2000}. Ok that does work if your upgrade does uninstall the old MSI first. This will cause the refcount of all previously installed components to reach zero which means that all files present in version 1.0 are deleted. But there is a faster way to perform your upgrade by first installing your new MSI and then remove the old one. If you choose this upgrade path then you will loose File1-5 after your upgrade and not only File1 as intended by your new component design. Rule 2 – Only add, never remove resources from a component If you did follow rule 1 you will not need Rule 2. You can add in a patch more resources to one component. That is ok. But you can never remove anything from it. There are tricky ways around that but I do not want to encourage bad component design. Penalty Lets assume you have 2 MSI files which install under the same component one file MSI1 MSI2 {1000} - ComponentId {1000} – ComponentId File1.txt File2.txt When you install and uninstall both MSIs you will end up with an installation where either File1 or File2 will be left. Why? It seems that MSI does not store the resources associated with each component in its internal database. Instead Windows will simply query the MSI that is currently uninstalled for all resources belonging to this component. Since it will find only one file and not two it will only uninstall one file. That is the main reason why you never can remove resources from a component! Rule 3 Never Remove A Component From an Update MSI. This is the same as if you change the GUID of a component by accident for your new update package. The resulting update package will not contain all components from the previously installed package. Penalty When you remove a component from a feature MSI will set the feature state during update to Advertised and log a warning message into its log file when you did enable MSI logging. SELMGR: ComponentId '{2DCEA1BA-3E27-E222-484C-D0D66AEA4F62}' is registered to feature 'xxxxxxx, but is not present in the Component table. Removal of components from a feature is not supported! MSI (c) (24:44) [07:53:13:436]: SELMGR: Removal of a component from a feature is not supported Advertised means that MSI treats all components of this feature as not installed. As a consequence during uninstall nothing will be removed since it is not installed! This is not only bad because uninstall does no longer work but this feature will also not get the required patches. All other features which have followed component versioning rules for update packages will be updated but the one faulty feature will not. This results in very hard to find bugs why an update was only partially successful. Things got better with Windows Installer 4.5 but you cannot rely on that nobody will use an older installer. It is a good idea to add to your update msiexec call MSIENFORCEUPGRADECOMPONENTRULES=1 which will abort the installation if you did violate this rule.

Read the article
Run Your Tests With Any NUnit Version

- by Alois Kraus

I always thought that the NUnit test runners and the test assemblies need to reference the same NUnit.Framework version. I wanted to be able to run my test assemblies with the newest GUI runner (currently 2.5.3). Ok so all I need to do is to reference both NUnit versions the newest one and the official for the current project. There is a nice article form Kent Bogart online how to reference the same assembly multiple times with different versions. The magic works by referencing one NUnit assembly with an alias which does prefix all types inside it. Then I could decorate my tests with the TestFixture and Test attribute from both NUnit versions and everything worked fine except that this was ugly. After playing a little bit around to make it simpler I found that I did not need to reference both NUnit.Framework assemblies. The test runners do not require the TestFixture and Test attribute in their specific version. That is really neat since the test runners are instructed by attributes what to do in a declarative way there is really no need to tie the runners to a specific version. At its core NUnit has this little method hidden to find matching TestFixtures and Tests public bool CanBuildFrom(Type type) { if (!(!type.IsAbstract || type.IsSealed)) { return false; } return (((Reflect.HasAttribute(type, "NUnit.Framework.TestFixtureAttribute", true) || Reflect.HasMethodWithAttribute(type, "NUnit.Framework.TestAttribute" , true)) || Reflect.HasMethodWithAttribute(type, "NUnit.Framework.TestCaseAttribute" , true)) || Reflect.HasMethodWithAttribute(type, "NUnit.Framework.TheoryAttribute" , true)); } That is versioning and backwards compatibility at its best. I tell NUnit what to do by decorating my tests classes with NUnit Attributes and the runner executes my intent without the need to bind me to a specific version. The contract between NUnit versions is actually a bit more complex (think of AssertExceptions) but this is also handled nicely by using not the concrete type but simply to check for the catched exception type by string. What can we learn from this? Versioning can be easy if the contract is small and the users of your library use it in a declarative way (Attributes). Everything beyond it will force you to reference several versions of the same assembly with all its consequences. Type equality is lost between versions so none of your casts will work. That means that you cannot simply use IBigInterface in two versions. You will need a wrapper to call the correct versioned one. To get out of this mess you can use one (and only one) version agnostic driver to encapsulate your business logic from the concrete versions. This is of course more work but as NUnit shows it can be easy. Simplicity is therefore not a nice thing to have but also requirement number one if you intend to make things more complex in version two and want to support any version (older and newer). Any interaction model above easy will not be maintainable. There are different approached to versioning. Below are my own personal observations how versioning works within the .NET Framwork and NUnit. Versioning Models 1. Bug Fixing and New Isolated Features When you only need to fix bugs there is no need to break anything. This is especially true when you have a big API surface. Microsoft did this with the .NET Framework 3.0 which did leave the CLR as is but delivered new assemblies for the features WPF, WCF and Windows Workflow Foundations. Their basic model was that the .NET 2.0 assemblies were declared as red assemblies which must not change (well mostly but each change was carefully reviewed to minimize the risk of breaking changes as much as possible) whereas the new green assemblies of .NET 3,3.5 did not have such obligations since they did implement new unrelated features which did not have any impact on the red assemblies. This is versioning strategy aimed at maximum compatibility and the delivery of new unrelated features. If you have a big API surface you should strive hard to do the same or you will break your customers code with every release. 2. New Breaking Features There are times when really new things need to be added to an existing product. The .NET Framework 4.0 did change the CLR in many ways which caused subtle different behavior although the API´s remained largely unchanged. Sometimes it is possible to simply recompile an application to make it work (e.g. changed method signature void Func() –> bool Func()) but behavioral changes need much more thought and cannot be automated. To minimize the impact .NET 2.0,3.0,3.5 applications will not automatically use the .NET 4.0 runtime when installed but they will keep using the “old” one. What is interesting is that a side by side execution model of both CLR versions (2 and 4) within one process is possible. Key to success was total isolation. You will have 2 GCs, 2 JIT compilers, 2 finalizer threads within one process. The two .NET runtimes cannot talk (except via the usual IPC mechanisms) to each other. Both runtimes share nothing and run independently within the same process. This enables Explorer plugins written for the CLR 2.0 to work even when a CLR 4 plugin is already running inside the Explorer process. The price for isolation is an increased memory footprint because everything is loaded and running two times. 3. New Non Breaking Features It really depends where you break things. NUnit has evolved and many different Assert, Expect… methods have been added. These changes are all localized in the NUnit.Framework assembly which can be easily extended. As long as the test execution contract (TestFixture, Test, AssertException) remains stable it is possible to write test executors which can run tests written for NUnit 10 because the execution contract has not changed. It is possible to write software which executes other components in a version independent way but this is only feasible if the interaction model is relatively simple. Versioning software is hard and it looks like it will remain hard since you suddenly work in a severely constrained environment when you try to innovate and to keep everything backwards compatible at the same time. These are contradicting goals and do not play well together. The easiest way out of this is to carefully watch what your customers are doing with your software. Minimizing the impact is much easier when you do not need to guess how many people will be broken when this or that is removed.

Read the article
Performance Optimization – It Is Faster When You Can Measure It

- by Alois Kraus

Performance optimization in bigger systems is hard because the measured numbers can vary greatly depending on the measurement method of your choice. To measure execution timing of specific methods in your application you usually use Time Measurement Method Potential Pitfalls Stopwatch Most accurate method on recent processors. Internally it uses the RDTSC instruction. Since the counter is processor specific you can get greatly different values when your thread is scheduled to another core or the core goes into a power saving mode. But things do change luckily: Intel's Designer's vol3b, section 16.11.1 "16.11.1 Invariant TSC The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource." DateTime.Now Good but it has only a resolution of 16ms which can be not enough if you want more accuracy. Reporting Method Potential Pitfalls Console.WriteLine Ok if not called too often. Debug.Print Are you really measuring performance with Debug Builds? Shame on you. Trace.WriteLine Better but you need to plug in some good output listener like a trace file. But be aware that the first time you call this method it will read your app.config and deserialize your system.diagnostics section which does also take time. In general it is a good idea to use some tracing library which does measure the timing for you and you only need to decorate some methods with tracing so you can later verify if something has changed for the better or worse. In my previous article I did compare measuring performance with quantum mechanics. This analogy does work surprising well. When you measure a quantum system there is a lower limit how accurately you can measure something. The Heisenberg uncertainty relation does tell us that you cannot measure of a quantum system the impulse and location of a particle at the same time with infinite accuracy. For programmers the two variables are execution time and memory allocations. If you try to measure the timings of all methods in your application you will need to store them somewhere. The fastest storage space besides the CPU cache is the memory. But if your timing values do consume all available memory there is no memory left for the actual application to run. On the other hand if you try to record all memory allocations of your application you will also need to store the data somewhere. This will cost you memory and execution time. These constraints are always there and regardless how good the marketing of tool vendors for performance and memory profilers are: Any measurement will disturb the system in a non predictable way. Commercial tool vendors will tell you they do calculate this overhead and subtract it from the measured values to give you the most accurate values but in reality it is not entirely true. After falling into the trap to trust the profiler timings several times I have got into the habit to Measure with a profiler to get an idea where potential bottlenecks are. Measure again with tracing only the specific methods to check if this method is really worth optimizing. Optimize it Measure again. Be surprised that your optimization has made things worse. Think harder Implement something that really works. Measure again Finished! - Or look for the next bottleneck. Recently I have looked into issues with serialization performance. For serialization DataContractSerializer was used and I was not sure if XML is really the most optimal wire format. After looking around I have found protobuf-net which uses Googles Protocol Buffer format which is a compact binary serialization format. What is good for Google should be good for us. A small sample app to check out performance was a matter of minutes: using ProtoBuf; using System; using System.Diagnostics; using System.IO; using System.Reflection; using System.Runtime.Serialization; [DataContract, Serializable] class Data { [DataMember(Order=1)] public int IntValue { get; set; } [DataMember(Order = 2)] public string StringValue { get; set; } [DataMember(Order = 3)] public bool IsActivated { get; set; } [DataMember(Order = 4)] public BindingFlags Flags { get; set; } } class Program { static MemoryStream _Stream = new MemoryStream(); static MemoryStream Stream { get { _Stream.Position = 0; _Stream.SetLength(0); return _Stream; } } static void Main(string[] args) { DataContractSerializer ser = new DataContractSerializer(typeof(Data)); Data data = new Data { IntValue = 100, IsActivated = true, StringValue = "Hi this is a small string value to check if serialization does work as expected" }; var sw = Stopwatch.StartNew(); int Runs = 1000 * 1000; for (int i = 0; i < Runs; i++) { //ser.WriteObject(Stream, data); Serializer.Serialize<Data>(Stream, data); } sw.Stop(); Console.WriteLine("Did take {0:N0}ms for {1:N0} objects", sw.Elapsed.TotalMilliseconds, Runs); Console.ReadLine(); } } The results are indeed promising: Serializer Time in ms N objects protobuf-net 807 1000000 DataContract 4402 1000000 Nearly a factor 5 faster and a much more compact wire format. Lets use it! After switching over to protbuf-net the transfered wire data has dropped by a factor two (good) and the performance has worsened by nearly a factor two. How is that possible? We have measured it? Protobuf-net is much faster! As it turns out protobuf-net is faster but it has a cost: For the first time a type is de/serialized it does use some very smart code-gen which does not come for free. Lets try to measure this one by setting of our performance test app the Runs value not to one million but to 1. Serializer Time in ms N objects protobuf-net 85 1 DataContract 24 1 The code-gen overhead is significant and can take up to 200ms for more complex types. The break even point where the code-gen cost is amortized by its faster serialization performance is (assuming small objects) somewhere between 20.000-40.000 serialized objects. As it turned out my specific scenario involved about 100 types and 1000 serializations in total. That explains why the good old DataContractSerializer is not so easy to take out of business. The final approach I ended up was to reduce the number of types and to serialize primitive types via BinaryWriter directly which turned out to be a pretty good alternative. It sounded good until I measured again and found that my optimizations so far do not help much. After looking more deeper at the profiling data I did found that one of the 1000 calls did take 50% of the time. So how do I find out which call it was? Normal profilers do fail short at this discipline. A (totally undeserved) relatively unknown profiler is SpeedTrace which does unlike normal profilers create traces of your applications by instrumenting your IL code at runtime. This way you can look at the full call stack of the one slow serializer call to find out if this stack was something special. Unfortunately the call stack showed nothing special. But luckily I have my own tracing as well and I could see that the slow serializer call did happen during the serialization of a bool value. When you encounter after much analysis something unreasonable you cannot explain it then the chances are good that your thread was suspended by the garbage collector. If there is a problem with excessive GCs remains to be investigated but so far the serialization performance seems to be mostly ok. When you do profile a complex system with many interconnected processes you can never be sure that the timings you just did measure are accurate at all. Some process might be hitting the disc slowing things down for all other processes for some seconds as well. There is a big difference between warm and cold startup. If you restart all processes you can basically forget the first run because of the OS disc cache, JIT and GCs make the measured timings very flexible. When you are in need of a random number generator you should measure cold startup times of a sufficiently complex system. After the first run you can try again getting different and much lower numbers. Now try again at least two times to get some feeling how stable the numbers are. Oh and try to do the same thing the next day. It might be that the bottleneck you found yesterday is gone today. Thanks to GC and other random stuff it can become pretty hard to find stuff worth optimizing if no big bottlenecks except bloatloads of code are left anymore. When I have found a spot worth optimizing I do make the code changes and do measure again to check if something has changed. If it has got slower and I am certain that my change should have made it faster I can blame the GC again. The thing is that if you optimize stuff and you allocate less objects the GC times will shift to some other location. If you are unlucky it will make your faster working code slower because you see now GCs at times where none were before. This is where the stuff does get really tricky. A safe escape hatch is to create a repro of the slow code in an isolated application so you can change things fast in a reliable manner. Then the normal profilers do also start working again. As Vance Morrison does point out it is much more complex to profile a system against the wall clock compared to optimize for CPU time. The reason is that for wall clock time analysis you need to understand how your system does work and which threads (if you have not one but perhaps 20) are causing a visible delay to the end user and which threads can wait a long time without affecting the user experience at all. Next time: Commercial profiler shootout.

Read the article
Refactor This (Ugly Code)!

- by Alois Kraus

Ayende has put on his blog some ugly code to refactor. First and foremost it is nearly impossible to reason about other peoples code without knowing the driving forces behind the current code. It is certainly possible to make it much cleaner when potential sources of errors cannot happen in the first place due to good design. I can see what the intention of the code is but I do not know about every brittle detail if I am allowed to reorder things here and there to simplify things. So I decided to make it much simpler by identifying the different responsibilities of the methods and encapsulate it in different classes. The code we need to refactor seems to deal with a handler after a message has been sent to a message queue. The handler does complete the current transaction if there is any and does handle any errors happening there. If during the the completion of the transaction errors occur the transaction is at least disposed. We can enter the handler already in a faulty state where we try to deliver the complete event in any case and signal a failure event and try to resend the message again to the queue if it was not inside a transaction. All is decorated with many try/catch blocks, duplicated code and some state variables to route the program flow. It is hard to understand and difficult to reason about. In other words: This code is a mess and could be written by me if I was under pressure. Here comes to code we want to refactor: private void HandleMessageCompletion( Message message, TransactionScope tx, OpenedQueue messageQueue, Exception exception, Action<CurrentMessageInformation, Exception> messageCompleted, Action<CurrentMessageInformation> beforeTransactionCommit) { var txDisposed = false; if (exception == null) { try { if (tx != null) { if (beforeTransactionCommit != null) beforeTransactionCommit(currentMessageInformation); tx.Complete(); tx.Dispose(); txDisposed = true; } try { if (messageCompleted != null) messageCompleted(currentMessageInformation, exception); } catch (Exception e) { Trace.TraceError("An error occured when raising the MessageCompleted event, the error will NOT affect the message processing"+ e); } return; } catch (Exception e) { Trace.TraceWarning("Failed to complete transaction, moving to error mode"+ e); exception = e; } } try { if (txDisposed == false && tx != null) { Trace.TraceWarning("Disposing transaction in error mode"); tx.Dispose(); } } catch (Exception e) { Trace.TraceWarning("Failed to dispose of transaction in error mode."+ e); } if (message == null) return; try { if (messageCompleted != null) messageCompleted(currentMessageInformation, exception); } catch (Exception e) { Trace.TraceError("An error occured when raising the MessageCompleted event, the error will NOT affect the message processing"+ e); } try { var copy = MessageProcessingFailure; if (copy != null) copy(currentMessageInformation, exception); } catch (Exception moduleException) { Trace.TraceError("Module failed to process message failure: " + exception.Message+ moduleException); } if (messageQueue.IsTransactional == false)// put the item back in the queue { messageQueue.Send(message); } } You can see quite some processing and handling going on there. Yes this looks like real world code one did put together to make things work and he does not trust his callbacks. I guess these are event handlers which are optional and the delegates were extracted from an event to call them back later when necessary. Lets see what the author of this code did intend: private void HandleMessageCompletion( TransactionHandler transactionHandler, MessageCompletionHandler handler, CurrentMessageInformation messageInfo, ErrorCollector errors ) { // commit current pending transaction transactionHandler.CallHandlerAndCommit(messageInfo, errors); // We have an error for a null message do not send completion event if (messageInfo.CurrentMessage == null) return; // Send completion event in any case regardless of errors handler.OnMessageCompleted(messageInfo, errors); // put message back if queue is not transactional transactionHandler.ResendMessageOnError(messageInfo.CurrentMessage, errors); } I did not bother to write the intention here again since the code should be pretty self explaining by now. I have used comments to explain the still nontrivial procedure step by step revealing the real intention about all this complex program flow. The original complexity of the problem domain does not go away but by applying the techniques of SRP (Single Responsibility Principle) and some functional style but we can abstract the necessary complexity away in useful abstractions which make it much easier to reason about it. Since most of the method seems to deal with errors I thought it was a good idea to encapsulate the error state of our current message in an ErrorCollector object which stores all exceptions in a list along with a description what the error all was about in the exception itself. We can log it later or not depending on the log level or whatever. It is really just a simple list that encapsulates the current error state. class ErrorCollector { List<Exception> _Errors = new List<Exception>(); public void Add(Exception ex, string description) { ex.Data["Description"] = description; _Errors.Add(ex); } public Exception Last { get { return _Errors.LastOrDefault(); } } public bool HasError { get { return _Errors.Count > 0; } } } Since the error state is global we have two choices to store a reference in the other helper objects (TransactionHandler and MessageCompletionHandler)or pass it to the method calls when necessary. I did chose the latter one because a second argument does not hurt and makes it easier to reason about the overall state while the helper objects remain stateless and immutable which makes the helper objects much easier to understand and as a bonus thread safe as well. This does not mean that the stored member variables are stateless or thread safe as well but at least our helper classes are it. Most of the complexity is located the transaction handling I consider as a separate responsibility that I delegate to the TransactionHandler which does nothing if there is no transaction or Call the Before Commit Handler Commit Transaction Dispose Transaction if commit did throw In fact it has a second responsibility to resend the message if the transaction did fail. I did see a good fit there since it deals with transaction failures. class TransactionHandler { TransactionScope _Tx; Action<CurrentMessageInformation> _BeforeCommit; OpenedQueue _MessageQueue; public TransactionHandler(TransactionScope tx, Action<CurrentMessageInformation> beforeCommit, OpenedQueue messageQueue) { _Tx = tx; _BeforeCommit = beforeCommit; _MessageQueue = messageQueue; } public void CallHandlerAndCommit(CurrentMessageInformation currentMessageInfo, ErrorCollector errors) { if (_Tx != null && !errors.HasError) { try { if (_BeforeCommit != null) { _BeforeCommit(currentMessageInfo); } _Tx.Complete(); _Tx.Dispose(); } catch (Exception ex) { errors.Add(ex, "Failed to complete transaction, moving to error mode"); Trace.TraceWarning("Disposing transaction in error mode"); try { _Tx.Dispose(); } catch (Exception ex2) { errors.Add(ex2, "Failed to dispose of transaction in error mode."); } } } } public void ResendMessageOnError(Message message, ErrorCollector errors) { if (errors.HasError && !_MessageQueue.IsTransactional) { _MessageQueue.Send(message); } } } If we need to change the handling in the future we have a much easier time to reason about our application flow than before. After we did complete our transaction and called our callback we can call the completion handler which is the main purpose of the HandleMessageCompletion method after all. The responsiblity o the MessageCompletionHandler is to call the completion callback and the failure callback when some error has occurred. class MessageCompletionHandler { Action<CurrentMessageInformation, Exception> _MessageCompletedHandler; Action<CurrentMessageInformation, Exception> _MessageProcessingFailure; public MessageCompletionHandler(Action<CurrentMessageInformation, Exception> messageCompletedHandler, Action<CurrentMessageInformation, Exception> messageProcessingFailure) { _MessageCompletedHandler = messageCompletedHandler; _MessageProcessingFailure = messageProcessingFailure; } public void OnMessageCompleted(CurrentMessageInformation currentMessageInfo, ErrorCollector errors) { try { if (_MessageCompletedHandler != null) { _MessageCompletedHandler(currentMessageInfo, errors.Last); } } catch (Exception ex) { errors.Add(ex, "An error occured when raising the MessageCompleted event, the error will NOT affect the message processing"); } if (errors.HasError) { SignalFailedMessage(currentMessageInfo, errors); } } void SignalFailedMessage(CurrentMessageInformation currentMessageInfo, ErrorCollector errors) { try { if (_MessageProcessingFailure != null) _MessageProcessingFailure(currentMessageInfo, errors.Last); } catch (Exception moduleException) { errors.Add(moduleException, "Module failed to process message failure"); } } } If for some reason I did screw up the logic and we need to call the completion handler from our Transaction handler we can simple add to the CallHandlerAndCommit method a third argument to the MessageCompletionHandler and we are fine again. If the logic becomes even more complex and we need to ensure that the completed event is triggered only once we have now one place the completion handler to capture the state. During this refactoring I simple put things together that belong together and came up with useful abstractions. If you look at the original argument list of the HandleMessageCompletion method I have put many things together: Original Arguments New Arguments Encapsulate Message message CurrentMessageInformation messageInfo Message message TransactionScope tx Action<CurrentMessageInformation> beforeTransactionCommit OpenedQueue messageQueue TransactionHandler transactionHandler TransactionScope tx OpenedQueue messageQueue Action<CurrentMessageInformation> beforeTransactionCommit Exception exception, ErrorCollector errors Action<CurrentMessageInformation, Exception> messageCompleted MessageCompletionHandler handler Action<CurrentMessageInformation, Exception> messageCompleted Action<CurrentMessageInformation, Exception> messageProcessingFailure The reason is simple: Put the things that have relationships together and you will find nearly automatically useful abstractions. I hope this makes sense to you. If you see a way to make it even more simple you can show Ayende your improved version as well.

Read the article
Do Not Optimize Without Measuring

- by Alois Kraus

Recently I had to do some performance work which included reading a lot of code. It is fascinating with what ideas people come up to solve a problem. Especially when there is no problem. When you look at other peoples code you will not be able to tell if it is well performing or not by reading it. You need to execute it with some sort of tracing or even better under a profiler. The first rule of the performance club is not to think and then to optimize but to measure, think and then optimize. The second rule is to do this do this in a loop to prevent slipping in bad things for too long into your code base. If you skip for some reason the measure step and optimize directly it is like changing the wave function in quantum mechanics. This has no observable effect in our world since it does represent only a probability distribution of all possible values. In quantum mechanics you need to let the wave function collapse to a single value. A collapsed wave function has therefore not many but one distinct value. This is what we physicists call a measurement. If you optimize your application without measuring it you are just changing the probability distribution of your potential performance values. Which performance your application actually has is still unknown. You only know that it will be within a specific range with a certain probability. As usual there are unlikely values within your distribution like a startup time of 20 minutes which should only happen once in 100 000 years. 100 000 years are a very short time when the first customer tries your heavily distributed networking application to run over a slow WIFI network… What is the point of this? Every programmer/architect has a mental performance model in his head. A model has always a set of explicit preconditions and a lot more implicit assumptions baked into it. When the model is good it will help you to think of good designs but it can also be the source of problems. In real world systems not all assumptions of your performance model (implicit or explicit) hold true any longer. The only way to connect your performance model and the real world is to measure it. In the WIFI example the model did assume a low latency high bandwidth LAN connection. If this assumption becomes wrong the system did have a drastic change in startup time. Lets look at a example. Lets assume we want to cache some expensive UI resource like fonts objects. For this undertaking we do create a Cache class with the UI themes we want to support. Since Fonts are expensive objects we do create it on demand the first time the theme is requested. A simple example of a Theme cache might look like this: using System; using System.Collections.Generic; using System.Drawing; struct Theme { public Color Color; public Font Font; } static class ThemeCache { static Dictionary<string, Theme> _Cache = new Dictionary<string, Theme> { {"Default", new Theme { Color = Color.AliceBlue }}, {"Theme12", new Theme { Color = Color.Aqua }}, }; public static Theme Get(string theme) { Theme cached = _Cache[theme]; if (cached.Font == null) { Console.WriteLine("Creating new font"); cached.Font = new Font("Arial", 8); } return cached; } } class Program { static void Main(string[] args) { Theme item = ThemeCache.Get("Theme12"); item = ThemeCache.Get("Theme12"); } } This cache does create font objects only once since on first retrieve of the Theme object the font is added to the Theme object. When we let the application run it should print “Creating new font” only once. Right? Wrong! The vigilant readers have spotted the issue already. The creator of this cache class wanted to get maximum performance. So he decided that the Theme object should be a value type (struct) to not put too much pressure on the garbage collector. The code Theme cached = _Cache[theme]; if (cached.Font == null) { Console.WriteLine("Creating new font"); cached.Font = new Font("Arial", 8); } does work with a copy of the value stored in the dictionary. This means we do mutate a copy of the Theme object and return it to our caller. But the original Theme object in the dictionary will have always null for the Font field! The solution is to change the declaration of struct Theme to class Theme or to update the theme object in the dictionary. Our cache as it is currently is actually a non caching cache. The funny thing was that I found out with a profiler by looking at which objects where finalized. I found way too many font objects to be finalized. After a bit debugging I found the allocation source for Font objects was this cache. Since this cache was there for years it means that the cache was never needed since I found no perf issue due to the creation of font objects. the cache was never profiled if it did bring any performance gain. to make the cache beneficial it needs to be accessed much more often. That was the story of the non caching cache. Next time I will write something something about measuring.

Read the article
Find The Bug

- by Alois Kraus

What does this code print and why? HashSet<int> set = new HashSet<int>(); int[] data = new int[] { 1, 2, 1, 2 }; var unique = from i in data where set.Add(i) select i; // Compiles to: var unique = Enumerable.Where(data, (i) => set.Add(i)); foreach (var i in unique) { Console.WriteLine("First: {0}", i); } foreach (var i in unique) { Console.WriteLine("Second: {0}", i); } The output is: First: 1 First: 2 Why is there no output of the second loop? The reason is that LINQ does not cache the results of the collection but it does recalculate the contents for every new enumeration again. Since I have used state (the Hashset does decide which entries are part of the output) I do arrive with an empty sequence since Add of the Hashset will return false for all values I have already passed in leaving nothing to return a second time. The solution is quite simple: Use the Distinct extension method or cache the results by calling .ToList() or ToArray() for the result of the LINQ query. Lession Learned: Do never forget to think about state in Where clauses!

Read the article
Why Is Faulty Behaviour In The .NET Framework Not Fixed?

- by Alois Kraus

Here is the scenario: You have a Windows Form Application that calls a method via Invoke or BeginInvoke which throws exceptions. Now you want to find out where the error did occur and how the method has been called. Here is the output we do get when we call Begin/EndInvoke or simply Invoke The actual code that was executed was like this: private void cInvoke_Click(object sender, EventArgs e) { InvokingFunction(CallMode.Invoke); } [MethodImpl(MethodImplOptions.NoInlining)] void InvokingFunction(CallMode mode) { switch (mode) { case CallMode.Invoke: this.Invoke(new MethodInvoker(GenerateError)); The faulting method is called GenerateError which does throw a NotImplementedException exception and wraps it in a NotSupportedException. [MethodImpl(MethodImplOptions.NoInlining)] void GenerateError() { F1(); } private void F1() { try { F2(); } catch (Exception ex) { throw new NotSupportedException("Outer Exception", ex); } } private void F2() { throw new NotImplementedException("Inner Exception"); } It is clear that the method F2 and F1 did actually throw these exceptions but we do not see them in the call stack. If we directly call the InvokingFunction and catch and print the exception we can find out very easily how we did get into this situation. We see methods F1,F2,GenerateError and InvokingFunction directly in the stack trace and we see that actually two exceptions did occur. Here is for comparison what we get from Invoke/EndInvoke System.NotImplementedException: Inner Exception StackTrace: at System.Windows.Forms.Control.MarshaledInvoke(Control caller, Delegate method, Object[] args, Boolean synchronous) at System.Windows.Forms.Control.Invoke(Delegate method, Object[] args) at WindowsFormsApplication1.AppForm.InvokingFunction(CallMode mode) at WindowsFormsApplication1.AppForm.cInvoke_Click(Object sender, EventArgs e) at System.Windows.Forms.Control.OnClick(EventArgs e) at System.Windows.Forms.Button.OnClick(EventArgs e) The exception message is kept but the stack starts running from our Invoke call and not from the faulting method F2. We have therefore no clue where this exception did occur! The stack starts running at the method MarshaledInvoke because the exception is rethrown with the throw catchedException which resets the stack trace. That is bad but things are even worse because if previously lets say 5 exceptions did occur .NET will return only the first (innermost) exception. That does mean that we do not only loose the original call stack but all other exceptions and all data contained therein as well. It is a pity that MS does know about this and simply closes this issue as not important. Programmers will play a lot more around with threads than before thanks to TPL, PLINQ that do come with .NET 4. Multithreading is hyped quit a lot in the press and everybody wants to use threads. But if the .NET Framework makes it nearly impossible to track down the easiest UI multithreading issue I have a problem with that. The problem has been reported but obviously not been solved. .NET 4 Beta 2 did not have changed that dreaded GetBaseException call in MarshaledInvoke to return only the innermost exception of the complete exception stack. It is really time to fix this. WPF on the other hand does the right thing and wraps the exceptions inside a TargetInvocationException which makes much more sense. But Not everybody uses WPF for its daily work and Windows forms applications will still be used for a long time. Below is the code to repro the issues shown and how the exceptions can be rendered in a meaningful way. The default Exception.ToString implementation generates a hard to interpret stack if several nested exceptions did occur. using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Windows.Forms; using System.Threading; using System.Globalization; using System.Runtime.CompilerServices; namespace WindowsFormsApplication1 { public partial class AppForm : Form { enum CallMode { Direct = 0, BeginInvoke = 1, Invoke = 2 }; public AppForm() { InitializeComponent(); Thread.CurrentThread.CurrentUICulture = CultureInfo.InvariantCulture; Application.ThreadException += new System.Threading.ThreadExceptionEventHandler(Application_ThreadException); } void Application_ThreadException(object sender, System.Threading.ThreadExceptionEventArgs e) { cOutput.Text = PrintException(e.Exception, 0, null).ToString(); } private void cDirectUnhandled_Click(object sender, EventArgs e) { InvokingFunction(CallMode.Direct); } private void cDirectCall_Click(object sender, EventArgs e) { try { InvokingFunction(CallMode.Direct); } catch (Exception ex) { cOutput.Text = PrintException(ex, 0, null).ToString(); } } private void cInvoke_Click(object sender, EventArgs e) { InvokingFunction(CallMode.Invoke); } private void cBeginInvokeCall_Click(object sender, EventArgs e) { InvokingFunction(CallMode.BeginInvoke); } [MethodImpl(MethodImplOptions.NoInlining)] void InvokingFunction(CallMode mode) { switch (mode) { case CallMode.Direct: GenerateError(); break; case CallMode.Invoke: this.Invoke(new MethodInvoker(GenerateError)); break; case CallMode.BeginInvoke: IAsyncResult res = this.BeginInvoke(new MethodInvoker(GenerateError)); this.EndInvoke(res); break; } } [MethodImpl(MethodImplOptions.NoInlining)] void GenerateError() { F1(); } private void F1() { try { F2(); } catch (Exception ex) { throw new NotSupportedException("Outer Exception", ex); } } private void F2() { throw new NotImplementedException("Inner Exception"); } StringBuilder PrintException(Exception ex, int identLevel, StringBuilder sb) { StringBuilder builtStr = sb; if( builtStr == null ) builtStr = new StringBuilder(); if( ex == null ) return builtStr; WriteLine(builtStr, String.Format("{0}: {1}", ex.GetType().FullName, ex.Message), identLevel); WriteLine(builtStr, String.Format("StackTrace: {0}", ShortenStack(ex.StackTrace)), identLevel + 1); builtStr.AppendLine(); return PrintException(ex.InnerException, ++identLevel, builtStr); } void WriteLine(StringBuilder sb, string msg, int identLevel) { foreach (string trimmedLine in SplitToLines(msg) .Select( (line) => line.Trim()) ) { for (int i = 0; i < identLevel; i++) sb.Append('\t'); sb.Append(trimmedLine); sb.AppendLine(); } } string ShortenStack(string stack) { int nonAppFrames = 0; // Skip stack frames not part of our app but include two foreign frames and skip the rest // If our stack frame is encountered reset counter to 0 return SplitToLines(stack) .Where((line) => { nonAppFrames = line.Contains("WindowsFormsApplication1") ? 0 : nonAppFrames + 1; return nonAppFrames < 3; }) .Select((line) => line) .Aggregate("", (current, line) => current + line + Environment.NewLine); } static char[] NewLines = Environment.NewLine.ToCharArray(); string[] SplitToLines(string str) { return str.Split(NewLines, StringSplitOptions.RemoveEmptyEntries); } } }

Read the article
Exception Handling Differences Between 32/64 Bit

- by Alois Kraus

I do quite a bit of debugging .NET applications but from time to time I see things that are impossible (at a first look). I may ask you dear reader what your mental exception handling model is. Exception handling is easy after all right? Lets suppose the following code: private void F1(object sender, EventArgs e) { try { F2(); } catch (Exception ex) { throw new Exception("even worse Exception"); } } private void F2() { try { F3(); } finally { throw new Exception("other exception"); } } private void F3() { throw new NotImplementedException(); } What will the call stack look like when you break into the catch(Exception) clause in Windbg (32 and 64 bit on .NET 3.5 SP1)? The mental model I have is that when an exception is thrown the stack frames are unwound until the catch handler can execute. An exception does propagate the call chain upwards. So when F3 does throw an exception the control flow will resume at the finally handler in F2 which does throw another exception hiding the original one (that is nasty) and then the new Exception will be catched in F1 where the catch handler is executed. So we should see in the catch handler in F1 as call stack only the F1 stack frame right? Well lets try it out in Windbg. For this I created a simple Windows Forms application with one button which does execute the F1 method in its click handler. When you compile the application for 64 bit and the catch handler is reached you will find with the following commands in Windbg Load sos extension from the same path where mscorwks was loaded in the current process .loadby sos mscorwks Beak on clr exceptions sxe clr Continue execution g Dump mixed call stack container C++ and .NET Stacks interleaved 0:000> !DumpStack OS Thread Id: 0x1d8 (0) Child-SP RetAddr Call Site 00000000002c88c0 000007fefa68f0bd KERNELBASE!RaiseException+0x39 00000000002c8990 000007fefac42ed0 mscorwks!RaiseTheExceptionInternalOnly+0x295 00000000002c8a60 000007ff005dd7f4 mscorwks!JIT_Throw+0x130 00000000002c8c10 000007fefa6942e1 WindowsFormsApplication1!WindowsFormsApplication1.Form1.F1(System.Object, System.EventArgs)+0xb4 00000000002c8c60 000007fefa661012 mscorwks!ExceptionTracker::CallHandler+0x145 00000000002c8d60 000007fefa711a72 mscorwks!ExceptionTracker::CallCatchHandler+0x9e 00000000002c8df0 0000000077b055cd mscorwks!ProcessCLRException+0x25e 00000000002c8e90 0000000077ae55f8 ntdll!RtlpExecuteHandlerForUnwind+0xd 00000000002c8ec0 000007fefa637c1a ntdll!RtlUnwindEx+0x539 00000000002c9560 000007fefa711a21 mscorwks!ClrUnwindEx+0x36 00000000002c9a70 0000000077b0554d mscorwks!ProcessCLRException+0x20d 00000000002c9b10 0000000077ae5d1c ntdll!RtlpExecuteHandlerForException+0xd 00000000002c9b40 0000000077b1fe48 ntdll!RtlDispatchException+0x3cb 00000000002ca220 000007fefdaeaa7d ntdll!KiUserExceptionDispatcher+0x2e 00000000002ca7e0 000007fefa68f0bd KERNELBASE!RaiseException+0x39 00000000002ca8b0 000007fefac42ed0 mscorwks!RaiseTheExceptionInternalOnly+0x295 00000000002ca980 000007ff005dd8df mscorwks!JIT_Throw+0x130 00000000002cab30 000007fefa6942e1 WindowsFormsApplication1!WindowsFormsApplication1.Form1.F2()+0x9f 00000000002cab80 000007fefa71b5b3 mscorwks!ExceptionTracker::CallHandler+0x145 00000000002cac80 000007fefa70dcd0 mscorwks!ExceptionTracker::ProcessManagedCallFrame+0x683 00000000002caed0 000007fefa7119af mscorwks!ExceptionTracker::ProcessOSExceptionNotification+0x430 00000000002cbd90 0000000077b055cd mscorwks!ProcessCLRException+0x19b 00000000002cbe30 0000000077ae55f8 ntdll!RtlpExecuteHandlerForUnwind+0xd 00000000002cbe60 000007fefa637c1a ntdll!RtlUnwindEx+0x539 00000000002cc500 000007fefa711a21 mscorwks!ClrUnwindEx+0x36 00000000002cca10 0000000077b0554d mscorwks!ProcessCLRException+0x20d 00000000002ccab0 0000000077ae5d1c ntdll!RtlpExecuteHandlerForException+0xd 00000000002ccae0 0000000077b1fe48 ntdll!RtlDispatchException+0x3cb 00000000002cd1c0 000007fefdaeaa7d ntdll!KiUserExceptionDispatcher+0x2e 00000000002cd780 000007fefa68f0bd KERNELBASE!RaiseException+0x39 00000000002cd850 000007fefac42ed0 mscorwks!RaiseTheExceptionInternalOnly+0x295 00000000002cd920 000007ff005dd968 mscorwks!JIT_Throw+0x130 00000000002cdad0 000007ff005dd875 WindowsFormsApplication1!WindowsFormsApplication1.Form1.F3()+0x48 00000000002cdb10 000007ff005dd786 WindowsFormsApplication1!WindowsFormsApplication1.Form1.F2()+0x35 00000000002cdb60 000007ff005dbe6a WindowsFormsApplication1!WindowsFormsApplication1.Form1.F1(System.Object, System.EventArgs)+0x46 00000000002cdbc0 000007ff005dd452 System_Windows_Forms!System.Windows.Forms.Control.OnClick(System.EventArgs)+0x5a Hm okaaay. I see my method F1 two times in this call stack. Looks like we did get some recursion bug. But that can´t be given the obvious code above. Let´s try the same thing in a 32 bit process. 0:000> !DumpStack OS Thread Id: 0x33e4 (0) Current frame: KERNELBASE!RaiseException+0x58 ChildEBP RetAddr Caller,Callee 0028ed38 767db727 KERNELBASE!RaiseException+0x58, calling ntdll!RtlRaiseException 0028ed4c 68b9008c mscorwks!Binder::RawGetClass+0x20, calling mscorwks!Module::LookupTypeDef 0028ed5c 68b904ff mscorwks!Binder::IsClass+0x23, calling mscorwks!Binder::RawGetClass 0028ed68 68bfb96f mscorwks!Binder::IsException+0x14, calling mscorwks!Binder::IsClass 0028ed78 68bfb996 mscorwks!IsExceptionOfType+0x23, calling mscorwks!Binder::IsException 0028ed80 68bfbb1c mscorwks!RaiseTheExceptionInternalOnly+0x2a8, calling KERNEL32!RaiseExceptionStub 0028eda8 68ba0713 mscorwks!Module::ResolveStringRef+0xe0, calling mscorwks!BaseDomain::GetStringObjRefPtrFromUnicodeString 0028edc8 68b91e8d mscorwks!SetObjectReferenceUnchecked+0x19 0028ede0 68c8e910 mscorwks!JIT_Throw+0xfc, calling mscorwks!RaiseTheExceptionInternalOnly 0028ee44 68c8e734 mscorwks!JIT_StrCns+0x22, calling mscorwks!LazyMachStateCaptureState 0028ee54 68c8e865 mscorwks!JIT_Throw+0x1e, calling mscorwks!LazyMachStateCaptureState 0028eea4 02ffaecd (MethodDesc 0x7af08c +0x7d WindowsFormsApplication1.Form1.F1(System.Object, System.EventArgs)), calling mscorwks!JIT_Throw 0028eeec 02ffaf19 (MethodDesc 0x7af098 +0x29 WindowsFormsApplication1.Form1.F2()), calling 06370634 0028ef58 02ffae37 (MethodDesc 0x7a7bb0 +0x4f System.Windows.Forms.Control.OnClick(System.EventArgs)) That does look more familar. The call stack has been unwound and we do see only some frames into the history where the debugger was smart enough to find out that we have called F2 from F1. The exception handling on 64 bit systems does work quite differently which seems to have the nice property to remember the called methods not only during the first pass of exception filter clauses (during first pass all catch handler are called if they are going to catch the exception which is about to be thrown) but also when the actual stack unwind has taken place. This makes it possible to follow not only the call stack right at the moment but also to look into the “history” of the catch/finally clauses. In a 64 bit process you only need to look at the ExceptionTracker to find out if a catch or finally handler was called. The two frames ProcessManagedCallFrame/CallHandler does indicate a finally clause whereas CallCatchHandler/CallHandler indicates a catch clause. That was a interesting one. Oh and by the way if you manage to load the Microsoft symbols you can also find out the hidden exception which. When you encounter in the call stack a line 0016eb34 75b79617 KERNELBASE!RaiseException+0x58 ====> Exception Code e0434f4d cxr@16e850 exr@16e838 Then it is a good idea to execute .exr 16e838 !analyze –v to find out more. In the managed world it is even easier since we can dump the objects allocated on the stack which have not yet been garbage collected to look at former method parameters. The command !dso which is the abbreviation for dump stack objects will give you 0:000> !dso OS Thread Id: 0x46c (0) ESP/REG Object Name 0016dd4c 020737f0 System.Exception 0016dd98 020737f0 System.Exception 0016dda8 01f5c6cc System.Windows.Forms.Button 0016ddac 01f5d2b8 System.EventHandler 0016ddb0 02071744 System.Windows.Forms.MouseEventArgs 0016ddc0 01f5d2b8 System.EventHandler 0016ddcc 01f5c6cc System.Windows.Forms.Button 0016dddc 020737f0 System.Exception 0016dde4 01f5d2b8 System.EventHandler 0016ddec 02071744 System.Windows.Forms.MouseEventArgs 0016de40 020737f0 System.Exception 0016de80 02071744 System.Windows.Forms.MouseEventArgs 0016de8c 01f5d2b8 System.EventHandler 0016de90 01f5c6cc System.Windows.Forms.Button 0016df10 02073784 System.SByte[] 0016df5c 02073684 System.NotImplementedException 0016e2a0 02073684 System.NotImplementedException 0016e2e8 01ed69f4 System.Resources.ResourceManager From there it is easy to do 0:000> !pe 02073684 Exception object: 02073684 Exception type: System.NotImplementedException Message: Die Methode oder der Vorgang sind nicht implementiert. InnerException: <none> StackTrace (generated): SP IP Function 0016ECB0 006904AD WindowsFormsApplication2!WindowsFormsApplication2.Form1.F3()+0x35 0016ECC0 00690411 WindowsFormsApplication2!WindowsFormsApplication2.Form1.F2()+0x29 0016ECF0 0069038F WindowsFormsApplication2!WindowsFormsApplication2.Form1.F1(System.Object, System.EventArgs)+0x3f StackTraceString: <none> HResult: 80004001 to see the former exception. That´s all for today.

Read the article
The Return Of __FILE__ And __LINE__ In .NET 4.5

- by Alois Kraus

Good things are hard to kill. One of the most useful predefined compiler macros in C/C++ were __FILE__ and __LINE__ which do expand to the compilation units file name and line number where this value is encountered by the compiler. After 4.5 versions of .NET we are on par with C/C++ again. It is of course not a simple compiler expandable macro it is an attribute but it does serve exactly the same purpose. Now we do get CallerLineNumberAttribute == __LINE__ CallerFilePathAttribute == __FILE__ CallerMemberNameAttribute == __FUNCTION__ (MSVC Extension) The most important one is CallerMemberNameAttribute which is very useful to implement the INotifyPropertyChanged interface without the need to hard code the name of the property anymore. Now you can simply decorate your change method with the new CallerMemberName attribute and you get the property name as string directly inserted by the C# compiler at compile time. public string UserName { get { return _userName; } set { _userName=value; RaisePropertyChanged(); // no more RaisePropertyChanged(“UserName”)! } } protected void RaisePropertyChanged([CallerMemberName] string member = "") { var copy = PropertyChanged; if(copy != null) { copy(new PropertyChangedEventArgs(this, member)); } } Nice and handy. This was obviously the prime reason to implement this feature in the C# 5.0 compiler. You can repurpose this feature for tracing to get your hands on the method name of your caller along other stuff very fast now. All infos are added during compile time which is much faster than other approaches like walking the stack. The example on MSDN shows the usage of this attribute with an example public static void TraceMessage(string message, [CallerMemberName] string memberName = "", [CallerFilePath] string sourceFilePath = "", [CallerLineNumber] int sourceLineNumber = 0) { Console.WriteLine("Hi {0} {1} {2}({3})", message, memberName, sourceFilePath, sourceLineNumber); } When I do think of tracing I do usually want to have a API which allows me to Trace method enter and leave Trace messages with a severity like Info, Warning, Error When I do print a trace message it is very useful to print out method and type name as well. So your API must either be able to pass the method and type name as strings or extract it automatically via walking back one Stackframe and fetch the infos from there. The first glaring deficiency is that there is no CallerTypeAttribute yet because the C# compiler team was not satisfied with its performance. A usable Trace Api might therefore look like enum TraceTypes { None = 0, EnterLeave = 1 << 0, Info = 1 << 1, Warn = 1 << 2, Error = 1 << 3 } class Tracer : IDisposable { string Type; string Method; public Tracer(string type, string method) { Type = type; Method = method; if (IsEnabled(TraceTypes.EnterLeave,Type, Method)) { } } private bool IsEnabled(TraceTypes traceTypes, string Type, string Method) { // Do checking here if tracing is enabled return false; } public void Info(string fmt, params object[] args) { } public void Warn(string fmt, params object[] args) { } public void Error(string fmt, params object[] args) { } public static void Info(string type, string method, string fmt, params object[] args) { } public static void Warn(string type, string method, string fmt, params object[] args) { } public static void Error(string type, string method, string fmt, params object[] args) { } public void Dispose() { // trace method leave } } This minimal trace API is very fast but hard to maintain since you need to pass in the type and method name as hard coded strings which can change from time to time. But now we have at least CallerMemberName to rid of the explicit method parameter right? Not really. Since any acceptable usable trace Api should have a method signature like Tracexxx(… string fmt, params [] object args) we not able to add additional optional parameters after the args array. If we would put it before the format string we would need to make it optional as well which would mean the compiler would need to figure out what our trace message and arguments are (not likely) or we would need to specify everything explicitly just like before . There are ways around this by providing a myriad of overloads which in the end are routed to the very same method but that is ugly. I am not sure if nobody inside MS agrees that the above API is reasonable to have or (more likely) that the whole talk about you can use this feature for diagnostic purposes was not a core feature at all but a simple byproduct of making the life of INotifyPropertyChanged implementers easier. A way around this would be to allow for variable argument arrays after the params keyword another set of optional arguments which are always filled by the compiler but I do not know if this is an easy one. The thing I am missing much more is the not provided CallerType attribute. But not in the way you would think of. In the API above I did add some filtering based on method and type to stay as fast as possible for types where tracing is not enabled at all. It should be no more expensive than an additional method call and a bool variable check if tracing for this type is enabled at all. The data is tightly bound to the calling type and method and should therefore become part of the static type instance. Since extending the CLR type system for tracing is not something I do expect to happen I have come up with an alternative approach which allows me basically to attach run time data to any existing type object in super fast way. The key to success is the usage of generics. class Tracer<T> : IDisposable { string Method; public Tracer(string method) { if (TraceData<T>.Instance.Enabled.HasFlag(TraceTypes.EnterLeave)) { } } public void Dispose() { if (TraceData<T>.Instance.Enabled.HasFlag(TraceTypes.EnterLeave)) { } } public static void Info(string fmt, params object[] args) { } /// <summary> /// Every type gets its own instance with a fresh set of variables to describe the /// current filter status. /// </summary> /// <typeparam name="T"></typeparam> internal class TraceData<UsingType> { internal static TraceData<UsingType> Instance = new TraceData<UsingType>(); public bool IsInitialized = false; // flag if we need to reinit the trace data in case of reconfigured trace settings at runtime public TraceTypes Enabled = TraceTypes.None; // Enabled trace levels for this type } } We do not need to pass the type as string or Type object to the trace Api. Instead we define a generic Api that accepts the using type as generic parameter. Then we can create a TraceData static instance which is due to the nature of generics a fresh instance for every new type parameter. My tests on my home machine have shown that this approach is as fast as a simple bool flag check. If you have an application with many types using tracing you do not want to bring the app down by simply enabling tracing for one special rarely used type. The trace filter performance for the types which are not enabled must be therefore the fasted code path. This approach has the nice side effect that if you store the TraceData instances in one global list you can reconfigure tracing at runtime safely by simply setting the IsInitialized flag to false. A similar effect can be achieved with a global static Dictionary<Type,TraceData> object but big hash tables have random memory access semantics which is bad for cache locality and you always need to pay for the lookup which involves hash code generation, equality check and an indexed array access. The generic version is wicked fast and allows you to add more features to your tracing Api with minimal perf overhead. But it is cumbersome to write the generic type argument always explicitly and worse if you do refactor code and move parts of it to other classes it might be that you cannot configure tracing correctly. I would like therefore to decorate my type with an attribute [CallerType] class Tracer<T> : IDisposable to tell the compiler to fill in the generic type argument automatically. class Program { static void Main(string[] args) { using (var t = new Tracer()) // equivalent to new Tracer<Program>() { That would be really useful and super fast since you do not need to pass any type object around but you do have full type infos at hand. This change would be breaking if another non generic type exists in the same namespace where now the generic counterpart would be preferred. But this is an acceptable risk in my opinion since you can today already get conflicts if two generic types of the same name are defined in different namespaces. This would be only a variation of this issue. When you do think about this further you can add more features like to trace the exception in your Dispose method if the method is left with an exception with that little trick I did write some time ago. You can think of tracing as a super fast and configurable switch to write data to an output destination or to execute alternative actions. With such an infrastructure you can e.g. Reconfigure tracing at run time. Take a memory dump when a specific method is left with a specific exception. Throw an exception when a specific trace statement is hit (useful for testing error conditions). Execute a passed delegate which e.g. dumps additional state when enabled. Write data to an in memory ring buffer and dump it when specific events do occur (e.g. method is left with an exception, triggered from outside). Write data to an output device. …. This stuff is really useful to have when your code is in production on a mission critical server and you need to find the root cause of sporadic crashes of your application. It could be a buggy graphics card driver which throws access violations into your application (ok with .NET 4 not anymore except if you enable a compatibility flag) where you would like to have a minidump or you have reached after two weeks of operation a state where you need a full memory dump at a specific point in time in the middle of an transaction. At my older machine I do get with this super fast approach 50 million traces/s when tracing is disabled. When I do know that tracing is enabled for this type I can walk the stack by using StackFrameHelper.GetStackFramesInternal to check further if a specific action or output device is configured for this method which is about 2-3 times faster than the regular StackTrace class. Even with one String.Format I am down to 3 million traces/s so performance is not so important anymore since I do want to do something now. The CallerMemberName feature of the C# 5 compiler is nice but I would have preferred to get direct access to the MethodHandle and not to the stringified version of it. But I really would like to see a CallerType attribute implemented to fill in the generic type argument of the call site to augment the static CLR type data with run time data.

Read the article
Profiling Startup Of VS2012 – SpeedTrace Profiler

- by Alois Kraus

SpeedTrace is a relatively unknown profiler made a company called Ipcas. A single professional license does cost 449€+VAT. For the test I did use SpeedTrace 4.5 which is currently Beta. Although it is cheaper than dotTrace it has by far the most options to influence how profiling does work. First you need to create a tracing project which does configure tracing for one process type. You can start the application directly from the profiler or (much more interesting) it does attach to a specific process when it is started. For this you need to check “Trace the specified …” radio button and enter the process name in the “Process Name of the Trace” edit box. You can even selectively enable tracing for processes with a specific command line. Then you need to activate the trace project by pressing the Activate Project button and you are ready to start VS as usual. If you want to profile the next 10 VS instances that you start you can set the Number of Processes counter to e.g. 10. This is immensely helpful if you are trying to profile only the next 5 started processes. As you can see there are many more tabs which do allow to influence tracing in a much more sophisticated way. SpeedTrace is the only profiler which does not rely entirely on the profiling Api of .NET. Instead it does modify the IL code (instrumentation on the fly) to write tracing information to disc which can later be analyzed. This approach is not only very fast but it does give you unprecedented analysis capabilities. Once the traces are collected they do show up in your workspace where you can open the trace viewer. I do skip the other windows because this view is by far the most useful one. You can sort the methods not only by Wall Clock time but also by CPU consumption and wait time which none of the other products support in their views at the same time. If you want to optimize for CPU consumption sort by CPU time. If you want to find out where most time is spent you need Clock Total time and Clock Waiting. There you can directly see if the method did take long because it did wait on something or it did really execute stuff that did take so long. Once you have found a method you want to drill deeper you can double click on a method to get to the Caller/Callee view which is similar to the JetBrains Method Grid view. But this time you do see much more. In the middle is the clicked method. Above are the methods that call you and below are the methods that you do directly call. Normally you would then start digging deeper to find the end of the chain where the slow method worth optimizing is located. But there is a shortcut. You can press the magic button to calculate the aggregation of all called methods. This is displayed in the lower left window where you can see each method call and how long it did take. There you can also sort to see if this call stack does only contain methods (e.g. WCF connect calls which you cannot make faster) not worth optimizing. YourKit has a similar feature where it is called Callees List. In the Functions tab you have in the context menu also many other useful analysis options One really outstanding feature is the View Call History Drilldown. When you select this one you get not a sum of all method invocations but a list with the duration of each method call. This is not surprising since SpeedTrace does use tracing to get its timings. There you can get many useful graphs how this method did behave over time. Did it become slower at some point in time or was only the first call slow? The diagrams and the list will tell you that. That is all fine but what should I do when one method call was slow? I want to see from where it was coming from. No problem select the method in the list hit F10 and you get the call stack. This is a life saver if you e.g. search for serialization problems. Today Serializers are used everywhere. You want to find out from where the 5s XmlSerializer.Deserialize call did come from? Hit F10 and you get the call stack which did invoke the 5s Deserialize call. The CPU timeline tab is also useful to find out where long pauses or excessive CPU consumption did happen. Click in the graph to get the Thread Stacks window where you can get a quick overview what all threads were doing at this time. This does look like the Stack Traces feature in YourKit. Only this time you get the last called method first which helps to quickly see what all threads were executing at this moment. YourKit does generate a rather long list which can be hard to go through when you have many threads. The thread list in the middle does not give you call stacks or anything like that but you see which methods were found most often executing code by the profiler which is a good indication for methods consuming most CPU time. This does sound too good to be true? I have not told you the best part yet. The best thing about this profiler is the staff behind it. When I do see a crash or some other odd behavior I send a mail to Ipcas and I do get usually the next day a mail that the problem has been fixed and a download link to the new version. The guys at Ipcas are even so helpful to log in to your machine via a Citrix Client to help you to get started profiling your actual application you want to profile. After a 2h telco I was converted from a hater to a believer of this tool. The fast response time might also have something to do with the fact that they are actively working on 4.5 to get out of the door. But still the support is by far the best I have encountered so far. The only downside is that you should instrument your assemblies including the .NET Framework to get most accurate numbers. You can profile without doing it but then you will see very high JIT times in your process which can severely affect the correctness of the measured timings. If you do not care about exact numbers you can also enable in the main UI in the Data Trace tab logging of method arguments of primitive types. If you need to know what files at which times were opened by your application you can find it out without a debugger. Since SpeedTrace does read huge trace files in its reader you should perhaps use a 64 bit machine to be able to analyze bigger traces as well. The memory consumption of the trace reader is too high for my taste. But they did promise for the next version to come up with something much improved.

Read the article
Multiline Replacement With Visual Studio

- by Alois Kraus

I had to remove some file headers in a bigger project which were all of the form #region File Header /*[ Compilation unit ---------------------------------------------------------- Name : Class1.cs Language : C# Creation Date : Description : -----------------------------------------------------------------------------*/ /*] END */ #endregion I know that would be a cool thing to write a simple C# program use a recursive file search, read all lines skip the first n lines and write the files back to disc. But I wanted to test things first before I ruin my source files with one little typo. There comes the Visual Studio Search and Replace in Files dialog into the game. I can test my regular expression to do a multiline match with the Find button before actually breaking anything. And if something goes wrong I have the Undo button. There is a nice blog post from Paulo Morgado online who deals with Multiline Regular expressions. The Visual Studio Regular expressions are non standard so you have to adapt your usual Regex know how to the other patterns. The pattern I cam finally up with is \#region File Header:b*(.*\n)@\#endregion The Regular expression can be read as \#region File Header Match “#region File Header” \# Escapes the # character since it is a quantifier. :b* After this none or more spaces or tabs can follow (:b stands for space or tab) (.*\n)@ Match anything across lines in a non greedy way (the @ character makes it non greedy) to prevent matching too much until the #endregion somewhere in our source file. \#endregion Match everything until “#endregion” is found I had always knew that Visual Studio can do it but I never bothered to learn the non standard Regex syntax. This is powerful and it is inside Visual Studio since 2005!

Read the article
ApiChange Corporate Edition

- by Alois Kraus

In my inital announcement I could only cover a small subset what ApiChange can do for you. Lets look at how ApiChange can help you to fix bugs due to wrong usage of an Api within a fraction of time than it would take normally. It happens that software is tested and some bugs show up. One bug could be …. : We get way too man log messages during our test run. Now you have the task to find the most frequent messages and eliminate the Log calls from the source code. But what about the myriads other log calls? How can we check that the distribution of log calls is nearly equal across all developers? And if not how can we contact the developer to check his code? ApiChange can help you too connect these loose ends. It combines several information silos into one cohesive view. The picture below shows how it is able to fill the gaps. The public version does currently “only” parse the binaries and pdbs to give you for a –whousesmethod query the following colums: If it happens that you have Rational ClearCase (a source control system) in your development shop and an Active Directory in place then ApiChange will try to determine from the source file which was determined from the pdb the last check in user which should be present in your Active Directory. From there it is only a small hop to an LDAP query to your AD domain or the GC (Global Catalog) to get from the user name his Full name Email Phone number Department …. ApiChange will append this additional data all of your query results which contain source files if you add the –fileinfo option. As I said this is currently not enabled by default since the AD domain needs to be configured which are currently only some hard coded values in the SiteConstants.cs source file of ApiChange.Api.dll. Once you got this data you can generate metrics based on source file, developer, assembly, … and add additional data by drag and drop directly into the pivot tables inside Excel. This allows you to e.g. to generate a report which lists the source files with most log calls in descending order along with the developer name and email in the pivot table. Armed with this knowledge you can take meaningful measures e.g. to ask the developer if the huge number of log calls in this source file can be optimized. I am aware that this is a very specific scenario but it is a huge time saver when you are able to fill the missing gaps of information. ApiChange does this in an extensible way. namespace ApiChange.ExternalData { public interface IFileInformationProvider { UserInfo GetInformationFromFile(string fileName); } } It defines an interface where you can implement your custom information provider to close the gap between source control system and the real person I have to send an email to ask if his code needs a closer inspection.

Read the article
ApiChange Is Released!

- by Alois Kraus

I have been working on little tool to simplify my life and perhaps yours as developer as well. It is basically a command line tool that allows you to execute queries on your compiled .NET code base. The main purpose is to find out how big the impact of an api change would be if you changed this or that. Now you can do high level operations like Diff public types for breaking changes. Who uses a method? Who uses a type? Who uses implements an interface? Who references me? What format has the binary (32/64, Managed C++, Pure IL, Unmanaged)? Search for all event subscribers and unsubscribers. A unique feature is to check for event subscription imbalances. Forgotten event subscriptions are the 90% cause of managed memory leaks. It is done at a per class level. If one class does subscribe to one event more often than it does unsubscribe it is treated as possible event subscription imbalance. Another unique ability is to search for users of string literals which allows you to track users of a string constant which is not possible otherwise. For incremental builds the ShowRebuildTargets command can be used to identify the dependant targets that need a rebuild after you did compile one assembly. It has some heuristics in place to determine the impact of breaking changes and finds out which targets need to be recompiled as well. It has a ton of other features and a an API to access these things programmatically so you can build upon these simple queries create even better tools. Perhaps we get a Visual Studio plug in? You can download it from CodePlex here. It works via XCopy deployment. Simply let it run and check the command line help out. The best feature in my opinion is that the output of nearly all commands can be piped to Excel for further analysis. Since it does read also the pdbs it can show you the source file name and line number as well for all matches. The following picture shows the output of a –WhousesType query. The following command checks where type from BaseLibraryV1.dll are used inside DependantLibV1.dll. All matches are printed out with the reason and matching item along with file and line number. There is even a hyper link to the match which will open Visual Studio. ApiChange -whousestype "*" BaseLibraryV1.dll -in DependantLibV1.dll –excel The "*” is the actual query which means all types. The syntax is the same like in C# just that placeholders are allowed ;-). More info's can be found at the Codeplex Documentation. The tool was developed in a TDD style manner which means that it is heavily tested and already used by a quite large user base inside the company I do work for. Luckily for you I got the permission to make it public so you take advantage of it. It is fully instrumented with tracing. If you find bugs simply add the –trace command line switch to find out what is failing and send me the output. How is it done? Your first guess might be that it uses reflection. Wrong. It is based on Mono Cecil a free IL parser with a fantastic API to access all internals of a managed assembly. The speed is awesome and to make it even faster I did make the tool heavily multi threaded. The query above did execute in 1.8s with the Excel output. On a rather slow machine I can analyze over 1500 assemblies in less than 40s with a very low memory consumption. The true power of Mono Cecil is that I can load an assembly like any other data file. I have no problems unloading a file but if I would have used reflection I would need to unload a whole AppDomain just to get rid of one assembly in my memory. Just to give you a glimpse how ApiChange.Api.dll can be used I show you one of the unit tests: public void Can_Find_GenericMethodInvocations_With_Type_Parameters() { // 1. Create an aggregator to collect our matches UsageQueryAggregator agg = new UsageQueryAggregator(); // 2. This is the type we want to search for. Load it via the type query var decimalType = TypeQuery.GetTypeByName(TestConstants.MscorlibAssembly, "System.Decimal"); // 3. register the type query which searches for uses of the Decimal type new WhoUsesType(agg, decimalType); // 4. Search for all users of the Decimal type in the DependandLibV1Assembly agg.Analyze(TestConstants.DependandLibV1Assembly); // Extract matches and assert Assert.AreEqual(2, agg.MethodMatches.Count, "Method match count"); Assert.AreEqual("UseGenericMethod", agg.MethodMatches[0].Match.Name); Assert.AreEqual("UseGenericMethod", agg.MethodMatches[1].Match.Name); } Many thanks go from here to Jb Evian for the creation of Mono.Cecil. Without this fantastic piece of code it would have been much much harder. There are other options around like the Common Compiler Infrastructure Metadata Api which should do the same thing but it was not a real option since the Microsoft reader did fail on even simple assemblies (at least in September 2009 this was the case). Besides this I found the CCI Apis much harder to use. The only real competitor was Reflector which does support many things but does not let me access his cool high level analyze commands. So I decided to dig into the IL specs and as a result you can query your compiled binaries from the command line or programmatically. The best thing is you try it out for yourself and give me some feedback what you miss. If you want to contribute or have a cool idea what should be added drop me a mail at A Kraus1@___No [email protected]. There is much more inside the tool I did not talk about it (yet).

Read the article
Future Of F# At Jazoon 2011

- by Alois Kraus

I was at the Jazoon 2011 in Zurich (Switzerland). It was a really cool event and it had many top notch speaker not only from the Microsoft universe. One of the most interesting talks was from Don Syme with the title: F# Today/F# Tomorrow. He did show how to use F# scripting to browse through open databases/, OData Web Services, Sharepoint, …interactively. It looked really easy with the help of F# Type Providers which is the next big language feature in a future F# version. The object returned by a Type Provider is used to access the data like in usual strongly typed object model. No guessing how the property of an object is called. Intellisense will show it just as you expect. There exists a range of Type Providers for various data sources where the schema of the stored data can somehow be dynamically extracted. Lets use e.g. a free database it would be then let data = DbProvider(http://.....); data the object which contains all data from e.g. a chemical database. It has an elements collection which contains an element which has the properties: Name, AtomicMass, Picture, …. You can browse the object returned by the Type Provider with full Intellisense because the returned object is strongly typed which makes this happen. The same can be achieved of course with code generators that use an input the schema of the input data (OData Web Service, database, Sharepoint, JSON serialized data, …) and spit out the necessary strongly typed objects as an assembly. This does work but has the downside that if the schema of your data source is huge you will quickly run against a wall with traditional code generators since the generated “deserialization” assembly could easily become several hundred MB. *** The following part contains guessing how this exactly work by asking Don two questions **** Q: Can I use Type Providers within C#? D: No. Q: F# is after all a library. I can reference the F# assemblies and use the contained Type Providers? D: F# does annotate the generated types in a special way at runtime which is not a static type that C# could use. The F# type providers seem to use a hybrid approach. At compilation time the Type Provider is instantiated with the url of your input data. The obtained schema information is used by the compiler to generate static types as usual but only for a small subset (the top level classes up to certain nesting level would make sense to me). To make this work you need to access the actual data source at compile time which could be a problem if you want to keep the actual url in a config file. Ok so this explains why it does work at all. But in the demo we did see full intellisense support down to the deepest object level. It looks like if you navigate deeper into the object hierarchy the type provider is instantiated in the background and attach to a true static type the properties determined at run time while you were typing. So this type is not really static at all. It is static if you define as a static type that its properties shows up in intellisense. But since this type information is determined while you are typing and it is not used to generate a true static type and you cannot use these “intellistatic” types from C#. Nonetheless this is a very cool language feature. With the plotting libraries you can generate expressive charts from any datasource within seconds to get quickly an overview of any structured data storage. My favorite programming language C# will not get such features in the near future there is hope. If you restrict yourself to OData sources you can use LINQPad to query any OData enabled data source with LINQ with ease. There you can query Stackoverflow with The output is also nicely rendered which makes it a very good tool to explore OData sources today.

Read the article
Thread.Interrupt Is Evil

- by Alois Kraus

Recently I have found an interesting issue with Thread.Interrupt during application shutdown. Some application was crashing once a week and we had not really a clue what was the issue. Since it happened not very often it was left as is until we have got some memory dumps during the crash. A memory dump usually means WindDbg which I really like to use (I know I am one of the very few fans of it). After a quick analysis I did find that the main thread already had exited and the thread with the crash was stuck in a Monitor.Wait. Strange Indeed. Running the application a few thousand times under the debugger would potentially not have shown me what the reason was so I decided to what I call constructive debugging. I did create a simple Console application project and try to simulate the exact circumstances when the crash did happen from the information I have via memory dump and source code reading. The thread that was crashing was actually MS code from an old version of the Microsoft Caching Application Block. From reading the code I could conclude that the main thread did call the Dispose method on the CacheManger class which did call Thread.Interrupt on the cache scavenger thread which was just waiting for work to do. My first version of the repro looked like this static void Main(string[] args) { Thread t = new Thread(ThreadFunc) { IsBackground = true, Name = "Test Thread" }; t.Start(); Console.WriteLine("Interrupt Thread"); t.Interrupt(); } static void ThreadFunc() { while (true) { object value = Dequeue(); // block until unblocked or awaken via ThreadInterruptedException } } static object WaitObject = new object(); static object Dequeue() { object lret = "got value"; try { lock (WaitObject) { } } catch (ThreadInterruptedException) { Console.WriteLine("Got ThreadInterruptException"); lret = null; } return lret; } I do start a background thread and call Thread.Interrupt on it and then directly let the application terminate. The thread in the meantime does plenty of Monitor.Enter/Leave calls to simulate work on it. This first version did not crash. So I need to dig deeper. From the memory dump I did know that the finalizer thread was doing just some critical finalizers which were closing file handles. Ok lets add some long running finalizers to the sample. class FinalizableObject : CriticalFinalizerObject { ~FinalizableObject() { Console.WriteLine("Hi we are waiting to finalize now and block the finalizer thread for 5s."); Thread.Sleep(5000); } } class Program { static void Main(string[] args) { FinalizableObject fin = new FinalizableObject(); Thread t = new Thread(ThreadFunc) { IsBackground = true, Name = "Test Thread" }; t.Start(); Console.WriteLine("Interrupt Thread"); t.Interrupt(); GC.KeepAlive(fin); // prevent finalizing it too early // After leaving main the other thread is woken up via Thread.Abort // while we are finalizing. This causes a stackoverflow in the CLR ThreadAbortException handling at this time. } With this changed Main method and a blocking critical finalizer I did get my crash just like the real application. The funny thing is that this is actually a CLR bug. When the main method is left the CLR does suspend all threads except the finalizer thread and declares all objects as garbage. After the normal finalizers were called the critical finalizers are executed to e.g. free OS handles (usually). Remember that I did call Thread.Interrupt as one of the last methods in the Main method. The Interrupt method is actually asynchronous and does wake a thread up and throws a ThreadInterruptedException only once unlike Thread.Abort which does rethrow the exception when an exception handling clause is left. It seems that the CLR does not expect that a frozen thread does wake up again while the critical finalizers are executed. While trying to raise a ThreadInterrupedException the CLR goes down with an stack overflow. Ups not so nice. Why has this nobody noticed for years is my next question. As it turned out this error does only happen on the CLR for .NET 4.0 (x86 and x64). It does not show up in earlier or later versions of the CLR. I have reported this issue on connect here but so far it was not confirmed as a CLR bug. But I would be surprised if my console application was to blame for a stack overflow in my test thread in a Monitor.Wait call. What is the moral of this story? Thread.Abort is evil but Thread.Interrupt is too. It is so evil that even the CLR of .NET 4.0 contains a race condition during the CLR shutdown. When the CLR gurus can get it wrong the chances are high that you get it wrong too when you use this constructs. If you do not believe me see what Patrick Smacchia does blog about Thread.Abort and List.Sort. Not only the CLR creators can get it wrong. The BCL writers do sometimes have a hard time with correct exception handling as well. If you do tell me that you use Thread.Abort frequently and never had problems with it I do suspect that you do not have looked deep enough into your application to find such sporadic errors.

Read the article
.NET Code Evolution

- by Alois Kraus

Originally posted on: http://geekswithblogs.net/akraus1/archive/2013/07/24/153504.aspxAt my day job I do look at a lot of code written by other people. Most of the code is quite good and some is even a masterpiece. And there is also code which makes you think WTF… oh it was written by me. Hm not so bad after all. There are many excuses reasons for bad code. Most often it is time pressure followed by not enough ambition (who cares) or insufficient training. Normally I do care about code quality quite a lot which makes me a (perceived) slow worker who does write many tests and refines the code quite a lot because of the design deficiencies. Most of the deficiencies I do find by putting my design under stress while checking for invariants. It does also help a lot to step into the code with a debugger (sometimes also Windbg). I do this much more often when my tests are red. That way I do get a much better understanding what my code really does and not what I think it should be doing. This time I do want to show you how code can evolve over the years with different .NET Framework versions. Once there was time where .NET 1.1 was new and many C++ programmers did switch over to get rid of not initialized pointers and memory leaks. There were also nice new data structures available such as the Hashtable which is fast lookup table with O(1) time complexity. All was good and much code was written since then. At 2005 a new version of the .NET Framework did arrive which did bring many new things like generics and new data structures. The “old” fashioned way of Hashtable were coming to an end and everyone used the new Dictionary<xx,xx> type instead which was type safe and faster because the object to type conversion (aka boxing) was no longer necessary. I think 95% of all Hashtables and dictionaries use string as key. Often it is convenient to ignore casing to make it easy to look up values which the user did enter. An often followed route is to convert the string to upper case before putting it into the Hashtable. Hashtable Table = new Hashtable(); void Add(string key, string value) { Table.Add(key.ToUpper(), value); } This is valid and working code but it has problems. First we can pass to the Hashtable a custom IEqualityComparer to do the string matching case insensitive. Second we can switch over to the now also old Dictionary type to become a little faster and we can keep the the original keys (not upper cased) in the dictionary. Dictionary<string, string> DictTable = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase); void AddDict(string key, string value) { DictTable.Add(key, value); } Many people do not user the other ctors of Dictionary because they do shy away from the overhead of writing their own comparer. They do not know that .NET has for strings already predefined comparers at hand which you can directly use. Today in the many core area we do use threads all over the place. Sometimes things break in subtle ways but most of the time it is sufficient to place a lock around the offender. Threading has become so mainstream that it may sound weird that in the year 2000 some guy got a huge incentive for the idea to reduce the time to process calibration data from 12 hours to 6 hours by using two threads on a dual core machine. Threading does make it easy to become faster at the expense of correctness. Correct and scalable multithreading can be arbitrarily hard to achieve depending on the problem you are trying to solve. Lets suppose we want to process millions of items with two threads and count the processed items processed by all threads. A typical beginners code might look like this: int Counter; void IJustLearnedToUseThreads() { var t1 = new Thread(ThreadWorkMethod); t1.Start(); var t2 = new Thread(ThreadWorkMethod); t2.Start(); t1.Join(); t2.Join(); if (Counter != 2 * Increments) throw new Exception("Hmm " + Counter + " != " + 2 * Increments); } const int Increments = 10 * 1000 * 1000; void ThreadWorkMethod() { for (int i = 0; i < Increments; i++) { Counter++; } } It does throw an exception with the message e.g. “Hmm 10.222.287 != 20.000.000” and does never finish. The code does fail because the assumption that Counter++ is an atomic operation is wrong. The ++ operator is just a shortcut for Counter = Counter + 1 This does involve reading the counter from a memory location into the CPU, incrementing value on the CPU and writing the new value back to the memory location. When we do look at the generated assembly code we will see only inc dword ptr [ecx+10h] which is only one instruction. Yes it is one instruction but it is not atomic. All modern CPUs have several layers of caches (L1,L2,L3) which try to hide the fact how slow actual main memory accesses are. Since cache is just another word for redundant copy it can happen that one CPU does read a value from main memory into the cache, modifies it and write it back to the main memory. The problem is that at least the L1 cache is not shared between CPUs so it can happen that one CPU does make changes to values which did change in meantime in the main memory. From the exception you can see we did increment the value 20 million times but half of the changes were lost because we did overwrite the already changed value from the other thread. This is a very common case and people do learn to protect their data with proper locking. void Intermediate() { var time = Stopwatch.StartNew(); Action acc = ThreadWorkMethod_Intermediate; var ar1 = acc.BeginInvoke(null, null); var ar2 = acc.BeginInvoke(null, null); ar1.AsyncWaitHandle.WaitOne(); ar2.AsyncWaitHandle.WaitOne(); if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Intermediate did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Intermediate() { for (int i = 0; i < Increments; i++) { lock (this) { Counter++; } } } This is better and does use the .NET Threadpool to get rid of manual thread management. It does give the expected result but it can result in deadlocks because you do lock on this. This is in general a bad idea since it can lead to deadlocks when other threads use your class instance as lock object. It is therefore recommended to create a private object as lock object to ensure that nobody else can lock your lock object. When you read more about threading you will read about lock free algorithms. They are nice and can improve performance quite a lot but you need to pay close attention to the CLR memory model. It does make quite weak guarantees in general but it can still work because your CPU architecture does give you more invariants than the CLR memory model. For a simple counter there is an easy lock free alternative present with the Interlocked class in .NET. As a general rule you should not try to write lock free algos since most likely you will fail to get it right on all CPU architectures. void Experienced() { var time = Stopwatch.StartNew(); Task t1 = Task.Factory.StartNew(ThreadWorkMethod_Experienced); Task t2 = Task.Factory.StartNew(ThreadWorkMethod_Experienced); t1.Wait(); t2.Wait(); if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Experienced did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Experienced() { for (int i = 0; i < Increments; i++) { Interlocked.Increment(ref Counter); } } Since time does move forward we do not use threads explicitly anymore but the much nicer Task abstraction which was introduced with .NET 4 at 2010. It is educational to look at the generated assembly code. The Interlocked.Increment method must be called which does wondrous things right? Lets see: lock inc dword ptr [eax] The first thing to note that there is no method call at all. Why? Because the JIT compiler does know very well about CPU intrinsic functions. Atomic operations which do lock the memory bus to prevent other processors to read stale values are such things. Second: This is the same increment call prefixed with a lock instruction. The only reason for the existence of the Interlocked class is that the JIT compiler can compile it to the matching CPU intrinsic functions which can not only increment by one but can also do an add, exchange and a combined compare and exchange operation. But be warned that the correct usage of its methods can be tricky. If you try to be clever and look a the generated IL code and try to reason about its efficiency you will fail. Only the generated machine code counts. Is this the best code we can write? Perhaps. It is nice and clean. But can we make it any faster? Lets see how good we are doing currently. Level Time in s IJustLearnedToUseThreads Flawed Code Intermediate 1,5 (lock) Experienced 0,3 (Interlocked.Increment) Master 0,1 (1,0 for int[2]) That lock free thing is really a nice thing. But if you read more about CPU cache, cache coherency, false sharing you can do even better. int[] Counters = new int[12]; // Cache line size is 64 bytes on my machine with an 8 way associative cache try for yourself e.g. 64 on more modern CPUs void Master() { var time = Stopwatch.StartNew(); Task t1 = Task.Factory.StartNew(ThreadWorkMethod_Master, 0); Task t2 = Task.Factory.StartNew(ThreadWorkMethod_Master, Counters.Length - 1); t1.Wait(); t2.Wait(); Counter = Counters[0] + Counters[Counters.Length - 1]; if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Master did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Master(object number) { int index = (int) number; for (int i = 0; i < Increments; i++) { Counters[index]++; } } The key insight here is to use for each core its own value. But if you simply use simply an integer array of two items, one for each core and add the items at the end you will be much slower than the lock free version (factor 3). Each CPU core has its own cache line size which is something in the range of 16-256 bytes. When you do access a value from one location the CPU does not only fetch one value from main memory but a complete cache line (e.g. 16 bytes). This means that you do not pay for the next 15 bytes when you access them. This can lead to dramatic performance improvements and non obvious code which is faster although it does have many more memory reads than another algorithm. So what have we done here? We have started with correct code but it was lacking knowledge how to use the .NET Base Class Libraries optimally. Then we did try to get fancy and used threads for the first time and failed. Our next try was better but it still had non obvious issues (lock object exposed to the outside). Knowledge has increased further and we have found a lock free version of our counter which is a nice and clean way which is a perfectly valid solution. The last example is only here to show you how you can get most out of threading by paying close attention to your used data structures and CPU cache coherency. Although we are working in a virtual execution environment in a high level language with automatic memory management it does pay off to know the details down to the assembly level. Only if you continue to learn and to dig deeper you can come up with solutions no one else was even considering. I have studied particle physics which does help at the digging deeper part. Have you ever tried to solve Quantum Chromodynamics equations? Compared to that the rest must be easy ;-). Although I am no longer working in the Science field I take pride in discovering non obvious things. This can be a very hard to find bug or a new way to restructure data to make something 10 times faster. Now I need to get some sleep ….

Read the article
Profiling Startup Of VS2012 – YourKit Profiler

- by Alois Kraus

The YourKit (v7.0.5) profiler is interesting in terms of price (79€ single place license, 409€ + 1 year support and upgrades) and feature set. You do get a performance and memory profiler in one package for which you normally need also to pay extra from the other vendors. As an interesting side note the profiler UI is written in Java because they do also sell Java profilers with the same feature set. To get all methods of a VS startup you need first to configure it to include System* in the profiled methods and you need to configure * to measure wall clock time. By default it does record only CPU times which allows you to optimize CPU hungry operations. But you will never see a Thread.Sleep(10000) in the profiler blocking the UI in this mode. It can profile as all others processes started from within the profiler but it can also profile the next or all started processes. As usual it can profile in sampling and tracing mode. But since it is a memory profiler as well it does by default also record all object allocations > 1MB. With allocation recording enabled VS2012 did crash but without allocation recording there were no problems. The CPU tab contains the time line of the application and when you click in the graph you the call stacks of all threads at this time. This is really a nice feature. When you select a time region you the CPU Usage estimation for this time window. I have seen many applications consuming 100% CPU only because they did create garbage like crazy. For this is the Garbage Collection tab interesting in conjunction with a time range. This view is like the CPU table only that the CPU graph (green) is missing. All relevant information except for GCs/s is already visible in the CPU tab. Very handy to pinpoint excessive GC or CPU bound issues. The Threads tab does show the thread names and their lifetime. This is useful to see thread interactions or which thread is hottest in terms of CPU consumption. On the CPU tab the call tree does exist in a merged and thread specific view. When you click on a method you get below a list of all called methods. There you can sort for methods with a high own time which are worth optimizing. In the Method List you can select which scope you want to see. Back Traces are the methods which did call you. Callees ist the list of methods called directly or indirectly by your method as a flat list. This is not a call stack but still very useful to see which methods were slow so you can see the “root” cause quite quickly without the need to click trough long call stacks. The last view Merged Calles is a call stacked view of the previous view. This does help a lot to understand did call each method at run time. You would get the same view with a debugger for one call invocation but here you get the full statistics (invocation count) as well. Since YourKit is also a memory profiler you can directly see which objects you have on your managed heap and which objects do hold most of your precious memory. You can in in the Object Explorer view also examine the contents of your objects (strings or whatsoever) to get a better understanding which objects where potentially allocating this stuff. YourKit is a very easy to use combined memory and performance profiler in one product. The unbeatable single license price makes it very attractive to straightly buy it. Although it is a Java UI it is very responsive and the memory consumption is considerably lower compared to dotTrace and ANTS profiler. What I do really like is to start the YourKit ui and then start the processes I want to profile as usual. There is no need to alter your own application code to be able to inject a profiler into your new started processes. For performance and memory profiling you can simply select the process you want to investigate from the list of started processes. That's the way I like to use profilers. Just get out of the way and let the application run without any special preparations. Next: Telerik JustTrace

Read the article
WMemoryProfiler is Released

- by Alois Kraus

What is it? WMemoryProfiler is a managed profiling Api to aid integration testing. This free library can get managed heap statistics and memory usage for your own process (remember testing) and other processes as well. The best thing is that it does work from .NET 2.0 up to .NET 4.5 in x86 and x64. To make it more interesting it can attach to any running .NET process. The reason why I do mention this is that commercial profilers do support this functionality only for their professional editions. An normally only since .NET 4.0 since the profiling API only since then does support attaching to a running process. This thing does differ in many aspects from “normal” profilers because while profiling yourself you can get all objects from all managed heaps back as an object array. If you ever wanted to change the state of an object which does only exist a method local in another thread you can get your hands on it now … Enough theory. Show me some code /// <summary> /// Show feature to not only get statisics out of a process but also the newly allocated /// instances since the last call to MarkCurrentObjects. /// GetNewObjects does return the newly allocated objects as object array /// </summary> static void InstanceTracking() { using (var dumper = new MemoryDumper()) // if you have problems use to see the debugger windows true,true)) { dumper.MarkCurrentObjects(); Allocate(); ILookup<Type, object> newObjects = dumper.GetNewObjects() .ToLookup( x => x.GetType() ); Console.WriteLine("New Strings:"); foreach (var newStr in newObjects[typeof(string)] ) { Console.WriteLine("Str: {0}", newStr); } } } … New Strings: Str: qqd Str: String data: Str: String data: 0 Str: String data: 1 … This is really hot stuff. Not only you can get heap statistics but you can directly examine the new objects and make queries upon them. When I do find more time I can reconstruct the object root graph from it from my own process. It this cool or what? You can also peek into the Finalization Queue to check if you did accidentally forget to dispose a whole bunch of objects … /// <summary> /// .NET 4.0 or above only. Get all finalizable objects which are ready for finalization and have no other object roots anymore. /// </summary> static void NotYetFinalizedObjects() { using (var dumper = new MemoryDumper()) { object[] finalizable = dumper.GetObjectsReadyForFinalization(); Console.WriteLine("Currently {0} objects of types {1} are ready for finalization. Consider disposing them before.", finalizable.Length, String.Join(",", finalizable.ToLookup( x=> x.GetType() ) .Select( x=> x.Key.Name)) ); } } How does it work? The W of WMemoryProfiler is a good hint. It does employ Windbg and SOS dll to do the heavy lifting and concentrates on an easy to use Api which does hide completely Windbg. If you do not want to see Windbg you will never see it. In my experience the most complex thing is actually to download Windbg from the Windows 8 Stanalone SDK. This is described in the Readme and the exception you are greeted with if it is missing in much greater detail. So I will not go into this here. What Next? Depending on the feedback I do get I can imagine some features which might be useful as well Calculate first order GC Roots from the actual object graph Identify global statics in Types in object graph Support read out of finalization queue of .NET 2.0 as well. Support Memory Dump analysis (again a feature only supported by commercial profilers in their professional editions if it is supported at all) Deserialize objects from a memory dump into a live process back (this would need some more investigation but it is doable) The last item needs some explanation. Why on earth would you want to do that? The basic idea is to store in your live process some logging/tracing data which can become quite big but since it is never written to it is very fast to generate. When your process crashes with a memory dump you could transfer this data structure back into a live viewer which can then nicely display your program state at the point it did crash. This is an advanced trouble shooting technique I have not seen anywhere yet but it could be quite useful. You can have here a look at the current feature list of WMemoryProfiler with some examples. How To Get Started? First I would download the released source package (it is tiny). And compile the complete project. Then you can compile the Example project (it has this name) and uncomment in the main method the scenario you want to check out. If you are greeted with an exception it is time to install the Windows 8 Standalone SDK which is described in great detail in the exception text. Thats it for the first round. I have seen something more limited in the Java world some years ago (now I cannot find the link anymore) but anyway. Now we have something much better.

Read the article
Different Not Automatically Implies Better

- by Alois Kraus

Originally posted on: http://geekswithblogs.net/akraus1/archive/2013/11/05/154556.aspxRecently I was digging deeper why some WCF hosted workflow application did consume quite a lot of memory although it did basically only load a xaml workflow. The first tool of choice is Process Explorer or even better Process Hacker (has more options and the best feature copy&paste does work). The three most important numbers of a process with regards to memory are Working Set, Private Working Set and Private Bytes. Working set is the currently consumed physical memory (parts can be shared between processes e.g. loaded dlls which are read only) Private Working Set is the physical memory needed by this process which is not shareable Private Bytes is the number of non shareable which is only visible in the current process (e.g. all new, malloc, VirtualAlloc calls do create private bytes) When you have a bigger workflow it can consume under 64 bit easily 500MB for a 1-2 MB xaml file. This does not look very scalable. Under 64 bit the issue is excessive private bytes consumption and not the managed heap. The picture is quite different for 32 bit which looks a bit strange but it seems that the hosted VB compiler is a lot less memory hungry under 32 bit. I did try to repro the issue with a medium sized xaml file (400KB) which does contain 1000 variables and 1000 if which can be represented by C# code like this: string Var1; string Var2; ... string Var1000; if (!String.IsNullOrEmpty(Var1) ) { Console.WriteLine(“Var1”); } if (!String.IsNullOrEmpty(Var2) ) { Console.WriteLine(“Var2”); } .... Since WF is based on VB.NET expressions you are bound to the hosted VB.NET compiler which does result in (x64) 140 MB of private bytes which is ca. 140 KB for each if clause which is quite a lot if you think about the actually present functionality. But there is hope. .NET 4.5 does allow now C# expressions for WF which is a major step forward for all C# lovers. I did create some simple patcher to “cross compile” my xaml to C# expressions. Lets look at the result: C# Expressions VB Expressions x86 x86 On my home machine I have only 32 bit which gives you quite exactly half of the memory consumption under 64 bit. C# expressions are 10 times more memory hungry than VB.NET expressions! I wanted to do more with less memory but instead it did consume a magnitude more memory. That is surprising to say the least. The workflow does initialize in about the same time under x64 and x86 where the VB code does it in 2s whereas the C# version needs 18s. Also nearly ten times slower. That is a too high price to pay for any bigger sized xaml workflow to convert from VB.NET to C# expressions. If I do reduce the number of expressions to 500 then it does need 400MB which is about half of the memory. It seems that the cost per if does rise linear with the number of total expressions in a xaml workflow. Expression Language Cost per IF Startup Time C# 1000 Ifs x64 1,5 MB 18s C# 500 Ifs x64 750 KB 9s VB 1000 Ifs x64 140 KB 2s VB 500 Ifs x64 70 KB 1s Now we can directly compare two MS implementations. It is clear that the VB.NET compiler uses the same underlying structure but it has much higher offset compared to the highly inefficient C# expression compiler. I have filed a connect bug here with a harsher wording about recent advances in memory consumption. The funniest thing is that one MS employee did give an Azure AppFabric demo around early 2011 which was so slow that he needed to investigate with xperf. He was after startup time and the call stacks with regards to VB.NET expression compilation were remarkably similar. In fact I only found this post by googling for parts of my call stacks. … “C# expressions will be coming soon to WF, and that will have different performance characteristics than VB” … What did he know Jan 2011 what I did no know until today? ;-). He knew that C# expression will come but that they will not be automatically have better footprint. It is about time to fix that. In its current state C# expressions are not usable for bigger workflows. That also explains the headline for today. You can cheat startup time by prestarting workflows so that the demo looks nice and snappy but it does hurt scalability a lot since you do need much more memory than necessary. I did find the stacks by enabling virtual allocation tracking within XPerf which is still the best tool out there. But first you need to look at your process to check where the memory is hiding: For the C# Expression compiler you do not need xperf. You can directly dump the managed heap and check with a profiler of your choice. But if the allocations are happening on the Private Data ( VirtualAlloc ) you can find it with xperf. There is a nice video on channel 9 explaining VirtualAlloc tracking it in greater detail. If your data allocations are on the Heap it does mean that the C/C++ runtime did create a heap for you where all malloc, new calls do allocate from it. You can enable heap tracing with xperf and full call stack support as well which is doable via xperf like it is shown also on channel 9. Or you can use WPRUI directly: To make “Heap Usage” it work you need to set for your executable the tracing flags (before you start it). For example devenv.exe HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\devenv.exe DWORD TracingFlags 1 Do not forget to disable it after you did complete profiling the process or it will impact the startup time quite a lot. You can with xperf attach directly to a running process and collect heap allocation information from a gone wild process. Very handy if you need to find out what a process was doing which has arrived in a funny state. “VirtualAlloc usage” does work without explicitly enabling stuff for a specific process and is always on machine wide. I had issues on my Windows 7 machines with the call stack collection and the latest Windows 8.1 Performance Toolkit. I was told that WPA from Windows 8.0 should work fine but I do not want to downgrade.

Read the article
Profiling Startup Of VS2012 – Ants Profiler

- by Alois Kraus

I just downloaded ANTS Profiler 7.4 to check how fast it is and how deep I can analyze the startup of Visual Studio 2012. The Pro version which is useful does cost 445€ which is ok. To measure a complex system I decided to simply profile VS2012 (Update 1) on my older Intel 6600 2,4GHz with 3 GB RAM and a 32 bit Windows 7. Ants Profiler is really easy to use. So lets try it out. The Ants Profiler does want to start the profiled application by its own which seems to be rather common. I did choose Method Level timing of all managed methods. In the configuration menu I did want to get all call stacks to get full details. Once this is configured you are ready to go. After that you can select the Method Grid to view Wall Clock Time in ms. I hate percentages which are on by default because I do want to look where absolute time is spent and not something else. From the Method Grid I can drill down to see where time is spent in a nice and I can look at the decompiled methods where the time is spent. This does really look nice. But did you see the size of the scroll bar in the method grid? Although I wanted all call stacks I do get only about 4 pages of methods to drill down. From the scroll bar count I would guess that the profiler does show me about 150 methods for the complete VS startup. This is nonsense. I will never find a bottleneck in VS when I am presented only a fraction of the methods that were actually executed. I have also tried in the configuration window to also profile the extremely trivial functions but there was no noticeable difference. It seems that the Ants Profiler does filter away way too many details to be useful for bigger systems. If you want to optimize a CPU bound operation inside NUnit then Ants Profiler is with its line level timings a very nice tool to work with. But for bigger stuff it is certainly not usable. I also do not like that I must start the profiled application from the profiler UI. This makes it hard to profile processes which are started by some other process. Next: JetBrains dotTrace

Read the article
Profiling Startup Of VS2012 – dotTrace Profiler

- by Alois Kraus

Jetbrains which is famous for the Resharper tool has also a profiler in its portfolio. I downloaded dotTrace 5.2 Professional (569€+VAT) to check how far I can profile the startup of VS2012. The most interesting startup option is “.NET Process”. With that you can profile the next started .NET process which is very useful if you want to profile an application which is not started by you. I did select Tracing as and Wall time to get similar options across all profilers. For some reason the attach option did not work with .NET 4.5 on my home machine. But I am sure that it did work with .NET 4.0 some time ago. Since we are profiling devenv.exe we can also select “Standalone Application” and start it from the profiler. The startup time of VS does increase about a factor 3 but that is ok. You get mainly three windows to work with. The first one shows the threads where you can drill down thread wise where most time is spent. I The next window is the call tree which does merge all threads together in a similar view. The last and most useful view in my opinion is the Plain List window which is nearly the same as the Method Grid in Ants Profiler. But this time we do get when I enable the Show system functions checkbox not a 150 but 19407 methods to choose from! I really tried with Ants Profiler to find something about out how VS does work but look how much we were missing! When I double click on a method I do get in the lower pane the called methods and their respective timings. This is something really useful and I can nicely drill down to the most important stuff. The measured time seems to be Wall Clock time which is a good thing to see where my time is really spent. You can also use Sampling as profiling method but this does give you much less information. Except for getting a first idea where to look first this profiling mode is not very useful to understand how you system does interact. The options have a good list of presets to hide by default many method and gray them out to concentrate on your code. It does not filter anything out if you enable Show system functions. By default methods from these assemblies are hidden or if the checkbox is checked grayed out. All in all JetBrains has made a nice profiler which does show great detail and it has nice drill down capabilities. The only thing is that I do not trust its measured timings. I did fall several times into the trap with this one to optimize at places which were already fast but the profiler did show high times in these methods. After measuring with Tracing I was certain that the measured times were greatly exaggerated. Especially when IO is involved it seems to have a hard time to subtract its own overhead. What I did miss most was the possibility to profile not only the next started process but to be able to select a process by name and perhaps a count to profile the next n processes of this name. Next: YourKit

Read the article
Profiling Startup Of VS2012 – JustTrace Profiler

- by Alois Kraus

JustTrace is made by Telerik which is mainly known for its collection of UI controls. The current version (2012.3.1127.0) does include a performance and memory profiler which does cost 614€ and is currently with a special offer for 306€ on sale. It does include one year of free upgrades. The uneven € numbers are calculated from the 799€ and 50% dicsount price. The UI is already in Metro style and simple to use. Multi process, attach, method recording filter are not supported. It looks like JustTrace is like Ants a Just My Code profiler. For stuff where you do not have the pdbs or you want to dig deeper into the BCL code you will not get far. After getting the profile data you get in the All Methods grid a plain list with hit count and own time. The method list for all methods is also suspiciously short which is a clear sign that you will not get far during the analysis of foreign code. But at least there is also a memory profiler included. For this I have to choose in the first window for Profiling Type “Memory Profiler” to check the memory consumption of VS. There are some interesting number to see but I do really miss from YourKit the thread stack window. How am I supposed to get a clue when much memory is allocated and the CPU consumption is high in which places I should look? The Snapshot summary gives a rough overview which is ok for a first impression. Next is Assemblies? This gives you a list of all loaded assemblies. Not terribly useful. The By Type view gives you exactly what it is supposed to do. You have to keep in mind that this list is filtered by the types you did check in the Assemblies list. The By Type instance list does only show types from assemblies which do not originate from Microsoft. By default mscorlib and System are not checked. That is the reason why for the first time my By Type window looked like The idea behind this feature is to show only your instances because you are ultimately responsible for the overall memory consumption. I am not sure if I do like this feature because by default it does hide too much. I do want to see at least how many strings and arrays are allocated. A simple namespace filter would also do it in my opinion. Now you can examine all string instances and look who in the object graph does keep a reference on them. That is nice but YourKit has the big plus that you can also look into the string contents. I am also not sure how in the graph cycles are visualized and what will happen if you have thousands of objects referencing you. That's pretty much it about JustTrace. It can help the average developer to pinpoint performance and memory issues by just looking at his own code and instances. Showing them more will not help them because the sheer amount of information will overwhelm them. And you need to have a pretty good understanding how the GC and the CLR does work. When you have a performance issue at a customer machine it is sometimes very helpful to be able a bring a profiler onto the machine (no pdbs, …) and to get a full snapshot of all processes which are in the problematic use case involved. For these more advanced use cased JustTrace is certainly the wrong tool. Next: SpeedTrace

Read the article
Getting fingerprint from Apache certificate (combined with key)

- by Alois Mahdal

I have just created a certificate for my Apache SSL host using: make-ssl-cert /usr/share/ssl-cert/ssleay.cnf /etc/ssl/private/myhost.crt Now that is the correct way to get the fingerprint out of it? (So I can keep it in other place for visual comparison---in case I need to connect and really don't trust the network?) openssl sha1 /etc/ssl/private/myhost.crt returns different SHA1 than Opera tells me about the cert. Is this because it's combined with the key? (...or am I spoofed already? :-)).

Read the article
How to make browser offer "remember password"

- by Alois Mahdal

What properties login form needs to have to trigger "remember password" for all (or most) browsers? Background: For years I have been using Opera, and the rule has been that almost anything that looks like login form triggers this feature. Now I'm exploring other browsers, and surprisingly find that one of login forms I visit most often, Roundcube login page, does not trigger this feature neither in Chromium nor Firefox (tried various versions and page setups of RC). Since it's on my VPS, I could patch it up and eventually even send that to RC development team, but I have the unnerving feeling that this must have been solved over and over, and it's just me being blind or something. So again: is there any consensus between browser developer as to when this featue should be triggered? Is there a best practice for webmasters to tell the browser "this is the login page"? Or are all browsers doing their own heuristics?

Read the article

1 2 | Next Page >