Search Results

Search found 1801 results on 73 pages for 'andrew cooper'.

Page 3/73 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Subterranean IL: Custom modifiers

    - by Simon Cooper
    In IL, volatile is an instruction prefix used to set a memory barrier at that instruction. However, in C#, volatile is applied to a field to indicate that all accesses on that field should be prefixed with volatile. As I mentioned in my previous post, this means that the field definition needs to store this information somehow, as such a field could be accessed from another assembly. However, IL does not have a concept of a 'volatile field'. How is this information stored? Attributes The standard way of solving this is to apply a VolatileAttribute or similar to the field; this extra metadata notifies the C# compiler that all loads and stores to that field should use the volatile prefix. However, there is a problem with this approach, namely, the .NET C++ compiler. C++ allows methods to be overloaded using properties, like volatile or const, on the parameters; this is perfectly legal C++: public ref class VolatileMethods { void Method(int *i) {} void Method(volatile int *i) {} } If volatile was specified using a custom attribute, then the VolatileMethods class wouldn't be compilable to IL, as there is nothing to differentiate the two methods from each other. This is where custom modifiers come in. Custom modifiers Custom modifiers are similar to custom attributes, but instead of being applied to an IL element separately to its declaration, they are embedded within the field or parameter's type signature itself. The VolatileMethods class would be compiled to the following IL: .class public VolatileMethods { .method public instance void Method(int32* i) {} .method public instance void Method( int32 modreq( [mscorlib]System.Runtime.CompilerServices.IsVolatile)* i) {} } The modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile) is the custom modifier. This adds a TypeDef or TypeRef token to the signature of the field or parameter, and even though they are mostly ignored by the CLR when it's executing the program, this allows methods and fields to be overloaded in ways that wouldn't be allowed using attributes. Because the modifiers are part of the signature, they need to be fully specified when calling such a method in IL: call instance void Method( int32 modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile)*) There are two ways of applying modifiers; modreq specifies required modifiers (like IsVolatile), and modopt specifies optional modifiers that can be ignored by compilers (like IsLong or IsConst). The type specified as the modifier argument are simple placeholders; if you have a look at the definitions of IsVolatile and IsLong they are completely empty. They exist solely to be referenced by a modifier. Custom modifiers are used extensively by the C++ compiler to specify concepts that aren't expressible in IL, but still need to be taken into account when calling method overloads. C++ and C# That's all very well and good, but how does this affect C#? Well, the C++ compiler uses modreq(IsVolatile) to specify volatility on both method parameters and fields, as it would be slightly odd to have the same concept represented using a modifier or attribute depending on what it was applied to. Once you've compiled your C++ project, it can then be referenced and used from C#, so the C# compiler has to recognise the modreq(IsVolatile) custom modifier applied to fields, and vice versa. So, even though you can't overload fields or parameters with volatile using C#, volatile needs to be expressed using a custom modifier rather than an attribute to guarentee correct interoperability and behaviour with any C++ dlls that happen to come along. Next up: a closer look at attributes, and how certain attributes compile in unexpected ways.

    Read the article

  • Podcast Show Notes: The Big Deal About Big Data

    - by Bob Rhubart
    This week the OTN ArchBeat kicks off a three-part series that looks at Big Data: what it is, its affect on enterprise IT, and what architects need to do to stay ahead of the big data curve. My guests for this conversation are Jean-Pierre Dijks and Andrew Bond . Jean-Pierre, based at Oracle HQ in Redwood Shores, CA, is product manager for Oracle Big Data Appliance and Oracle's big data strategy. Andrew Bond  is Head of Transformation Architecture for Oracle, where he specialzes in Data Warehousing, Business Intelligence, and Big Data. Andrew is based in the UK, but for this conversation he dialed in from a car somewhere on the streets of Amsterdam. Listen to Part 1What is Big Data, really, and why does it matter? Listen to Part 2 (Oct 10)What new challenges does Big Data present for Architects? What do architects need to do to prepare themselves and their environments? Listen to Part 3 (Oct 17)Who is driving the adoption of Big Data strategies in organizations, and why? Additional Resources http://blogs.oracle.com/datawarehousing http://www.facebook.com/pages/OracleBigData https://twitter.com/#!/OracleBigData Coming Soon A conversation about how the rapidly evolving enterprise IT landscape is transforming the roles, responsibilities, and skill requirements for architects and developers. Stay tuned: RSS

    Read the article

  • Anatomy of a .NET Assembly - PE Headers

    - by Simon Cooper
    Today, I'll be starting a look at what exactly is inside a .NET assembly - how the metadata and IL is stored, how Windows knows how to load it, and what all those bytes are actually doing. First of all, we need to understand the PE file format. PE files .NET assemblies are built on top of the PE (Portable Executable) file format that is used for all Windows executables and dlls, which itself is built on top of the MSDOS executable file format. The reason for this is that when .NET 1 was released, it wasn't a built-in part of the operating system like it is nowadays. Prior to Windows XP, .NET executables had to load like any other executable, had to execute native code to start the CLR to read & execute the rest of the file. However, starting with Windows XP, the operating system loader knows natively how to deal with .NET assemblies, rendering most of this legacy code & structure unnecessary. It still is part of the spec, and so is part of every .NET assembly. The result of this is that there are a lot of structure values in the assembly that simply aren't meaningful in a .NET assembly, as they refer to features that aren't needed. These are either set to zero or to certain pre-defined values, specified in the CLR spec. There are also several fields that specify the size of other datastructures in the file, which I will generally be glossing over in this initial post. Structure of a PE file Most of a PE file is split up into separate sections; each section stores different types of data. For instance, the .text section stores all the executable code; .rsrc stores unmanaged resources, .debug contains debugging information, and so on. Each section has a section header associated with it; this specifies whether the section is executable, read-only or read/write, whether it can be cached... When an exe or dll is loaded, each section can be mapped into a different location in memory as the OS loader sees fit. In order to reliably address a particular location within a file, most file offsets are specified using a Relative Virtual Address (RVA). This specifies the offset from the start of each section, rather than the offset within the executable file on disk, so the various sections can be moved around in memory without breaking anything. The mapping from RVA to file offset is done using the section headers, which specify the range of RVAs which are valid within that section. For example, if the .rsrc section header specifies that the base RVA is 0x4000, and the section starts at file offset 0xa00, then an RVA of 0x401d (offset 0x1d within the .rsrc section) corresponds to a file offset of 0xa1d. Because each section has its own base RVA, each valid RVA has a one-to-one mapping with a particular file offset. PE headers As I said above, most of the header information isn't relevant to .NET assemblies. To help show what's going on, I've created a diagram identifying all the various parts of the first 512 bytes of a .NET executable assembly. I've highlighted the relevant bytes that I will refer to in this post: Bear in mind that all numbers are stored in the assembly in little-endian format; the hex number 0x0123 will appear as 23 01 in the diagram. The first 64 bytes of every file is the DOS header. This starts with the magic number 'MZ' (0x4D, 0x5A in hex), identifying this file as an executable file of some sort (an .exe or .dll). Most of the rest of this header is zeroed out. The important part of this header is at offset 0x3C - this contains the file offset of the PE signature (0x80). Between the DOS header & PE signature is the DOS stub - this is a stub program that simply prints out 'This program cannot be run in DOS mode.\r\n' to the console. I will be having a closer look at this stub later on. The PE signature starts at offset 0x80, with the magic number 'PE\0\0' (0x50, 0x45, 0x00, 0x00), identifying this file as a PE executable, followed by the PE file header (also known as the COFF header). The relevant field in this header is in the last two bytes, and it specifies whether the file is an executable or a dll; bit 0x2000 is set for a dll. Next up is the PE standard fields, which start with a magic number of 0x010b for x86 and AnyCPU assemblies, and 0x20b for x64 assemblies. Most of the rest of the fields are to do with the CLR loader stub, which I will be covering in a later post. After the PE standard fields comes the NT-specific fields; again, most of these are not relevant for .NET assemblies. The one that is is the highlighted Subsystem field, and specifies if this is a GUI or console app - 0x20 for a GUI app, 0x30 for a console app. Data directories & section headers After the PE and COFF headers come the data directories; each directory specifies the RVA (first 4 bytes) and size (next 4 bytes) of various important parts of the executable. The only relevant ones are the 2nd (Import table), 13th (Import Address table), and 15th (CLI header). The Import and Import Address table are only used by the startup stub, so we will look at those later on. The 15th points to the CLI header, where the CLR-specific metadata begins. After the data directories comes the section headers; one for each section in the file. Each header starts with the section's ASCII name, null-padded to 8 bytes. Again, most of each header is irrelevant, but I've highlighted the base RVA and file offset in each header. In the diagram, you can see the following sections: .text: base RVA 0x2000, file offset 0x200 .rsrc: base RVA 0x4000, file offset 0xa00 .reloc: base RVA 0x6000, file offset 0x1000 The .text section contains all the CLR metadata and code, and so is by far the largest in .NET assemblies. The .rsrc section contains the data you see in the Details page in the right-click file properties page, but is otherwise unused. The .reloc section contains address relocations, which we will look at when we study the CLR startup stub. What about the CLR? As you can see, most of the first 512 bytes of an assembly are largely irrelevant to the CLR, and only a few bytes specify needed things like the bitness (AnyCPU/x86 or x64), whether this is an exe or dll, and the type of app this is. There are some bytes that I haven't covered that affect the layout of the file (eg. the file alignment, which determines where in a file each section can start). These values are pretty much constant in most .NET assemblies, and don't affect the CLR data directly. Conclusion To summarize, the important data in the first 512 bytes of a file is: DOS header. This contains a pointer to the PE signature. DOS stub, which we'll be looking at in a later post. PE signature PE file header (aka COFF header). This specifies whether the file is an exe or a dll. PE standard fields. This specifies whether the file is AnyCPU/32bit or 64bit. PE NT-specific fields. This specifies what type of app this is, if it is an app. Data directories. The 15th entry (at offset 0x168) contains the RVA and size of the CLI header inside the .text section. Section headers. These are used to map between RVA and file offset. The important one is .text, which is where all the CLR data is stored. In my next post, we'll start looking at the metadata used by the CLR directly, which is all inside the .text section.

    Read the article

  • Some non-generic collections

    - by Simon Cooper
    Although the collections classes introduced in .NET 2, 3.5 and 4 cover most scenarios, there are still some .NET 1 collections that don't have generic counterparts. In this post, I'll be examining what they do, why you might use them, and some things you'll need to bear in mind when doing so. BitArray System.Collections.BitArray is conceptually the same as a List<bool>, but whereas List<bool> stores each boolean in a single byte (as that's what the backing bool[] does), BitArray uses a single bit to store each value, and uses various bitmasks to access each bit individually. This means that BitArray is eight times smaller than a List<bool>. Furthermore, BitArray has some useful functions for bitmasks, like And, Xor and Not, and it's not limited to 32 or 64 bits; a BitArray can hold as many bits as you need. However, it's not all roses and kittens. There are some fundamental limitations you have to bear in mind when using BitArray: It's a non-generic collection. The enumerator returns object (a boxed boolean), rather than an unboxed bool. This means that if you do this: foreach (bool b in bitArray) { ... } Every single boolean value will be boxed, then unboxed. And if you do this: foreach (var b in bitArray) { ... } you'll have to manually unbox b on every iteration, as it'll come out of the enumerator an object. Instead, you should manually iterate over the collection using a for loop: for (int i=0; i<bitArray.Length; i++) { bool b = bitArray[i]; ... } Following on from that, if you want to use BitArray in the context of an IEnumerable<bool>, ICollection<bool> or IList<bool>, you'll need to write a wrapper class, or use the Enumerable.Cast<bool> extension method (although Cast would box and unbox every value you get out of it). There is no Add or Remove method. You specify the number of bits you need in the constructor, and that's what you get. You can change the length yourself using the Length property setter though. It doesn't implement IList. Although not really important if you're writing a generic wrapper around it, it is something to bear in mind if you're using it with pre-generic code. However, if you use BitArray carefully, it can provide significant gains over a List<bool> for functionality and efficiency of space. OrderedDictionary System.Collections.Specialized.OrderedDictionary does exactly what you would expect - it's an IDictionary that maintains items in the order they are added. It does this by storing key/value pairs in a Hashtable (to get O(1) key lookup) and an ArrayList (to maintain the order). You can access values by key or index, and insert or remove items at a particular index. The enumerator returns items in index order. However, the Keys and Values properties return ICollection, not IList, as you might expect; CopyTo doesn't maintain the same ordering, as it copies from the backing Hashtable, not ArrayList; and any operations that insert or remove items from the middle of the collection are O(n), just like a normal list. In short; don't use this class. If you need some sort of ordered dictionary, it would be better to write your own generic dictionary combining a Dictionary<TKey, TValue> and List<KeyValuePair<TKey, TValue>> or List<TKey> for your specific situation. ListDictionary and HybridDictionary To look at why you might want to use ListDictionary or HybridDictionary, we need to examine the performance of these dictionaries compared to Hashtable and Dictionary<object, object>. For this test, I added n items to each collection, then randomly accessed n/2 items: So, what's going on here? Well, ListDictionary is implemented as a linked list of key/value pairs; all operations on the dictionary require an O(n) search through the list. However, for small n, the constant factor that big-o notation doesn't measure is much lower than the hashing overhead of Hashtable or Dictionary. HybridDictionary combines a Hashtable and ListDictionary; for small n, it uses a backing ListDictionary, but switches to a Hashtable when it gets to 9 items (you can see the point it switches from a ListDictionary to Hashtable in the graph). Apart from that, it's got very similar performance to Hashtable. So why would you want to use either of these? In short, you wouldn't. Any gain in performance by using ListDictionary over Dictionary<TKey, TValue> would be offset by the generic dictionary not having to cast or box the items you store, something the graphs above don't measure. Only if the performance of the dictionary is vital, the dictionary will hold less than 30 items, and you don't need type safety, would you use ListDictionary over the generic Dictionary. And even then, there's probably more useful performance gains you can make elsewhere.

    Read the article

  • Why enumerator structs are a really bad idea

    - by Simon Cooper
    If you've ever poked around the .NET class libraries in Reflector, I'm sure you would have noticed that the generic collection classes all have implementations of their IEnumerator as a struct rather than a class. As you will see, this design decision has some rather unfortunate side effects... As is generally known in the .NET world, mutable structs are a Very Bad Idea; and there are several other blogs around explaining this (Eric Lippert's blog post explains the problem quite well). In the BCL, the generic collection enumerators are all mutable structs, as they need to keep track of where they are in the collection. This bit me quite hard when I was coding a wrapper around a LinkedList<int>.Enumerator. It boils down to this code: sealed class EnumeratorWrapper : IEnumerator<int> { private readonly LinkedList<int>.Enumerator m_Enumerator; public EnumeratorWrapper(LinkedList<int> linkedList) { m_Enumerator = linkedList.GetEnumerator(); } public int Current { get { return m_Enumerator.Current; } } object System.Collections.IEnumerator.Current { get { return Current; } } public bool MoveNext() { return m_Enumerator.MoveNext(); } public void Reset() { ((System.Collections.IEnumerator)m_Enumerator).Reset(); } public void Dispose() { m_Enumerator.Dispose(); } } The key line here is the MoveNext method. When I initially coded this, I thought that the call to m_Enumerator.MoveNext() would alter the enumerator state in the m_Enumerator class variable and so the enumeration would proceed in an orderly fashion through the collection. However, when I ran this code it went into an infinite loop - the m_Enumerator.MoveNext() call wasn't actually changing the state in the m_Enumerator variable at all, and my code was looping forever on the first collection element. It was only after disassembling that method that I found out what was going on The MoveNext method above results in the following IL: .method public hidebysig newslot virtual final instance bool MoveNext() cil managed { .maxstack 1 .locals init ( [0] bool CS$1$0000, [1] valuetype [System]System.Collections.Generic.LinkedList`1/Enumerator CS$0$0001) L_0000: nop L_0001: ldarg.0 L_0002: ldfld valuetype [System]System.Collections.Generic.LinkedList`1/Enumerator EnumeratorWrapper::m_Enumerator L_0007: stloc.1 L_0008: ldloca.s CS$0$0001 L_000a: call instance bool [System]System.Collections.Generic.LinkedList`1/Enumerator::MoveNext() L_000f: stloc.0 L_0010: br.s L_0012 L_0012: ldloc.0 L_0013: ret } Here, the important line is 0002 - m_Enumerator is accessed using the ldfld operator, which does the following: Finds the value of a field in the object whose reference is currently on the evaluation stack. So, what the MoveNext method is doing is the following: public bool MoveNext() { LinkedList<int>.Enumerator CS$0$0001 = this.m_Enumerator; bool CS$1$0000 = CS$0$0001.MoveNext(); return CS$1$0000; } The enumerator instance being modified by the call to MoveNext is the one stored in the CS$0$0001 variable on the stack, and not the one in the EnumeratorWrapper class instance. Hence why the state of m_Enumerator wasn't getting updated. Hmm, ok. Well, why is it doing this? If you have a read of Eric Lippert's blog post about this issue, you'll notice he quotes a few sections of the C# spec. In particular, 7.5.4: ...if the field is readonly and the reference occurs outside an instance constructor of the class in which the field is declared, then the result is a value, namely the value of the field I in the object referenced by E. And my m_Enumerator field is readonly! Indeed, if I remove the readonly from the class variable then the problem goes away, and the code works as expected. The IL confirms this: .method public hidebysig newslot virtual final instance bool MoveNext() cil managed { .maxstack 1 .locals init ( [0] bool CS$1$0000) L_0000: nop L_0001: ldarg.0 L_0002: ldflda valuetype [System]System.Collections.Generic.LinkedList`1/Enumerator EnumeratorWrapper::m_Enumerator L_0007: call instance bool [System]System.Collections.Generic.LinkedList`1/Enumerator::MoveNext() L_000c: stloc.0 L_000d: br.s L_000f L_000f: ldloc.0 L_0010: ret } Notice on line 0002, instead of the ldfld we had before, we've got a ldflda, which does this: Finds the address of a field in the object whose reference is currently on the evaluation stack. Instead of loading the value, we're loading the address of the m_Enumerator field. So now the call to MoveNext modifies the enumerator stored in the class rather than on the stack, and everything works as expected. Previously, I had thought enumerator structs were an odd but interesting feature of the BCL that I had used in the past to do linked list slices. However, effects like this only underline how dangerous mutable structs are, and I'm at a loss to explain why the enumerators were implemented as structs in the first place. (interestingly, the SortedList<TKey, TValue> enumerator is a struct but is private, which makes it even more odd - the only way it can be accessed is as a boxed IEnumerator!). I would love to hear people's theories as to why the enumerators are implemented in such a fashion. And bonus points if you can explain why LinkedList<int>.Enumerator.Reset is an explicit implementation but Dispose is implicit... Note to self: never ever ever code a mutable struct.

    Read the article

  • Inside Red Gate - Project teams

    - by Simon Cooper
    Within each division in Red Gate, development effort is structured around one or more project teams; currently, each division contains 2-3 separate teams. These are self contained units responsible for a particular development project. Project team structure The typical size of a development team varies, but is normally around 4-7 people - one project manager, two developers, one or two testers, a technical author (who is responsible for the text within the application, website content, and help documentation) and a user experience designer (who designs and prototypes the UIs) . However, team sizes can vary from 3 up to 12, depending on the division and project. As an rule, all the team sits together in the same area of the office. (Again, this is my experience of what happens. I haven't worked in the DBA division, and SQL Tools might have changed completely since I moved to .NET. As I mentioned in my previous post, each division is free to structure itself as it sees fit.) Depending on the project, and the other needs in the division, the tech author and UX designer may be shared between several projects. Generally, developers and testers work on one project at a time. If the project is a simple point release, then it might not need a UX designer at all. However, if it's a brand new product, then a UX designer and tech author will be involved right from the start. Developers, testers, and the project manager will normally stay together in the same team as they work on different projects, unless there's a good reason to split or merge teams for a particular project. Technical authors and UX designers will normally go wherever they are needed in the division, depending on what each project needs at the time. In my case, I was working with more or less the same people for over 2 years, all the way through SQL Compare 7, 8, and Schema Compare for Oracle. This helped to build a great sense of camaraderie wihin the team, and helped to form and maintain a team identity. This, in turn, meant we worked very well together, and so the final result was that much better (as well as making the work more fun). How is a project started and run? The product manager within each division collates user feedback and ideas, does lots of research, throws in a few ideas from people within the company, and then comes up with a list of what the division should work on in the next few years. This is split up into projects, and after each project is greenlit (I'll be discussing this later on) it is then assigned to a project team, as and when they become available (I'm sure there's lots of discussions and meetings at this point that I'm not aware of!). From that point, it's entirely up to the project team. Just as divisions are autonomous, project teams are also given a high degree of autonomy. All the teams in Red Gate use some sort of vaguely agile methodology; most use some variations on SCRUM, some have experimented with Kanban. Some store the project progress on a whiteboard, some use our bug tracker, others use different methods. It all depends on what the team members think will work best for them to get the best result at the end. From that point, the project proceeds as you would expect; code gets written, tests pass and fail, discussions about how to resolve various problems are had and decided upon, and out pops a new product, new point release, new internal tool, or whatever the project's goal was. The project manager ensures that everyone works together without too much bloodshed and that thrown missiles are constrained to Nerf bullets, the developers write the code, the testers ensure it actually works, and the tech author and UX designer ensure that people will be able to use the final product to solve their problem (after all, developers make lousy UI designers and technical authors). Projects in Red Gate last a relatively short amount of time; most projects are less than 6 months. The longest was 18 months. This has evolved as the company has grown, and I suspect is a side effect of the type of software Red Gate produces. As an ISV, we sell packaged software; we only get revenue when customers purchase the ready-made tools. As a result, we only get a sellable piece of software right at the end of a project. Therefore, the longer the project lasts, the more time and money has to be invested by the company before we get any revenue from it, and the riskier the project becomes. This drives the average project time down. Small project teams are the core of how Red Gate produces software, and are what the whole development effort of the company is built around. In my next post, I'll be looking at the office itself, and how all 200 of us manage to fit on two floors of a small office building.

    Read the article

  • .NET Security Part 3

    - by Simon Cooper
    You write a security-related application that allows addins to be used. These addins (as dlls) can be downloaded from anywhere, and, if allowed to run full-trust, could open a security hole in your application. So you want to restrict what the addin dlls can do, using a sandboxed appdomain, as explained in my previous posts. But there needs to be an interaction between the code running in the sandbox and the code that created the sandbox, so the sandboxed code can control or react to things that happen in the controlling application. Sandboxed code needs to be able to call code outside the sandbox. Now, there are various methods of allowing cross-appdomain calls, the two main ones being .NET Remoting with MarshalByRefObject, and WCF named pipes. I’m not going to cover the details of setting up such mechanisms here, or which you should choose for your specific situation; there are plenty of blogs and tutorials covering such issues elsewhere. What I’m going to concentrate on here is the more general problem of running fully-trusted code within a sandbox, which is required in most methods of app-domain communication and control. Defining assemblies as fully-trusted In my last post, I mentioned that when you create a sandboxed appdomain, you can pass in a list of assembly strongnames that run as full-trust within the appdomain: // get the Assembly object for the assembly Assembly assemblyWithApi = ... // get the StrongName from the assembly's collection of evidence StrongName apiStrongName = assemblyWithApi.Evidence.GetHostEvidence<StrongName>(); // create the sandbox AppDomain sandbox = AppDomain.CreateDomain( "Sandbox", null, appDomainSetup, restrictedPerms, apiStrongName); Any assembly that is loaded into the sandbox with a strong name the same as one in the list of full-trust strong names is unconditionally given full-trust permissions within the sandbox, irregardless of permissions and sandbox setup. This is very powerful! You should only use this for assemblies that you trust as much as the code creating the sandbox. So now you have a class that you want the sandboxed code to call: // within assemblyWithApi public class MyApi { public static void MethodToDoThings() { ... } } // within the sandboxed dll public class UntrustedSandboxedClass { public void DodgyMethod() { ... MyApi.MethodToDoThings(); ... } } However, if you try to do this, you get quite an ugly exception: MethodAccessException: Attempt by security transparent method ‘UntrustedSandboxedClass.DodgyMethod()’ to access security critical method ‘MyApi.MethodToDoThings()’ failed. Security transparency, which I covered in my first post in the series, has entered the picture. Partially-trusted code runs at the Transparent security level, fully-trusted code runs at the Critical security level, and Transparent code cannot under any circumstances call Critical code. Security transparency and AllowPartiallyTrustedCallersAttribute So the solution is easy, right? Make MethodToDoThings SafeCritical, then the transparent code running in the sandbox can call the api: [SecuritySafeCritical] public static void MethodToDoThings() { ... } However, this doesn’t solve the problem. When you try again, exactly the same exception is thrown; MethodToDoThings is still running as Critical code. What’s going on? By default, a fully-trusted assembly always runs Critical code, irregardless of any security attributes on its types and methods. This is because it may not have been designed in a secure way when called from transparent code – as we’ll see in the next post, it is easy to open a security hole despite all the security protections .NET 4 offers. When exposing an assembly to be called from partially-trusted code, the entire assembly needs a security audit to decide what should be transparent, safe critical, or critical, and close any potential security holes. This is where AllowPartiallyTrustedCallersAttribute (APTCA) comes in. Without this attribute, fully-trusted assemblies run Critical code, and partially-trusted assemblies run Transparent code. When this attribute is applied to an assembly, it confirms that the assembly has had a full security audit, and it is safe to be called from untrusted code. All code in that assembly runs as Transparent, but SecurityCriticalAttribute and SecuritySafeCriticalAttribute can be applied to individual types and methods to make those run at the Critical or SafeCritical levels, with all the restrictions that entails. So, to allow the sandboxed assembly to call the full-trust API assembly, simply add APCTA to the API assembly: [assembly: AllowPartiallyTrustedCallers] and everything works as you expect. The sandboxed dll can call your API dll, and from there communicate with the rest of the application. Conclusion That’s the basics of running a full-trust assembly in a sandboxed appdomain, and allowing a sandboxed assembly to access it. The key is AllowPartiallyTrustedCallersAttribute, which is what lets partially-trusted code call a fully-trusted assembly. However, an assembly with APTCA applied to it means that you have run a full security audit of every type and member in the assembly. If you don’t, then you could inadvertently open a security hole. I’ll be looking at ways this can happen in my next post.

    Read the article

  • Sprites are sometimes blurry in Flash

    - by Tim Cooper
    I am playing around with drawing an SVG sprite (imported in through [Embed]). Depending on the coordinates of the image, sometimes it appears more crisp than others. The following image shows how at different locations is it rendered differently: (Image link - You may have to download and zoom in with an image editor to see it) You'll notice that the middle sprite is more blurry than the ones on the sides. Does anyone know why this is? Any help would be appreciated.

    Read the article

  • Inside Red Gate - Be Reasonable!

    - by Simon Cooper
    As I discussed in my previous posts, divisions and project teams within Red Gate are allowed a lot of autonomy to manage themselves. It's not just the teams though, there's an awful lot of freedom given to individual employees within the company as well. Reasonableness How Red Gate treats it's employees is embodied in the phrase 'You will be reasonable with us, and we will be reasonable with you'. As an employee, you are trusted to do your job to the best of you ability. There's no one looking over your shoulder, no one clocking you in and out each day. Everyone is working at the company because they want to, and one of the core ideas of Red Gate is that the company exists to 'let people do the best work of their lives'. Everything is geared towards that. To help you do your job, office services and the IT department are there. If you need something to help you work better (a third or fourth monitor, footrests, or a new keyboard) then ask people in Information Systems (IS) or Office Services and you will be given it, no questions asked. Everyone has administrator access to their own machines, and you can install whatever you want on it. If there's a particular bit of software you need, then ask IS and they will buy it. As an example, last year I wanted to replace my main hard drive with an SSD; I had a summer job at school working in a computer repair shop, so knew what to do. I went to IS and asked for 'an SSD, a SATA cable, and a screwdriver'. And I got it there and then, even the screwdriver. Awesome. I screwed it in myself, copied all my main drive files across, and I was good to go. Of course, if you're not happy doing that yourself, then IS will sort it all out for you, no problems. If you need something that the company doesn't have (say, a book off Amazon, or you need some specifications printing off & bound), then everyone has a expense limit of £100 that you can use without any sign-off needed from your managers. If you need a company credit card for whatever reason, then you can get it. This freedom extends to working hours and holiday; you're expected to be in the office 11am-3pm each day, but outside those times you can work whenever you want. If you need a half-day holiday on a days notice, or even the same day, then you'll get it, unless there's a good reason you're needed that day. If you need to work from home for a day or so for whatever reason, then you can. If it's reasonable, then it's allowed. Trust issues? A lot of trust, and a lot of leeway, is given to all the people in Red Gate. Everyone is expected to work hard, do their jobs to the best of their ability, and there will be a minimum of bureaucratic obstacles that stop you doing your work. What happens if you abuse this trust? Well, an example is company trip expenses. You're free to expense what you like; food, drink, transport, etc, but if you expenses are not reasonable, then you will never travel with the company again. Simple as that. Everyone knows when they're abusing the system, so simply don't do it. Along with reasonableness, another phrase used is 'Don't be a ***'. If you act like a ***, and abuse any of the trust placed in you, even if you're the best tester, salesperson, dev, or manager in the company, then you won't be a part of the company any more. From what I know about other companies, employee trust is highly variable between companies, all the way up to CCTV trained on employee's monitors. As a dev, I want to produce well-written & useful code that solves people's problems. Being able to get whatever I need - install whatever tools I need, get time off when I need to, obtain reference books within a day - all let me do my job, and so let Red Gate help other people do their own jobs through the tools we produce. Plus, I don't think I would like working for a company that doesn't allow admin access to your own machine and blocks Facebook!

    Read the article

  • Flash framerate reliability

    - by Tim Cooper
    I am working in Flash and a few things have been brought to my attention. Below is some code I have some questions on: addEventListener(Event.ENTER_FRAME, function(e:Event):void { if (KEY_RIGHT) { // Move character right } // Etc. }); stage.addEventListener(KeyboardEvent.KEY_DOWN, function(e:KeyboardEvent):void { // Report key which is down }); stage.addEventListener(KeyboardEvent.KEY_UP, function(e:KeyboardEvent):void { // Report key which is up }); I have the project configured so that it has a framerate of 60 FPS. The two questions I have on this are: What happens when it is unable to call that function every 1/60 of a second? Is this a way of processing events that need to be limited by time (ex: a ball which needs to travel to the right of the screen from the left in X seconds)? Or should it be done a different way?

    Read the article

  • Subterranean IL: Volatile

    - by Simon Cooper
    This time, we'll be having a look at the volatile. prefix instruction, and one of the differences between volatile in IL and C#. The volatile. prefix volatile is a tricky one, as there's varying levels of documentation on it. From what I can see, it has two effects: It prevents caching of the load or store value; rather than reading or writing to a cached version of the memory location (say, the processor register or cache), it forces the value to be loaded or stored at the 'actual' memory location, so it is then immediately visible to other threads. It forces a memory barrier at the prefixed instruction. This ensures instructions don't get re-ordered around the volatile instruction. This is slightly more complicated than it first seems, and only seems to matter on certain architectures. For more details, Joe Duffy has a blog post going into the details. For this post, I'll be concentrating on the first aspect of volatile. Caching field accesses To demonstrate this, I created a simple multithreaded IL program. It boils down to the following code: .class public Holder { .field public static class Holder holder .field public bool stop .method public static specialname void .cctor() { newobj instance void Holder::.ctor() stsfld class Holder Holder::holder ret }}.method private static void Main() { .entrypoint // Thread t = new Thread(new ThreadStart(DoWork)) // t.Start() // Thread.Sleep(2000) // Console.WriteLine("Stopping thread...") ldsfld class Holder Holder::holder ldc.i4.1 stfld bool Holder::stop call instance void [mscorlib]System.Threading.Thread::Join() ret}.method private static void DoWork() { ldsfld class Holder Holder::holder // while (!Holder.holder.stop) {} DoWork: dup ldfld bool Holder::stop brfalse DoWork pop ret} If you compile and run this code, you'll find that the call to Thread.Join() never returns - the DoWork spinlock is reading a cached version of Holder.stop, which is never being updated with the new value set by the Main method. Adding volatile to the ldfld fixes this: dupvolatile.ldfld bool Holder::stopbrfalse DoWork The volatile ldfld forces the field access to read direct from heap memory, which is then updated by the main thread, rather than using a cached copy. volatile in C# This highlights one of the differences between IL and C#. In IL, volatile only applies to the prefixed instruction, whereas in C#, volatile is specified on a field to indicate that all accesses to that field should be volatile (interestingly, there's no mention of the 'no caching' aspect of volatile in the C# spec; it only focuses on the memory barrier aspect). Furthermore, this information needs to be stored within the assembly somehow, as such a field might be accessed directly from outside the assembly, but there's no concept of a 'volatile field' in IL! How this information is stored with the field will be the subject of my next post.

    Read the article

  • Anatomy of a .NET Assembly - CLR metadata 1

    - by Simon Cooper
    Before we look at the bytes comprising the CLR-specific data inside an assembly, we first need to understand the logical format of the metadata (For this post I only be looking at simple pure-IL assemblies; mixed-mode assemblies & other things complicates things quite a bit). Metadata streams Most of the CLR-specific data inside an assembly is inside one of 5 streams, which are analogous to the sections in a PE file. The name of each section in a PE file starts with a ., and the name of each stream in the CLR metadata starts with a #. All but one of the streams are heaps, which store unstructured binary data. The predefined streams are: #~ Also called the metadata stream, this stream stores all the information on the types, methods, fields, properties and events in the assembly. Unlike the other streams, the metadata stream has predefined contents & structure. #Strings This heap is where all the namespace, type & member names are stored. It is referenced extensively from the #~ stream, as we'll be looking at later. #US Also known as the user string heap, this stream stores all the strings used in code directly. All the strings you embed in your source code end up in here. This stream is only referenced from method bodies. #GUID This heap exclusively stores GUIDs used throughout the assembly. #Blob This heap is for storing pure binary data - method signatures, generic instantiations, that sort of thing. Items inside the heaps (#Strings, #US, #GUID and #Blob) are indexed using a simple binary offset from the start of the heap. At that offset is a coded integer giving the length of that item, then the item's bytes immediately follow. The #GUID stream is slightly different, in that GUIDs are all 16 bytes long, so a length isn't required. Metadata tables The #~ stream contains all the assembly metadata. The metadata is organised into 45 tables, which are binary arrays of predefined structures containing information on various aspects of the metadata. Each entry in a table is called a row, and the rows are simply concatentated together in the file on disk. For example, each row in the TypeRef table contains: A reference to where the type is defined (most of the time, a row in the AssemblyRef table). An offset into the #Strings heap with the name of the type An offset into the #Strings heap with the namespace of the type. in that order. The important tables are (with their table number in hex): 0x2: TypeDef 0x4: FieldDef 0x6: MethodDef 0x14: EventDef 0x17: PropertyDef Contains basic information on all the types, fields, methods, events and properties defined in the assembly. 0x1: TypeRef The details of all the referenced types defined in other assemblies. 0xa: MemberRef The details of all the referenced members of types defined in other assemblies. 0x9: InterfaceImpl Links the types defined in the assembly with the interfaces that type implements. 0xc: CustomAttribute Contains information on all the attributes applied to elements in this assembly, from method parameters to the assembly itself. 0x18: MethodSemantics Links properties and events with the methods that comprise the get/set or add/remove methods of the property or method. 0x1b: TypeSpec 0x2b: MethodSpec These tables provide instantiations of generic types and methods for each usage within the assembly. There are several ways to reference a single row within a table. The simplest is to simply specify the 1-based row index (RID). The indexes are 1-based so a value of 0 can represent 'null'. In this case, which table the row index refers to is inferred from the context. If the table can't be determined from the context, then a particular row is specified using a token. This is a 4-byte value with the most significant byte specifying the table, and the other 3 specifying the 1-based RID within that table. This is generally how a metadata table row is referenced from the instruction stream in method bodies. The third way is to use a coded token, which we will look at in the next post. So, back to the bytes Now we've got a rough idea of how the metadata is logically arranged, we can now look at the bytes comprising the start of the CLR data within an assembly: The first 8 bytes of the .text section are used by the CLR loader stub. After that, the CLR-specific data starts with the CLI header. I've highlighted the important bytes in the diagram. In order, they are: The size of the header. As the header is a fixed size, this is always 0x48. The CLR major version. This is always 2, even for .NET 4 assemblies. The CLR minor version. This is always 5, even for .NET 4 assemblies, and seems to be ignored by the runtime. The RVA and size of the metadata header. In the diagram, the RVA 0x20e4 corresponds to the file offset 0x2e4 Various flags specifying if this assembly is pure-IL, whether it is strong name signed, and whether it should be run as 32-bit (this is how the CLR differentiates between x86 and AnyCPU assemblies). A token pointing to the entrypoint of the assembly. In this case, 06 (the last byte) refers to the MethodDef table, and 01 00 00 refers to to the first row in that table. (after a gap) RVA of the strong name signature hash, which comes straight after the CLI header. The RVA 0x2050 corresponds to file offset 0x250. The rest of the CLI header is mainly used in mixed-mode assemblies, and so is zeroed in this pure-IL assembly. After the CLI header comes the strong name hash, which is a SHA-1 hash of the assembly using the strong name key. After that comes the bodies of all the methods in the assembly concatentated together. Each method body starts off with a header, which I'll be looking at later. As you can see, this is a very small assembly with only 2 methods (an instance constructor and a Main method). After that, near the end of the .text section, comes the metadata, containing a metadata header and the 5 streams discussed above. We'll be looking at this in the next post. Conclusion The CLI header data doesn't have much to it, but we've covered some concepts that will be important in later posts - the logical structure of the CLR metadata and the overall layout of CLR data within the .text section. Next, I'll have a look at the contents of the #~ stream, and how the table data is arranged on disk.

    Read the article

  • Inside Red Gate - Experimenting In Public

    - by Simon Cooper
    Over the next few weeks, we'll be performing experiments on SmartAssembly to confirm or refute various hypotheses we have about how people use the product, what is stopping them from using it to its full extent, and what we can change to make it more useful and easier to use. Some of these experiments can be done within the team, some within Red Gate, and some need to be done on external users. External testing Some external testing can be done by standard usability tests and surveys, however, there are some hypotheses that can only be tested by building a version of SmartAssembly with some things in the UI or implementation changed. We'll then be able to look at how the experimental build is used compared to the 'mainline' build, which forms our baseline or control group, and use this data to confirm or refute the relevant hypotheses. However, there are several issues we need to consider before running experiments using separate builds: Ideally, the user wouldn't know they're running an experimental SmartAssembly. We don't want users to use the experimental build like it's an experimental build, we want them to use it like it's the real mainline build. Only then will we get valid, useful, and informative data concerning our hypotheses. There's no point running the experiments if we can't find out what happens after the download. To confirm or refute some of our hypotheses, we need to find out how the tool is used once it is installed. Fortunately, we've applied feature usage reporting to the SmartAssembly codebase itself to provide us with that information. Of course, this then makes the experimental data conditional on the user agreeing to send that data back to us in the first place. Unfortunately, even though this does limit the amount of useful data we'll be getting back, and possibly skew the data, there's not much we can do about this; we don't collect feature usage data without the user's consent. Looks like we'll simply have to live with this. What if the user tries to buy the experiment? This is something that isn't really covered by the Lean Startup book; how do you support users who give you money for an experiment? If the experiment is a new feature, and the user buys a license for SmartAssembly based on that feature, then what do we do if we later decide to pivot & scrap that feature? We've either got to spend time and money bringing that feature up to production quality and into the mainline anyway, or we've got disgruntled customers. Either way is bad. Again, there's not really any good solution to this. Similarly, what if we've removed some features for an experiment and a potential new user downloads the experimental build? (As I said above, there's no indication the build is an experimental build, as we want to see what users really do with it). The crucial feature they need is missing, causing a bad trial experience, a lost potential customer, and a lost chance to help the customer with their problem. Again, this is something not really covered by the Lean Startup book, and something that doesn't have a good solution. So, some tricky issues there, not all of them with nice easy answers. Turns out the practicalities of running Lean Startup experiments are more complicated than they first seem!

    Read the article

  • Subterranean IL: Exception handler semantics

    - by Simon Cooper
    In my blog posts on fault and filter exception handlers, I said that the same behaviour could be replicated using normal catch blocks. Well, that isn't entirely true... Changing the handler semantics Consider the following: .try { .try { .try { newobj instance void [mscorlib]System.Exception::.ctor() // IL for: // e.Data.Add("DictKey", true) throw } fault { ldstr "1: Fault handler" call void [mscorlib]System.Console::WriteLine(string) endfault } } filter { ldstr "2a: Filter logic" call void [mscorlib]System.Console::WriteLine(string) // IL for: // (bool)((Exception)e).Data["DictKey"] endfilter }{ ldstr "2b: Filter handler" call void [mscorlib]System.Console::WriteLine(string) leave.s Return } } catch object { ldstr "3: Catch handler" call void [mscorlib]System.Console::WriteLine(string) leave.s Return } Return: // rest of method If the filter handler is engaged (true is inserted into the exception dictionary) then the filter handler gets engaged, and the following gets printed to the console: 2a: Filter logic 1: Fault handler 2b: Filter handler and if the filter handler isn't engaged, then the following is printed: 2a:Filter logic 1: Fault handler 3: Catch handler Filter handler execution The filter handler is executed first. Hmm, ok. Well, what happens if we replaced the fault block with the C# equivalent (with the exception dictionary value set to false)? .try { // throw exception } catch object { ldstr "1: Fault handler" call void [mscorlib]System.Console::WriteLine(string) rethrow } we get this: 1: Fault handler 2a: Filter logic 3: Catch handler The fault handler is executed first, instead of the filter block. Eh? This change in behaviour is due to the way the CLR searches for exception handlers. When an exception is thrown, the CLR stops execution of the thread, and searches up the stack for an exception handler that can handle the exception and stop it propagating further - catch or filter handlers. It checks the type clause of catch clauses, and executes the code in filter blocks to see if the filter can handle the exception. When the CLR finds a valid handler, it saves the handler's location, then goes back to where the exception was thrown and executes fault and finally blocks between there and the handler location, discarding stack frames in the process, until it reaches the handler. So? By replacing a fault with a catch, we have changed the semantics of when the filter code is executed; by using a rethrow instruction, we've split up the exception handler search into two - one search to find the first catch, then a second when the rethrow instruction is encountered. This is only really obvious when mixing C# exception handlers with fault or filter handlers, so this doesn't affect code written only in C#. However it could cause some subtle and hard-to-debug effects with object initialization and ordering when using and calling code written in a language that can compile fault and filter handlers.

    Read the article

  • Developing Schema Compare for Oracle (Part 2): Dependencies

    - by Simon Cooper
    In developing Schema Compare for Oracle, one of the issues we came across was the size of the databases. As detailed in my last blog post, we had to allow schema pre-filtering due to the number of objects in a standard Oracle database. Unfortunately, this leads to some quite tricky situations regarding object dependencies. This post explains how we deal with these dependencies. 1. Cross-schema dependencies Say, in the following database, you're populating SchemaA, and synchronizing SchemaA.Table1: SOURCE   TARGET CREATE TABLE SchemaA.Table1 ( Col1 NUMBER REFERENCES SchemaB.Table1(Col1));   CREATE TABLE SchemaA.Table1 ( Col1 VARCHAR2(100) REFERENCES SchemaB.Table1(Col1)); CREATE TABLE SchemaB.Table1 ( Col1 NUMBER PRIMARY KEY);   CREATE TABLE SchemaB.Table1 ( Col1 VARCHAR2(100) PRIMARY KEY); We need to do a rebuild of SchemaA.Table1 to change Col1 from a VARCHAR2(100) to a NUMBER. This consists of: Creating a table with the new schema Inserting data from the old table to the new table, with appropriate conversion functions (in this case, TO_NUMBER) Dropping the old table Rename new table to same name as old table Unfortunately, in this situation, the rebuild will fail at step 1, as we're trying to create a NUMBER column with a foreign key reference to a VARCHAR2(100) column. As we're only populating SchemaA, the naive implementation of the object population prefiltering (sticking a WHERE owner = 'SCHEMAA' on all the data dictionary queries) will generate an incorrect sync script. What we actually have to do is: Drop foreign key constraint on SchemaA.Table1 Rebuild SchemaB.Table1 Rebuild SchemaA.Table1, adding the foreign key constraint to the new table This means that in order to generate a correct synchronization script for SchemaA.Table1 we have to know what SchemaB.Table1 is, and that it also needs to be rebuilt to successfully rebuild SchemaA.Table1. SchemaB isn't the schema that the user wants to synchronize, but we still have to load the table and column information for SchemaB.Table1 the same way as any table in SchemaA. Fortunately, Oracle provides (mostly) complete dependency information in the dictionary views. Before we actually read the information on all the tables and columns in the database, we can get dependency information on all the objects that are either pointed at by objects in the schemas we’re populating, or point to objects in the schemas we’re populating (think about what would happen if SchemaB was being explicitly populated instead), with a suitable query on all_constraints (for foreign key relationships) and all_dependencies (for most other types of dependencies eg a function using another function). The extra objects found can then be included in the actual object population, and the sync wizard then has enough information to figure out the right thing to do when we get to actually synchronize the objects. Unfortunately, this isn’t enough. 2. Dependency chains The solution above will only get the immediate dependencies of objects in populated schemas. What if there’s a chain of dependencies? A.tbl1 -> B.tbl1 -> C.tbl1 -> D.tbl1 If we’re only populating SchemaA, the implementation above will only include B.tbl1 in the dependent objects list, whereas we might need to know about C.tbl1 and D.tbl1 as well, in order to ensure a modification on A.tbl1 can succeed. What we actually need is a graph traversal on the dependency graph that all_dependencies represents. Fortunately, we don’t have to read all the database dependency information from the server and run the graph traversal on the client computer, as Oracle provides a method of doing this in SQL – CONNECT BY. So, we can put all the dependencies we want to include together in big bag with UNION ALL, then run a SELECT ... CONNECT BY on it, starting with objects in the schema we’re populating. We should end up with all the objects that might be affected by modifications in the initial schema we’re populating. Good solution? Well, no. For one thing, it’s sloooooow. all_dependencies, on my test databases, has got over 110,000 rows in it, and the entire query, for which Oracle was creating a temporary table to hold the big bag of graph edges, was often taking upwards of two minutes. This is too long, and would only get worse for large databases. But it had some more fundamental problems than just performance. 3. Comparison dependencies Consider the following schema: SOURCE   TARGET CREATE TABLE SchemaA.Table1 ( Col1 NUMBER REFERENCES SchemaB.Table1(col1));   CREATE TABLE SchemaA.Table1 ( Col1 VARCHAR2(100)); CREATE TABLE SchemaB.Table1 ( Col1 NUMBER PRIMARY KEY);   CREATE TABLE SchemaB.Table1 ( Col1 VARCHAR2(100)); What will happen if we used the dependency algorithm above on the source & target database? Well, SchemaA.Table1 has a foreign key reference to SchemaB.Table1, so that will be included in the source database population. On the target, SchemaA.Table1 has no such reference. Therefore SchemaB.Table1 will not be included in the target database population. In the resulting comparison of the two objects models, what you will end up with is: SOURCE  TARGET SchemaA.Table1 -> SchemaA.Table1 SchemaB.Table1 -> (no object exists) When this comparison is synchronized, we will see that SchemaB.Table1 does not exist, so we will try the following sequence of actions: Create SchemaB.Table1 Rebuild SchemaA.Table1, with foreign key to SchemaB.Table1 Oops. Because the dependencies are only followed within a single database, we’ve tried to create an object that already exists. To fix this we can include any objects found as dependencies in the source or target databases in the object population of both databases. SchemaB.Table1 will then be included in the target database population, and we won’t try and create objects that already exist. All good? Well, consider the following schema (again, only explicitly populating SchemaA, and synchronizing SchemaA.Table1): SOURCE   TARGET CREATE TABLE SchemaA.Table1 ( Col1 NUMBER REFERENCES SchemaB.Table1(col1));   CREATE TABLE SchemaA.Table1 ( Col1 VARCHAR2(100)); CREATE TABLE SchemaB.Table1 ( Col1 NUMBER PRIMARY KEY);   CREATE TABLE SchemaB.Table1 ( Col1 VARCHAR2(100) PRIMARY KEY); CREATE TABLE SchemaC.Table1 ( Col1 NUMBER);   CREATE TABLE SchemaC.Table1 ( Col1 VARCHAR2(100) REFERENCES SchemaB.Table1); Although we’re now including SchemaB.Table1 on both sides of the comparison, there’s a third table (SchemaC.Table1) that we don’t know about that will cause the rebuild of SchemaB.Table1 to fail if we try and synchronize SchemaA.Table1. That’s because we’re only running the dependency query on the schemas we’re explicitly populating; to solve this issue, we would have to run the dependency query again, but this time starting the graph traversal from the objects found in the other database. Furthermore, this dependency chain could be arbitrarily extended.This leads us to the following algorithm for finding all the dependencies of a comparison: Find initial dependencies of schemas the user has selected to compare on the source and target Include these objects in both the source and target object populations Run the dependency query on the source, starting with the objects found as dependents on the target, and vice versa Repeat 2 & 3 until no more objects are found For the schema above, this will result in the following sequence of actions: Find initial dependenciesSchemaA.Table1 -> SchemaB.Table1 found on sourceNo objects found on target Include objects in both source and targetSchemaB.Table1 included in source and target Run dependency query, starting with found objectsNo objects to start with on sourceSchemaB.Table1 -> SchemaC.Table1 found on target Include objects in both source and targetSchemaC.Table1 included in source and target Run dependency query on found objectsNo objects found in sourceNo objects to start with in target Stop This will ensure that we include all the necessary objects to make any synchronization work. However, there is still the issue of query performance; the CONNECT BY on the entire database dependency graph is still too slow. After much sitting down and drawing complicated diagrams, we decided to move the graph traversal algorithm from the server onto the client (which turned out to run much faster on the client than on the server); and to ensure we don’t read the entire dependency graph onto the client we also pull the graph across in bits – we start off with dependency edges involving schemas selected for explicit population, and whenever the graph traversal comes across a dependency reference to a schema we don’t yet know about a thunk is hit that pulls in the dependency information for that schema from the database. We continue passing more dependent objects back and forth between the source and target until no more dependency references are found. This gives us the list of all the extra objects to populate in the source and target, and object population can then proceed. 4. Object blacklists and fast dependencies When we tested this solution, we were puzzled in that in some of our databases most of the system schemas (WMSYS, ORDSYS, EXFSYS, XDB, etc) were being pulled in, and this was increasing the database registration and comparison time quite significantly. After debugging, we discovered that the culprits were database tables that used one of the Oracle PL/SQL types (eg the SDO_GEOMETRY spatial type). These were creating a dependency chain from the database tables we were populating to the system schemas, and hence pulling in most of the system objects in that schema. To solve this we introduced blacklists of objects we wouldn’t follow any dependency chain through. As well as the Oracle-supplied PL/SQL types (MDSYS.SDO_GEOMETRY, ORDSYS.SI_COLOR, among others) we also decided to blacklist the entire PUBLIC and SYS schemas, as any references to those would likely lead to a blow up in the dependency graph that would massively increase the database registration time, and could result in the client running out of memory. Even with these improvements, each dependency query was taking upwards of a minute. We discovered from Oracle execution plans that there were some columns, with dependency information we required, that were querying system tables with no indexes on them! To cut a long story short, running the following query: SELECT * FROM all_tab_cols WHERE data_type_owner = ‘XDB’; results in a full table scan of the SYS.COL$ system table! This single clause was responsible for over half the execution time of the dependency query. Hence, the ‘Ignore slow dependencies’ option was born – not querying this and a couple of similar clauses to drastically speed up the dependency query execution time, at the expense of producing incorrect sync scripts in rare edge cases. Needless to say, along with the sync script action ordering, the dependency code in the database registration is one of the most complicated and most rewritten parts of the Schema Compare for Oracle engine. The beta of Schema Compare for Oracle is out now; if you find a bug in it, please do tell us so we can get it fixed!

    Read the article

  • Developing Schema Compare for Oracle (Part 1)

    - by Simon Cooper
    SQL Compare is one of Red Gate's most successful SQL Server tools; it allows developers and DBAs to compare and synchronize the contents of their databases. Although similar tools exist for Oracle, they are quite noticeably lacking in the usability and stability that SQL Compare is known for in the SQL Server world. We could see a real need for a usable schema comparison tools for Oracle, and so the Schema Compare for Oracle project was born. Over the next few weeks, as we come up to release of v1, I'll be doing a series of posts on the development of Schema Compare for Oracle. For the first post, I thought I would start with the main pitfalls that we stumbled across when developing the product, especially from a SQL Server background. 1. Schemas and Databases The most obvious difference is that the concept of a 'database' is quite different between Oracle and SQL Server. On SQL Server, one server instance has multiple databases, each with separate schemas. There is typically little communication between separate databases, and most databases are no more than about 1000-2000 objects. This means SQL Compare can register an entire database in a reasonable amount of time, and cross-database dependencies probably won't be an issue. It is a quite different scene under Oracle, however. The terms 'database' and 'instance' are used interchangeably, (although technically 'database' refers to the datafiles on disk, and 'instance' the running Oracle process that reads & writes to the database), and a database is a single conceptual entity. This immediately presents problems, as it is infeasible to register an entire database as we do in SQL Compare; in my Oracle install, using the standard recommended options, there are 63975 system objects. If we tried to register all those, not only would it take hours, but the client would probably run out of memory before we finished. As a result, we had to allow people to specify what schemas they wanted to register. This decision had quite a few knock-on effects for the design, which I will cover in a future post. 2. Connecting to Oracle The next obvious difference is in actually connecting to Oracle – in SQL Server, you can specify a server and database, and off you go. On Oracle things are slightly more complicated. SIDs, Service Names, and TNS A database (the files on disk) must have a unique identifier for the databases on the system, called the SID. It also has a global database name, which consists of a name (which doesn't have to match the SID) and a domain. Alternatively, you can identify a database using a service name, which normally has a 1-to-1 relationship with instances, but may not if, for example, using RAC (Real Application Clusters) for redundancy and failover. You specify the computer and instance you want to connect to using TNS (Transparent Network Substrate). The user-visible parts are a config file (tnsnames.ora) on the client machine that specifies how to connect to an instance. For example, the entry for one of my test instances is: SC_11GDB1 = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = simonctest)(PORT = 1521)) ) (CONNECT_DATA = (SID = 11gR1db1) ) ) This gives the hostname, port, and SID of the instance I want to connect to, and associates it with a name (SC_11GDB1). The tnsnames syntax also allows you to specify failover, multiple descriptions and address lists, and client load balancing. You can then specify this TNS identifier as the data source in a connection string. Although using ODP.NET (the .NET dlls provided by Oracle) was fine for internal prototype builds, once we released the EAP we discovered that this simply wasn't an acceptable solution for installs on other people's machines. Due to .NET assembly strong naming, users had to have installed on their machines the exact same version of the ODP.NET dlls as we had on our build server. We couldn't ship the ODP.NET dlls with our installer as the Oracle license agreement prohibited this, and we didn't want to force users to install another Oracle client just so they can run our program. To be able to list the TNS entries in the connection dialog, we also had to locate and parse the tnsnames.ora file, which was complicated by users with several Oracle client installs and intricate TNS entries. After much swearing at our computers, we eventually decided to use a third party Oracle connection library from Devart that we could ship with our program; this could use whatever client version was installed, parse the TNS entries for us, and also had the nice feature of being able to connect to an Oracle server without having any client installed at all. Unfortunately, their current license agreement prevents us from shipping an Oracle SDK, but that's a bridge we'll cross when we get to it. 3. Running synchronization scripts The most important difference is that in Oracle, DDL is non-transactional; you cannot rollback DDL statements like you can on SQL Server. Although we considered various solutions to this, including using the flashback archive or recycle bin, or generating an undo script, no reliable method of completely undoing a half-executed sync script has yet been found; so in this case we simply have to trust that the DBA or developer will check and verify the script before running it. However, before we got to that stage, we had to get the scripts to run in the first place... To run a synchronization script from SQL Compare we essentially pass the script over to the SqlCommand.ExecuteNonQuery method. However, when we tried to do the same for an OracleConnection we got a very strange error – 'ORA-00911: invalid character', even when running the most basic CREATE TABLE command. After much hair-pulling and Googling, we discovered that Oracle has got some very strange behaviour with semicolons at the end of statements. To understand what's going on, we need to take a quick foray into SQL and PL/SQL. PL/SQL is not T-SQL In SQL Server, T-SQL is the language used to interface with the database. It has DDL, DML, control flow, and many other nice features (like Turing-completeness) that you can mix and match in the same script. In Oracle, DDL SQL and PL/SQL are two completely separate languages, with different syntax, different datatypes and different execution engines within the instance. Oracle SQL is much more like 'pure' ANSI SQL, with no state, no control flow, and only the basic DML commands. PL/SQL is the Turing-complete language, but can only do DML and DCL (i.e. BEGIN TRANSATION commands). Any DDL or SQL commands that aren't recognised by the PL/SQL engine have to be passed back to the SQL engine via an EXECUTE IMMEDIATE command. In PL/SQL, a semicolons is a valid token used to delimit the end of a statement. In SQL, a semicolon is not a valid token (even though the Oracle documentation gives them at the end of the syntax diagrams) . When you execute the command CREATE TABLE table1 (COL1 NUMBER); in SQL*Plus the semicolon on the end is a command to SQL*Plus to execute the preceding statement on the server; it strips off the semicolon before passing it on. SQL Developer does a similar thing. When executing a PL/SQL block, however, the syntax is like so: BEGIN INSERT INTO table1 VALUES (1); INSERT INTO table1 VALUES (2); END; / In this case, the semicolon is accepted by the PL/SQL engine as a statement delimiter, and instead the / is the command to SQL*Plus to execute the current block. This explains the ORA-00911 error we got when trying to run the CREATE TABLE command – the server is complaining about the semicolon on the end. This also means that there is no SQL syntax to execute more than one DDL command in the same OracleCommand. Therefore, we would have to do a round-trip to the server for every command we want to execute. Obviously, this would cause lots of network traffic and be very slow on slow or congested networks. Our first attempt at a solution was to wrap every SQL statement (without semicolon) inside an EXECUTE IMMEDIATE command in a PL/SQL block and pass that to the server to execute. One downside of this solution is that we get no feedback as to how the script execution is going; we're currently evaluating better solutions to this thorny issue. Next up: Dependencies; how we solved the problem of being unable to register the entire database, and the knock-on effects to the whole product.

    Read the article

  • Subterranean IL: Fault exception handlers

    - by Simon Cooper
    Fault event handlers are one of the two handler types that aren't available in C#. It behaves exactly like a finally, except it is only run if control flow exits the block due to an exception being thrown. As an example, take the following method: .method public static void FaultExample(bool throwException) { .try { ldstr "Entering try block" call void [mscorlib]System.Console::WriteLine(string) ldarg.0 brfalse.s NormalReturn ThrowException: ldstr "Throwing exception" call void [mscorlib]System.Console::WriteLine(string) newobj void [mscorlib]System.Exception::.ctor() throw NormalReturn: ldstr "Leaving try block" call void [mscorlib]System.Console::WriteLine(string) leave.s Return } fault { ldstr "Fault handler" call void [mscorlib]System.Console::WriteLine(string) endfault } Return: ldstr "Returning from method" call void [mscorlib]System.Console::WriteLine(string) ret } If we pass true to this method the following gets printed: Entering try block Throwing exception Fault handler and the exception gets passed up the call stack. So, the exception gets thrown, the fault handler gets run, and the exception propagates up the stack afterwards in the normal way. If we pass false, we get the following: Entering try block Leaving try block Returning from method Because we are leaving the .try using a leave.s instruction, and not throwing an exception, the fault handler does not get called. Fault handlers and C# So why were these not included in C#? It seems a pretty simple feature; one extra keyword that compiles in exactly the same way, and with the same semantics, as a finally handler. If you think about it, the same behaviour can be replicated using a normal catch block: try { throw new Exception(); } catch { // fault code goes here throw; } The catch block only gets run if an exception is thrown, and the exception gets rethrown and propagates up the call stack afterwards; exactly like a fault block. The only complications that occur is when you want to add a fault handler to a try block with existing catch handlers. Then, you either have to wrap the try in another try: try { try { // ... } catch (DirectoryNotFoundException) { // ... // leave.s as normal... } catch (IOException) { // ... throw; } } catch { // fault logic throw; } or separate out the fault logic into another method and call that from the appropriate handlers: try { // ... } catch (DirectoryNotFoundException ) { // ... } catch (IOException ioe) { // ... HandleFaultLogic(); throw; } catch (Exception e) { HandleFaultLogic(); throw; } To be fair, the number of times that I would have found a fault handler useful is minimal. Still, it's quite annoying knowing such functionality exists, but you're not able to access it from C#. Fortunately, there are some easy workarounds one can use instead. Next time: filter handlers.

    Read the article

  • .NET vs Windows 8: Rematch!

    - by Simon Cooper
    So, although you will be able to use your existing .NET skills to develop Metro apps, it turns out Microsoft are limiting Visual Studio 2011 Express to Metro-only. From the Express website: Visual Studio 11 Express for Windows 8 provides tools for Metro style app development. To create desktop apps, you need to use Visual Studio 11 Professional, or higher. Oh dear. To develop any sort of non-Metro application, you will need to pay for at least VS Professional. I suspect Microsoft (or at least, certain groups within Microsoft) have a very explicit strategy in mind. By making VS Express Metro-only, developers who don't want to pay for Professional will be forced to make their simple one-shot or open-source application in Metro. This increases the number of applications available for Windows 8 and Windows mobile devices, which in turn make those platforms more attractive for consumers. When you use the free VS 11 Express, instead of paying Microsoft, you provide them a service by making applications for Metro, which in turn makes Microsoft's mobile offering more attractive to consumers, increasing their market share. Of course, it remains to be seen if developers forced to jump onto the Metro bandwagon will simply jump ship to Android or iOS instead. At least, that's what I think is going on. With Microsoft, who really knows?

    Read the article

  • Subterranean IL: Generics and array covariance

    - by Simon Cooper
    Arrays in .NET are curious beasts. They are the only built-in collection types in the CLR, and SZ-arrays (single dimension, zero-indexed) have their own commands and IL syntax. One of their stranger properties is they have a kind of built-in covariance long before generic variance was added in .NET 4. However, this causes a subtle but important problem with generics. First of all, we need to briefly recap on array covariance. SZ-array covariance To demonstrate, I'll tweak the classes I introduced in my previous posts: public class IncrementableClass { public int Value; public virtual void Increment(int incrementBy) { Value += incrementBy; } } public class IncrementableClassx2 : IncrementableClass { public override void Increment(int incrementBy) { base.Increment(incrementBy); base.Increment(incrementBy); } } In the CLR, SZ-arrays of reference types are implicitly convertible to arrays of the element's supertypes, all the way up to object (note that this does not apply to value types). That is, an instance of IncrementableClassx2[] can be used wherever a IncrementableClass[] or object[] is required. When an SZ-array could be used in this fashion, a run-time type check is performed when you try to insert an object into the array to make sure you're not trying to insert an instance of IncrementableClass into an IncrementableClassx2[]. This check means that the following code will compile fine but will fail at run-time: IncrementableClass[] array = new IncrementableClassx2[1]; array[0] = new IncrementableClass(); // throws ArrayTypeMismatchException These checks are enforced by the various stelem* and ldelem* il instructions in such a way as to ensure you can't insert a IncrementableClass into a IncrementableClassx2[]. For the rest of this post, however, I'm going to concentrate on the ldelema instruction. ldelema This instruction pops the array index (int32) and array reference (O) off the stack, and pushes a pointer (&) to the corresponding array element. However, unlike the ldelem instruction, the instruction's type argument must match the run-time array type exactly. This is because, once you've got a managed pointer, you can use that pointer to both load and store values in that array element using the ldind* and stind* (load/store indirect) instructions. As the same pointer can be used for both input and output to the array, the type argument to ldelema must be invariant. At the time, this was a perfectly reasonable restriction, and maintained array type-safety within managed code. However, along came generics, and with it the constrained callvirt instruction. So, what happens when we combine array covariance and constrained callvirt? .method public static void CallIncrementArrayValue() { // IncrementableClassx2[] arr = new IncrementableClassx2[1] ldc.i4.1 newarr IncrementableClassx2 // arr[0] = new IncrementableClassx2(); dup newobj instance void IncrementableClassx2::.ctor() ldc.i4.0 stelem.ref // IncrementArrayValue<IncrementableClass>(arr, 0) // here, we're treating an IncrementableClassx2[] as IncrementableClass[] dup ldc.i4.0 call void IncrementArrayValue<class IncrementableClass>(!!0[],int32) // ... ret } .method public static void IncrementArrayValue<(IncrementableClass) T>( !!T[] arr, int32 index) { // arr[index].Increment(1) ldarg.0 ldarg.1 ldelema !!T ldc.i4.1 constrained. !!T callvirt instance void IIncrementable::Increment(int32) ret } And the result: Unhandled Exception: System.ArrayTypeMismatchException: Attempted to access an element as a type incompatible with the array. at IncrementArrayValue[T](T[] arr, Int32 index) at CallIncrementArrayValue() Hmm. We're instantiating the generic method as IncrementArrayValue<IncrementableClass>, but passing in an IncrementableClassx2[], hence the ldelema instruction is failing as it's expecting an IncrementableClass[]. On features and feature conflicts What we've got here is a conflict between existing behaviour (ldelema ensuring type safety on covariant arrays) and new behaviour (managed pointers to object references used for every constrained callvirt on generic type instances). And, although this is an edge case, there is no general workaround. The generic method could be hidden behind several layers of assemblies, wrappers and interfaces that make it a requirement to use array covariance when calling the generic method. Furthermore, this will only fail at runtime, whereas compile-time safety is what generics were designed for! The solution is the readonly. prefix instruction. This modifies the ldelema instruction to ignore the exact type check for arrays of reference types, and so it lets us take the address of array elements using a covariant type to the actual run-time type of the array: .method public static void IncrementArrayValue<(IncrementableClass) T>( !!T[] arr, int32 index) { // arr[index].Increment(1) ldarg.0 ldarg.1 readonly. ldelema !!T ldc.i4.1 constrained. !!T callvirt instance void IIncrementable::Increment(int32) ret } But what about type safety? In return for ignoring the type check, the resulting controlled mutability pointer can only be used in the following situations: As the object parameter to ldfld, ldflda, stfld, call and constrained callvirt instructions As the pointer parameter to ldobj or ldind* As the source parameter to cpobj In other words, the only operations allowed are those that read from the pointer; stind* and similar that alter the pointer itself are banned. This ensures that the array element we're pointing to won't be changed to anything untoward, and so type safety within the array is maintained. This is a typical example of the maxim that whenever you add a feature to a program, you have to consider how that feature interacts with every single one of the existing features. Although an edge case, the readonly. prefix instruction ensures that generics and array covariance work together and that compile-time type safety is maintained. Tune in next time for a look at the .ctor generic type constraint, and what it means.

    Read the article

  • Anatomy of a .NET Assembly - CLR metadata 2

    - by Simon Cooper
    Before we look any further at the CLR metadata, we need a quick diversion to understand how the metadata is actually stored. Encoding table information As an example, we'll have a look at a row in the TypeDef table. According to the spec, each TypeDef consists of the following: Flags specifying various properties of the class, including visibility. The name of the type. The namespace of the type. What type this type extends. The field list of this type. The method list of this type. How is all this data actually represented? Offset & RID encoding Most assemblies don't need to use a 4 byte value to specify heap offsets and RIDs everywhere, however we can't hard-code every offset and RID to be 2 bytes long as there could conceivably be more than 65535 items in a heap or more than 65535 fields or types defined in an assembly. So heap offsets and RIDs are only represented in the full 4 bytes if it is required; in the header information at the top of the #~ stream are 3 bits indicating if the #Strings, #GUID, or #Blob heaps use 2 or 4 bytes (the #US stream is not accessed from metadata), and the rowcount of each table. If the rowcount for a particular table is greater than 65535 then all RIDs referencing that table throughout the metadata use 4 bytes, else only 2 bytes are used. Coded tokens Not every field in a table row references a single predefined table. For example, in the TypeDef extends field, a type can extend another TypeDef (a type in the same assembly), a TypeRef (a type in a different assembly), or a TypeSpec (an instantiation of a generic type). A token would have to be used to let us specify the table along with the RID. Tokens are always 4 bytes long; again, this is rather wasteful of space. Cutting the RID down to 2 bytes would make each token 3 bytes long, which isn't really an optimum size for computers to read from memory or disk. However, every use of a token in the metadata tables can only point to a limited subset of the metadata tables. For the extends field, we only need to be able to specify one of 3 tables, which we can do using 2 bits: 0x0: TypeDef 0x1: TypeRef 0x2: TypeSpec We could therefore compress the 4-byte token that would otherwise be needed into a coded token of type TypeDefOrRef. For each type of coded token, the least significant bits encode the table the token points to, and the rest of the bits encode the RID within that table. We can work out whether each type of coded token needs 2 or 4 bytes to represent it by working out whether the maximum RID of every table that the coded token type can point to will fit in the space available. The space available for the RID depends on the type of coded token; a TypeOrMethodDef coded token only needs 1 bit to specify the table, leaving 15 bits available for the RID before a 4-byte representation is needed, whereas a HasCustomAttribute coded token can point to one of 18 different tables, and so needs 5 bits to specify the table, only leaving 11 bits for the RID before 4 bytes are needed to represent that coded token type. For example, a 2-byte TypeDefOrRef coded token with the value 0x0321 has the following bit pattern: 0 3 2 1 0000 0011 0010 0001 The first two bits specify the table - TypeRef; the other bits specify the RID. Because we've used the first two bits, we've got to shift everything along two bits: 000000 1100 1000 This gives us a RID of 0xc8. If any one of the TypeDef, TypeRef or TypeSpec tables had more than 16383 rows (2^14 - 1), then 4 bytes would need to be used to represent all TypeDefOrRef coded tokens throughout the metadata tables. Lists The third representation we need to consider is 1-to-many references; each TypeDef refers to a list of FieldDef and MethodDef belonging to that type. If we were to specify every FieldDef and MethodDef individually then each TypeDef would be very large and a variable size, which isn't ideal. There is a way of specifying a list of references without explicitly specifying every item; if we order the MethodDef and FieldDef tables by the owning type, then the field list and method list in a TypeDef only have to be a single RID pointing at the first FieldDef or MethodDef belonging to that type; the end of the list can be inferred by the field list and method list RIDs of the next row in the TypeDef table. Going back to the TypeDef If we have a look back at the definition of a TypeDef, we end up with the following reprensentation for each row: Flags - always 4 bytes Name - a #Strings heap offset. Namespace - a #Strings heap offset. Extends - a TypeDefOrRef coded token. FieldList - a single RID to the FieldDef table. MethodList - a single RID to the MethodDef table. So, depending on the number of entries in the heaps and tables within the assembly, the rows in the TypeDef table can be as small as 14 bytes, or as large as 24 bytes. Now we've had a look at how information is encoded within the metadata tables, in the next post we can see how they are arranged on disk.

    Read the article

  • PostSharp, Obfuscation, and IL

    - by Simon Cooper
    Aspect-oriented programming (AOP) is a relatively new programming paradigm. Originating at Xerox PARC in 1994, the paradigm was first made available for general-purpose development as an extension to Java in 2001. From there, it has quickly been adapted for use in all the common languages used today. In the .NET world, one of the primary AOP toolkits is PostSharp. Attributes and AOP Normally, attributes in .NET are entirely a metadata construct. Apart from a few special attributes in the .NET framework, they have no effect whatsoever on how a class or method executes within the CLR. Only by using reflection at runtime can you access any attributes declared on a type or type member. PostSharp changes this. By declaring a custom attribute that derives from PostSharp.Aspects.Aspect, applying it to types and type members, and running the resulting assembly through the PostSharp postprocessor, you can essentially declare 'clever' attributes that change the behaviour of whatever the aspect has been applied to at runtime. A simple example of this is logging. By declaring a TraceAttribute that derives from OnMethodBoundaryAspect, you can automatically log when a method has been executed: public class TraceAttribute : PostSharp.Aspects.OnMethodBoundaryAspect { public override void OnEntry(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Entering {0}.{1}.", method.DeclaringType.FullName, method.Name)); } public override void OnExit(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Leaving {0}.{1}.", method.DeclaringType.FullName, method.Name)); } } [Trace] public void MethodToLog() { ... } Now, whenever MethodToLog is executed, the aspect will automatically log entry and exit, without having to add the logging code to MethodToLog itself. PostSharp Performance Now this does introduce a performance overhead - as you can see, the aspect allows access to the MethodBase of the method the aspect has been applied to. If you were limited to C#, you would be forced to retrieve each MethodBase instance using Type.GetMethod(), matching on the method name and signature. This is slow. Fortunately, PostSharp is not limited to C#. It can use any instruction available in IL. And in IL, you can do some very neat things. Ldtoken C# allows you to get the Type object corresponding to a specific type name using the typeof operator: Type t = typeof(Random); The C# compiler compiles this operator to the following IL: ldtoken [mscorlib]System.Random call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle( valuetype [mscorlib]System.RuntimeTypeHandle) The ldtoken instruction obtains a special handle to a type called a RuntimeTypeHandle, and from that, the Type object can be obtained using GetTypeFromHandle. These are both relatively fast operations - no string lookup is required, only direct assembly and CLR constructs are used. However, a little-known feature is that ldtoken is not just limited to types; it can also get information on methods and fields, encapsulated in a RuntimeMethodHandle or RuntimeFieldHandle: // get a MethodBase for String.EndsWith(string) ldtoken method instance bool [mscorlib]System.String::EndsWith(string) call class [mscorlib]System.Reflection.MethodBase [mscorlib]System.Reflection.MethodBase::GetMethodFromHandle( valuetype [mscorlib]System.RuntimeMethodHandle) // get a FieldInfo for the String.Empty field ldtoken field string [mscorlib]System.String::Empty call class [mscorlib]System.Reflection.FieldInfo [mscorlib]System.Reflection.FieldInfo::GetFieldFromHandle( valuetype [mscorlib]System.RuntimeFieldHandle) These usages of ldtoken aren't usable from C# or VB, and aren't likely to be added anytime soon (Eric Lippert's done a blog post on the possibility of adding infoof, methodof or fieldof operators to C#). However, PostSharp deals directly with IL, and so can use ldtoken to get MethodBase objects quickly and cheaply, without having to resort to string lookups. The kicker However, there are problems. Because ldtoken for methods or fields isn't accessible from C# or VB, it hasn't been as well-tested as ldtoken for types. This has resulted in various obscure bugs in most versions of the CLR when dealing with ldtoken and methods, and specifically, generic methods and methods of generic types. This means that PostSharp was behaving incorrectly, or just plain crashing, when aspects were applied to methods that were generic in some way. So, PostSharp has to work around this. Without using the metadata tokens directly, the only way to get the MethodBase of generic methods is to use reflection: Type.GetMethod(), passing in the method name as a string along with information on the signature. Now, this works fine. It's slower than using ldtoken directly, but it works, and this only has to be done for generic methods. Unfortunately, this poses problems when the assembly is obfuscated. PostSharp and Obfuscation When using ldtoken, obfuscators don't affect how PostSharp operates. Because the ldtoken instruction directly references the type, method or field within the assembly, it is unaffected if the name of the object is changed by an obfuscator. However, the indirect loading used for generic methods was breaking, because that uses the name of the method when the assembly is put through the PostSharp postprocessor to lookup the MethodBase at runtime. If the name then changes, PostSharp can't find it anymore, and the assembly breaks. So, PostSharp needs to know about any changes an obfuscator does to an assembly. The way PostSharp does this is by adding another layer of indirection. When PostSharp obfuscation support is enabled, it includes an extra 'name table' resource in the assembly, consisting of a series of method & type names. When PostSharp needs to lookup a method using reflection, instead of encoding the method name directly, it looks up the method name at a fixed offset inside that name table: MethodBase genericMethod = typeof(ContainingClass).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: get_Prop1 21: set_Prop1 22: DoFoo 23: GetWibble When the assembly is later processed by an obfuscator, the obfuscator can replace all the method and type names within the name table with their new name. That way, the reflection lookups performed by PostSharp will now use the new names, and everything will work as expected: MethodBase genericMethod = typeof(#kGy).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: #kkA 21: #zAb 22: #EF5a 23: #2tg As you can see, this requires direct support by an obfuscator in order to perform these rewrites. Dotfuscator supports it, and now, starting with SmartAssembly 6.6.4, SmartAssembly does too. So, a relatively simple solution to a tricky problem, with some CLR bugs thrown in for good measure. You don't see those every day!

    Read the article

  • Developing Schema Compare for Oracle (Part 5): Query Snapshots

    - by Simon Cooper
    If you've emailed us about a bug you've encountered with the EAP or beta versions of Schema Compare for Oracle, we probably asked you to send us a query snapshot of your databases. Here, I explain what a query snapshot is, and how it helps us fix your bug. Problem 1: Debugging users' bug reports When we started the Schema Compare project, we knew we were going to get problems with users' databases - configurations we hadn't considered, features that weren't installed, unicode issues, wierd dependencies... With SQL Compare, users are generally happy to send us a database backup that we can restore using a single RESTORE DATABASE command on our test servers and immediately reproduce the problem. Oracle, on the other hand, would be a lot more tricky. As Oracle generally has a 1-to-1 mapping between instances and databases, any databases users sent would have to be restored to their own instance. Furthermore, the number of steps required to get a properly working database, and the size of most oracle databases, made it infeasible to ask every customer who came across a bug during our beta program to send us their databases. We also knew that there would be lots of issues with data security that would make it hard to get backups. So we needed an easier way to be able to debug customers issues and sort out what strange schema data Oracle was returning. Problem 2: Test execution time Another issue we knew we would have to solve was the execution time of the tests we would produce for the Schema Compare engine. Our initial prototype showed that querying the data dictionary for schema information was going to be slow (at least 15 seconds per database), and this is generally proportional to the size of the database. If you're running thousands of tests on the same databases, each one registering separate schemas, not only would the tests would take hours and hours to run, but the test servers would be hammered senseless. The solution To solve these, we needed to be able to populate the schema of a database without actually connecting to it. Well, the IDataReader interface is the primary way we read data from an Oracle server. The data dictionary queries we use return their data in terms of simple strings and numbers, which we then process and reconstruct into an object model, and the results of these queries are identical for identical schemas. So, we can record the raw results of the queries once, and then replay these results to construct the same object model as many times as required without needing to actually connect to the original database. This is what query snapshots do. They are binary files containing the raw unprocessed data we get back from the oracle server for all the queries we run on the data dictionary to get schema information. The core of the query snapshot generation takes the results of the IDataReader we get from running queries on Oracle, and passes the row data to a BinaryWriter that writes it straight to a file. The query snapshot can then be replayed to create the same object model; when the results of a specific query is needed by the population code, we can simply read the binary data stored in the file on disk and present it through an IDataReader wrapper. This is far faster than querying the server over the network, and allows us to run tests in a reasonable time. They also allow us to easily debug a customers problem; using a simple snapshot generation program, users can generate a query snapshot that could be sent along with a bug report that we can immediately replay on our machines to let us debug the issue, rather than having to obtain database backups and restore databases to test systems. There are also far fewer problems with data security; query snapshots only contain schema information, which is generally less sensitive than table data. Query snapshots implementation However, actually implementing such a feature did have a couple of 'gotchas' to it. My second blog post detailed the development of the dependencies algorithm we use to ensure we get all the dependencies in the database, and that algorithm uses data from both databases to find all the needed objects - what database you're comparing to affects what objects get populated from both databases. We get information on these additional objects using an appropriate WHERE clause on all the population queries. So, in order to accurately replay the results of querying the live database, the query snapshot needs to be a snapshot of a comparison of two databases, not just populating a single database. Furthermore, although the code population queries (eg querying all_tab_cols to get column information) can simply be passed straight from the IDataReader to the BinaryWriter, we need to hook into and run the live dependencies algorithm while we're creating the snapshot to ensure we get the same WHERE clauses, and the same query results, as if we were populating straight from a live system. We also need to store the results of the dependencies queries themselves, as the resulting dependency graph is stored within the OracleDatabase object that is produced, and is later used to help order actions in synchronization scripts. This is significantly helped by the dependencies algorithm being a deterministic algorithm - given the same input, it will always return the same output. Therefore, when we're replaying a query snapshot, and processing dependency information, we simply have to return the results of the queries in the order we got them from the live database, rather than trying to calculate the contents of all_dependencies on the fly. Query snapshots are a significant feature in Schema Compare that really helps us to debug problems with the tool, as well as making our testers happier. Although not really user-visible, they are very useful to the development team to help us fix bugs in the product much faster than we otherwise would be able to.

    Read the article

  • Anatomy of a .NET Assembly - Signature encodings

    - by Simon Cooper
    If you've just joined this series, I highly recommend you read the previous posts in this series, starting here, or at least these posts, covering the CLR metadata tables. Before we look at custom attribute encoding, we first need to have a brief look at how signatures are encoded in an assembly in general. Signature types There are several types of signatures in an assembly, all of which share a common base representation, and are all stored as binary blobs in the #Blob heap, referenced by an offset from various metadata tables. The types of signatures are: Method definition and method reference signatures. Field signatures Property signatures Method local variables. These are referenced from the StandAloneSig table, which is then referenced by method body headers. Generic type specifications. These represent a particular instantiation of a generic type. Generic method specifications. Similarly, these represent a particular instantiation of a generic method. All these signatures share the same underlying mechanism to represent a type Representing a type All metadata signatures are based around the ELEMENT_TYPE structure. This assigns a number to each 'built-in' type in the framework; for example, Uint16 is 0x07, String is 0x0e, and Object is 0x1c. Byte codes are also used to indicate SzArrays, multi-dimensional arrays, custom types, and generic type and method variables. However, these require some further information. Firstly, custom types (ie not one of the built-in types). These require you to specify the 4-byte TypeDefOrRef coded token after the CLASS (0x12) or VALUETYPE (0x11) element type. This 4-byte value is stored in a compressed format before being written out to disk (for more excruciating details, you can refer to the CLI specification). SzArrays simply have the array item type after the SZARRAY byte (0x1d). Multidimensional arrays follow the ARRAY element type with a series of compressed integers indicating the number of dimensions, and the size and lower bound of each dimension. Generic variables are simply followed by the index of the generic variable they refer to. There are other additions as well, for example, a specific byte value indicates a method parameter passed by reference (BYREF), and other values indicating custom modifiers. Some examples... To demonstrate, here's a few examples and what the resulting blobs in the #Blob heap will look like. Each name in capitals corresponds to a particular byte value in the ELEMENT_TYPE or CALLCONV structure, and coded tokens to custom types are represented by the type name in curly brackets. A simple field: int intField; FIELD I4 A field of an array of a generic type parameter (assuming T is the first generic parameter of the containing type): T[] genArrayField FIELD SZARRAY VAR 0 An instance method signature (note how the number of parameters does not include the return type): instance string MyMethod(MyType, int&, bool[][]); HASTHIS DEFAULT 3 STRING CLASS {MyType} BYREF I4 SZARRAY SZARRAY BOOLEAN A generic type instantiation: MyGenericType<MyType, MyStruct> GENERICINST CLASS {MyGenericType} 2 CLASS {MyType} VALUETYPE {MyStruct} For more complicated examples, in the following C# type declaration: GenericType<T> : GenericBaseType<object[], T, GenericType<T>> { ... } the Extends field of the TypeDef for GenericType will point to a TypeSpec with the following blob: GENERICINST CLASS {GenericBaseType} 3 SZARRAY OBJECT VAR 0 GENERICINST CLASS {GenericType} 1 VAR 0 And a static generic method signature (generic parameters on types are referenced using VAR, generic parameters on methods using MVAR): TResult[] GenericMethod<TInput, TResult>( TInput, System.Converter<TInput, TOutput>); GENERIC 2 2 SZARRAY MVAR 1 MVAR 0 GENERICINST CLASS {System.Converter} 2 MVAR 0 MVAR 1 As you can see, complicated signatures are recursively built up out of quite simple building blocks to represent all the possible variations in a .NET assembly. Now we've looked at the basics of normal method signatures, in my next post I'll look at custom attribute application signatures, and how they are different to normal signatures.

    Read the article

  • Subterranean IL: Exception handling 1

    - by Simon Cooper
    Today, I'll be starting a look at the Structured Exception Handling mechanism within the CLR. Exception handling is quite a complicated business, and, as a result, the rules governing exception handling clauses in IL are quite strict; you need to be careful when writing exception clauses in IL. Exception handlers Exception handlers are specified using a .try clause within a method definition. .try <TryStartLabel> to <TryEndLabel> <HandlerType> handler <HandlerStartLabel> to <HandlerEndLabel> As an example, a basic try/catch block would be specified like so: TryBlockStart: // ... leave.s CatchBlockEndTryBlockEnd:CatchBlockStart: // at the start of a catch block, the exception thrown is on the stack callvirt instance string [mscorlib]System.Object::ToString() call void [mscorlib]System.Console::WriteLine(string) leave.s CatchBlockEnd CatchBlockEnd: // method code continues... .try TryBlockStart to TryBlockEnd catch [mscorlib]System.Exception handler CatchBlockStart to CatchBlockEnd There are four different types of handler that can be specified: catch <TypeToken> This is the standard exception catch clause; you specify the object type that you want to catch (for example, [mscorlib]System.ArgumentException). Any object can be thrown as an exception, although Microsoft recommend that only classes derived from System.Exception are thrown as exceptions. filter <FilterLabel> A filter block allows you to provide custom logic to determine if a handler block should be run. This functionality is exposed in VB, but not in C#. finally A finally block executes when the try block exits, regardless of whether an exception was thrown or not. fault This is similar to a finally block, but a fault block executes only if an exception was thrown. This is not exposed in VB or C#. You can specify multiple catch or filter handling blocks in each .try, but fault and finally handlers must have their own .try clause. We'll look into why this is in later posts. Scoped exception handlers The .try syntax is quite tricky to use; it requires multiple labels, and you've got to be careful to keep separate the different exception handling sections. However, starting from .NET 2, IL allows you to use scope blocks to specify exception handlers instead. Using this syntax, the example above can be written like so: .try { // ... leave.s EndSEH}catch [mscorlib]System.Exception { callvirt instance string [mscorlib]System.Object::ToString() call void [mscorlib]System.Console::WriteLine(string) leave.s EndSEH}EndSEH:// method code continues... As you can see, this is much easier to write (and read!) than a stand-alone .try clause. Next time, I'll be looking at some of the restrictions imposed by SEH on control flow, and how the C# compiler generated exception handling clauses.

    Read the article

  • Why you shouldn't add methods to interfaces in APIs

    - by Simon Cooper
    It is an oft-repeated maxim that you shouldn't add methods to a publically-released interface in an API. Recently, I was hit hard when this wasn't followed. As part of the work on ApplicationMetrics, I've been implementing auto-reporting of MVC action methods; whenever an action was called on a controller, ApplicationMetrics would automatically report it without the developer needing to add manual ReportEvent calls. Fortunately, MVC provides easy hook when a controller is created, letting me log when it happens - the IControllerFactory interface. Now, the dll we provide to instrument an MVC webapp has to be compiled against .NET 3.5 and MVC 1, as the lowest common denominator. This MVC 1 dll will still work when used in an MVC 2, 3 or 4 webapp because all MVC 2+ webapps have a binding redirect redirecting all references to previous versions of System.Web.Mvc to the correct version, and type forwards taking care of any moved types in the new assemblies. Or at least, it should. IControllerFactory In MVC 1 and 2, IControllerFactory was defined as follows: public interface IControllerFactory { IController CreateController(RequestContext requestContext, string controllerName); void ReleaseController(IController controller); } So, to implement the logging controller factory, we simply wrap the existing controller factory: internal sealed class LoggingControllerFactory : IControllerFactory { private readonly IControllerFactory m_CurrentController; public LoggingControllerFactory(IControllerFactory currentController) { m_CurrentController = currentController; } public IController CreateController( RequestContext requestContext, string controllerName) { // log the controller being used FeatureSessionData.ReportEvent("Controller used:", controllerName); return m_CurrentController.CreateController(requestContext, controllerName); } public void ReleaseController(IController controller) { m_CurrentController.ReleaseController(controller); } } Easy. This works as expected in MVC 1 and 2. However, in MVC 3 this type was throwing a TypeLoadException, saying a method wasn't implemented. It turns out that, in MVC 3, the definition of IControllerFactory was changed to this: public interface IControllerFactory { IController CreateController(RequestContext requestContext, string controllerName); SessionStateBehavior GetControllerSessionBehavior( RequestContext requestContext, string controllerName); void ReleaseController(IController controller); } There's a new method in the interface. So when our MVC 1 dll was redirected to reference System.Web.Mvc v3, LoggingControllerFactory tried to implement version 3 of IControllerFactory, was missing the GetControllerSessionBehaviour method, and so couldn't be loaded by the CLR. Implementing the new method Fortunately, there was a workaround. Because interface methods are normally implemented implicitly in the CLR, if we simply declare a virtual method matching the signature of the new method in MVC 3, then it will be ignored in MVC 1 and 2 and implement the extra method in MVC 3: internal sealed class LoggingControllerFactory : IControllerFactory { ... public virtual SessionStateBehaviour GetControllerSessionBehaviour( RequestContext requestContext, string controllerName) {} ... } However, this also has problems - the SessionStateBehaviour type only exists in .NET 4, and we're limited to .NET 3.5 by support for MVC 1 and 2. This means that the only solutions to support all MVC versions are: Construct the LoggingControllerFactory type at runtime using reflection Produce entirely separate dlls for MVC 1&2 and MVC 3. Ugh. And all because of that blasted extra method! Another solution? Fortunately, in this case, there is a third option - System.Web.Mvc also provides a DefaultControllerFactory type that can provide the implementation of GetControllerSessionBehaviour for us in MVC 3, while still allowing us to override CreateController and ReleaseController. However, this does mean that LoggingControllerFactory won't be able to wrap any calls to GetControllerSessionBehaviour. This is an acceptable bug, given the other options, as very few developers will be overriding GetControllerSessionBehaviour in their own custom controller factory. So, if you're providing an interface as part of an API, then please please please don't add methods to it. Especially if you don't provide a 'default' implementing type. Any code compiled against the previous version that can't be updated will have some very tough decisions to make to support both versions.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >