Monthly Archives

Articles indexed in March 2012

Page 14/239 | < Previous Page | 10 11 12 13 14 15 16 17 18 19 20 21 | Next Page >

No Significant Fragmentation? Look Closer…

If you are relying on using 'best-practice' percentage-based thresholds when you are creating an index maintenance plan for a SQL Server that checks the fragmentation in your pages, you may miss occasional 'edge' conditions on larger tables that will cause severe degradation in performance. It is worth being aware of patterns of data access in particular tables when judging the best threshold figure to use.

Read the article
How to Document and Configure SQL Server Instance Settings

Occasionally, when you install identical databases on two different SQL Server instances, they will behave in surprisingly different ways. Why? Most likely, it is down to different configuration settings. There are around seventy of these settings and the DBA needs to be aware of the effect that many of them have. Brad McGehee explains them all in enough detail to help with most common configuration problems, and suggests some best practices.

Read the article
TortoiseSVN and Subversion Cookbook Part 3: In, Out, and Around

Subversion doesn't have to be difficult, especially if you have Michael Sorens's guide at hand. After dealing in previous articles with checkouts and commits in Subversion, and covering the various file-manipulation operations that are required for Subversion, Michael now deals in this article with file macro-management, the operations such as putting things in, and taking things out, that deal with repositories and projects.

Read the article
Unit Testing Myths and Practices

We all understand the value of Unit Testing, but how come so few organisations maintain unit tests for their in-house applications? We can no longer pretend that unit testing is a universal panacea for ensuring less-buggy applications. Instead, we should be prepared to actively justify the use of unit tests, and be more savvy about where in the development cycle the unit test resources should be most effectively used.

Read the article
Concurrent Affairs

- by Tony Davis

I once wrote an editorial, multi-core mania, on the conundrum of ever-increasing numbers of processor cores, but without the concurrent programming techniques to get anywhere near exploiting their performance potential. I came to the.controversial.conclusion that, while the problem loomed for all procedural languages, it was not a big issue for the vast majority of programmers. Two years later, I still think most programmers don't concern themselves overly with this issue, but I do think that's a bigger problem than I originally implied. Firstly, is the performance boost from writing code that can fully exploit all available cores worth the cost of the additional programming complexity? Right now, with quad-core processors that, at best, can make our programs four times faster, the answer is still no for many applications. But what happens in a few years, as the number of cores grows to 100 or even 1000? At this point, it becomes very hard to ignore the potential gains from exploiting concurrency. Possibly, I was optimistic to assume that, by the time we have 100-core processors, and most applications really needed to exploit them, some technology would be around to allow us to do so with relative ease. The ideal solution would be one that allows programmers to forget about the problem, in much the same way that garbage collection removed the need to worry too much about memory allocation. From all I can find on the topic, though, there is only a remote likelihood that we'll ever have a compiler that takes a program written in a single-threaded style and "auto-magically" converts it into an efficient, correct, multi-threaded program. At the same time, it seems clear that what is currently the most common solution, multi-threaded programming with shared memory, is unsustainable. As soon as a piece of state can be changed by a different thread of execution, the potential number of execution paths through your program grows exponentially with the number of threads. If you have two threads, each executing n instructions, then there are 2^n possible "interleavings" of those instructions. Of course, many of those interleavings will have identical behavior, but several won't. Not only does this make understanding how a program works an order of magnitude harder, but it will also result in irreproducible, non-deterministic, bugs. And of course, the problem will be many times worse when you have a hundred or a thousand threads. So what is the answer? All of the possible alternatives require a change in the way we write programs and, currently, seem to be plagued by performance issues. Software transactional memory (STM) applies the ideas of database transactions, and optimistic concurrency control, to memory. However, working out how to break down your program into sufficiently small transactions, so as to avoid contention issues, isn't easy. Another approach is concurrency with actors, where instead of having threads share memory, each thread runs in complete isolation, and communicates with others by passing messages. It simplifies concurrent programs but still has performance issues, if the threads need to operate on the same large piece of data. There are doubtless other possible solutions that I haven't mentioned, and I would love to know to what extent you, as a developer, are considering the problem of multi-core concurrency, what solution you currently favor, and why. Cheers, Tony.

Read the article
Upcoming Webinar: Practical Performance Profiling presented by Jean-Philippe Gouigoux

- by Michaela Murray

Hot on the heels of releasing his new book, Practical Performance Profiling, I'm delighted that Jean-Philippe Gouigoux will be joining us on April 3rd to present a free webinar on optimizing .NET code performance. He gave me a sneak preview of his talk last week and there's a lot of really useful advice in there. He'll be discussing why he thinks 20% of performance problems account for 80% of lost time, before looking at some real examples of both server-side and client-side profiling, and covering a variety of best practices you can use to improve the performance of your own code. The webinar will be followed by a Q&A session where he'll be joined by Red Gate technical support engineer Chris Allen to answer any of your questions. Jean-Philippe has 10 years' experience in .NET, most recently as system architect at MGDIS, and was recently made a Microsoft MVP for his contributions to the .NET community. I'm really excited that he's found a gap between his day job and university lecturing to share his knowledge, and I hope you'll be able to join us on April 3rd - it's free but you do need to register in advance at https://www3.gotomeeting.com/register/829014934. I'll see you there!

Read the article
A Slice of Raspberry Pi

- by Phil Factor

Guest editorial for the ITPro/SysAdmin newsletter The Raspberry Pi Foundation has done a superb design job on their new $35 network-enabled Linux computer. This tiny machine, incorporating an ARM processor on a Broadcom BCM2835 multimedia chip, aims to put the fun back into learning computing. The public response has been overwhelmingly positive.Note that aim: "…to put the fun back". Education in Information Technology is in dire straits. It always has been, but seems to have deteriorated further still, even in the face of improved provision of equipment.In many countries, the government controls the curriculum. It predicted a shortage in office-based IT skills, and so geared the ICT curriculum toward mind-numbing training in word-processing and spreadsheet skills. Instead, the shortage has turned out to be in people with an engineering-mindset, who can solve problems with whatever technologies are available and learn new techniques quickly, in a rapidly-changing field.In retrospect, the assumption that specific training was required rather than an education was an idiotic response to the arrival of mainstream information technology. As a result, ICT became a disaster area, which discouraged a generation of youngsters from a career in IT, and thereby led directly to the shortage of people with the skills that are required to exploit the potential of Information Technology..Raspberry Pi aims to reverse the trend. This is a rig that is geared to fast graphics in high resolution. It is no toy. It should be a superb games machine. However, the use of Fedora, Debian, or Arch Linux ARM shows the more serious educational intent behind the Foundation's work. It looks like it will even do some office work too!So, get hold of any power supply that provides a 5VDC source at the required 700mA; an old Blackberry charger will do or, alternatively, it will run off four AA cells. You'll need a USB hub to support the mouse and keyboard, and maybe a hard drive. You'll want a DVI monitor (with audio out) or TV (sound and video). You'll also need to be able to cope with wired Ethernet 10/100, if you want networking.With this lot assembled, stick the paraphernalia on the back of the HDTV with Blu Tack, get a nice keyboard, and you have a classy Linux-based home computer. The major cost is in the T.V and the keyboard. If you're not already writing software for this platform, then maybe, at a time when some countries are talking of orders in the millions, you should consider it.

Read the article
VB Myth - Case Insensitivity is Awesome!

- by Damon

I was reading Andy Brown's article 10 Reasons Why Visual Basic is Better than C# and the first claim is that VB is superior because of case insensitivity. I think the reasons he outlines are basically as follows: Your fingers get tired finding the shift key (e.g. typing PascalCase and camelCase members) You are much more likely to make mistakes while typing names When you accidentally leave caps lock on, it really matters These three arguments culminate in the conclusion: "It doesn't matter if you disagree with everything else in this article: case-sensitivity alone is sufficient reason to ditch C#!" Righto. I've been using Visual Basic since version 5.0, I wrote a book about ASP.NET in Visual Basic, so I want everyone to know I'm definitely not a VB.NET hater. I had to converted to C# because it was the language of preference at the places I've worked, so I'm used to both languages. I love me some case sensitivity. So first, let's debunk the claims. First, your fingers do not get tired of finding the shift key unless you are writing code in notepad and compiling everything on the command line. Visual Studio pretty much takes away the need to use the shift key at all. For the most part, any programmer worth a damn doesn't have to type more than about 3-5 characters of any variable or method name before IntelliSense kicks in to help. VB or C#, if you are not using the tab key for autocomplete then you are typing too much anyway, regardless of whether the shift key is involved or not. Also, you've got to be a pretty hard-core candy ass if you're complaining at the end of the day that your little fingers are hurting from hitting the shift key. Second, I cannot logically refute the fact that if there are more stringent rules about case sensitivity it will lead to more mistakes. As such, know that you will be more prone to mistakes in C#. However, lets talk about the magnitude of the problem. If you are using IntelliSense then you have auto-correction built in so you probably won't have much of a problem in the first place. If you manage to bypass IntelliSense and type something wrong you normally are immediately presented with a red-squiggly to let you know something is amiss. Normally, a person would look at the problem, figure out what the heck went wrong, and then avoid that problem again in the future. Granted, I have met people who seem to lack this capability, but their problem is deeper than a decision between VB.NET and C#. So let's make sure that we're all on the same page about this problem. If you have two teams of developers, one that uses VB.NET and one that uses C#, do not expect to see the VB.NET team drinking beer at the end of the project in festive revelry while the C# team is crying over what the hell to do because their code is riddled with case-sensitivity problems that nobody can resolve. Lastly, if you leave your caps lock key on, turn it off. Really, what kind of ass-hat would write an entire VB.NET application ENTIRELY IN CAPS? I happen to be a fan of case sensitivity because it encourages precision and uniformity. The last thing I need is a code base that looks like it was ransacked by LeEt HacKors wHo Can uSe wHateVer cASe tHey wanT. I mean really, if you saw someone write this: PuBLIc Sub MyMethod . End Sub And upon asking them why BL was upper case, they responded "Oh, I accidentally hit the shift key there. Fortunately for me VB.NET is a case insensitive language so I saved a couple of keystrokes by leaving it in there." Or if you saw: PUBLIC SUB ANOTHERMETHOD . END SUB And the response to why it was uppercased was "Yeah, I accidentally had caps locks on, fortunately for me VB.NET doesn't care. Really dodged a bullet there, glad I wasn't using C#." Would you not think that a bit ridiculous? If you want to convince C# developers that C# sucks, go for it. But the case insensitivity argument is crap.

Read the article
Why you shouldn't add methods to interfaces in APIs

- by Simon Cooper

It is an oft-repeated maxim that you shouldn't add methods to a publically-released interface in an API. Recently, I was hit hard when this wasn't followed. As part of the work on ApplicationMetrics, I've been implementing auto-reporting of MVC action methods; whenever an action was called on a controller, ApplicationMetrics would automatically report it without the developer needing to add manual ReportEvent calls. Fortunately, MVC provides easy hook when a controller is created, letting me log when it happens - the IControllerFactory interface. Now, the dll we provide to instrument an MVC webapp has to be compiled against .NET 3.5 and MVC 1, as the lowest common denominator. This MVC 1 dll will still work when used in an MVC 2, 3 or 4 webapp because all MVC 2+ webapps have a binding redirect redirecting all references to previous versions of System.Web.Mvc to the correct version, and type forwards taking care of any moved types in the new assemblies. Or at least, it should. IControllerFactory In MVC 1 and 2, IControllerFactory was defined as follows: public interface IControllerFactory { IController CreateController(RequestContext requestContext, string controllerName); void ReleaseController(IController controller); } So, to implement the logging controller factory, we simply wrap the existing controller factory: internal sealed class LoggingControllerFactory : IControllerFactory { private readonly IControllerFactory m_CurrentController; public LoggingControllerFactory(IControllerFactory currentController) { m_CurrentController = currentController; } public IController CreateController( RequestContext requestContext, string controllerName) { // log the controller being used FeatureSessionData.ReportEvent("Controller used:", controllerName); return m_CurrentController.CreateController(requestContext, controllerName); } public void ReleaseController(IController controller) { m_CurrentController.ReleaseController(controller); } } Easy. This works as expected in MVC 1 and 2. However, in MVC 3 this type was throwing a TypeLoadException, saying a method wasn't implemented. It turns out that, in MVC 3, the definition of IControllerFactory was changed to this: public interface IControllerFactory { IController CreateController(RequestContext requestContext, string controllerName); SessionStateBehavior GetControllerSessionBehavior( RequestContext requestContext, string controllerName); void ReleaseController(IController controller); } There's a new method in the interface. So when our MVC 1 dll was redirected to reference System.Web.Mvc v3, LoggingControllerFactory tried to implement version 3 of IControllerFactory, was missing the GetControllerSessionBehaviour method, and so couldn't be loaded by the CLR. Implementing the new method Fortunately, there was a workaround. Because interface methods are normally implemented implicitly in the CLR, if we simply declare a virtual method matching the signature of the new method in MVC 3, then it will be ignored in MVC 1 and 2 and implement the extra method in MVC 3: internal sealed class LoggingControllerFactory : IControllerFactory { ... public virtual SessionStateBehaviour GetControllerSessionBehaviour( RequestContext requestContext, string controllerName) {} ... } However, this also has problems - the SessionStateBehaviour type only exists in .NET 4, and we're limited to .NET 3.5 by support for MVC 1 and 2. This means that the only solutions to support all MVC versions are: Construct the LoggingControllerFactory type at runtime using reflection Produce entirely separate dlls for MVC 1&2 and MVC 3. Ugh. And all because of that blasted extra method! Another solution? Fortunately, in this case, there is a third option - System.Web.Mvc also provides a DefaultControllerFactory type that can provide the implementation of GetControllerSessionBehaviour for us in MVC 3, while still allowing us to override CreateController and ReleaseController. However, this does mean that LoggingControllerFactory won't be able to wrap any calls to GetControllerSessionBehaviour. This is an acceptable bug, given the other options, as very few developers will be overriding GetControllerSessionBehaviour in their own custom controller factory. So, if you're providing an interface as part of an API, then please please please don't add methods to it. Especially if you don't provide a 'default' implementing type. Any code compiled against the previous version that can't be updated will have some very tough decisions to make to support both versions.

Read the article
IE HTML Debugger Causing Issues with IE Enhanced Security

- by Damon

In an effort to debug a Silverlight component on a page in SharePoint I opened the Developer Tools in Internet Explorer. After choosing the Find > Select Element by Click option my page refreshed for some reason and a small bar appeared at the top of the page reading: You may be trying to access this site from a secured browser on the server. Please enable scripts and reload this page. After a quick look around the internet, some seemed to be suggesting that you have to disable the Internet Explorer Enhanced Security Configuration (IE ESC) in Server Manager. Since this is one of the very first things I do when creating a VM, I figured the solution did not apply to me. However, I decided to go ahead and enable IE ESC and then disable it again to see if that would fix the problem, and it did. So if you see that error message in IE, the bar and you've already got IE ESC disabled, you can just enable it and disable it to get rid of the bar.

Read the article
Showing "Failed" for a SharePoint 2010 Timer Job Status

- by Damon

I have been working with a bunch of custom timer jobs for last month. Basically, I'm processing a bunch of SharePoint items from the timer job and since I don't want the job failing because of an error on one item, so I'm handing errors on an item-by-item basis and just continuing on with the next item. The net result of this, I soon found, is that my timer job actually says it ran successfully even if every single item fails. So I figured I would just set the "Failed" status on the timer job is anything went wrong so an administrator could see that not all was well. However, I quickly found that there is no way to set a timer job status. If you want the status to show up as "Failed" then the only way to do it is to throw an exception. In my case, I just used a flag to store whether or not an error had occurred, and if so the the timer job throws a an exception just before existing to let the status display correctly.

Read the article
C# async and actors

- by Alex.Davies

If you read my last post about async, you might be wondering what drove me to write such odd code in the first place. The short answer is that .NET Demon is written using NAct Actors. Actors are an old idea, which I believe deserve a renaissance under C# 5. The idea is to isolate each stateful object so that only one thread has access to its state at any point in time. That much should be familiar, it's equivalent to traditional lock-based synchronization. The different part is that actors pass "messages" to each other rather than calling a method and waiting for it to return. By doing that, each thread can only ever be holding one lock. This completely eliminates deadlocks, my least favourite concurrency problem. Most people who use actors take this quite literally, and there are plenty of frameworks which help you to create message classes and loops which can receive the messages, inspect what type of message they are, and process them accordingly. But I write C# for a reason. Do I really have to choose between using actors and everything I love about object orientation in C#? Type safety Interfaces Inheritance Generics As it turns out, no. You don't need to choose between messages and method calls. A method call makes a perfectly good message, as long as you don't wait for it to return. This is where asynchonous methods come in. I have used NAct for a while to wrap my objects in a proxy layer. As long as I followed the rule that methods must always return void, NAct queued up the call for later, and immediately released my thread. When I needed to get information out of other actors, I could use EventHandlers and callbacks (continuation passing style, for any CS geeks reading), and NAct would call me back in my isolated thread without blocking the actor that raised the event. Using callbacks looks horrible though. To remind you: m_BuildControl.FilterEnabledForBuilding( projects, enabledProjects = m_OutOfDateProjectFinder.FilterNeedsBuilding( enabledProjects, newDirtyProjects = { ....... Which is why I'm really happy that NAct now supports async methods. Now, methods are allowed to return Task rather than just void. I can await those methods, and C# 5 will turn the rest of my method into a continuation for me. NAct will run the other method in the other actor's context, but will make sure that when my method resumes, we're back in my context. Neither actor was ever blocked waiting for the other one. Apart from when they were actually busy doing something, they were responsive to concurrent messages from other sources. To be fair, you could use async methods with lock statements to achieve exactly the same thing, but it's ugly. Here's a realistic example of an object that has a queue of data that gets passed to another object to be processed: class QueueProcessor { private readonly ItemProcessor m_ItemProcessor = ... private readonly object m_Sync = new object(); private Queue<object> m_DataQueue = ... private List<object> m_Results = ... public async Task ProcessOne() { object data = null; lock (m_Sync) { data = m_DataQueue.Dequeue(); } var processedData = await m_ItemProcessor.ProcessData(data); lock (m_Sync) { m_Results.Add(processedData); } } } We needed to write two lock blocks, one to get the data to process, one to store the result. The worrying part is how easily we could have forgotten one of the locks. Compare that to the version using NAct: class QueueProcessorActor : IActor { private readonly ItemProcessor m_ItemProcessor = ... private Queue<object> m_DataQueue = ... private List<object> m_Results = ... public async Task ProcessOne() { // We are an actor, it's always thread-safe to access our private fields var data = m_DataQueue.Dequeue(); var processedData = await m_ItemProcessor.ProcessData(data); m_Results.Add(processedData); } } You don't have to explicitly lock anywhere, NAct ensures that your code will only ever run on one thread, because it's an actor. Either way, async is definitely better than traditional synchronous code. Here's a diagram of what a typical synchronous implementation might do: The left side shows what is running on the thread that has the lock required to access the QueueProcessor's data. The red section is where that lock is held, but doesn't need to be. Contrast that with the async version we wrote above: Here, the lock is released in the middle. The QueueProcessor is free to do something else. Most importantly, even if the ItemProcessor sometimes calls the QueueProcessor, they can never deadlock waiting for each other. So I thoroughly recommend you use async for all code that has to wait a while for things. And if you find yourself writing lots of lock statements, think about using actors as well. Using actors and async together really takes the misery out of concurrent programming.

Read the article
C# 5: At last, async without the pain

- by Alex.Davies

For me, the best feature in Visual Studio 11 is the async and await keywords that come with C# 5. I am a big fan of asynchronous programming: it frees up resources, in particular the thread that a piece of code needs to run in. That lets that thread run something else, while waiting for your long-running operation to complete. That's really important if that thread is the UI thread, or if it's holding a lock because it accesses some data structure. Before C# 5, I think I was about the only person in the world who really cared about asynchronous programming. The trouble was that you had to go to extreme lengths to make code asynchronous. I would forever be writing methods that, instead of returning a value, accepted an extra argument that is a "continuation". Then, when calling the method, I'd have to pass a lambda in to it, which contained all the stuff that needed to happen after the method finished. Here is a real snippet of code that is in .NET Demon: m_BuildControl.FilterEnabledForBuilding( projects, enabledProjects = m_OutOfDateProjectFinder.FilterNeedsBuilding( enabledProjects, newDirtyProjects = { // Mark any currently broken projects as dirty newDirtyProjects.UnionWith(m_BrokenProjects); // Copy what we found into the set of dirty things m_DirtyProjects = newDirtyProjects; RunSomeBuilds(); })); It's just obtuse. Who puts a lambda inside a lambda like that? Well, me obviously. But surely enabledProjects should just be the return value of FilterEnabledForBuilding? And newDirtyProjects should just be the return value of FilterNeedsBuilding? C# 5 async/await lets you write asynchronous code without it looking so stupid. Here's what I plan to change that code to, once we upgrade to VS 11: var enabledProjects = await m_BuildControl.FilterEnabledForBuilding(projects); var newDirtyProjects = await m_OutOfDateProjectFinder.FilterNeedsBuilding(enabledProjects); // Mark any currently broken projects as dirty newDirtyProjects.UnionWith(m_BrokenProjects); // Copy what we found into the set of dirty things m_DirtyProjects = newDirtyProjects; RunSomeBuilds(); Much easier to read! But how is this the same code? If we were on the UI thread, doesn't the UI thread have to block while FilterEnabledForBuilding runs? No, it doesn't, and that's the magic of the await keyword! It cuts your method up into its constituent pieces, much like I did manually with lambdas before. When you run it, only the piece up to the first await actually runs. The rest is passed to FilterEnabledForBuilding as a continuation, which will get called back whenever that method is finished. In the meantime, our thread returns, and can go back to making the UI responsive, or whatever else threads do in their spare time. This is actually a massive simplification, and if you're interested in all the gory details, and speed hacks that the await keyword actually does for you, I recommend Jon Skeet's blog posts about it.

Read the article
A Community Cure for a String Splitting Headache

- by Tony Davis

A heartwarming tale of dogged perseverance and Community collaboration to solve some SQL Server string-related headaches. Michael J Swart posted a blog this week that had me smiling in recognition and agreement, describing how an inquisitive Developer or DBA deals with a problem. It's a three-step process, starting with discomfort and anxiety; a feeling that one doesn't know as much about one's chosen specialized subject as previously thought. It progresses through a phase of intense research and learning until finally one achieves breakthrough, blessed relief and renewed optimism. In this case, the discomfort was provoked by the mystery of massively high CPU when searching Unicode strings in SQL Server. Michael explored the problem via Stack Overflow, Google and Twitter #sqlhelp, finally leading to resolution and a blog post that shared what he learned. Perfect; except that sometimes you have to be prepared to share what you've learned so far, while still mired in the phase of nagging discomfort. A good recent example of this recently can be found on our own blogs. Despite being a loud advocate of the lightning fast T-SQL-based string splitting techniques, honed to near perfection over many years by Jeff Moden and others, Phil Factor retained a dogged conviction that, in theory, shredding element-based XML using XQuery ought to be even more efficient for splitting a string to create a table. After some careful testing, he found instead that the XML way performed and scaled miserably by comparison. Somewhat subdued, and with a nagging feeling that perhaps he was still missing "something", he posted his findings. What happened next was a joy to behold; the community jumped in to suggest subtle changes in approach, using an attribute-based rather than element-based XML list, and tweaking the XQuery shredding. The result was performance and scalability that surpassed all other techniques. I asked Phil how quickly he would have arrived at the real breakthrough on his own. His candid answer was "never". Both are great examples of the power of Community learning and the latter in particular the importance of being brave enough to parade one's ignorance. Perhaps Jeff Moden will accept the string-splitting gauntlet one more time. To quote the great man: you've just got to love this community! If you've an interesting tale to tell about being helped to a significant breakthrough for a problem by the community, I'd love to hear about it. Cheers, Tony.

Read the article
Oh no! My padding's invalid!

- by Simon Cooper

Recently, I've been doing some work involving cryptography, and encountered the standard .NET CryptographicException: 'Padding is invalid and cannot be removed.' Searching on StackOverflow produces 57 questions concerning this exception; it's a very common problem encountered. So I decided to have a closer look. To test this, I created a simple project that decrypts and encrypts a byte array: // create some random data byte[] data = new byte[100]; new Random().NextBytes(data); // use the Rijndael symmetric algorithm RijndaelManaged rij = new RijndaelManaged(); byte[] encrypted; // encrypt the data using a CryptoStream using (var encryptor = rij.CreateEncryptor()) using (MemoryStream encryptedStream = new MemoryStream()) using (CryptoStream crypto = new CryptoStream( encryptedStream, encryptor, CryptoStreamMode.Write)) { crypto.Write(data, 0, data.Length); encrypted = encryptedStream.ToArray(); } byte[] decrypted; // and decrypt it again using (var decryptor = rij.CreateDecryptor()) using (CryptoStream crypto = new CryptoStream( new MemoryStream(encrypted), decryptor, CryptoStreamMode.Read)) { byte[] decrypted = new byte[data.Length]; crypto.Read(decrypted, 0, decrypted.Length); } Sure enough, I got exactly the same CryptographicException when trying to decrypt the data even in this simple example. Well, I'm obviously missing something, if I can't even get this single method right! What does the exception message actually mean? What am I missing? Well, after playing around a bit, I discovered the problem was fixed by changing the encryption step to this: // encrypt the data using a CryptoStream using (var encryptor = rij.CreateEncryptor()) using (MemoryStream encryptedStream = new MemoryStream()) { using (CryptoStream crypto = new CryptoStream( encryptedStream, encryptor, CryptoStreamMode.Write)) { crypto.Write(data, 0, data.Length); } encrypted = encryptedStream.ToArray(); } Aaaah, so that's what the problem was. The CryptoStream wasn't flushing all it's data to the MemoryStream before it was being read, and closing the stream causes it to flush everything to the backing stream. But why does this cause an error in padding? Cryptographic padding All symmetric encryption algorithms (of which Rijndael is one) operates on fixed block sizes. For Rijndael, the default block size is 16 bytes. This means the input needs to be a multiple of 16 bytes long. If it isn't, then the input is padded to 16 bytes using one of the padding modes. This is only done to the final block of data to be encrypted. CryptoStream has a special method to flush this final block of data - FlushFinalBlock. Calling Stream.Flush() does not flush the final block, as you might expect. Only by closing the stream or explicitly calling FlushFinalBlock is the final block, with any padding, encrypted and written to the backing stream. Without this call, the encrypted data is 16 bytes shorter than it should be. If this final block wasn't written, then the decryption gets to the final 16 bytes of the encrypted data and tries to decrypt it as the final block with padding. The end bytes don't match the padding scheme it's been told to use, therefore it throws an exception stating what is wrong - what the decryptor expects to be padding actually isn't, and so can't be removed from the stream. So, as well as closing the stream before reading the result, an alternative fix to my encryption code is the following: // encrypt the data using a CryptoStream using (var encryptor = rij.CreateEncryptor()) using (MemoryStream encryptedStream = new MemoryStream()) using (CryptoStream crypto = new CryptoStream( encryptedStream, encryptor, CryptoStreamMode.Write)) { crypto.Write(data, 0, data.Length); // explicitly flush the final block of data crypto.FlushFinalBlock(); encrypted = encryptedStream.ToArray(); } Conclusion So, if your padding is invalid, make sure that you close or call FlushFinalBlock on any CryptoStream performing encryption before you access the encrypted data. Flush isn't enough. Only then will the final block be present in the encrypted data, allowing it to be decrypted successfully.

Read the article
Database Migration Scripts: Getting from place A to place B

- by Phil Factor

We’ll be looking at a typical database ‘migration’ script which uses an unusual technique to migrate existing ‘de-normalised’ data into a more correct form. So, the book-distribution business that uses the PUBS database has gradually grown organically, and has slipped into ‘de-normalisation’ habits. What’s this? A new column with a list of tags or ‘types’ assigned to books. Because books aren’t really in just one category, someone has ‘cured’ the mismatch between the database and the business requirements. This is fine, but it is now proving difficult for their new website that allows searches by tags. Any request for history book really has to look in the entire list of associated tags rather than the ‘Type’ field that only keeps the primary tag. We have other problems. The TypleList column has duplicates in there which will be affecting the reporting, and there is the danger of mis-spellings getting there. The reporting system can’t be persuaded to do reports based on the tags and the Database developers are complaining about the unCoddly things going on in their database. In your version of PUBS, this extra column doesn’t exist, so we’ve added it and put in 10,000 titles using SQL Data Generator. /* So how do we refactor this database? firstly, we create a table of all the tags. */IF OBJECT_ID('TagName') IS NULL OR OBJECT_ID('TagTitle') IS NULL BEGIN CREATE TABLE TagName (TagName_ID INT IDENTITY(1,1) PRIMARY KEY , Tag VARCHAR(20) NOT NULL UNIQUE) /* ...and we insert into it all the tags from the list (remembering to take out any leading spaces */ INSERT INTO TagName (Tag) SELECT DISTINCT LTRIM(x.y.value('.', 'Varchar(80)')) AS [Tag] FROM (SELECT Title_ID, CONVERT(XML, '<list>' + REPLACE(TypeList, ',', '') + '</list>') AS XMLkeywords FROM dbo.titles)g CROSS APPLY XMLkeywords.nodes('/list/i/text()') AS x ( y ) /* we can then use this table to provide a table that relates tags to articles */ CREATE TABLE TagTitle (TagTitle_ID INT IDENTITY(1, 1), [title_id] [dbo].[tid] NOT NULL REFERENCES titles (Title_ID), TagName_ID INT NOT NULL REFERENCES TagName (Tagname_ID) CONSTRAINT [PK_TagTitle] PRIMARY KEY CLUSTERED ([title_id] ASC, TagName_ID) ON [PRIMARY]) CREATE NONCLUSTERED INDEX idxTagName_ID ON TagTitle (TagName_ID) INCLUDE (TagTitle_ID,title_id) /* ...and it is easy to fill this with the tags for each title ... */ INSERT INTO TagTitle (Title_ID, TagName_ID) SELECT DISTINCT Title_ID, TagName_ID FROM (SELECT Title_ID, CONVERT(XML, '<list>' + REPLACE(TypeList, ',', '') + '</list>') AS XMLkeywords FROM dbo.titles)g CROSS APPLY XMLkeywords.nodes('/list/i/text()') AS x ( y ) INNER JOIN TagName ON TagName.Tag=LTRIM(x.y.value('.', 'Varchar(80)')) END /* That's all there was to it. Now we can select all titles that have the military tag, just to try things out */SELECT Title FROM titles INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID WHERE tagname.tag='Military'/* and see the top ten most popular tags for titles */SELECT Tag, COUNT(*) FROM titles INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID GROUP BY Tag ORDER BY COUNT(*) DESC/* and if you still want your list of tags for each title, then here they are */SELECT title_ID, title, STUFF( (SELECT ','+tagname.tag FROM titles thisTitle INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID WHERE ThisTitle.title_id=titles.title_ID FOR XML PATH(''), TYPE).value('.', 'varchar(max)') ,1,1,'') FROM titles ORDER BY title_ID So we’ve refactored our PUBS database without pain. We’ve even put in a check to prevent it being re-run once the new tables are created. Here is the diagram of the new tag relationship We’ve done both the DDL to create the tables and their associated components, and the DML to put the data in them. I could have also included the script to remove the de-normalised TypeList column, but I’d do a whole lot of tests first before doing that. Yes, I’ve left out the assertion tests too, which should check the edge cases and make sure the result is what you’d expect. One thing I can’t quite figure out is how to deal with an ordered list using this simple XML-based technique. We can ensure that, if we have to produce a list of tags, we can get the primary ‘type’ to be first in the list, but what if the entire order is significant? Thank goodness it isn’t in this case. If it were, we might have to revisit a string-splitter function that returns the ordinal position of each component in the sequence. You’ll see immediately that we can create a synchronisation script for deployment from a comparison tool such as SQL Compare, to change the schema (DDL). On the other hand, no tool could do the DML to stuff the data into the new table, since there is no way that any tool will be able to work out where the data should go. We used some pretty hairy code to deal with a slightly untypical problem. We would have to do this migration by hand, and it has to go into source control as a batch. If most of your database changes are to be deployed by an automated process, then there must be a way of over-riding this part of the data synchronisation process to do this part of the process taking the part of the script that fills the tables, Checking that the tables have not already been filled, and executing it as part of the transaction. Of course, you might prefer the approach I’ve taken with the script of creating the tables in the same batch as the data conversion process, and then using the presence of the tables to prevent the script from being re-run. The problem with scripting a refactoring change to a database is that it has to work both ways. If we install the new system and then have to rollback the changes, several books may have been added, or had their tags changed, in the meantime. Yes, you have to script any rollback! These have to be mercilessly tested, and put in source control just in case of the rollback of a deployment after it has been in place for any length of time. I’ve shown you how to do this with the part of the script .. /* and if you still want your list of tags for each title, then here they are */SELECT title_ID, title, STUFF( (SELECT ','+tagname.tag FROM titles thisTitle INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID WHERE ThisTitle.title_id=titles.title_ID FOR XML PATH(''), TYPE).value('.', 'varchar(max)') ,1,1,'') FROM titles ORDER BY title_ID …which would be turned into an UPDATE … FROM script. UPDATE titles SET typelist= ThisTaglistFROM (SELECT title_ID, title, STUFF( (SELECT ','+tagname.tag FROM titles thisTitle INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID WHERE ThisTitle.title_id=titles.title_ID ORDER BY CASE WHEN tagname.tag=titles.[type] THEN 1 ELSE 0 END DESC FOR XML PATH(''), TYPE).value('.', 'varchar(max)') ,1,1,'') AS ThisTagList FROM titles)fINNER JOIN Titles ON f.title_ID=Titles.title_ID You’ll notice that it isn’t quite a round trip because the tags are in a different order, though we’ve managed to make sure that the primary tag is the first one as originally. So, we’ve improved the database for the poor book distributors using PUBS. It is not a major deal but you’ve got to be prepared to provide a migration script that will go both forwards and backwards. Ideally, database refactoring scripts should be able to go from any version to any other. Schema synchronization scripts can do this pretty easily, but no data synchronisation scripts can deal with serious refactoring jobs without the developers being able to specify how to deal with cases like this.

Read the article
Tuning Red Gate: #4 of Some

- by Grant Fritchey

First time connecting to these servers directly (keys to the kingdom, bwa-ha-ha-ha. oh, excuse me), so I'm going to take a look at the server properties, just to see if there are any issues there. Max memory is set, cool, first possible silly mistake clear. In fact, these look to be nicely set up. Oh, I'd like to see the ANSI Standards set by default, but it's not a big deal. The default location for database data is the F:\ drive, where I saw all the activity last time. Cool, the people maintaining the servers in our company listen, parallelism threshold is set to 35 and optimize for ad hoc is enabled. No shocks, no surprises. The basic setup is appropriate. On to the problem database. Nothing wrong in the properties. The database is in SIMPLE recovery, but I think it's a reporting system, so no worries there. Again, I'd prefer to see the ANSI settings for connections, but that's the worst thing I can see. Time to look at the queries, tables, indexes and statistics because all the information I've collected over the last several days suggests that we're not looking at a systemic problem (except possibly not enough memory), but at the traditional tuning issues. I just want to note that, I started looking at the system, not the queries. So should you when tuning your environment. I know, from the data collected through SQL Monitor, what my top poor performing queries are, and the most frequently called, etc. I'm starting with the most frequently called. I'm going to get the execution plan for this thing out of the cache (although, with the cache dumping constantly, I might not get it). And it's not there. Called 1.3 million times over the last 3 days, but it's not in cache. Wow. OK. I'll see what's in cache for this database: SELECT deqs.creation_time, deqs.execution_count, deqs.max_logical_reads, deqs.max_elapsed_time, deqs.total_logical_reads, deqs.total_elapsed_time, deqp.query_plan, SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1, (deqs.statement_end_offset - deqs.statement_start_offset) / 2 + 1) AS QueryStatement FROM sys.dm_exec_query_stats AS deqs CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp WHERE dest.dbid = DB_ID('Warehouse') AND deqs.statement_end_offset > 0 AND deqs.statement_start_offset > 0 ORDER BY deqs.max_logical_reads DESC ; And looking at the most expensive operation, we have our first bad boy: Multiple table scans against very large sets of data and a sort operation. a sort operation? It's an insert. Oh, I see, the table is a heap, so it's doing an insert, then sorting the data and then inserting into the primary key. First question, why isn't this a clustered index? Let's look at some more of the queries. The next one is deceiving. Here's the query plan: You're thinking to yourself, what's the big deal? Well, what if I told you that this thing had 8036318 reads? I know, you're looking at skinny little pipes. Know why? Table variable. Estimated number of rows = 1. Actual number of rows. well, I'm betting several more than one considering it's read 8 MILLION pages off the disk in a single execution. We have a serious and real tuning candidate. Oh, and I missed this, it's loading the table variable from a user defined function. Let me check, let me check. YES! A multi-statement table valued user defined function. And another tuning opportunity. This one's a beauty, seriously. Did I also mention that they're doing a hash against all the columns in the physical table. I'm sure that won't lead to scans of a 500,000 row table, no, not at all. OK. I lied. Of course it is. At least it's on the top part of the Loop which means the scan is only executed once. I just did a cursory check on the next several poor performers. all calling the UDF. I think I found a big tuning opportunity. At this point, I'm typing up internal emails for the company. Someone just had their baby called ugly. In addition to a series of suggested changes that we need to implement, I'm also apologizing for being such an unkind monster as to question whether that third eye & those flippers belong on such an otherwise lovely child.

Read the article
Inside the Concurrent Collections: ConcurrentDictionary

- by Simon Cooper

Using locks to implement a thread-safe collection is rather like using a sledgehammer - unsubtle, easy to understand, and tends to make any other tool redundant. Unlike the previous two collections I looked at, ConcurrentStack and ConcurrentQueue, ConcurrentDictionary uses locks quite heavily. However, it is careful to wield locks only where necessary to ensure that concurrency is maximised. This will, by necessity, be a higher-level look than my other posts in this series, as there is quite a lot of code and logic in ConcurrentDictionary. Therefore, I do recommend that you have ConcurrentDictionary open in a decompiler to have a look at all the details that I skip over. The problem with locks There's several things to bear in mind when using locks, as encapsulated by the lock keyword in C# and the System.Threading.Monitor class in .NET (if you're unsure as to what lock does in C#, I briefly covered it in my first post in the series): Locks block threads The most obvious problem is that threads waiting on a lock can't do any work at all. No preparatory work, no 'optimistic' work like in ConcurrentQueue and ConcurrentStack, nothing. It sits there, waiting to be unblocked. This is bad if you're trying to maximise concurrency. Locks are slow Whereas most of the methods on the Interlocked class can be compiled down to a single CPU instruction, ensuring atomicity at the hardware level, taking out a lock requires some heavy lifting by the CLR and the operating system. There's quite a bit of work required to take out a lock, block other threads, and wake them up again. If locks are used heavily, this impacts performance. Deadlocks When using locks there's always the possibility of a deadlock - two threads, each holding a lock, each trying to aquire the other's lock. Fortunately, this can be avoided with careful programming and structured lock-taking, as we'll see. So, it's important to minimise where locks are used to maximise the concurrency and performance of the collection. Implementation As you might expect, ConcurrentDictionary is similar in basic implementation to the non-concurrent Dictionary, which I studied in a previous post. I'll be using some concepts introduced there, so I recommend you have a quick read of it. So, if you were implementing a thread-safe dictionary, what would you do? The naive implementation is to simply have a single lock around all methods accessing the dictionary. This would work, but doesn't allow much concurrency. Fortunately, the bucketing used by Dictionary allows a simple but effective improvement to this - one lock per bucket. This allows different threads modifying different buckets to do so in parallel. Any thread making changes to the contents of a bucket takes the lock for that bucket, ensuring those changes are thread-safe. The method that maps each bucket to a lock is the GetBucketAndLockNo method: private void GetBucketAndLockNo( int hashcode, out int bucketNo, out int lockNo, int bucketCount) { // the bucket number is the hashcode (without the initial sign bit) // modulo the number of buckets bucketNo = (hashcode & 0x7fffffff) % bucketCount; // and the lock number is the bucket number modulo the number of locks lockNo = bucketNo % m_locks.Length; } However, this does require some changes to how the buckets are implemented. The 'implicit' linked list within a single backing array used by the non-concurrent Dictionary adds a dependency between separate buckets, as every bucket uses the same backing array. Instead, ConcurrentDictionary uses a strict linked list on each bucket: This ensures that each bucket is entirely separate from all other buckets; adding or removing an item from a bucket is independent to any changes to other buckets. Modifying the dictionary All the operations on the dictionary follow the same basic pattern: void AlterBucket(TKey key, ...) { int bucketNo, lockNo; 1: GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, m_buckets.Length); 2: lock (m_locks[lockNo]) { 3: Node headNode = m_buckets[bucketNo]; 4: Mutate the node linked list as appropriate } } For example, when adding another entry to the dictionary, you would iterate through the linked list to check whether the key exists already, and add the new entry as the head node. When removing items, you would find the entry to remove (if it exists), and remove the node from the linked list. Adding, updating, and removing items all follow this pattern. Performance issues There is a problem we have to address at this point. If the number of buckets in the dictionary is fixed in the constructor, then the performance will degrade from O(1) to O(n) when a large number of items are added to the dictionary. As more and more items get added to the linked lists in each bucket, the lookup operations will spend most of their time traversing a linear linked list. To fix this, the buckets array has to be resized once the number of items in each bucket has gone over a certain limit. (In ConcurrentDictionary this limit is when the size of the largest bucket is greater than the number of buckets for each lock. This check is done at the end of the TryAddInternal method.) Resizing the bucket array and re-hashing everything affects every bucket in the collection. Therefore, this operation needs to take out every lock in the collection. Taking out mutiple locks at once inevitably summons the spectre of the deadlock; two threads each hold a lock, and each trying to acquire the other lock. How can we eliminate this? Simple - ensure that threads never try to 'swap' locks in this fashion. When taking out multiple locks, always take them out in the same order, and always take out all the locks you need before starting to release them. In ConcurrentDictionary, this is controlled by the AcquireLocks, AcquireAllLocks and ReleaseLocks methods. Locks are always taken out and released in the order they are in the m_locks array, and locks are all released right at the end of the method in a finally block. At this point, it's worth pointing out that the locks array is never re-assigned, even when the buckets array is increased in size. The number of locks is fixed in the constructor by the concurrencyLevel parameter. This simplifies programming the locks; you don't have to check if the locks array has changed or been re-assigned before taking out a lock object. And you can be sure that when a thread takes out a lock, another thread isn't going to re-assign the lock array. This would create a new series of lock objects, thus allowing another thread to ignore the existing locks (and any threads controlling them), breaking thread-safety. Consequences of growing the array Just because we're using locks doesn't mean that race conditions aren't a problem. We can see this by looking at the GrowTable method. The operation of this method can be boiled down to: private void GrowTable(Node[] buckets) { try { 1: Acquire first lock in the locks array // this causes any other thread trying to take out // all the locks to block because the first lock in the array // is always the one taken out first // check if another thread has already resized the buckets array // while we were waiting to acquire the first lock 2: if (buckets != m_buckets) return; 3: Calculate the new size of the backing array 4: Node[] array = new array[size]; 5: Acquire all the remaining locks 6: Re-hash the contents of the existing buckets into array 7: m_buckets = array; } finally { 8: Release all locks } } As you can see, there's already a check for a race condition at step 2, for the case when the GrowTable method is called twice in quick succession on two separate threads. One will successfully resize the buckets array (blocking the second in the meantime), when the second thread is unblocked it'll see that the array has already been resized & exit without doing anything. There is another case we need to consider; looking back at the AlterBucket method above, consider the following situation: Thread 1 calls AlterBucket; step 1 is executed to get the bucket and lock numbers. Thread 2 calls GrowTable and executes steps 1-5; thread 1 is blocked when it tries to take out the lock in step 2. Thread 2 re-hashes everything, re-assigns the buckets array, and releases all the locks (steps 6-8). Thread 1 is unblocked and continues executing, but the calculated bucket and lock numbers are no longer valid. Between calculating the correct bucket and lock number and taking out the lock, another thread has changed where everything is. Not exactly thread-safe. Well, a similar problem was solved in ConcurrentStack and ConcurrentQueue by storing a local copy of the state, doing the necessary calculations, then checking if that state is still valid. We can use a similar idea here: void AlterBucket(TKey key, ...) { while (true) { Node[] buckets = m_buckets; int bucketNo, lockNo; GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, buckets.Length); lock (m_locks[lockNo]) { // if the state has changed, go back to the start if (buckets != m_buckets) continue; Node headNode = m_buckets[bucketNo]; Mutate the node linked list as appropriate } break; } } TryGetValue and GetEnumerator And so, finally, we get onto TryGetValue and GetEnumerator. I've left these to the end because, well, they don't actually use any locks. How can this be? Whenever you change a bucket, you need to take out the corresponding lock, yes? Indeed you do. However, it is important to note that TryGetValue and GetEnumerator don't actually change anything. Just as immutable objects are, by definition, thread-safe, read-only operations don't need to take out a lock because they don't change anything. All lockless methods can happily iterate through the buckets and linked lists without worrying about locking anything. However, this does put restrictions on how the other methods operate. Because there could be another thread in the middle of reading the dictionary at any time (even if a lock is taken out), the dictionary has to be in a valid state at all times. Every change to state has to be made visible to other threads in a single atomic operation (all relevant variables are marked volatile to help with this). This restriction ensures that whatever the reading threads are doing, they never read the dictionary in an invalid state (eg items that should be in the collection temporarily removed from the linked list, or reading a node that has had it's key & value removed before the node itself has been removed from the linked list). Fortunately, all the operations needed to change the dictionary can be done in that way. Bucket resizes are made visible when the new array is assigned back to the m_buckets variable. Any additions or modifications to a node are done by creating a new node, then splicing it into the existing list using a single variable assignment. Node removals are simply done by re-assigning the node's m_next pointer. Because the dictionary can be changed by another thread during execution of the lockless methods, the GetEnumerator method is liable to return dirty reads - changes made to the dictionary after GetEnumerator was called, but before the enumeration got to that point in the dictionary. It's worth listing at this point which methods are lockless, and which take out all the locks in the dictionary to ensure they get a consistent view of the dictionary: Lockless: TryGetValue GetEnumerator The indexer getter ContainsKey Takes out every lock (lockfull?): Count IsEmpty Keys Values CopyTo ToArray Concurrent principles That covers the overall implementation of ConcurrentDictionary. I haven't even begun to scratch the surface of this sophisticated collection. That I leave to you. However, we've looked at enough to be able to extract some useful principles for concurrent programming: Partitioning When using locks, the work is partitioned into independant chunks, each with its own lock. Each partition can then be modified concurrently to other partitions. Ordered lock-taking When a method does need to control the entire collection, locks are taken and released in a fixed order to prevent deadlocks. Lockless reads Read operations that don't care about dirty reads don't take out any lock; the rest of the collection is implemented so that any reading thread always has a consistent view of the collection. That leads us to the final collection in this little series - ConcurrentBag. Lacking a non-concurrent analogy, it is quite different to any other collection in the class libraries. Prepare your thinking hats!

Read the article
Tuning Red Gate: #3 of Lots

- by Grant Fritchey

I'm drilling down into the metrics about SQL Server itself available to me in the Analysis tab of SQL Monitor to see what's up with our two problematic servers. In the previous post I'd noticed that rg-sql01 had quite a few CPU spikes. So one of the first things I want to check there is how much CPU is getting used by SQL Server itself. It's possible we're looking at some other process using up all the CPU Nope, It's SQL Server. I compared this to the rg-sql02 server: You can see that there is a more, consistently low set of CPU counters there. I clearly need to look at rg-sql01 and capture more specific data around the queries running on it to identify which ones are causing these CPU spikes. I always like to look at the Batch Requests/sec on a server, not because it's an indication of a problem, but because it gives you some idea of the load. Just how much is this server getting hit? Here are rg-sql01 and rg-sql02: Of the two, clearly rg-sql01 has a lot of activity. Remember though, that's all this is a measure of, activity. It doesn't suggest anything other than what it says, the number of requests coming in. But it's the kind of thing you want to know in order to understand how the system is used. Are you seeing a correlation between the number of requests and the CPU usage, or a reverse correlation, the number of requests drops as the CPU spikes? See, it's useful. Some of the details you can look at are Compilations/sec, Compilations/Batch and Recompilations/sec. These give you some idea of how the cache is getting used within the system. None of these showed anything interesting on either server. One metric that I like (even though I know it can be controversial) is the Page Life Expectancy. On the average server I expect see a series of mountains as the PLE climbs then drops due to a data load or something along those lines. That's not the case here: Those spikes back in January suggest that the servers weren't really being used much. The PLE on the rg-sql01 seems to be somewhat consistent growing to 3 hours or so then dropping, but the rg-sql02 PLE looks like it might be all over the map. Instead of continuing to look at this high level gathering data view, I'm going to drill down on rg-sql02 and see what it's done for the last week: And now we begin to see where we might have an issue. Memory on this system is getting flushed every 1/2 hour or so. I'm going to check another metric, scans: Whoa! I'm going back to the system real quick to look at some disk information again for rg-sql02. Here is the average disk queue length on the server: and the transfers Right, I think I have a guess as to what's up here. We're seeing memory get flushed constantly and we're seeing lots of scans. The disks are queuing, especially that F drive, and there are lots of requests that correspond to the scans and the memory flushes. In short, we've got queries that are scanning the data, a lot, so we either have bad queries or bad indexes. I'm going back to the server overview for rg-sql02 and check the Top 10 expensive queries. I'm modifying it to show me the last 3 days and the totals, so I'm not looking at some maintenance routine that ran 10 minutes ago and is skewing the results: OK. I need to look into these queries that are getting executed this much. They're generating a lot of reads, but which queries are generating the most reads: Ow, all still going against the same database. This is where I'm going to temporarily leave SQL Monitor. What I want to do is connect up to the server, validate that the Warehouse database is using the F:\ drive (which I'll put money down it is) and then start seeing what's up with these queries. Part 1 of the Series Part 2 of the Series

Read the article
SharePoint Scenario Framework

- by Damon

I've worked with SharePoint for some time now, and I like to think that I know all there is to know about it. Deep down I know that's not true, but it's a fun delusion. However, I found out yesterday that there is a mechanism in SharePoint called the Scenario Framework that has been around for a while but that I had no idea ever existed. It is used to maintain state between multi-page web forms and helps manage the navigation between those pages. If building out multiple-page forms in SharePoint is in your plans, you can find more information about the Scenario Framework on Waldek Mastykarz blog entry on the subject.

Read the article
Red Gate's on the road in 2012 - Will you catch us?

- by RedAndTheCommunity

Annabel Bradford, our Communities and Events Manager, tells all about her experience of our 1st SQL Saturday of the year. The first stop this year was SQL Saturday #104 Colorado Springs, back in early January. I made the trip across from the UK just for this SQL Saturday event, and I'm so glad I did. I picked up Max from Red Gate's Pasadena office and we flew into Colorado Springs airport late on Friday evening to be greeted by freezing temperatures, which was quite a shock after the California sunshine. Rising before the sun, we arrived at Mr Biggs, the venue for the event, in the darkness. It was great to see so many smiling attendees so bright and early on a Saturday morning. Everyone was eager to learn more about SQL Server, and hundreds of people came and chatted with us at the table, saw demos and learnt more about Red Gate tools. The event highlights for the attendees were definitely the unlimited lazer quest, bowling and pool available during the break times. For Max, Grant Fritchey and I on the Red Gate table, the highlights have to be meeting customers and getting the opportunity to meet attendees who'd heard of, but wanted to know more about, Red Gate. We were delighted to hear lots of valuable feedback that we took back to share with the team. As a thank you for sharing insights about their work lives and how they use SQL Server and Red Gate tools, attendees are able to take away Red Gate SQL Server books. We aim to have a range of titles available when we exhibit, so that attendees can choose a book that's going to be most interesting to them, and that they can use as a reference back at the office. Every time I meet a Red Gate user or a member of the SQL community, I'm always overwhelmed by the enthusiasm they have for their industry. Everyone who gives up their time to learn more about their job should be rewarded, and at Red Gate we like to do just that. Red Gate has long supported the SQL community through sponsorship to facilitate user group meetings and community events, but it's only though face-to-face contact that we really get a chance to see the impact of our support. I hope we'll have the chance to see you on the road at some point this year. We'll be at a range of events, including free SQL Saturdays, one day free events 'the Red Gate way', two-day Rallys, and full-week conferences. Next stop is SQL Saturday #109 Silicon Valley on March 3rd where you'll meet Jeff and Arneh, two of our US-based SQL team members. Be sure to ask them any questions you've got about the Red Gate tools, as these guys will be delighted to hear your questions, show you the options, and will make a note of your feedback to send through to the development team. Until the next time. Happy learning! Annabel Grant, Max and Annabel at SQL Saturday #104 Colorado Springs

Read the article
Normalisation and 'Anima notitia copia' (Soul of the Database)

- by Phil Factor

(A Guest Editorial for Simple-Talk) The other day, I was staring at the sys.syslanguages table in SQL Server with slightly-raised eyebrows . I’d just been reading Chris Date’s interesting book ‘SQL and Relational Theory’. He’d made the point that you’re not necessarily doing relational database operations by using a SQL Database product. The same general point was recently made by Dino Esposito about ASP.NET MVC. The use of ASP.NET MVC doesn’t guarantee you a good application design: It merely makes it possible to test it. The way I’d describe the sentiment in both cases is ‘you can hit someone over the head with a frying-pan but you can’t call it cooking’. SQL enables you to create relational databases. However, even if it smells bad, it is no crime to do hideously un-relational things with a SQL Database just so long as it’s necessary and you can tell the difference; not only that but also only if you’re aware of the risks and implications. Naturally, I’ve never knowingly created a database that Codd would have frowned at, but around the edges are interfaces and data feeds I’ve written that have caused hissy fits amongst the Normalisation fundamentalists. Part of the problem for those who agonise about such things is the misinterpretation of Atomicity. An atomic value is one for which, in the strange virtual universe you are creating in your database, you don’t have any interest in any of its component parts. If you aren’t interested in the electrons, neutrinos, muons, or taus, then an atom is ..er.. atomic. In the same way, if you are passed a JSON string or XML, and required to store it in a database, then all you need to do is to ask yourself, in your role as Anima notitia copia (Soul of the database) ‘have I any interest in the contents of this item of information?’. If the answer is ‘No!’, or ‘nequequam! Then it is an atomic value, however complex it may be. After all, you would never have the urge to store the pixels of images individually, under the misguided idea that these are the atomic values would you? I would, of course, ask the ‘Anima notitia copia’ rather than the application developers, since there may be more than one application, and the applications developers may be designing the application in the absence of full domain knowledge, (‘or by the seat of the pants’ as the technical term used to be). If, on the other hand, the answer is ‘sure, and we want to index the XML column’, then we may be in for some heavy XML-shredding sessions to get to store the ‘atomic’ values and ensure future harmony as the application develops. I went back to looking at the sys.syslanguages table. It has a months column with the months in a delimited list January,February,March,April,May,June,July,August,September,October,November,December This is an ordered list. Wicked? I seem to remember that this value, like shortmonths and days, is treated as a ‘thing’. It is merely passed off to an external C++ routine in order to format a date in a particular language, and never accessed directly within the database. As far as the database is concerned, it is an atomic value. There is more to normalisation than meets the eye.

Read the article
ANTS Performance Profiler 7.0 has been released!

- by Michaela Murray

Please join me in welcoming ANTS Performance Profiler 7 to the world of .NET. ANTS Performance Profiler is a .NET code profiling tool. It lets you identify performance bottlenecks within minutes and therefore enables you to optimize your application performance. Version 7.0 includes integrated decompilation: when profiling methods and assemblies with no source code file, you can generate source code right from the profiler interface. You can then browse and navigate this automatically generated source as if it was your own. If you have an assembly's PDB file but no source, integrated decompilation even lets you view line-level timings for each method, pinpointing the exact cause of performance bottlenecks. Integrated decompilation is powered by .NET Reflector, but you don't need Reflector installed to use the functionality. Watch this video to see it in action. Also new in ANTS Performance Profiler 7.0: · Full support for SharePoint 2010 - No need to manually configure profiling for the latest version of SharePoint · Full support for IIS Express · Azure and Amazon EC2 support, enabling you to profile in the cloud Please click here, for more details about the ANTS Performance Profiler 7.0.

Read the article
Curing the Database-Application mismatch

- by Phil Factor

If an application requires access to a database, then you have to be able to deploy it so as to be version-compatible with the database, in phase. If you can deploy both together, then the application and database must normally be deployed at the same version in which they, together, passed integration and functional testing. When a single database supports more than one application, then the problem gets more interesting. I’ll need to be more precise here. It is actually the application-interface definition of the database that needs to be in a compatible ‘version’. Most databases that get into production have no separate application-interface; in other words they are ‘close-coupled’. For this vast majority, the whole database is the application-interface, and applications are free to wander through the bowels of the database scot-free. If you’ve spurned the perceived wisdom of application architects to have a defined application-interface within the database that is based on views and stored procedures, any version-mismatch will be as sensitive as a kitten. A team that creates an application that makes direct access to base tables in a database will have to put a lot of energy into keeping Database and Application in sync, to say nothing of having to tackle issues such as security and audit. It is not the obvious route to development nirvana. I’ve been in countless tense meetings with application developers who initially bridle instinctively at the apparent restrictions of being ‘banned’ from the base tables or routines of a database. There is no good technical reason for needing that sort of access that I’ve ever come across. Everything that the application wants can be delivered via a set of views and procedures, and with far less pain for all concerned: This is the application-interface. If more than zero developers are creating a database-driven application, then the project will benefit from the loose-coupling that an application interface brings. What is important here is that the database development role is separated from the application development role, even if it is the same developer performing both roles. The idea of an application-interface with a database is as old as I can remember. The big corporate or government databases generally supported several applications, and there was little option. When a new application wanted access to an existing corporate database, the developers, and myself as technical architect, would have to meet with hatchet-faced DBAs and production staff to work out an interface. Sure, they would talk up the effort involved for budgetary reasons, but it was routine work, because it decoupled the database from its supporting applications. We’d be given our own stored procedures. One of them, I still remember, had ninety-two parameters. All database access was encapsulated in one application-module. If you have a stable defined application-interface with the database (Yes, one for each application usually) you need to keep the external definitions of the components of this interface in version control, linked with the application source, and carefully track and negotiate any changes between database developers and application developers. Essentially, the application development team owns the interface definition, and the onus is on the Database developers to implement it and maintain it, in conformance. Internally, the database can then make all sorts of changes and refactoring, as long as source control is maintained. If the application interface passes all the comprehensive integration and functional tests for the particular version they were designed for, nothing is broken. Your performance-testing can ‘hang’ on the same interface, since databases are judged on the performance of the application, not an ‘internal’ database process. The database developers have responsibility for maintaining the application-interface, but not its definition, as they refactor the database. This is easily tested on a daily basis since the tests are normally automated. In this setting, the deployment can proceed if the more stable application-interface, rather than the continuously-changing database, passes all tests for the version of the application. Normally, if all goes well, a database with a well-designed application interface can evolve gracefully without changing the external appearance of the interface, and this is confirmed by integration tests that check the interface, and which hopefully don’t need to be altered at all often. If the application is rapidly changing its ‘domain model’ in the light of an increased understanding of the application domain, then it can change the interface definitions and the database developers need only implement the interface rather than refactor the underlying database. The test team will also have to redo the functional and integration tests which are, of course ‘written to’ the definition. The Database developers will find it easier if these tests are done before their re-wiring job to implement the new interface. If, at the other extreme, an application receives no further development work but survives unchanged, the database can continue to change and develop to keep pace with the requirements of the other applications it supports, and needs only to take care that the application interface is never broken. Testing is easy since your automated scripts to test the interface do not need to change. The database developers will, of course, maintain their own source control for the database, and will be likely to maintain versions for all major releases. However, this will not need to be shared with the applications that the database servers. On the other hand, the definition of the application interfaces should be within the application source. Changes in it have to be subject to change-control procedures, as they will require a chain of tests. Once you allow, instead of an application-interface, an intimate relationship between application and database, we are in the realms of impedance mismatch, over and above the obvious security problems. Part of this impedance problem is a difference in development practices. Whereas the application has to be regularly built and integrated, this isn’t necessarily the case with the database. An RDBMS is inherently multi-user and self-integrating. If the developers work together on the database, then a subsequent integration of the database on a staging server doesn’t often bring nasty surprises. A separate database-integration process is only needed if the database is deliberately built in a way that mimics the application development process, but which hampers the normal database-development techniques. This process is like demanding a official walking with a red flag in front of a motor car. In order to closely coordinate databases with applications, entire databases have to be ‘versioned’, so that an application version can be matched with a database version to produce a working build without errors. There is no natural process to ‘version’ databases. Each development project will have to define a system for maintaining the version level. A curious paradox occurs in development when there is no formal application-interface. When the strains and cracks happen, the extra meetings, bureaucracy, and activity required to maintain accurate deployments looks to IT management like work. They see activity, and it looks good. Work means progress. Management then smile on the design choices made. In IT, good design work doesn’t necessarily look good, and vice versa.

Read the article
Tuning Red Gate: #2 of Many

- by Grant Fritchey

In the last installment, I used the SQL Monitor tool to get a snapshot view of the current state of the servers at Red Gate that are giving us trouble. That snapshot suggested some areas where I should focus some time, primarily in which queries were being called most frequently or were running the longest. But, you don't want to just run off & start tuning queries. Remember, the foundation for query tuning is the server itself. So, I want to be sure I'm not looking at some major hardware or configuration issues that I need to address first. Rather than look at the current status of the server, I'm going to look at historical data. Clicking on the Analysis tab of SQL Monitor I get a whole list of counters that I can look at. More importantly, I can look at them over a period of time. Even more importantly, I can compare past periods with current periods to see if we're looking at a progressive issue or not. There are counters here that will give me an indication of load, and there are counters here that will tell me specifics about that load. First, I want to just look at the load to understand where the pain points might be. Trying to drill down before you have detailed information is just bad planning. First thing I'm going to check is the CPU, just to see what's up there. I have two servers I'm interested in, so I'll show you both: Looking at the last 30 days for both servers, well, let's just say that the first server is about what I would expect. It has an average baseline behavior with occasional, regular, peaks. This looks like a system with a fairly steady & predictable load that probably has a nightly batch process that spikes the processor. In short, normal stuff. The points there where the CPU drops radically. that might be worth investigating further because something changed the processing on this system a lot. But the first server. It's all over the place. There's no steady CPU behavior at all. It's spike high for long periods of time. It's up, it's down. I'm really going to have to spend time looking at CPU issues on this server to try to figure out what's up. It might be other processes being shared on the server, it might be something else. Either way, I'm going to have to spend time evaluating this CPU, especially those peeks about a week ago. Looking at the Pages/sec, again, just a measure of load, I see that there are some peaks on the rg-sql02 server, but over all, it looks like a fairly standard load. Plus, the peaks are only up to 550 pages/sec. Remember, this isn't a performance measure, but just a load measurement, but from this, I don't think we're looking at major memory issues, but I may want to correlate these counters with the CPU counters. Again, the other server looks like there's stuff going on. The load is not at all consistent. In fact there was a point earlier in the year that looks pretty severe. Plus the spikes here are twice the size of the other system. We've got a lot more load going on here and I will probably need to drill down on memory usage on this server. Taking a look at the disk transfers/sec the load on both systems seems to roughly correspond to the other load indicators. Notice that drop right in the middle of the graph for rg-sql02. I wonder if the office was closed over that period or a system was down for maintenance. If I saw spikes in memory or disk that corresponded to the drip in CPU, you can assume something was using those other resources and causing a drop, but when everything goes down, it just means that the system isn't gettting used. The disk on the rg-sql01 system isn't spiking exactly the same way as the memory & cpu, so there's a good chance (chance mind you) that any performance issues might not be disk related. However, notice that huge jump at the beginning of the month. Several disks were used more than they were for the rest of the month. That's the load on the server. What about the load on SQL Server itself? Next time.

Read the article

Monthly Archives

Articles indexed in March 2012

Page 14/239 | < Previous Page | 10 11 12 13 14 15 16 17 18 19 20 21 | Next Page >

- by Tony Davis

- by Michaela Murray

- by Phil Factor

- by Damon

- by Simon Cooper

- by Damon

- by Damon

- by Alex.Davies

- by Alex.Davies

- by Tony Davis

- by Simon Cooper

- by Phil Factor

- by Grant Fritchey

- by Simon Cooper

- by Grant Fritchey

- by Damon

- by RedAndTheCommunity

- by Phil Factor

- by Michaela Murray

- by Phil Factor

- by Grant Fritchey

< Previous Page | 10 11 12 13 14 15 16 17 18 19 20 21 | Next Page >