Search Results

Search found 20859 results on 835 pages for 'little bobby tables'.

Page 87/835 | < Previous Page | 83 84 85 86 87 88 89 90 91 92 93 94 | Next Page >

Save all file names in a directory to a vector

- by Bobby

I need to save all ".xml" file names in a directory to a vector. To make a long story short, I cannot use the dirent API. It seems as if C++ does not have any concept of "directories". Once I have the filenames in a vector, I can iterate through and "fopen" these files. Is there an easy way to get these filenames at runtime?

Read the article
When should I implement IDisposeable?

- by Bobby

What is the best practice for when to implement IDisposeable? Is the best rule of thumb to implement it if you have one managed object in the class, or does it depend if the object was created in the class or just passed in? Should I also do it for classes with no managed objects at all?

Read the article
Stop Entity Framework from updating edmx model with a column that isn't needed

- by TMan

I have rowguids in all my tables to help with change tracking in all my tables. I don't want/need these tables in my edmx or my entities. However, I do still need to make changes to other things sometimes so everytime i go to update model from database in the edmx it adds all the rowguids in all my tables everytime and i have to manually delete each one. Is there a way to handle this from happening? Is there a way I can maybe edit the T4 to maybe ignore that 'rowguid' column? Database first Entity framework

Read the article
SVN files edited thru FTP, How do I fix the edited files so they work thru SVN again?

- by Bobby

Basically, a 3rd party analyst we brought on to make some improvements to our site decided to edit files directly on the live server threw FTP. Any time we make changes to those pages threw SVN we have PHP parse errors. Things should be edited threw the SVN and committed. We have our working copies setup so we can edit them how ever we want while having them running under apache for testing. We commit all of our changes to a local repository, then commit from there to the live server. The code on the live server that causes the parse error has " .mine" and " .rxxxx", and my question is, how can I revert those files back to before they were edited thru FTP so I can update them threw SVN again?

Read the article
[picker dismissModalViewControllerAnimated:YES]; not working

- by shishir.bobby

I am using - (void)imagePickerController:(UIImagePickerController *)picker didFinishPickingMediaWithInfo:(NSDictionary *)info { } for imagepicker and using [picker dismissModalViewControllerAnimated:YES]; in iPad to dismiss the picker. also i tried using - (void)imagePickerControllerDidCancel:(UIImagePickerController *)picker;{ [picker dismissModalViewControllerAnimated:YES]; } to dismiss the picker once it done the work. BUt its not dimissing it. when i tap anywhere in the screen, than only it gets dismiss. What i am doing wrong to dismiss the imagepicker? Many thnkas

Read the article
Where Should Using Statements Be Located [closed]

- by Bobby Ortiz - DotNetBob

Possible Duplicate: What is the difference between these two declarations? I recently started working on a project with using statement located inside the NameSpace block. namespace MyApp.Web { using System; using System.Web.Security; using System.Web; public class MyClass { I usually put my using statements above the namespace block. using System; using System.Web.Security; using System.Web; namespace MyApp.Web { public class MyClass { I don't think it matters, but I am currious if someone else had a recommendation and could they explain why one way is better than another. Note: I always have one class per file.

Read the article
SQL SELECT from third level table

- by Spidermain50

Ok I know little about SQL so bare with me... I'm trying to see if certain values exist in a third level table and I don't know how to go about it. Here is the scenario... I have a Accident table that holds accident information. It has 3 one-to-many child tables (Units, Occupants, NonMotorists). An each of those child tables have their own many-to-many child table (Alcohol). I need to be able to have some way of seeing if a range of values exists in a field in those Alcohol tables. Here is watered down version of what my structure for the tables looks like... --tblAccident--_ PK_AccidentNumber --tblAccidentUnit-- PK_PrimaryKey FK_AccidentNumber --tblAccidentOccupant-- PK_PrimaryKey FK_AccidentNumber --tblAccidentNonMotorist-- PK_PrimaryKey FK_AccidentNumber --tblAccidentUnitAlcohol-- PK_PrimaryKey FK_ForeignKey AlcoholValue <---- THIS IS WHAT I NEED TO SEARCH --tblAccidentOccupantAlcohol-- PK_PrimaryKey FK_ForeignKey AlcoholValue <---- THIS IS WHAT I NEED TO SEARCH --tblAccidentNonMotoristAlcohol-- PK_PrimaryKey FK_ForeignKey AlcoholValue <---- THIS IS WHAT I NEED TO SEARCH I hope this makes some sense as to what i am trying to accomplish. thank you

Read the article
Very basic database theory.

- by John R

I have a set of tables to show the relationship between organziations and supporters below. Although I have done some basic mySQL querries, I know very little about database 'design'. I plan to querry the database for: -a list of contributors to a specific organization... or, -a list of organizations that a specific suporter supports. The database tables for organiations and contributors may have other columns in the future and recieve a lesser amount of querries based on that information. A | X A | Y A | Z B | X B | Y C | X C | Z How should the tables be set up? I assume that there should be a third table, but there is still redundent information in the third table. Is there a better way of setting up the tables? +----+-------+ +-------------+----------+ +----+-------+ | id | org | | org | contr | | id | contr.| +----+-------+ +-------------+----------+ +----+-------+ | 1 | A | | 1 | 1 | | 1 | X | | 2 | B | | 1 | 2 | | 2 | Y | | 3 | C | | 1 | 3 | | 3 | Z | +----+-------+ | 2 | 1 | +----+-------+ | 2 | 2 | | 3 | 1 | | 3 | 3 | +-------------+----------+

Read the article
C++ hash table w/o using STL

- by Bobby

I need to create a hash table that has a key as a string, and value as an int. I cannot use STL containers on my target. Is there a suitable hash table class for this purpose?

Read the article
jQuery: How to get the first “td” in all rows

- by bobby

I have a table, and I want get the first “td” in all rows. My jquery here: $("table.SimpleTable tr td:first-child").css('background-color','red'); and my HTML here: <table class='SimpleTable' border="1" ID="Table1"> <tr> <td>Left</td> <td>Right</td> </tr> <tr> <td>Left</td> <td>Right</td> </tr> <tr> <td>Left</td> <td>Right</td> </tr> <tr> <td>Left</td> <td> <table border="1" ID="Table2"> <tr> <td>AAA</td> <td>AAA</td> <td>AAA</td> </tr> </table> </td> </tr> <tr> <td>Left</td> <td> <table border="1" ID="Table3"> <tr> <td>BBB</td> <td>BBB</td> <td>BBB</td> </tr> </table> </td> </tr> </table> The problem here it get the first "td" in the nested table of the second "td". Please help me!

Read the article
Is this possible in sql server 2005?

- by chandru_cp

This is my queries select ClientName,ClientMobNo from Clients select DriverName,DriverMobNo from Drivers It gives me two result tables... But i want to combine both the result tables into a single table... I tried union and union all it doesn't give me what i want.... Note: There is no relationship between the two tables...... There may be 200 clients and 100 drivers...

Read the article
Compare two table and find matching columns

- by Karthick

Hi, I have two tables table1 and table2, i need to write a select query which will list me the columns that exist in both the tables.(mysql) I need to do for different tables (2 at a time) Is this possible? I tried using INFORMATION_SCHEMA.COLUMNS but am not able to get it right.

Read the article
Database Design Question

- by Soo

Ok SO, I have a user table and want to define groups of users together. The best solution I have for this is to create three database tables as follows: UserTable user_id user_name UserGroupLink group_id member_id GroupInfo group_id group_name This method keeps the member and group information separate. This is just my way of thinking. Is there a better way to do this? Also, what is a good naming convention for tables that link two other tables?

Read the article
Getting Error 91

- by user1695788

I have a general comprehension issue with classes and objects. What I'm trying to do is pretty simple but I'm getting errors. In the code example below, sometimes the line "Call tables.MethodInCTables" runs fine and sometimes it produces error 91, object not set. IN all cases, I can "see" the method in the type ahead so I know that the code recognizes the "tables" instance and "sees" MethodInCTables. But then I get the run-time error. Sub MainSub() Dim tables as New CTables Call tables.MethodInCTables End Sub ----Class Module = CTables Sub MethodInCTables() ...do something End Sub

Read the article
T-SQL Tuesday #33: Trick Shots: Undocumented, Underdocumented, and Unknown Conspiracies!

- by Most Valuable Yak (Rob Volk)

Mike Fal (b | t) is hosting this month's T-SQL Tuesday on Trick Shots. I love this choice because I've been preoccupied with sneaky/tricky/evil SQL Server stuff for a long time and have been presenting on it for the past year. Mike's directives were "Show us a cool trick or process you developed…It doesn’t have to be useful", which most of my blogging definitely fits, and "Tell us what you learned from this trick…tell us how it gave you insight in to how SQL Server works", which is definitely a new concept. I've done a lot of reading and watching on SQL Server Internals and even attended training, but sometimes I need to go explore on my own, using my own tools and techniques. It's an itch I get every few months, and, well, it sure beats workin'. I've found some people to be intimidated by SQL Server's internals, and I'll admit there are A LOT of internals to keep track of, but there are tons of excellent resources that clearly document most of them, and show how knowing even the basics of internals can dramatically improve your database's performance. It may seem like rocket science, or even brain surgery, but you don't have to be a genius to understand it. Although being an "evil genius" can help you learn some things they haven't told you about. ;) This blog post isn't a traditional "deep dive" into internals, it's more of an approach to find out how a program works. It utilizes an extremely handy tool from an even more extremely handy suite of tools, Sysinternals. I'm not the only one who finds Sysinternals useful for SQL Server: Argenis Fernandez (b | t), Microsoft employee and former T-SQL Tuesday host, has an excellent presentation on how to troubleshoot SQL Server using Sysinternals, and I highly recommend it. Argenis didn't cover the Strings.exe utility, but I'll be using it to "hack" the SQL Server executable (DLL and EXE) files. Please note that I'm not promoting software piracy or applying these techniques to attack SQL Server via internal knowledge. This is strictly educational and doesn't reveal any proprietary Microsoft information. And since Argenis works for Microsoft and demonstrated Sysinternals with SQL Server, I'll just let him take the blame for it. :P (The truth is I've used Strings.exe on SQL Server before I ever met Argenis.) Once you download and install Strings.exe you can run it from the command line. For our purposes we'll want to run this in the Binn folder of your SQL Server instance (I'm referencing SQL Server 2012 RTM): cd "C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn" C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn> strings *sql*.dll > sqldll.txt C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn> strings *sql*.exe > sqlexe.txt I've limited myself to DLLs and EXEs that have "sql" in their names. There are quite a few more but I haven't examined them in any detail. (Homework assignment for you!) If you run this yourself you'll get 2 text files, one with all the extracted strings from every SQL DLL file, and the other with the SQL EXE strings. You can open these in Notepad, but you're better off using Notepad++, EditPad, Emacs, Vim or another more powerful text editor, as these will be several megabytes in size. And when you do open it…you'll find…a TON of gibberish. (If you think that's bad, just try opening the raw DLL or EXE file in Notepad. And by the way, don't do this in production, or even on a running instance of SQL Server.) Even if you don't clean up the file, you can still use your editor's search function to find a keyword like "SELECT" or some other item you expect to be there. As dumb as this sounds, I sometimes spend my lunch break just scanning the raw text for anything interesting. I'm boring like that. Sometimes though, having these files available can lead to some incredible learning experiences. For me the most recent time was after reading Joe Sack's post on non-parallel plan reasons. He mentions a new SQL Server 2012 execution plan element called NonParallelPlanReason, and demonstrates a query that generates "MaxDOPSetToOne". Joe (formerly on the Microsoft SQL Server product team, so he knows this stuff) mentioned that this new element was not currently documented and tried a few more examples to see what other reasons could be generated. Since I'd already run Strings.exe on the SQL Server DLLs and EXE files, it was easy to run grep/find/findstr for MaxDOPSetToOne on those extracts. Once I found which files it belonged to (sqlmin.dll) I opened the text to see if the other reasons were listed. As you can see in my comment on Joe's blog, there were about 20 additional non-parallel reasons. And while it's not "documentation" of this underdocumented feature, the names are pretty self-explanatory about what can prevent parallel processing. I especially like the ones about cursors – more ammo! - and am curious about the PDW compilation and Cloud DB replication reasons. One reason completely stumped me: NoParallelHekatonPlan. What the heck is a hekaton? Google and Wikipedia were vague, and the top results were not in English. I found one reference to Greek, stating "hekaton" can be translated as "hundredfold"; with a little more Wikipedia-ing this leads to hecto, the prefix for "one hundred" as a unit of measure. I'm not sure why Microsoft chose hekaton for such a plan name, but having already learned some Greek I figured I might as well dig some more in the DLL text for hekaton. Here's what I found: hekaton_slow_param_passing Occurs when a Hekaton procedure call dispatch goes to slow parameter passing code path The reason why Hekaton parameter passing code took the slow code path hekaton_slow_param_pass_reason sp_deploy_hekaton_database sp_undeploy_hekaton_database sp_drop_hekaton_database sp_checkpoint_hekaton_database sp_restore_hekaton_database e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\hkproc.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\matgen.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\matquery.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\sqlmeta.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\resultset.cpp Interesting! The first 4 entries (in red) mention parameters and "slow code". Could this be the foundation of the mythical DBCC RUNFASTER command? Have I been passing my parameters the slow way all this time? And what about those sp_xxxx_hekaton_database procedures (in blue)? Could THEY be the secret to a faster SQL Server? Could they promise a "hundredfold" improvement in performance? Are these special, super-undocumented DIB (databases in black)? I decided to look in the SQL Server system views for any objects with hekaton in the name, or references to them, in hopes of discovering some new code that would answer all my questions: SELECT name FROM sys.all_objects WHERE name LIKE '%hekaton%' SELECT name FROM sys.all_objects WHERE object_definition(OBJECT_ID) LIKE '%hekaton%' Which revealed: name ------------------------ (0 row(s) affected) name ------------------------ sp_createstats sp_recompile sp_updatestats (3 row(s) affected) Hmm. Well that didn't find much. Looks like these procedures are seriously undocumented, unknown, perhaps forbidden knowledge. Maybe a part of some unspeakable evil? (No, I'm not paranoid, I just like mysteries and thought that punching this up with that kind of thing might keep you reading. I know I'd fall asleep without it.) OK, so let's check out those 3 procedures and see what they reveal when I search for "Hekaton": sp_createstats: -- filter out local temp tables, Hekaton tables, and tables for which current user has no permissions -- Note that OBJECTPROPERTY returns NULL on type="IT" tables, thus we only call it on type='U' tables OK, that's interesting, let's go looking down a little further: ((@table_type<>'U') or (0 = OBJECTPROPERTY(@table_id, 'TableIsInMemory'))) and -- Hekaton table Wellllll, that tells us a few new things: There's such a thing as Hekaton tables (UPDATE: I'm not the only one to have found them!) They are not standard user tables and probably not in memory UPDATE: I misinterpreted this because I didn't read all the code when I wrote this blog post. The OBJECTPROPERTY function has an undocumented TableIsInMemory option Let's check out sp_recompile: -- (3) Must not be a Hekaton procedure. And once again go a little further: if (ObjectProperty(@objid, 'IsExecuted') <> 0 AND ObjectProperty(@objid, 'IsInlineFunction') = 0 AND ObjectProperty(@objid, 'IsView') = 0 AND -- Hekaton procedure cannot be recompiled -- Make them go through schema version bumping branch, which will fail ObjectProperty(@objid, 'ExecIsCompiledProc') = 0) And now we learn that hekaton procedures also exist, they can't be recompiled, there's a "schema version bumping branch" somewhere, and OBJECTPROPERTY has another undocumented option, ExecIsCompiledProc. (If you experiment with this you'll find this option returns null, I think it only works when called from a system object.) This is neat! Sadly sp_updatestats doesn't reveal anything new, the comments about hekaton are the same as sp_createstats. But we've ALSO discovered undocumented features for the OBJECTPROPERTY function, which we can now search for: SELECT name, object_definition(OBJECT_ID) FROM sys.all_objects WHERE object_definition(OBJECT_ID) LIKE '%OBJECTPROPERTY(%' I'll leave that to you as more homework. I should add that searching the system procedures was recommended long ago by the late, great Ken Henderson, in his Guru's Guide books, as a great way to find undocumented features. That seems to be really good advice! Now if you're a programmer/hacker, you've probably been drooling over the last 5 entries for hekaton (in green), because these are the names of source code files for SQL Server! Does this mean we can access the source code for SQL Server? As The Oracle suggested to Neo, can we return to The Source??? Actually, no. Well, maybe a little bit. While you won't get the actual source code from the compiled DLL and EXE files, you'll get references to source files, debugging symbols, variables and module names, error messages, and even the startup flags for SQL Server. And if you search for "DBCC" or "CHECKDB" you'll find a really nice section listing all the DBCC commands, including the undocumented ones. Granted those are pretty easy to find online, but you may be surprised what those web sites DIDN'T tell you! (And neither will I, go look for yourself!) And as we saw earlier, you'll also find execution plan elements, query processing rules, and who knows what else. It's also instructive to see how Microsoft organizes their source directories, how various components (storage engine, query processor, Full Text, AlwaysOn/HADR) are split into smaller modules. There are over 2000 source file references, go do some exploring! So what did we learn? We can pull strings out of executable files, search them for known items, browse them for unknown items, and use the results to examine internal code to learn even more things about SQL Server. We've even learned how to use command-line utilities! We are now 1337 h4X0rz! (Not really. I hate that leetspeak crap.) Although, I must confess I might've gone too far with the "conspiracy" part of this post. I apologize for that, it's just my overactive imagination. There's really no hidden agenda or conspiracy regarding SQL Server internals. It's not The Matrix. It's not like you'd find anything like that in there: Attach Matrix Database DM_MATRIX_COMM_PIPELINES MATRIXXACTPARTICIPANTS dm_matrix_agents Alright, enough of this paranoid ranting! Microsoft are not really evil! It's not like they're The Borg from Star Trek: ALTER FEDERATION DROP ALTER FEDERATION SPLIT DROP FEDERATION #tsql2sday

Read the article
Red Gate Coder interviews: Alex Davies

- by Michael Williamson

Alex Davies has been a software engineer at Red Gate since graduating from university, and is currently busy working on .NET Demon. We talked about tackling parallel programming with his actors framework, a scientific approach to debugging, and how JavaScript is going to affect the programming languages we use in years to come. So, if we start at the start, how did you get started in programming? When I was seven or eight, I was given a BBC Micro for Christmas. I had asked for a Game Boy, but my dad thought it would be better to give me a proper computer. For a year or so, I only played games on it, but then I found the user guide for writing programs in it. I gradually started doing more stuff on it and found it fun. I liked creating. As I went into senior school I continued to write stuff on there, trying to write games that weren’t very good. I got a real computer when I was fourteen and found ways to write BASIC on it. Visual Basic to start with, and then something more interesting than that. How did you learn to program? Was there someone helping you out? Absolutely not! I learnt out of a book, or by experimenting. I remember the first time I found a loop, I was like “Oh my God! I don’t have to write out the same line over and over and over again any more. It’s amazing!” When did you think this might be something that you actually wanted to do as a career? For a long time, I thought it wasn’t something that you would do as a career, because it was too much fun to be a career. I thought I’d do chemistry at university and some kind of career based on chemical engineering. And then I went to a careers fair at school when I was seventeen or eighteen, and it just didn’t interest me whatsoever. I thought “I could be a programmer, and there’s loads of money there, and I’m good at it, and it’s fun”, but also that I shouldn’t spoil my hobby. Now I don’t really program in my spare time any more, which is a bit of a shame, but I program all the rest of the time, so I can live with it. Do you think you learnt much about programming at university? Yes, definitely! I went into university knowing how to make computers do anything I wanted them to do. However, I didn’t have the language to talk about algorithms, so the algorithms course in my first year was massively important. Learning other language paradigms like functional programming was really good for breadth of understanding. Functional programming influences normal programming through design rather than actually using it all the time. I draw inspiration from it to write imperative programs which I think is actually becoming really fashionable now, but I’ve been doing it for ages. I did it first! There were also some courses on really odd programming languages, a bit of Prolog, a little bit of C. Having a little bit of each of those is something that I would have never done on my own, so it was important. And then there are knowledge-based courses which are about not programming itself but things that have been programmed like TCP. Those are really important for examples for how to approach things. Did you do any internships while you were at university? Yeah, I spent both of my summers at the same company. I thought I could code well before I went there. Looking back at the crap that I produced, it was only surpassed in its crappiness by all of the other code already in that company. I’m so much better at writing nice code now than I used to be back then. Was there just not a culture of looking after your code? There was, they just didn’t hire people for their abilities in that area. They hired people for raw IQ. The first indicator of it going wrong was that they didn’t have any computer scientists, which is a bit odd in a programming company. But even beyond that they didn’t have people who learnt architecture from anyone else. Most of them had started straight out of university, so never really had experience or mentors to learn from. There wasn’t the experience to draw from to teach each other. In the second half of my second internship, I was being given tasks like looking at new technologies and teaching people stuff. Interns shouldn’t be teaching people how to do their jobs! All interns are going to have little nuggets of things that you don’t know about, but they shouldn’t consistently be the ones who know the most. It’s not a good environment to learn. I was going to ask how you found working with people who were more experienced than you… When I reached Red Gate, I found some people who were more experienced programmers than me, and that was difficult. I’ve been coding since I was tiny. At university there were people who were cleverer than me, but there weren’t very many who were more experienced programmers than me. During my internship, I didn’t find anyone who I classed as being a noticeably more experienced programmer than me. So, it was a shock to the system to have valid criticisms rather than just formatting criticisms. However, Red Gate’s not so big on the actual code review, at least it wasn’t when I started. We did an entire product release and then somebody looked over all of the UI of that product which I’d written and say what they didn’t like. By that point, it was way too late and I’d disagree with them. Do you think the lack of code reviews was a bad thing? I think if there’s going to be any oversight of new people, then it should be continuous rather than chunky. For me I don’t mind too much, I could go out and get oversight if I wanted it, and in those situations I felt comfortable without it. If I was managing the new person, then maybe I’d be keener on oversight and then the right way to do it is continuously and in very, very small chunks. Have you had any significant projects you’ve worked on outside of a job? When I was a teenager I wrote all sorts of stuff. I used to write games, I derived how to do isomorphic projections myself once. I didn’t know what the word was so I couldn’t Google for it, so I worked it out myself. It was horrifically complicated. But it sort of tailed off when I started at university, and is now basically zero. If I do side-projects now, they tend to be work-related side projects like my actors framework, NAct, which I started in a down tools week. Could you explain a little more about NAct? It is a little C# framework for writing parallel code more easily. Parallel programming is difficult when you need to write to shared data. Sometimes parallel programming is easy because you don’t need to write to shared data. When you do need to access shared data, you could just have your threads pile in and do their work, but then you would screw up the data because the threads would trample on each other’s toes. You could lock, but locks are really dangerous if you’re using more than one of them. You get interactions like deadlocks, and that’s just nasty. Actors instead allows you to say this piece of data belongs to this thread of execution, and nobody else can read it. If you want to read it, then ask that thread of execution for a piece of it by sending a message, and it will send the data back by a message. And that avoids deadlocks as long as you follow some obvious rules about not making your actors sit around waiting for other actors to do something. There are lots of ways to write actors, NAct allows you to do it as if it was method calls on other objects, which means you get all the strong type-safety that C# programmers like. Do you think that this is suitable for the majority of parallel programming, or do you think it’s only suitable for specific cases? It’s suitable for most difficult parallel programming. If you’ve just got a hundred web requests which are all independent of each other, then I wouldn’t bother because it’s easier to just spin them up in separate threads and they can proceed independently of each other. But where you’ve got difficult parallel programming, where you’ve got multiple threads accessing multiple bits of data in multiple ways at different times, then actors is at least as good as all other ways, and is, I reckon, easier to think about. When you’re using actors, you presumably still have to write your code in a different way from you would otherwise using single-threaded code. You can’t use actors with any methods that have return types, because you’re not allowed to call into another actor and wait for it. If you want to get a piece of data out of another actor, then you’ve got to use tasks so that you can use “async” and “await” to await asynchronously for it. But other than that, you can still stick things in classes so it’s not too different really. Rather than having thousands of objects with mutable state, you can use component-orientated design, where there are only a few mutable classes which each have a small number of instances. Then there can be thousands of immutable objects. If you tend to do that anyway, then actors isn’t much of a jump. If I’ve already built my system without any parallelism, how hard is it to add actors to exploit all eight cores on my desktop? Usually pretty easy. If you can identify even one boundary where things look like messages and you have components where some objects live on one side and these other objects live on the other side, then you can have a granddaddy object on one side be an actor and it will parallelise as it goes across that boundary. Not too difficult. If we do get 1000-core desktop PCs, do you think actors will scale up? It’s hard. There are always in the order of twenty to fifty actors in my whole program because I tend to write each component as actors, and I tend to have one instance of each component. So this won’t scale to a thousand cores. What you can do is write data structures out of actors. I use dictionaries all over the place, and if you need a dictionary that is going to be accessed concurrently, then you could build one of those out of actors in no time. You can use queuing to marshal requests between different slices of the dictionary which are living on different threads. So it’s like a distributed hash table but all of the chunks of it are on the same machine. That means that each of these thousand processors has cached one small piece of the dictionary. I reckon it wouldn’t be too big a leap to start doing proper parallelism. Do you think it helps if actors get baked into the language, similarly to Erlang? Erlang is excellent in that it has thread-local garbage collection. C# doesn’t, so there’s a limit to how well C# actors can possibly scale because there’s a single garbage collected heap shared between all of them. When you do a global garbage collection, you’ve got to stop all of the actors, which is seriously expensive, whereas in Erlang garbage collections happen per-actor, so they’re insanely cheap. However, Erlang deviated from all the sensible language design that people have used recently and has just come up with crazy stuff. You can definitely retrofit thread-local garbage collection to .NET, and then it’s quite well-suited to support actors, even if it’s not baked into the language. Speaking of language design, do you have a favourite programming language? I’ll choose a language which I’ve never written before. I like the idea of Scala. It sounds like C#, only with some of the niggles gone. I enjoy writing static types. It means you don’t have to writing tests so much. When you say it doesn’t have some of the niggles? C# doesn’t allow the use of a property as a method group. It doesn’t have Scala case classes, or sum types, where you can do a switch statement and the compiler checks that you’ve checked all the cases, which is really useful in functional-style programming. Pattern-matching, in other words. That’s actually the major niggle. C# is pretty good, and I’m quite happy with C#. And what about going even further with the type system to remove the need for tests to something like Haskell? Or is that a step too far? I’m quite a pragmatist, I don’t think I could deal with trying to write big systems in languages with too few other users, especially when learning how to structure things. I just don’t know anyone who can teach me, and the Internet won’t teach me. That’s the main reason I wouldn’t use it. If I turned up at a company that writes big systems in Haskell, I would have no objection to that, but I wouldn’t instigate it. What about things in C#? For instance, there’s contracts in C#, so you can try to statically verify a bit more about your code. Do you think that’s useful, or just not worthwhile? I’ve not really tried it. My hunch is that it needs to be built into the language and be quite mathematical for it to work in real life, and that doesn’t seem to have ended up true for C# contracts. I don’t think anyone who’s tried them thinks they’re any good. I might be wrong. On a slightly different note, how do you like to debug code? I think I’m quite an odd debugger. I use guesswork extremely rarely, especially if something seems quite difficult to debug. I’ve been bitten spending hours and hours on guesswork and not being scientific about debugging in the past, so now I’m scientific to a fault. What I want is to see the bug happening in the debugger, to step through the bug happening. To watch the program going from a valid state to an invalid state. When there’s a bug and I can’t work out why it’s happening, I try to find some piece of evidence which places the bug in one section of the code. From that experiment, I binary chop on the possible causes of the bug. I suppose that means binary chopping on places in the code, or binary chopping on a stage through a processing cycle. Basically, I’m very stupid about how I debug. I won’t make any guesses, I won’t use any intuition, I will only identify the experiment that’s going to binary chop most effectively and repeat rather than trying to guess anything. I suppose it’s quite top-down. Is most of the time then spent in the debugger? Absolutely, if at all possible I will never debug using print statements or logs. I don’t really hold much stock in outputting logs. If there’s any bug which can be reproduced locally, I’d rather do it in the debugger than outputting logs. And with SmartAssembly error reporting, there’s not a lot that can’t be either observed in an error report and just fixed, or reproduced locally. And in those other situations, maybe I’ll use logs. But I hate using logs. You stare at the log, trying to guess what’s going on, and that’s exactly what I don’t like doing. You have to just look at it and see does this look right or wrong. We’ve covered how you get to grip with bugs. How do you get to grips with an entire codebase? I watch it in the debugger. I find little bugs and then try to fix them, and mostly do it by watching them in the debugger and gradually getting an understanding of how the code works using my process of binary chopping. I have to do a lot of reading and watching code to choose where my slicing-in-half experiment is going to be. The last time I did it was SmartAssembly. The old code was a complete mess, but at least it did things top to bottom. There wasn’t too much of some of the big abstractions where flow of control goes all over the place, into a base class and back again. Code’s really hard to understand when that happens. So I like to choose a little bug and try to fix it, and choose a bigger bug and try to fix it. Definitely learn by doing. I want to always have an aim so that I get a little achievement after every few hours of debugging. Once I’ve learnt the codebase I might be able to fix all the bugs in an hour, but I’d rather be using them as an aim while I’m learning the codebase. If I was a maintainer of a codebase, what should I do to make it as easy as possible for you to understand? Keep distinct concepts in different places. And name your stuff so that it’s obvious which concepts live there. You shouldn’t have some variable that gets set miles up the top of somewhere, and then is read miles down to choose some later behaviour. I’m talking from a very much SmartAssembly point of view because the old SmartAssembly codebase had tons and tons of these things, where it would read some property of the code and then deal with it later. Just thousands of variables in scope. Loads of things to think about. If you can keep concepts separate, then it aids me in my process of fixing bugs one at a time, because each bug is going to more or less be understandable in the one place where it is. And what about tests? Do you think they help at all? I’ve never had the opportunity to learn a codebase which has had tests, I don’t know what it’s like! What about when you’re actually developing? How useful do you find tests in finding bugs or regressions? Finding regressions, absolutely. Running bits of code that would be quite hard to run otherwise, definitely. It doesn’t happen very often that a test finds a bug in the first place. I don’t really buy nebulous promises like tests being a good way to think about the spec of the code. My thinking goes something like “This code works at the moment, great, ship it! Ah, there’s a way that this code doesn’t work. Okay, write a test, demonstrate that it doesn’t work, fix it, use the test to demonstrate that it’s now fixed, and keep the test for future regressions.” The most valuable tests are for bugs that have actually happened at some point, because bugs that have actually happened at some point, despite the fact that you think you’ve fixed them, are way more likely to appear again than new bugs are. Does that mean that when you write your code the first time, there are no tests? Often. The chance of there being a bug in a new feature is relatively unaffected by whether I’ve written a test for that new feature because I’m not good enough at writing tests to think of bugs that I would have written into the code. So not writing regression tests for all of your code hasn’t affected you too badly? There are different kinds of features. Some of them just always work, and are just not flaky, they just continue working whatever you throw at them. Maybe because the type-checker is particularly effective around them. Writing tests for those features which just tend to always work is a waste of time. And because it’s a waste of time I’ll tend to wait until a feature has demonstrated its flakiness by having bugs in it before I start trying to test it. You can get a feel for whether it’s going to be flaky code as you’re writing it. I try to write it to make it not flaky, but there are some things that are just inherently flaky. And very occasionally, I’ll think “this is going to be flaky” as I’m writing, and then maybe do a test, but not most of the time. How do you think your programming style has changed over time? I’ve got clearer about what the right way of doing things is. I used to flip-flop a lot between different ideas. Five years ago I came up with some really good ideas and some really terrible ideas. All of them seemed great when I thought of them, but they were quite diverse ideas, whereas now I have a smaller set of reliable ideas that are actually good for structuring code. So my code is probably more similar to itself than it used to be back in the day, when I was trying stuff out. I’ve got more disciplined about encapsulation, I think. There are operational things like I use actors more now than I used to, and that forces me to use immutability more than I used to. The first code that I wrote in Red Gate was the memory profiler UI, and that was an actor, I just didn’t know the name of it at the time. I don’t really use object-orientation. By object-orientation, I mean having n objects of the same type which are mutable. I want a constant number of objects that are mutable, and they should be different types. I stick stuff in dictionaries and then have one thing that owns the dictionary and puts stuff in and out of it. That’s definitely a pattern that I’ve seen recently. I think maybe I’m doing functional programming. Possibly. It’s plausible. If you had to summarise the essence of programming in a pithy sentence, how would you do it? Programming is the form of art that, without losing any of the beauty of architecture or fine art, allows you to produce things that people love and you make money from. So you think it’s an art rather than a science? It’s a little bit of engineering, a smidgeon of maths, but it’s not science. Like architecture, programming is on that boundary between art and engineering. If you want to do it really nicely, it’s mostly art. You can get away with doing architecture and programming entirely by having a good engineering mind, but you’re not going to produce anything nice. You’re not going to have joy doing it if you’re an engineering mind. Architects who are just engineering minds are not going to enjoy their job. I suppose engineering is the foundation on which you build the art. Exactly. How do you think programming is going to change over the next ten years? There will be an unfortunate shift towards dynamically-typed languages, because of JavaScript. JavaScript has an unfair advantage. JavaScript’s unfair advantage will cause more people to be exposed to dynamically-typed languages, which means other dynamically-typed languages crop up and the best features go into dynamically-typed languages. Then people conflate the good features with the fact that it’s dynamically-typed, and more investment goes into dynamically-typed languages. They end up better, so people use them. What about the idea of compiling other languages, possibly statically-typed, to JavaScript? It’s a reasonable idea. I would like to do it, but I don’t think enough people in the world are going to do it to make it pick up. The hordes of beginners are the lifeblood of a language community. They are what makes there be good tools and what makes there be vibrant community websites. And any particular thing which is the same as JavaScript only with extra stuff added to it, although it might be technically great, is not going to have the hordes of beginners. JavaScript is always to be quickest and easiest way for a beginner to start programming in the browser. And dynamically-typed languages are great for beginners. Compilers are pretty scary and beginners don’t write big code. And having your errors come up in the same place, whether they’re statically checkable errors or not, is quite nice for a beginner. If someone asked me to teach them some programming, I’d teach them JavaScript. If dynamically-typed languages are great for beginners, when do you think the benefits of static typing start to kick in? The value of having a statically typed program is in the tools that rely on the static types to produce a smooth IDE experience rather than actually telling me my compile errors. And only once you’re experienced enough a programmer that having a really smooth IDE experience makes a blind bit of difference, does static typing make a blind bit of difference. So it’s not really about size of codebase. If I go and write up a tiny program, I’m still going to get value out of writing it in C# using ReSharper because I’m experienced with C# and ReSharper enough to be able to write code five times faster if I have that help. Any other visions of the future? Nobody’s going to use actors. Because everyone’s going to be running on single-core VMs connected over network-ready protocols like JSON over HTTP. So, parallelism within one operating system is going to die. But until then, you should use actors. More Red Gater Coder interviews

Read the article
MySQL Memory usage

- by Rob Stevenson-Leggett

Our MySQL server seems to be using a lot of memory. I've tried looking for slow queries and queries with no index and have halved the peak CPU usage and Apache memory usage but the MySQL memory stays constantly at 2.2GB (~51% of available memory on the server). Here's the graph from Plesk. Running top in the SSH window shows the same figures. Does anyone have any ideas on why the memory usage is constant like this and not peaks and troughs with usage of the app? Here's the output of the MySQL Tuning Primer script: -- MYSQL PERFORMANCE TUNING PRIMER -- - By: Matthew Montgomery - MySQL Version 5.0.77-log x86_64 Uptime = 1 days 14 hrs 4 min 21 sec Avg. qps = 22 Total Questions = 3059456 Threads Connected = 13 Warning: Server has not been running for at least 48hrs. It may not be safe to use these recommendations To find out more information on how each of these runtime variables effects performance visit: http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html Visit http://www.mysql.com/products/enterprise/advisors.html for info about MySQL's Enterprise Monitoring and Advisory Service SLOW QUERIES The slow query log is enabled. Current long_query_time = 1 sec. You have 6 out of 3059477 that take longer than 1 sec. to complete Your long_query_time seems to be fine BINARY UPDATE LOG The binary update log is NOT enabled. You will not be able to do point in time recovery See http://dev.mysql.com/doc/refman/5.0/en/point-in-time-recovery.html WORKER THREADS Current thread_cache_size = 0 Current threads_cached = 0 Current threads_per_sec = 2 Historic threads_per_sec = 0 Threads created per/sec are overrunning threads cached You should raise thread_cache_size MAX CONNECTIONS Current max_connections = 100 Current threads_connected = 14 Historic max_used_connections = 20 The number of used connections is 20% of the configured maximum. Your max_connections variable seems to be fine. INNODB STATUS Current InnoDB index space = 6 M Current InnoDB data space = 18 M Current InnoDB buffer pool free = 0 % Current innodb_buffer_pool_size = 8 M Depending on how much space your innodb indexes take up it may be safe to increase this value to up to 2 / 3 of total system memory MEMORY USAGE Max Memory Ever Allocated : 2.07 G Configured Max Per-thread Buffers : 274 M Configured Max Global Buffers : 2.01 G Configured Max Memory Limit : 2.28 G Physical Memory : 3.84 G Max memory limit seem to be within acceptable norms KEY BUFFER Current MyISAM index space = 4 M Current key_buffer_size = 7 M Key cache miss rate is 1 : 40 Key buffer free ratio = 81 % Your key_buffer_size seems to be fine QUERY CACHE Query cache is supported but not enabled Perhaps you should set the query_cache_size SORT OPERATIONS Current sort_buffer_size = 2 M Current read_rnd_buffer_size = 256 K Sort buffer seems to be fine JOINS Current join_buffer_size = 132.00 K You have had 16 queries where a join could not use an index properly You should enable "log-queries-not-using-indexes" Then look for non indexed joins in the slow query log. If you are unable to optimize your queries you may want to increase your join_buffer_size to accommodate larger joins in one pass. Note! This script will still suggest raising the join_buffer_size when ANY joins not using indexes are found. OPEN FILES LIMIT Current open_files_limit = 1024 files The open_files_limit should typically be set to at least 2x-3x that of table_cache if you have heavy MyISAM usage. Your open_files_limit value seems to be fine TABLE CACHE Current table_cache value = 64 tables You have a total of 426 tables You have 64 open tables. Current table_cache hit rate is 1% , while 100% of your table cache is in use You should probably increase your table_cache TEMP TABLES Current max_heap_table_size = 16 M Current tmp_table_size = 32 M Of 15134 temp tables, 9% were created on disk Effective in-memory tmp_table_size is limited to max_heap_table_size. Created disk tmp tables ratio seems fine TABLE SCANS Current read_buffer_size = 128 K Current table scan ratio = 2915 : 1 read_buffer_size seems to be fine TABLE LOCKING Current Lock Wait ratio = 1 : 142213 Your table locking seems to be fine The app is a facebook game with about 50-100 concurrent users. Thanks, Rob

Read the article
Basics of Join Predicate Pushdown in Oracle

- by Maria Colgan

Happy New Year to all of our readers! We hope you all had a great holiday season. We start the new year by continuing our series on Optimizer transformations. This time it is the turn of Predicate Pushdown. I would like to thank Rafi Ahmed for the content of this blog.Normally, a view cannot be joined with an index-based nested loop (i.e., index access) join, since a view, in contrast with a base table, does not have an index defined on it. A view can only be joined with other tables using three methods: hash, nested loop, and sort-merge joins. Introduction The join predicate pushdown (JPPD) transformation allows a view to be joined with index-based nested-loop join method, which may provide a more optimal alternative. In the join predicate pushdown transformation, the view remains a separate query block, but it contains the join predicate, which is pushed down from its containing query block into the view. The view thus becomes correlated and must be evaluated for each row of the outer query block. These pushed-down join predicates, once inside the view, open up new index access paths on the base tables inside the view; this allows the view to be joined with index-based nested-loop join method, thereby enabling the optimizer to select an efficient execution plan. The join predicate pushdown transformation is not always optimal. The join predicate pushed-down view becomes correlated and it must be evaluated for each outer row; if there is a large number of outer rows, the cost of evaluating the view multiple times may make the nested-loop join suboptimal, and therefore joining the view with hash or sort-merge join method may be more efficient. The decision whether to push down join predicates into a view is determined by evaluating the costs of the outer query with and without the join predicate pushdown transformation under Oracle's cost-based query transformation framework. The join predicate pushdown transformation applies to both non-mergeable views and mergeable views and to pre-defined and inline views as well as to views generated internally by the optimizer during various transformations. The following shows the types of views on which join predicate pushdown is currently supported. UNION ALL/UNION view Outer-joined view Anti-joined view Semi-joined view DISTINCT view GROUP-BY view Examples Consider query A, which has an outer-joined view V. The view cannot be merged, as it contains two tables, and the join between these two tables must be performed before the join between the view and the outer table T4. A: SELECT T4.unique1, V.unique3 FROM T_4K T4, (SELECT T10.unique3, T10.hundred, T10.ten FROM T_5K T5, T_10K T10 WHERE T5.unique3 = T10.unique3) VWHERE T4.unique3 = V.hundred(+) AND T4.ten = V.ten(+) AND T4.thousand = 5; The following shows the non-default plan for query A generated by disabling join predicate pushdown. When query A undergoes join predicate pushdown, it yields query B. Note that query B is expressed in a non-standard SQL and shows an internal representation of the query. B: SELECT T4.unique1, V.unique3 FROM T_4K T4, (SELECT T10.unique3, T10.hundred, T10.ten FROM T_5K T5, T_10K T10 WHERE T5.unique3 = T10.unique3 AND T4.unique3 = V.hundred(+) AND T4.ten = V.ten(+)) V WHERE T4.thousand = 5; The execution plan for query B is shown below. In the execution plan BX, note the keyword 'VIEW PUSHED PREDICATE' indicates that the view has undergone the join predicate pushdown transformation. The join predicates (shown here in red) have been moved into the view V; these join predicates open up index access paths thereby enabling index-based nested-loop join of the view. With join predicate pushdown, the cost of query A has come down from 62 to 32. As mentioned earlier, the join predicate pushdown transformation is cost-based, and a join predicate pushed-down plan is selected only when it reduces the overall cost. Consider another example of a query C, which contains a view with the UNION ALL set operator.C: SELECT R.unique1, V.unique3 FROM T_5K R, (SELECT T1.unique3, T2.unique1+T1.unique1 FROM T_5K T1, T_10K T2 WHERE T1.unique1 = T2.unique1 UNION ALL SELECT T1.unique3, T2.unique2 FROM G_4K T1, T_10K T2 WHERE T1.unique1 = T2.unique1) V WHERE R.unique3 = V.unique3 and R.thousand < 1; The execution plan of query C is shown below. In the above, 'VIEW UNION ALL PUSHED PREDICATE' indicates that the UNION ALL view has undergone the join predicate pushdown transformation. As can be seen, here the join predicate has been replicated and pushed inside every branch of the UNION ALL view. The join predicates (shown here in red) open up index access paths thereby enabling index-based nested loop join of the view. Consider query D as an example of join predicate pushdown into a distinct view. We have the following cardinalities of the tables involved in query D: Sales (1,016,271), Customers (50,000), and Costs (787,766). D: SELECT C.cust_last_name, C.cust_city FROM customers C, (SELECT DISTINCT S.cust_id FROM sales S, costs CT WHERE S.prod_id = CT.prod_id and CT.unit_price > 70) V WHERE C.cust_state_province = 'CA' and C.cust_id = V.cust_id; The execution plan of query D is shown below. As shown in XD, when query D undergoes join predicate pushdown transformation, the expensive DISTINCT operator is removed and the join is converted into a semi-join; this is possible, since all the SELECT list items of the view participate in an equi-join with the outer tables. Under similar conditions, when a group-by view undergoes join predicate pushdown transformation, the expensive group-by operator can also be removed. With the join predicate pushdown transformation, the elapsed time of query D came down from 63 seconds to 5 seconds. Since distinct and group-by views are mergeable views, the cost-based transformation framework also compares the cost of merging the view with that of join predicate pushdown in selecting the most optimal execution plan. Summary We have tried to illustrate the basic ideas behind join predicate pushdown on different types of views by showing example queries that are quite simple. Oracle can handle far more complex queries and other types of views not shown here in the examples. Again many thanks to Rafi Ahmed for the content of this blog post.

Read the article
T-SQL Tuesday #005: Creating SSMS Custom Reports

- by Mike C

This is my contribution to the T-SQL Tuesday blog party, started by Adam Machanic and hosted this month by Aaron Nelson . Aaron announced this month's topic is "reporting" so I figured I'd throw a blog up on a reporting topic I've been interested in for a while -- namely creating custom reports in SSMS. Creating SSMS custom reports isn't difficult, but like most technical work it's very detailed with a lot of little steps involved. So this post is a little longer than usual and includes a lot of...(read more)

Read the article
T-SQL Tuesday #005: Creating SSMS Custom Reports

- by Mike C

This is my contribution to the T-SQL Tuesday blog party, started by Adam Machanic and hosted this month by Aaron Nelson . Aaron announced this month's topic is "reporting" so I figured I'd throw a blog up on a reporting topic I've been interested in for a while -- namely creating custom reports in SSMS. Creating SSMS custom reports isn't difficult, but like most technical work it's very detailed with a lot of little steps involved. So this post is a little longer than usual and includes a lot of...(read more)

Read the article
SQL SERVER – Stored Procedure and Transactions

- by pinaldave

I just overheard the following statement – “I do not use Transactions in SQL as I use Stored Procedure“. I just realized that there are so many misconceptions about this subject. Transactions has nothing to do with Stored Procedures. Let me demonstrate that with a simple example. USE tempdb GO -- Create 3 Test Tables CREATE TABLE TABLE1 (ID INT); CREATE TABLE TABLE2 (ID INT); CREATE TABLE TABLE3 (ID INT); GO -- Create SP CREATE PROCEDURE TestSP AS INSERT INTO TABLE1 (ID) VALUES (1) INSERT INTO TABLE2 (ID) VALUES ('a') INSERT INTO TABLE3 (ID) VALUES (3) GO -- Execute SP -- SP will error out EXEC TestSP GO -- Check the Values in Table SELECT * FROM TABLE1; SELECT * FROM TABLE2; SELECT * FROM TABLE3; GO Now, the main point is: If Stored Procedure is transactional then, it should roll back complete transactions when it encounters any errors. Well, that does not happen in this case, which proves that Stored Procedure does not only provide just the transactional feature to a batch of T-SQL. Let’s see the result very quickly. It is very clear that there were entries in table1 which are not shown in the subsequent tables. If SP was transactional in terms of T-SQL Query Batches, there would be no entries in any of the tables. If you want to use Transactions with Stored Procedure, wrap the code around with BEGIN TRAN and COMMIT TRAN. The example is as following. CREATE PROCEDURE TestSPTran AS BEGIN TRAN INSERT INTO TABLE1 (ID) VALUES (11) INSERT INTO TABLE2 (ID) VALUES ('b') INSERT INTO TABLE3 (ID) VALUES (33) COMMIT GO -- Execute SP EXEC TestSPTran GO -- Check the Values in Tables SELECT * FROM TABLE1; SELECT * FROM TABLE2; SELECT * FROM TABLE3; GO -- Clean up DROP TABLE Table1 DROP TABLE Table2 DROP TABLE Table3 GO In this case, there will be no entries in any part of the table. What is your opinion about this blog post? Please leave your comments about it here. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Stored Procedure, SQL Tips and Tricks, T SQL, Technology

Read the article
ASP.NET Frameworks and Raw Throughput Performance

- by Rick Strahl

A few days ago I had a curious thought: With all these different technologies that the ASP.NET stack has to offer, what's the most efficient technology overall to return data for a server request? When I started this it was mere curiosity rather than a real practical need or result. Different tools are used for different problems and so performance differences are to be expected. But still I was curious to see how the various technologies performed relative to each just for raw throughput of the request getting to the endpoint and back out to the client with as little processing in the actual endpoint logic as possible (aka Hello World!). I want to clarify that this is merely an informal test for my own curiosity and I'm sharing the results and process here because I thought it was interesting. It's been a long while since I've done any sort of perf testing on ASP.NET, mainly because I've not had extremely heavy load requirements and because overall ASP.NET performs very well even for fairly high loads so that often it's not that critical to test load performance. This post is not meant to make a point or even come to a conclusion which tech is better, but just to act as a reference to help understand some of the differences in perf and give a starting point to play around with this yourself. I've included the code for this simple project, so you can play with it and maybe add a few additional tests for different things if you like. Source Code on GitHub I looked at this data for these technologies: ASP.NET Web API ASP.NET MVC WebForms ASP.NET WebPages ASMX AJAX Services (couldn't get AJAX/JSON to run on IIS8 ) WCF Rest Raw ASP.NET HttpHandlers It's quite a mixed bag, of course and the technologies target different types of development. What started out as mere curiosity turned into a bit of a head scratcher as the results were sometimes surprising. What I describe here is more to satisfy my curiosity more than anything and I thought it interesting enough to discuss on the blog :-) First test: Raw Throughput The first thing I did is test raw throughput for the various technologies. This is the least practical test of course since you're unlikely to ever create the equivalent of a 'Hello World' request in a real life application. The idea here is to measure how much time a 'NOP' request takes to return data to the client. So for this request I create the simplest Hello World request that I could come up for each tech. Http Handler The first is the lowest level approach which is an HTTP handler. public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public bool IsReusable { get { return true; } } } WebForms Next I added a couple of ASPX pages - one using CodeBehind and one using only a markup page. The CodeBehind page simple does this in CodeBehind without any markup in the ASPX page: public partial class HelloWorld_CodeBehind : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { Response.Write("Hello World. Time is: " + DateTime.Now.ToString() ); Response.End(); } } while the Markup page only contains some static output via an expression:<%@ Page Language="C#" AutoEventWireup="false" CodeBehind="HelloWorld_Markup.aspx.cs" Inherits="AspNetFrameworksPerformance.HelloWorld_Markup" %> Hello World. Time is <%= DateTime.Now %> ASP.NET WebPages WebPages is the freestanding Razor implementation of ASP.NET. Here's the simple HelloWorld.cshtml page:Hello World @DateTime.Now WCF REST WCF REST was the token REST implementation for ASP.NET before WebAPI and the inbetween step from ASP.NET AJAX. I'd like to forget that this technology was ever considered for production use, but I'll include it here. Here's an OperationContract class: [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World" + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } } WCF REST can return arbitrary results by returning a Stream object and a content type. The code above turns the string result into a stream and returns that back to the client. ASP.NET AJAX (ASMX Services) I also wanted to test ASP.NET AJAX services because prior to WebAPI this is probably still the most widely used AJAX technology for the ASP.NET stack today. Unfortunately I was completely unable to get this running on my Windows 8 machine. Visual Studio 2012 removed adding of ASP.NET AJAX services, and when I tried to manually add the service and configure the script handler references it simply did not work - I always got a SOAP response for GET and POST operations. No matter what I tried I always ended up getting XML results even when explicitly adding the ScriptHandler. So, I didn't test this (but the code is there - you might be able to test this on a Windows 7 box). ASP.NET MVC Next up is probably the most popular ASP.NET technology at the moment: MVC. Here's the small controller: public class MvcPerformanceController : Controller { public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } } ASP.NET WebAPI Next up is WebAPI which looks kind of similar to MVC. Except here I have to use a StringContent result to return the response: public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } } Testing Take a minute to think about each of the technologies… and take a guess which you think is most efficient in raw throughput. The fastest should be pretty obvious, but the others - maybe not so much. The testing I did is pretty informal since it was mainly to satisfy my curiosity - here's how I did this: I used Apache Bench (ab.exe) from a full Apache HTTP installation to run and log the test results of hitting the server. ab.exe is a small executable that lets you hit a URL repeatedly and provides counter information about the number of requests, requests per second etc. ab.exe and the batch file are located in the \LoadTests folder of the project. An ab.exe command line looks like this: ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld which hits the specified URL 100,000 times with a load factor of 20 concurrent requests. This results in output like this: It's a great way to get a quick and dirty performance summary. Run it a few times to make sure there's not a large amount of varience. You might also want to do an IISRESET to clear the Web Server. Just make sure you do a short test run to warm up the server first - otherwise your first run is likely to be skewed downwards. ab.exe also allows you to specify headers and provide POST data and many other things if you want to get a little more fancy. Here all tests are GET requests to keep it simple. I ran each test: 100,000 iterations Load factor of 20 concurrent connections IISReset before starting A short warm up run for API and MVC to make sure startup cost is mitigated Here is the batch file I used for the test: IISRESET REM make sure you add REM C:\Program Files (x86)\Apache Software Foundation\Apache2.2\bin REM to your path so ab.exe can be found REM Warm up ab.exe -n100 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJsonab.exe -n100 -c20 http://localhost/aspnetperf/api/HelloWorldJson ab.exe -n100 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld ab.exe -n100000 -c20 http://localhost/aspnetperf/handler.ashx > handler.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_CodeBehind.aspx > AspxCodeBehind.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_Markup.aspx > AspxMarkup.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld > Wcf.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldCode > Mvc.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld > WebApi.txt I ran each of these tests 3 times and took the average score for Requests/second, with the machine otherwise idle. I did see a bit of variance when running many tests but the values used here are the medians. Part of this has to do with the fact I ran the tests on my local machine - result would probably more consistent running the load test on a separate machine hitting across the network. I ran these tests locally on my laptop which is a Dell XPS with quad core Sandibridge I7-2720QM @ 2.20ghz and a fast SSD drive on Windows 8. CPU load during tests ran to about 70% max across all 4 cores (IOW, it wasn't overloading the machine). Ideally you can try running these tests on a separate machine hitting the local machine. If I remember correctly IIS 7 and 8 on client OSs don't throttle so the performance here should be Results Ok, let's cut straight to the chase. Below are the results from the tests… It's not surprising that the handler was fastest. But it was a bit surprising to me that the next fastest was WebForms and especially Web Forms with markup over a CodeBehind page. WebPages also fared fairly well. MVC and WebAPI are a little slower and the slowest by far is WCF REST (which again I find surprising). As mentioned at the start the raw throughput tests are not overly practical as they don't test scripting performance for the HTML generation engines or serialization performances of the data engines. All it really does is give you an idea of the raw throughput for the technology from time of request to reaching the endpoint and returning minimal text data back to the client which indicates full round trip performance. But it's still interesting to see that Web Forms performs better in throughput than either MVC, WebAPI or WebPages. It'd be interesting to try this with a few pages that actually have some parsing logic on it, but that's beyond the scope of this throughput test. But what's also amazing about this test is the sheer amount of traffic that a laptop computer is handling. Even the slowest tech managed 5700 requests a second, which is one hell of a lot of requests if you extrapolate that out over a 24 hour period. Remember these are not static pages, but dynamic requests that are being served. Another test - JSON Data Service Results The second test I used a JSON result from several of the technologies. I didn't bother running WebForms and WebPages through this test since that doesn't make a ton of sense to return data from the them (OTOH, returning text from the APIs didn't make a ton of sense either :-) In these tests I have a small Person class that gets serialized and then returned to the client. The Person class looks like this: public class Person { public Person() { Id = 10; Name = "Rick"; Entered = DateTime.Now; } public int Id { get; set; } public string Name { get; set; } public DateTime Entered { get; set; } } Here are the updated handler classes that use Person: Handler public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { var action = context.Request.QueryString["action"]; if (action == "json") JsonRequest(context); else TextRequest(context); } public void TextRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public void JsonRequest(HttpContext context) { var json = JsonConvert.SerializeObject(new Person(), Formatting.None); context.Response.ContentType = "application/json"; context.Response.Write(json); } public bool IsReusable { get { return true; } } } This code adds a little logic to check for a action query string and route the request to an optional JSON result method. To generate JSON, I'm using the same JSON.NET serializer (JsonConvert.SerializeObject) used in Web API to create the JSON response. WCF REST [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World " + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } [OperationContract] [WebGet(ResponseFormat=WebMessageFormat.Json,BodyStyle=WebMessageBodyStyle.WrappedRequest)] public Person HelloWorldJson() { // Add your operation implementation here return new Person(); } } For WCF REST all I have to do is add a method with the Person result type. ASP.NET MVC public class MvcPerformanceController : Controller { // // GET: /MvcPerformance/ public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } public JsonResult HelloWorldJson() { return Json(new Person(), JsonRequestBehavior.AllowGet); } } For MVC all I have to do for a JSON response is return a JSON result. ASP.NET internally uses JavaScriptSerializer. ASP.NET WebAPI public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } [HttpGet] public Person HelloWorldJson() { return new Person(); } [HttpGet] public HttpResponseMessage HelloWorldJson2() { var response = new HttpResponseMessage(HttpStatusCode.OK); response.Content = new ObjectContent<Person>(new Person(), GlobalConfiguration.Configuration.Formatters.JsonFormatter); return response; } } Testing and Results To run these data requests I used the following ab.exe commands:REM JSON RESPONSES ab.exe -n100000 -c20 http://localhost/aspnetperf/Handler.ashx?action=json > HandlerJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJson > MvcJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorldJson > WebApiJson.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorldJson > WcfJson.txt The results from this test run are a bit interesting in that the WebAPI test improved performance significantly over returning plain string content. Here are the results: The performance for each technology drops a little bit except for WebAPI which is up quite a bit! From this test it appears that WebAPI is actually significantly better performing returning a JSON response, rather than a plain string response. Snag with Apache Benchmark and 'Length Failures' I ran into a little snag with Apache Benchmark, which was reporting failures for my Web API requests when serializing. As the graph shows performance improved significantly from with JSON results from 5580 to 6530 or so which is a 15% improvement (while all others slowed down by 3-8%). However, I was skeptical at first because the WebAPI test reports showed a bunch of errors on about 10% of the requests. Check out this report: Notice the Failed Request count. What the hey? Is WebAPI failing on roughly 10% of requests when sending JSON? Turns out: No it's not! But it took some sleuthing to figure out why it reports these failures. At first I thought that Web API was failing, and so to make sure I re-ran the test with Fiddler attached and runiisning the ab.exe test by using the -X switch: ab.exe -n100 -c10 -X localhost:8888 http://localhost/aspnetperf/api/HelloWorldJson which showed that indeed all requests where returning proper HTTP 200 results with full content. However ab.exe was reporting the errors. After some closer inspection it turned out that the dates varying in size altered the response length in dynamic output. For example: these two results: {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.841926-10:00"} {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.8519262-10:00"} are different in length for the number which results in 68 and 69 bytes respectively. The same URL produces different result lengths which is what ab.exe reports. I didn't notice at first bit the same is happening when running the ASHX handler with JSON.NET result since it uses the same serializer that varies the milliseconds. Moral: You can typically ignore Length failures in Apache Benchmark and when in doubt check the actual output with Fiddler. Note that the other failure values are accurate though. Another interesting Side Note: Perf drops over Time As I was running these tests repeatedly I was finding that performance steadily dropped from a startup peak to a 10-15% lower stable level. IOW, with Web API I'd start out with around 6500 req/sec and in subsequent runs it keeps dropping until it would stabalize somewhere around 5900 req/sec occasionally jumping lower. For these tests this is why I did the IIS RESET and warm up for individual tests. This is a little puzzling. Looking at Process Monitor while the test are running memory very quickly levels out as do handles and threads, on the first test run. Subsequent runs everything stays stable, but the performance starts going downwards. This applies to all the technologies - Handlers, Web Forms, MVC, Web API - curious to see if others test this and see similar results. Doing an IISRESET then resets everything and performance starts off at peak again… Summary As I stated at the outset, these were informal to satiate my curiosity not to prove that any technology is better or even faster than another. While there clearly are differences in performance the differences (other than WCF REST which was by far the slowest and the raw handler which was by far the highest) are relatively minor, so there is no need to feel that any one technology is a runaway standout in raw performance. Choosing a technology is about more than pure performance but also about the adequateness for the job and the easy of implementation. The strengths of each technology will make for any minor performance difference we see in these tests. However, to me it's important to get an occasional reality check and compare where new technologies are heading. Often times old stuff that's been optimized and designed for a time of less horse power can utterly blow the doors off newer tech and simple checks like this let you compare. Luckily we're seeing that much of the new stuff performs well even in V1.0 which is great. To me it was very interesting to see Web API perform relatively badly with plain string content, which originally led me to think that Web API might not be properly optimized just yet. For those that caught my Tweets late last week regarding WebAPI's slow responses was with String content which is in fact considerably slower. Luckily where it counts with serialized JSON and XML WebAPI actually performs better. But I do wonder what would make generic string content slower than serialized code? This stresses another point: Don't take a single test as the final gospel and don't extrapolate out from a single set of tests. Certainly Twitter can make you feel like a fool when you post something immediate that hasn't been fleshed out a little more <blush>. Egg on my face. As a result I ended up screwing around with this for a few hours today to compare different scenarios. Well worth the time… I hope you found this useful, if not for the results, maybe for the process of quickly testing a few requests for performance and charting out a comparison. Now onwards with more serious stuff… Resources Source Code on GitHub Apache HTTP Server Project (ab.exe is part of the binary distribution)© Rick Strahl, West Wind Technologies, 2005-2012Posted in ASP.NET Web Api Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
The penultimate audit trigger framework

- by Piotr Rodak

So, it’s time to see what I came up with after some time of playing with COLUMNS_UPDATED() and bitmasks. The first part of this miniseries describes the mechanics of the encoding which columns are updated within DML operation. The task I was faced with was to prepare an audit framework that will be fairly easy to use. The audited tables were to be the ones directly modified by user applications, not the ones heavily used by batch or ETL processes. The framework consists of several tables and procedures...(read more)

Read the article
Advanced TSQL Tuning: Why Internals Knowledge Matters

- by Paul White

There is much more to query tuning than reducing logical reads and adding covering nonclustered indexes. Query tuning is not complete as soon as the query returns results quickly in the development or test environments. In production, your query will compete for memory, CPU, locks, I/O and other resources on the server. Today’s entry looks at some tuning considerations that are often overlooked, and shows how deep internals knowledge can help you write better TSQL. As always, we’ll need some example data. In fact, we are going to use three tables today, each of which is structured like this: Each table has 50,000 rows made up of an INTEGER id column and a padding column containing 3,999 characters in every row. The only difference between the three tables is in the type of the padding column: the first table uses CHAR(3999), the second uses VARCHAR(MAX), and the third uses the deprecated TEXT type. A script to create a database with the three tables and load the sample data follows: USE master; GO IF DB_ID('SortTest') IS NOT NULL DROP DATABASE SortTest; GO CREATE DATABASE SortTest COLLATE LATIN1_GENERAL_BIN; GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest', SIZE = 3GB, MAXSIZE = 3GB ); GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest_log', SIZE = 256MB, MAXSIZE = 1GB, FILEGROWTH = 128MB ); GO ALTER DATABASE SortTest SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE SortTest SET AUTO_CLOSE OFF ; ALTER DATABASE SortTest SET AUTO_CREATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_SHRINK OFF ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS_ASYNC ON ; ALTER DATABASE SortTest SET PARAMETERIZATION SIMPLE ; ALTER DATABASE SortTest SET READ_COMMITTED_SNAPSHOT OFF ; ALTER DATABASE SortTest SET MULTI_USER ; ALTER DATABASE SortTest SET RECOVERY SIMPLE ; USE SortTest; GO CREATE TABLE dbo.TestCHAR ( id INTEGER IDENTITY (1,1) NOT NULL, padding CHAR(3999) NOT NULL, CONSTRAINT [PK dbo.TestCHAR (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestMAX ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL, CONSTRAINT [PK dbo.TestMAX (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestTEXT ( id INTEGER IDENTITY (1,1) NOT NULL, padding TEXT NOT NULL, CONSTRAINT [PK dbo.TestTEXT (id)] PRIMARY KEY CLUSTERED (id), ) ; -- ============= -- Load TestCHAR (about 3s) -- ============= INSERT INTO dbo.TestCHAR WITH (TABLOCKX) ( padding ) SELECT padding = REPLICATE(CHAR(65 + (Data.n % 26)), 3999) FROM ( SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1 FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) AS Data ORDER BY Data.n ASC ; -- ============ -- Load TestMAX (about 3s) -- ============ INSERT INTO dbo.TestMAX WITH (TABLOCKX) ( padding ) SELECT CONVERT(VARCHAR(MAX), padding) FROM dbo.TestCHAR ORDER BY id ; -- ============= -- Load TestTEXT (about 5s) -- ============= INSERT INTO dbo.TestTEXT WITH (TABLOCKX) ( padding ) SELECT CONVERT(TEXT, padding) FROM dbo.TestCHAR ORDER BY id ; -- ========== -- Space used -- ========== -- EXECUTE sys.sp_spaceused @objname = 'dbo.TestCHAR'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAX'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestTEXT'; ; CHECKPOINT ; That takes around 15 seconds to run, and shows the space allocated to each table in its output: To illustrate the points I want to make today, the example task we are going to set ourselves is to return a random set of 150 rows from each table. The basic shape of the test query is the same for each of the three test tables: SELECT TOP (150) T.id, T.padding FROM dbo.Test AS T ORDER BY NEWID() OPTION (MAXDOP 1) ; Test 1 – CHAR(3999) Running the template query shown above using the TestCHAR table as the target, we find that the query takes around 5 seconds to return its results. This seems slow, considering that the table only has 50,000 rows. Working on the assumption that generating a GUID for each row is a CPU-intensive operation, we might try enabling parallelism to see if that speeds up the response time. Running the query again (but without the MAXDOP 1 hint) on a machine with eight logical processors, the query now takes 10 seconds to execute – twice as long as when run serially. Rather than attempting further guesses at the cause of the slowness, let’s go back to serial execution and add some monitoring. The script below monitors STATISTICS IO output and the amount of tempdb used by the test query. We will also run a Profiler trace to capture any warnings generated during query execution. DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TC.id, TC.padding FROM dbo.TestCHAR AS TC ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; Let’s take a closer look at the statistics and query plan generated from this: Following the flow of the data from right to left, we see the expected 50,000 rows emerging from the Clustered Index Scan, with a total estimated size of around 191MB. The Compute Scalar adds a column containing a random GUID (generated from the NEWID() function call) for each row. With this extra column in place, the size of the data arriving at the Sort operator is estimated to be 192MB. Sort is a blocking operator – it has to examine all of the rows on its input before it can produce its first row of output (the last row received might sort first). This characteristic means that Sort requires a memory grant – memory allocated for the query’s use by SQL Server just before execution starts. In this case, the Sort is the only memory-consuming operator in the plan, so it has access to the full 243MB (248,696KB) of memory reserved by SQL Server for this query execution. Notice that the memory grant is significantly larger than the expected size of the data to be sorted. SQL Server uses a number of techniques to speed up sorting, some of which sacrifice size for comparison speed. Sorts typically require a very large number of comparisons, so this is usually a very effective optimization. One of the drawbacks is that it is not possible to exactly predict the sort space needed, as it depends on the data itself. SQL Server takes an educated guess based on data types, sizes, and the number of rows expected, but the algorithm is not perfect. In spite of the large memory grant, the Profiler trace shows a Sort Warning event (indicating that the sort ran out of memory), and the tempdb usage monitor shows that 195MB of tempdb space was used – all of that for system use. The 195MB represents physical write activity on tempdb, because SQL Server strictly enforces memory grants – a query cannot ‘cheat’ and effectively gain extra memory by spilling to tempdb pages that reside in memory. Anyway, the key point here is that it takes a while to write 195MB to disk, and this is the main reason that the query takes 5 seconds overall. If you are wondering why using parallelism made the problem worse, consider that eight threads of execution result in eight concurrent partial sorts, each receiving one eighth of the memory grant. The eight sorts all spilled to tempdb, resulting in inefficiencies as the spilled sorts competed for disk resources. More importantly, there are specific problems at the point where the eight partial results are combined, but I’ll cover that in a future post. CHAR(3999) Performance Summary: 5 seconds elapsed time 243MB memory grant 195MB tempdb usage 192MB estimated sort set 25,043 logical reads Sort Warning Test 2 – VARCHAR(MAX) We’ll now run exactly the same test (with the additional monitoring) on the table using a VARCHAR(MAX) padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TM.id, TM.padding FROM dbo.TestMAX AS TM ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query takes around 8 seconds to complete (3 seconds longer than Test 1). Notice that the estimated row and data sizes are very slightly larger, and the overall memory grant has also increased very slightly to 245MB. The most marked difference is in the amount of tempdb space used – this query wrote almost 391MB of sort run data to the physical tempdb file. Don’t draw any general conclusions about VARCHAR(MAX) versus CHAR from this – I chose the length of the data specifically to expose this edge case. In most cases, VARCHAR(MAX) performs very similarly to CHAR – I just wanted to make test 2 a bit more exciting. MAX Performance Summary: 8 seconds elapsed time 245MB memory grant 391MB tempdb usage 193MB estimated sort set 25,043 logical reads Sort warning Test 3 – TEXT The same test again, but using the deprecated TEXT data type for the padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TT.id, TT.padding FROM dbo.TestTEXT AS TT ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query runs in 500ms. If you look at the metrics we have been checking so far, it’s not hard to understand why: TEXT Performance Summary: 0.5 seconds elapsed time 9MB memory grant 5MB tempdb usage 5MB estimated sort set 207 logical reads 596 LOB logical reads Sort warning SQL Server’s memory grant algorithm still underestimates the memory needed to perform the sorting operation, but the size of the data to sort is so much smaller (5MB versus 193MB previously) that the spilled sort doesn’t matter very much. Why is the data size so much smaller? The query still produces the correct results – including the large amount of data held in the padding column – so what magic is being performed here? TEXT versus MAX Storage The answer lies in how columns of the TEXT data type are stored. By default, TEXT data is stored off-row in separate LOB pages – which explains why this is the first query we have seen that records LOB logical reads in its STATISTICS IO output. You may recall from my last post that LOB data leaves an in-row pointer to the separate storage structure holding the LOB data. SQL Server can see that the full LOB value is not required by the query plan until results are returned, so instead of passing the full LOB value down the plan from the Clustered Index Scan, it passes the small in-row structure instead. SQL Server estimates that each row coming from the scan will be 79 bytes long – 11 bytes for row overhead, 4 bytes for the integer id column, and 64 bytes for the LOB pointer (in fact the pointer is rather smaller – usually 16 bytes – but the details of that don’t really matter right now). OK, so this query is much more efficient because it is sorting a very much smaller data set – SQL Server delays retrieving the LOB data itself until after the Sort starts producing its 150 rows. The question that normally arises at this point is: Why doesn’t SQL Server use the same trick when the padding column is defined as VARCHAR(MAX)? The answer is connected with the fact that if the actual size of the VARCHAR(MAX) data is 8000 bytes or less, it is usually stored in-row in exactly the same way as for a VARCHAR(8000) column – MAX data only moves off-row into LOB storage when it exceeds 8000 bytes. The default behaviour of the TEXT type is to be stored off-row by default, unless the ‘text in row’ table option is set suitably and there is room on the page. There is an analogous (but opposite) setting to control the storage of MAX data – the ‘large value types out of row’ table option. By enabling this option for a table, MAX data will be stored off-row (in a LOB structure) instead of in-row. SQL Server Books Online has good coverage of both options in the topic In Row Data. The MAXOOR Table The essential difference, then, is that MAX defaults to in-row storage, and TEXT defaults to off-row (LOB) storage. You might be thinking that we could get the same benefits seen for the TEXT data type by storing the VARCHAR(MAX) values off row – so let’s look at that option now. This script creates a fourth table, with the VARCHAR(MAX) data stored off-row in LOB pages: CREATE TABLE dbo.TestMAXOOR ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL, CONSTRAINT [PK dbo.TestMAXOOR (id)] PRIMARY KEY CLUSTERED (id), ) ; EXECUTE sys.sp_tableoption @TableNamePattern = N'dbo.TestMAXOOR', @OptionName = 'large value types out of row', @OptionValue = 'true' ; SELECT large_value_types_out_of_row FROM sys.tables WHERE [schema_id] = SCHEMA_ID(N'dbo') AND name = N'TestMAXOOR' ; INSERT INTO dbo.TestMAXOOR WITH (TABLOCKX) ( padding ) SELECT SPACE(0) FROM dbo.TestCHAR ORDER BY id ; UPDATE TM WITH (TABLOCK) SET padding.WRITE (TC.padding, NULL, NULL) FROM dbo.TestMAXOOR AS TM JOIN dbo.TestCHAR AS TC ON TC.id = TM.id ; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAXOOR' ; CHECKPOINT ; Test 4 – MAXOOR We can now re-run our test on the MAXOOR (MAX out of row) table: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) MO.id, MO.padding FROM dbo.TestMAXOOR AS MO ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; TEXT Performance Summary: 0.3 seconds elapsed time 245MB memory grant 0MB tempdb usage 193MB estimated sort set 207 logical reads 446 LOB logical reads No sort warning The query runs very quickly – slightly faster than Test 3, and without spilling the sort to tempdb (there is no sort warning in the trace, and the monitoring query shows zero tempdb usage by this query). SQL Server is passing the in-row pointer structure down the plan and only looking up the LOB value on the output side of the sort. The Hidden Problem There is still a huge problem with this query though – it requires a 245MB memory grant. No wonder the sort doesn’t spill to tempdb now – 245MB is about 20 times more memory than this query actually requires to sort 50,000 records containing LOB data pointers. Notice that the estimated row and data sizes in the plan are the same as in test 2 (where the MAX data was stored in-row). The optimizer assumes that MAX data is stored in-row, regardless of the sp_tableoption setting ‘large value types out of row’. Why? Because this option is dynamic – changing it does not immediately force all MAX data in the table in-row or off-row, only when data is added or actually changed. SQL Server does not keep statistics to show how much MAX or TEXT data is currently in-row, and how much is stored in LOB pages. This is an annoying limitation, and one which I hope will be addressed in a future version of the product. So why should we worry about this? Excessive memory grants reduce concurrency and may result in queries waiting on the RESOURCE_SEMAPHORE wait type while they wait for memory they do not need. 245MB is an awful lot of memory, especially on 32-bit versions where memory grants cannot use AWE-mapped memory. Even on a 64-bit server with plenty of memory, do you really want a single query to consume 0.25GB of memory unnecessarily? That’s 32,000 8KB pages that might be put to much better use. The Solution The answer is not to use the TEXT data type for the padding column. That solution happens to have better performance characteristics for this specific query, but it still results in a spilled sort, and it is hard to recommend the use of a data type which is scheduled for removal. I hope it is clear to you that the fundamental problem here is that SQL Server sorts the whole set arriving at a Sort operator. Clearly, it is not efficient to sort the whole table in memory just to return 150 rows in a random order. The TEXT example was more efficient because it dramatically reduced the size of the set that needed to be sorted. We can do the same thing by selecting 150 unique keys from the table at random (sorting by NEWID() for example) and only then retrieving the large padding column values for just the 150 rows we need. The following script implements that idea for all four tables: SET STATISTICS IO ON ; WITH TestTable AS ( SELECT * FROM dbo.TestCHAR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id = ANY (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAX ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestTEXT ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAXOOR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; All four queries now return results in much less than a second, with memory grants between 6 and 12MB, and without spilling to tempdb. The small remaining inefficiency is in reading the id column values from the clustered primary key index. As a clustered index, it contains all the in-row data at its leaf. The CHAR and VARCHAR(MAX) tables store the padding column in-row, so id values are separated by a 3999-character column, plus row overhead. The TEXT and MAXOOR tables store the padding values off-row, so id values in the clustered index leaf are separated by the much-smaller off-row pointer structure. This difference is reflected in the number of logical page reads performed by the four queries: Table 'TestCHAR' logical reads 25511 lob logical reads 000 Table 'TestMAX'. logical reads 25511 lob logical reads 000 Table 'TestTEXT' logical reads 00412 lob logical reads 597 Table 'TestMAXOOR' logical reads 00413 lob logical reads 446 We can increase the density of the id values by creating a separate nonclustered index on the id column only. This is the same key as the clustered index, of course, but the nonclustered index will not include the rest of the in-row column data. CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestCHAR (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAX (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestTEXT (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAXOOR (id); The four queries can now use the very dense nonclustered index to quickly scan the id values, sort them by NEWID(), select the 150 ids we want, and then look up the padding data. The logical reads with the new indexes in place are: Table 'TestCHAR' logical reads 835 lob logical reads 0 Table 'TestMAX' logical reads 835 lob logical reads 0 Table 'TestTEXT' logical reads 686 lob logical reads 597 Table 'TestMAXOOR' logical reads 686 lob logical reads 448 With the new index, all four queries use the same query plan (click to enlarge): Performance Summary: 0.3 seconds elapsed time 6MB memory grant 0MB tempdb usage 1MB sort set 835 logical reads (CHAR, MAX) 686 logical reads (TEXT, MAXOOR) 597 LOB logical reads (TEXT) 448 LOB logical reads (MAXOOR) No sort warning I’ll leave it as an exercise for the reader to work out why trying to eliminate the Key Lookup by adding the padding column to the new nonclustered indexes would be a daft idea Conclusion This post is not about tuning queries that access columns containing big strings. It isn’t about the internal differences between TEXT and MAX data types either. It isn’t even about the cool use of UPDATE .WRITE used in the MAXOOR table load. No, this post is about something else: Many developers might not have tuned our starting example query at all – 5 seconds isn’t that bad, and the original query plan looks reasonable at first glance. Perhaps the NEWID() function would have been blamed for ‘just being slow’ – who knows. 5 seconds isn’t awful – unless your users expect sub-second responses – but using 250MB of memory and writing 200MB to tempdb certainly is! If ten sessions ran that query at the same time in production that’s 2.5GB of memory usage and 2GB hitting tempdb. Of course, not all queries can be rewritten to avoid large memory grants and sort spills using the key-lookup technique in this post, but that’s not the point either. The point of this post is that a basic understanding of execution plans is not enough. Tuning for logical reads and adding covering indexes is not enough. If you want to produce high-quality, scalable TSQL that won’t get you paged as soon as it hits production, you need a deep understanding of execution plans, and as much accurate, deep knowledge about SQL Server as you can lay your hands on. The advanced database developer has a wide range of tools to use in writing queries that perform well in a range of circumstances. By the way, the examples in this post were written for SQL Server 2008. They will run on 2005 and demonstrate the same principles, but you won’t get the same figures I did because 2005 had a rather nasty bug in the Top N Sort operator. Fair warning: if you do decide to run the scripts on a 2005 instance (particularly the parallel query) do it before you head out for lunch… This post is dedicated to the people of Christchurch, New Zealand. © 2011 Paul White email: @[email protected] twitter: @SQL_Kiwi

Read the article
Linux Live USB Media

<b>Jamie's Random Musings:</b> "It is pretty common these days for laptops, and even desktops, to be able to boot from a USB flash memory drive. So you can save a little time and a little money by converting various Linux distributions ISO images to bootable USB devices, rather than burning them to CD/DVD."

Read the article

< Previous Page | 83 84 85 86 87 88 89 90 91 92 93 94 | Next Page >