Search Results

Search found 65851 results on 2635 pages for 'simple php blog'.

Page 1057/2635 | < Previous Page | 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064  | Next Page >

  • MySQL MyISAM table performance... painfully, painfully slow

    - by Salman A
    I've got a table structure that can be summarized as follows: pagegroup * pagegroupid * name has 3600 rows page * pageid * pagegroupid * data references pagegroup; has 10000 rows; can have anything between 1-700 rows per pagegroup; the data column is of type mediumtext and the column contains 100k - 200kbytes data per row userdata * userdataid * pageid * column1 * column2 * column9 references page; has about 300,000 rows; can have about 1-50 rows per page The above structure is pretty straight forwad, the problem is that that a join from userdata to page group is terribly, terribly slow even though I have indexed all columns that should be indexed. The time needed to run a query for such a join (userdata inner_join page inner_join pagegroup) exceeds 3 minutes. This is terribly slow considering the fact that I am not selecting the data column at all. Example of the query that takes too long: SELECT userdata.column1, pagegroup.name FROM userdata INNER JOIN page USING( pageid ) INNER JOIN pagegroup USING( pagegroupid ) Please help by explaining why does it take so long and what can i do to make it faster. Edit #1 Explain returns following gibberish: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE userdata ALL pageid 372420 1 SIMPLE page eq_ref PRIMARY,pagegroupid PRIMARY 4 topsecret.userdata.pageid 1 1 SIMPLE pagegroup eq_ref PRIMARY PRIMARY 4 topsecret.page.pagegroupid 1 Edit #2 SELECT u.field2, p.pageid FROM userdata u INNER JOIN page p ON u.pageid = p.pageid; /* 0.07 sec execution, 6.05 sec fecth */ id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE u ALL pageid 372420 1 SIMPLE p eq_ref PRIMARY PRIMARY 4 topsecret.u.pageid 1 Using index SELECT p.pageid, g.pagegroupid FROM page p INNER JOIN pagegroup g ON p.pagegroupid = g.pagegroupid; /* 9.37 sec execution, 60.0 sec fetch */ id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE g index PRIMARY PRIMARY 4 3646 Using index 1 SIMPLE p ref pagegroupid pagegroupid 5 topsecret.g.pagegroupid 3 Using where Moral of the story Keep medium/long text columns in a separate table if you run into performance problems such as this one.

    Read the article

  • BNF – how to read syntax?

    - by Piotr Rodak
    A few days ago I read post of Jen McCown (blog) about her idea of blogging about random articles from Books Online. I think this is a great idea, even if Jen says that it’s not exciting or sexy. I noticed that many of the questions that appear on forums and other media arise from pure fact that people asking questions didn’t bother to read and understand the manual – Books Online. Jen came up with a brilliant, concise acronym that describes very well the category of posts about Books Online – RTFM365. I take liberty of tagging this post with the same acronym. I often come across questions of type – ‘Hey, i am trying to create a table, but I am getting an error’. The error often says that the syntax is invalid. 1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT DEFAULT Guid_Default NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) ); 5 The answer is usually(1), ‘Ok, let me check it out.. Ah yes – you have to put name of the DEFAULT constraint before the type of constraint: 1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) ); Why many people stumble on syntax errors? Is the syntax poorly documented? No, the issue is, that correct syntax of the CREATE TABLE statement is documented very well in Books Online and is.. intimidating. Many people can be taken aback by the rather complex block of code that describes all intricacies of the statement. However, I don’t know better way of defining syntax of the statement or command. The notation that is used to describe syntax in Books Online is a form of Backus-Naur notatiion, called BNF for short sometimes. This is a notation that was invented around 50 years ago, and some say that even earlier, around 400 BC – would you believe? Originally it was used to define syntax of, rather ancient now, ALGOL programming language (in 1950’s, not in ancient India). If you look closer at the definition of the BNF, it turns out that the principles of this syntax are pretty simple. Here are a few bullet points: italic_text is a placeholder for your identifier <italic_text_in_angle_brackets> is a definition which is described further. [everything in square brackets] is optional {everything in curly brackets} is obligatory everything | separated | by | operator is an alternative ::= “assigns” definition to an identifier Yes, it looks like these six simple points give you the key to understand even the most complicated syntax definitions in Books Online. Books Online contain an article about syntax conventions – have you ever read it? Let’s have a look at fragment of the CREATE TABLE statement: 1 CREATE TABLE 2 [ database_name . [ schema_name ] . | schema_name . ] table_name 3 ( { <column_definition> | <computed_column_definition> 4 | <column_set_definition> } 5 [ <table_constraint> ] [ ,...n ] ) 6 [ ON { partition_scheme_name ( partition_column_name ) | filegroup 7 | "default" } ] 8 [ { TEXTIMAGE_ON { filegroup | "default" } ] 9 [ FILESTREAM_ON { partition_scheme_name | filegroup 10 | "default" } ] 11 [ WITH ( <table_option> [ ,...n ] ) ] 12 [ ; ] Let’s look at line 2 of the above snippet: This line uses rules 3 and 5 from the list. So you know that you can create table which has specified one of the following. just name – table will be created in default user schema schema name and table name – table will be created in specified schema database name, schema name and table name – table will be created in specified database, in specified schema database name, .., table name – table will be created in specified database, in default schema of the user. Note that this single line of the notation describes each of the naming schemes in deterministic way. The ‘optionality’ of the schema_name element is nested within database_name.. section. You can use either database_name and optional schema name, or just schema name – this is specified by the pipe character ‘|’. The error that user gets with execution of the first script fragment in this post is as follows: Msg 156, Level 15, State 1, Line 2 Incorrect syntax near the keyword 'DEFAULT'. Ok, let’s have a look how to find out the correct syntax. Line number 3 of the BNF fragment above contains reference to <column_definition>. Since column_definition is in angle brackets, we know that this is a reference to notion described further in the code. And indeed, the very next fragment of BNF contains syntax of the column definition. 1 <column_definition> ::= 2 column_name <data_type> 3 [ FILESTREAM ] 4 [ COLLATE collation_name ] 5 [ NULL | NOT NULL ] 6 [ 7 [ CONSTRAINT constraint_name ] DEFAULT constant_expression ] 8 | [ IDENTITY [ ( seed ,increment ) ] [ NOT FOR REPLICATION ] 9 ] 10 [ ROWGUIDCOL ] [ <column_constraint> [ ...n ] ] 11 [ SPARSE ] Look at line 7 in the above fragment. It says, that the column can have a DEFAULT constraint which, if you want to name it, has to be prepended with [CONSTRAINT constraint_name] sequence. The name of the constraint is optional, but I strongly recommend you to make the effort of coming up with some meaningful name yourself. So the correct syntax of the CREATE TABLE statement from the beginning of the article is like this: 1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) ); That is practically everything you should know about BNF. I encourage you to study the syntax definitions for various statements and commands in Books Online, you can find really interesting things hidden there. Technorati Tags: SQL Server,t-sql,BNF,syntax   (1) No, my answer usually is a question – ‘What error message? What does it say?’. You’d be surprised to know how many people think I can go through time and space and look at their screen at the moment they received the error.

    Read the article

  • South Florida Code Camp 2010 &ndash; VI &ndash; 2010-02-27

    - by Dave Noderer
    Catching up after our sixth code camp here in the Ft Lauderdale, FL area. Website at: http://www.fladotnet.com/codecamp. For the 5th time, DeVry University hosted the event which makes everything else really easy! Statistics from 2010 South Florida Code Camp: 848 registered (we use Microsoft Group Events) ~ 600 attended (516 took name badges) 64 speakers (including speaker idol) 72 sessions 12 parallel tracks Food 400 waters 600 sodas 900 cups of coffee (it was cold!) 200 pounds of ice 200 pizza's 10 large salad trays 900 mouse pads Photos on facebook Dave Noderer: http://www.facebook.com/home.php#!/album.php?aid=190812&id=693530361 Joe Healy: http://www.facebook.com/devfish?ref=mf#!/album.php?aid=202787&id=720054950 Will Strohl:http://www.facebook.com/home.php#!/album.php?aid=2045553&id=1046966128&ref=mf Veronica Gonzalez: http://www.facebook.com/home.php#!/album.php?aid=150954&id=672439484 Florida Speaker Idol One of the sessions at code camp was the South Florida Regional speaker idol competition. After user group level competitions there are five competitors. I acted as MC and score keeper while Ed Hill, Bob O’Connell, John Dunagan and Shervin Shakibi were judges. This statewide competition is being run by Roy Lawsen in Lakeland and the winner, Jeff Truman from Naples will move on to the state finals to be held at the Orlando Code Camp on 3/27/2010: http://www.orlandocodecamp.com/. Each speaker has 10 minutes. The participants were: Alex Koval Jeff Truman Jared Nielsen Chris Catto Venkat Narayanasamy They all did a great job and I’m working with each to make sure they don’t stop there and start speaking at meetings. Thanks to everyone involved! Volunteers As always events like this don’t happen without a lot of help! The key people were: Ed Hill, Bob O’Connell – DeVry For the months leading up to the event, Ed collects all of the swag, books, etc and stores them. He holds meeting with various DeVry departments to coordinate the day, he works with the students in the days  before code camp to stuff bags, print signs, arrange tables and visit BJ’s for our supplies (I go and pay but have a small car!). And of course the day of the event he is there at 5:30 am!! We took two SUV’s to BJ’s, i was really worried that the 36 cases of water were going to break his rear axle! He also helps with the students and works very hard before and after the event. Rainer Haberman – Speakers and Volunteer of the Year Rainer has helped over the past couple of years but this time he took full control of arranging the tracks. I did some preliminary work solicitation speakers but he took over all communications after that. We have tried various organizations around speakers, chair per track, central team but having someone paying attention to the details is definitely the way to go! This was the first year I did not have to jump in at the last minute and re-arrange everything. There were lots of kudo’s from the speakers too saying they felt it was more organized than they have experienced in the past from any code camp. Thanks Rainer! Ray Alamonte – Book Swap We saw the idea of a book swap from the Alabama Code Camp and thought we would give it a try. Ray jumped in and took control. The idea was to get people to bring their old technical books to swap or for others to buy. You got a ticket for each book you brought that you could then turn in to buy another book. If you did not have a ticket you could buy a book for $1. Net proceeds were $153 which I rounded up and donated to the Red Cross. There is plenty going on in Haiti and Chile! I don’t think we really got a count of how many books came in. I many cases the books barely hit the table before being picked up again. At the end we were left with a dozen books which we donated to the DeVry library. A great success we will definitely do again! Jace Weiss / Ratchelen Hut – Coffee and Snacks Wow, this was an eye opener. In past years a few of us would struggle to give some attention to coffee, snacks, etc. But it was always tenuous and always ended up running out of coffee. In the past we have tried buying Dunkin Donuts coffee, renting urns, borrowing urns, etc. This year I actually purchased 2 – 100 cup Westbend commercial brewers plus a couple of small urns (30 and 60 cup we used for decaf). We got them both started early (although i forgot to push the on button on one!) and primed it with 10 boxes of Joe from Dunkin. then Jace and Rachelen took over.. once a batch was brewed they would refill the boxes, keep the area clean and at one point were filling cups. We never ran out of coffee and served a few hundred more than last  year. We did look but next year I’ll get a large insulated (like gatorade) dispensing container. It all went very smoothly and having help focused on that one area was a big win. Thanks Jace and Rachelen! Ken & Shirley Golding / Roberta Barbosa – Registration Ken & Shirley showed up and took over registration. This year we printed small name tags for everyone registered which was great because it is much easier to remember someone’s name when they are labeled! In any case it went the smoothest it has ever gone. All three were actively pulling people through the registration, answering questions, directing them to bags and information very quickly. I did not see that there was too big a line at any time. Thanks!! Scott Katarincic / Vishal Shukla – Website For the 3rd?? year in a row, Scott was in charge of the website starting in August or September when I start on code camp. He handles all the requests, makes changes to the site and admin. I think two years ago he wrote all the backend administration and tunes it and the website a bit but things are pretty stable. The only thing I do is put up the sponsors. It is a big pressure off of me!! Thanks Scott! Vishal jumped into the web end this year and created a new Silverlight agenda page to replace the old ajax page. We will continue to enhance this but it is definitely a good step forward! Thanks! Alex Funkhouser – T-shirts/Mouse pads/tables/sponsors Alex helps in many areas. He helps me bring in sponsors and handles all the logistics for t-shirts, sponsor tables and this year the mouse pads. He is also a key person to help promote the event as well not to mention the after after party which I did not attend and don’t want to know much about! Students There were a number of student volunteers but don’t have all of their names. But thanks to them, they stuffed bags, patrolled pizza and helped with moving things around. Sponsors We had a bunch of great sponsors which allowed us to feed people and give a way a lot of great swag. Our major sponsors of DeVry, Microsoft (both DPE and UGSS), Infragistics, Telerik, SQL Share (End to End, SQL Saturdays), and Interclick are very much appreciated. The other sponsors Applied Innovations (also supply code camp hosting), Ultimate Software (a great local SW company), Linxter (reliable cloud messaging we are lucky to have here!), Mediascend (a media startup), SoftwareFX (another local SW company we are happy to have back participating in CC), CozyRoc (if you do SSIS, check them out), Arrow Design (local DNN and Silverlight experts),Boxes and Arrows (a local SW consulting company) and Robert Half. One thing we did this year besides a t-shirt was a mouse pad. I like it because it will be around for a long time on many desks. After much investigation and years of using mouse pad’s I’ve determined that the 1/8” fabric top is the best and that is what we got!   So now I get a break for a few months before starting again!

    Read the article

  • XNA Notes 010

    - by George Clingerman
    With GDC 2011 wrapping up there were a LOT of great interviews and posts with and about XNA and XBLIG and some of our more notorious developers. Definitely worth spending many, many hours watching, listening and reading all those. Very inspiring! Also, don’t forget to get signed up for Dream Build Play! And just as an early warning reminder do NOT, I repeat do NOT wait to submit your game the last day. There are major issues submitting the last day every year and you do not want all your hard work to be hanging on whether your entry actually went through in that last day. Plan on submitting a few days if not a week before. I’m serious, you’ll thank yourself later! Now on to what’s happening in the XNA community! Time Critical XNA News: PAX East Meet Up (really wish I was going!) http://forums.create.msdn.com/forums/p/71921/439262.aspx Want to stay panicked about the countdown to Dream Build Play? Mike McLaughlin shares his DBP countdown clock http://twitter.com/#!/mikebmcl/status/44454458960252928 XNA Team: Nick Gravelyn Only needs less than 600 new users in his unique marketing plan for Pixel Man 2 http://nickgravelyn.com/pixelman2/ And hares his ad revenue numbers with his XNA WP7 games http://theoneswiththelight.com/2011/my-results-with-ad-revenue-for-wp7-games/ XNA MVPs: Andy “The ZMan” Dunn posts his 15,000th App Hub forum post and shares a few thoughts on the MVP summit http://forums.create.msdn.com/forums/t/77625.aspx Chris Williams shares his thoughts on the MVP summit http://geekswithblogs.net/cwilliams/archive/2011/03/07/144229.aspx XNA Developers: Nathan Fouts of Mommy’s Best games Wraps up GDC http://mommysbest.blogspot.com/2011/03/gdc-2011-wrapped.html And shares the wonderful screenshots from Serious Sam. (I’m so jealous people at PAX East willl be playing a demo of this game!) http://mommysbest.blogspot.com/2011/03/serious-sam-double-d.html James Silva of Ska Studios announces http://www.ska-studios.com/2011/03/09/vampire-smile-at-hotel-sierra/ http://www.ska-studios.com/2011/03/08/vengeance-begins-april-6th/ http://www.ska-studios.com/2011/03/04/good-morning-gato-52/ Michael McLaughlin writes an extremely useful set of tips for XNA WP7 developers http://geekswithblogs.net/mikebmcl/archive/2011/03/10/tips-for-xna-wp7-developers.aspx Robert Boyd “the one man XBLIG improving machine” posts his 9 tips for marketing an Xbox LIVE Indie Gam http://www.gamasutra.com/blogs/RobertBoyd/20110309/7183/9_Tips_for_XBLIG_Marketing.php http://forums.create.msdn.com/forums/p/77534/470586.aspx#470586 And shares his day by day experience at GDC this year http://www.gamasutra.com/blogs/RobertBoyd/20110301/7118/GDC_Saves_the_World__Impressions_Day_1.php http://www.gamasutra.com/blogs/RobertBoyd/20110301/7123/GDC_Saves_the_World__Impressions_Day_2.php http://www.gamasutra.com/blogs/RobertBoyd/20110303/7129/GDC_Saves_the_World__Impressions_Day_3.php http://www.gamasutra.com/blogs/RobertBoyd/20110307/7133/GDC_Saves_the_World__Impressions_Day_4.php http://www.gamasutra.com/blogs/RobertBoyd/20110307/7160/GDC_Saves_the_World__Impressions_Day_5.php Phillipe Da Silva releases new IGF Pong Sample preview http://www.vimeo.com/20904070 Xbox LIVE Indie Games (XBLIG): Gamergeddon posts XBox Indie Game Roundup for March 6th http://www.gamergeddon.com/2011/03/06/xbox-indie-game-round-up-march-6th/ Dealspwn interviews FortressCraft developer Projector Games http://www.dealspwn.com/fortresscraft-developer-interview-minecraft-clones-venting-haters-part-1/ http://www.dealspwn.com/fortresscraft-developer-interview-part-2-trials-tribulations-indie-development/ Writings of Mass Destruction continues the Xbox LIVE Indie Game a day campaign, here’s his take on FishCraft (be sure to check out his other posts!) http://writingsofmassdeduction.com/2011/03/05/day-116-fishcraft/ Tom Ogburn shares his GDC notes on the XBLIG panel jotted quickly while attending the panel http://twitter.com/#!/TOgburn/status/44454191028125696 http://www.starlitskygames.com/blogs/site_news/archive/2011/03/06/802.aspx Dave Voyles of Armless Octopus has crazy good coverage on XNA and Xbox LIVE Indie Game developers at GDC 2011. Interviews and articles all extremely well done! http://www.armlessoctopus.com/2011/03/06/gdc-2011-successful-indie-developers-share-insight-on-microsofts-self-publishing-service/ There’s honestly so many posts and interviews you should just hit his front page and scroll down through all of the latest ones. http://www.armlessoctopus.com/ GameMarx Episode 12 http://www.gamemarx.com/video/the-show/27/ep-12-march-4-2011.aspx B.U.T.T.O.N now on Steam! http://www.gamesetwatch.com/2011/03/button_party_game_now_on_steam.php German Xbox Dashboard gets review program from GamePro http://www.armlessoctopus.com/2011/03/07/gamepo-indie-review-show-debuts-on-german-xbox-dashboard/ XboxIndies.com (one of the best XNA sites out there at this point!) continues to add review sites to it’s main review feed. (And don’t forget to play with that awesome XBLIG pivot control!) http://xboxindies.com/ Kris Steele of FunInfused Games shares early footage of his game World of Chalk http://twitter.com/#!/kriswd40/status/45007114371989504 Raymond Matthews of Darkstarmatryx reviews FunInfused Games Abduction Action http://www.darkstarmatryx.com/?p=264 TheVideoGamerRob reviews Zombie Football Carnage http://videogamerrob.wordpress.com/2011/03/08/xblig-review-zombie-football-carnage/ XBLIG Square Off Making the Jump to WP7 http://www.wp7connect.com/2011/03/08/xblig-square-off-will-make-the-jump-to-windows-phone/ Mommy’s Best Games making the news round with their Serious Sam announcement http://www.joystiq.com/2011/03/09/serious-sam-gets-serious-indie-cred-with-new-indie-series/ Most quoted and linked XBLIG article of the week with the least amount of actual facts and reporting. Shared only because it makes me sad that this is the best coverage we get. (Hey reporters, there’s LOT and LOTS of XBLIG and XNA experts you can contact if you need to check up on facts or wonder why on questions like, Why can’t XBLIGs have Nazis? There’s actually a real answer for that..) http://www.joystiq.com/2011/03/06/xblig-facts-nazi-killing-a-no-no-revenue-a-yes-yes/ XNA Development: Mort8088 has been in an XNA tutorial writing frenzy releasing 4 XNA 4.0 entry level tutorials this week! http://mort8088.com/2011/03/06/xna-4-0-tutorial-0-intro/ http://mort8088.com/2011/03/06/xna-4-0-tutorial-1-fonts/ http://mort8088.com/2011/03/06/xna-4-0-tutorial-2-sprites/ http://mort8088.com/2011/03/06/xna-4-0-tutorial-3-input-from-keyboard/ Interesting discussion on what it means to be a community (you do have to sign up to be a member of the XNA UK forums to read it...) http://twitter.com/#!/XNAUK/status/44705269254594560 Slyprid continues his incredible pace on Transmute and shares screens of his new Animation Builder http://twitter.com/#!/slyprid/status/45169271847911424 http://forgottenstarstudios.com/blog/ Philippe Da Silva wants to know who is using IGF for their games. If it’s you, drop him a note letting him know! http://twitter.com/#!/philippedasilva/status/44325893719588864 New Sunburn Video Tutorials released http://www.synapsegaming.com/blogs/fivesidedbarrel/archive/2011/03/07/new-documentation-video-tutorials.aspx Loading and rendering animated collada models using XNA 4.0 http://bunkernetz.wordpress.com/2011/03/09/loading-and-rendering-animated-collada-models-using-xna-4-0/ XNA for Silverlight Developers Part 6 Accelerometer Input http://buzzgamesnews.blogspot.com/2011/03/xna-for-silverlight-developers-part-6.html

    Read the article

  • Ada and 'The Book'

    - by Phil Factor
    The long friendship between Charles Babbage and Ada Lovelace created one of the most exciting and mysterious of collaborations ever to have resulted in a technological breakthrough. The fireworks that created by the collision of two prodigious mathematical and creative talents resulted in an invention, the Analytical Engine, which went on to change society fundamentally. However, beyond that, we just don't know what the bulk of their collaborative work was about:;  it was done in strictest secrecy. Even the known outcome of their friendship, the first programmable computer, was shrouded in mystery. At the time, nobody, except close friends and family, had any idea of Ada Byron's contribution to the invention of the ‘Engine’, and how to program it. Her great insight was published in August 1843, under the initials AAL, standing for Ada Augusta Lovelace, her title then being the Countess of Lovelace. It was contained in a lengthy ‘note’ to her translation of a publication that remains the best description of Babbage's amazing Analytical Engine. The secret identity of the person behind those enigmatic initials was finally revealed by Prince de Polignac who, seventy years later, wrote to Ada's daughter to seek confirmation that her mother had, indeed, been the author of the brilliant sentences that described so accurately how Babbage's mechanical computer could be programmed with punch-cards. L.F. Menabrea's paper on the Analytical Engine first appeared in the 'Bibliotheque Universelle de Geneve' in October 1842, and Ada translated it anonymously for Taylor's 'Scientific Memoirs'. Charles Babbage was surprised that she had not written an original paper as she already knew a surprising amount about the way the machine worked. He persuaded her to at least write some explanatory notes. These notes ended up extending to four times the length of the original article and represented the first published account of how a machine could be programmed to perform any calculation. Her example of programming the Bernoulli sequence would have worked on the Analytical engine had the device’s construction been completed, and gave Ada an unassailable claim to have invented the art of programming. What was the reason for Ada's secrecy? She was the only legitimate child of Lord Byron, who was probably the best known celebrity of the age, so she was already famous. She was a senior aristocrat, with titles, a fortune in money and vast estates in the Midlands. She had political influence, and was the cousin of Lord Melbourne, who was the Prime Minister at that time. She was friendly with the young Queen Victoria. Her mathematical activities were a pastime, and not one that would be considered by others to be in keeping with her roles and responsibilities. You wouldn't dare to dream up a fictional heroine like Ada. She was dazzlingly beautiful and talented. She could speak several languages fluently, and play some musical instruments with professional skill. Contemporary accounts refer to her being 'accomplished in science, art and literature'. On top of that, she was a brilliant mathematician, a talent inherited from her mother, Annabella Milbanke. In her mother's circle of literary and scientific friends was Charles Babbage, and Ada's friendship with him dates from her teenage zest for Mathematics. She was one of the first people he'd ever met who understood what he had attempted to achieve with the 'Difference Engine', and with whom he could converse as intellectual equals. He arranged for her to have an education from the most talented academics in the country. Ada melted the heart of the cantankerous genius to the point that he became a faithful and loyal father-figure to her. She was one of the very few who could grasp the principles of the later, and very different, ‘Analytical Engine’ which was designed from the start to tackle a variety of tasks. Sadly, Ada Byron's life ended less than a decade after completing the work that assured her long-term fame, in November 1852. She was dying of cancer, her gambling habits had caused her to run up huge debts, she'd had more than one affairs, and she was being blackmailed. Her brilliant but unempathic mother was nursing her in her final illness, destroying her personal letters and records, and repaying her debts. Her husband was distraught but helpless. Charles Babbage, however, maintained his steadfast paternalistic friendship to the end. She appointed her loyal friend to be her executor. For years, she and Babbage had been working together on a secret project, known only as 'The Book'. We have a clue to what it was in a letter written by her nine years earlier, on 11th August 1843. It was a joint project by herself and Lord Lovelace, her husband, and was intended to involve Babbage's 'undivided energies'. It involved 'consulting your Engine' (it required Babbage’s computer). The letter gives no hint about the project except for the high-minded nature of its purpose, and its highly mathematical nature.  From then on, the surviving correspondence between the two gives only veiled references to 'The Book'. There isn't much, since Babbage later destroyed any letters that could have damaged her reputation within the Establishment. 'I cannot spare the book today, which I am very sorry for. At the moment I want it for constant reference, but I think you can have it tomorrow' (Oct 1844)  And 'I will send you the book directly, and you can say, when you receive it, how long you will want to keep it'. (Nov 1844)  The two of them were obviously intent on the work: She writes, four years later, 'I have an engagement for Wednesday which will prevent me from attending to your wishes about the book' (Dec 1848). This was something that they both needed to work on, but could not do in parallel: 'I will send the book on Tuesday, and it can be left with you till Friday' (11 Feb 1849). After six years work, it had been so well-handled that it was beginning to fall apart: 'Don't forget the new cover you promised for the book. The poor book is very shabby and wants one' (20 Sept 1849). So what was going on? The word 'book' was not a code-word: it was a real book, probably a 'printer's blank', plain paper, but properly bound so printers and publishers could show off how the published work might look. The hints from the correspondence are of advanced mathematics. It is obvious that the book was travelling between them, back and forth, each one working on it for less than a week before passing it back. Ada and her husband were certainly involved in gambling large sums of money on the horses, and so most biographers have concluded that the three of them were trying to calculate the mathematical odds on the horses. This theory has three large problems. Firstly, Ada's original letter proposing the project refers to its high-minded nature. Babbage was temperamentally opposed to gambling and would scarcely have given so much time to the project, even though he was devoted to Ada. Secondly, Babbage would have very soon have realized the hopelessness of trying to beat the bookies. This sort of betting never attracts his type of intellectual background. The third problem is that any work on calculating the odds on horses would not need a well-thumbed book to pass back and forth between them; they would have not had to work in series. The original project was instigated by Ada, along with her husband, William King-Noel, 1st Earl of Lovelace. Charles Babbage was invited to join the project after the couple had come up with the idea. What could William have contributed? One might assume that William was a Bertie Wooster character, addicted only to the joys of the turf, but this was far from the truth. He was a scientist, a Cambridge graduate who was later elected to be a Fellow of the Royal Society. After Eton, he went to Trinity College, Cambridge. On graduation, he entered the diplomatic service and acted as secretary under Lord Nugent, who was Lord Commissioner of the Ionian Islands. William was very friendly with Babbage too, able to discuss scientific matters on equal terms. He was a capable engineer who invented a process for bending large timbers by the application of steam heat. He delivered a paper to the Institution of Civil Engineers in 1849, and received praise from the great engineer, Isambard Kingdom Brunel. As well as being Lord Lieutenant of the County of Surrey for most of Victoria's reign, he had time for a string of scientific and engineering achievements. Whatever the project was, it is unlikely that William was a junior partner. After Ada's death, the project disappeared. Then, two years later, Babbage, through one of his occasional outbursts of temper, demonstrated that he was able to decrypt one of the most powerful of secret codes, Vigenère's autokey cipher.  All contemporary diplomatic and military messages used a variant of this cipher. Babbage had made three important discoveries, namely, the mathematical law of this cipher, the principle of the key periodicity, and the technique of the symmetry of position. The technique is now known as the Kasiski examination, also called the Kasiski test, but Babbage got there first. At one time, he listed amongst his future projects, the writing of a book 'The Philosophy of Decyphering', but it never came to anything. This discovery was going to change the course of history, since it was used to decipher the Russians’ military dispatches in the Crimean war. Babbage himself played a role during the Crimean War as a cryptographical adviser to his friend, Rear-Admiral Sir Francis Beaufort of the Admiralty. This is as much as we can be certain about in trying to make sense of the bulk of the time that Charles Babbage and Ada Lovelace worked together. Nine years of intensive work, involving the 'Engine' and a great deal of mathematics and research seems to have been lost: or has it? I've argued in the past http://www.simple-talk.com/community/blogs/philfactor/archive/2008/06/13/59614.aspx that the cracking of the Vigenère autokey cipher, was a fundamental motive behind the British Government's support and funding of the 'Difference Engine'. The Duke of Wellington, whose understanding of the military significance of being able to read enemy dispatches, was the most steadfast advocate of the project. If the three friends were actually doing the work of cracking codes by mathematical techniques that used the techniques of key periodicity, and symmetry of position (the use of a book being passed quickly to and fro is very suggestive), intending to then use the 'Engine' to do the routine cracking of each dispatch, then this is a rather different story. The project was Ada and William's idea. (William had served in the diplomatic service and would be familiar with the use of codes). This makes Ada Lovelace the initiator of a project which, by giving both Britain, and probably the USA, a diplomatic and military advantage in the second part of the Nineteenth century, changed world history. Ada would never have wanted any credit for cracking the cipher, and developing the method that rendered all contemporary military and diplomatic ciphering techniques nugatory; quite the reverse. And it is clear from the gaps in the record of the letters between the collaborators that the evidence was destroyed, probably on her request by her irascible but intensely honorable executor, Charles Babbage. Charles Babbage toyed with the idea of going public, but the Crimean war put an end to that. The British Government had a valuable secret, and intended to keep it that way. Ada and Charles had quite often discussed possible moneymaking projects that would fund the development of the Analytic Engine, the first programmable computer, but their secret work was never in the running as a potential cash cow. I suspect that the British Government was, even then, working on the concealment of a discovery whose value to the nation depended on it remaining so. The success of code-breaking in the Crimean war, and the American Civil war, led to the British and Americans  subsequently giving much more weight and funding to the science of decryption. Paradoxically, this makes Ada's contribution even closer to the creation of Colossus, the first digital computer, at Bletchley Park, specifically to crack the Nazi’s secret codes.

    Read the article

  • Inside the Concurrent Collections: ConcurrentBag

    - by Simon Cooper
    Unlike the other concurrent collections, ConcurrentBag does not really have a non-concurrent analogy. As stated in the MSDN documentation, ConcurrentBag is optimised for the situation where the same thread is both producing and consuming items from the collection. We'll see how this is the case as we take a closer look. Again, I recommend you have ConcurrentBag open in a decompiler for reference. Thread Statics ConcurrentBag makes heavy use of thread statics - static variables marked with ThreadStaticAttribute. This is a special attribute that instructs the CLR to scope any values assigned to or read from the variable to the executing thread, not globally within the AppDomain. This means that if two different threads assign two different values to the same thread static variable, one value will not overwrite the other, and each thread will see the value they assigned to the variable, separately to any other thread. This is a very useful function that allows for ConcurrentBag's concurrency properties. You can think of a thread static variable: [ThreadStatic] private static int m_Value; as doing the same as: private static Dictionary<Thread, int> m_Values; where the executing thread's identity is used to automatically set and retrieve the corresponding value in the dictionary. In .NET 4, this usage of ThreadStaticAttribute is encapsulated in the ThreadLocal class. Lists of lists ConcurrentBag, at its core, operates as a linked list of linked lists: Each outer list node is an instance of ThreadLocalList, and each inner list node is an instance of Node. Each outer ThreadLocalList is owned by a particular thread, accessible through the thread local m_locals variable: private ThreadLocal<ThreadLocalList<T>> m_locals It is important to note that, although the m_locals variable is thread-local, that only applies to accesses through that variable. The objects referenced by the thread (each instance of the ThreadLocalList object) are normal heap objects that are not specific to any thread. Thinking back to the Dictionary analogy above, if each value stored in the dictionary could be accessed by other means, then any thread could access the value belonging to other threads using that mechanism. Only reads and writes to the variable defined as thread-local are re-routed by the CLR according to the executing thread's identity. So, although m_locals is defined as thread-local, the m_headList, m_nextList and m_tailList variables aren't. This means that any thread can access all the thread local lists in the collection by doing a linear search through the outer linked list defined by these variables. Adding items So, onto the collection operations. First, adding items. This one's pretty simple. If the current thread doesn't already own an instance of ThreadLocalList, then one is created (or, if there are lists owned by threads that have stopped, it takes control of one of those). Then the item is added to the head of that thread's list. That's it. Don't worry, it'll get more complicated when we account for the other operations on the list! Taking & Peeking items This is where it gets tricky. If the current thread's list has items in it, then it peeks or removes the head item (not the tail item) from the local list and returns that. However, if the local list is empty, it has to go and steal another item from another list, belonging to a different thread. It iterates through all the thread local lists in the collection using the m_headList and m_nextList variables until it finds one that has items in it, and it steals one item from that list. Up to this point, the two threads had been operating completely independently. To steal an item from another thread's list, the stealing thread has to do it in such a way as to not step on the owning thread's toes. Recall how adding and removing items both operate on the head of the thread's linked list? That gives us an easy way out - a thread trying to steal items from another thread can pop in round the back of another thread's list using the m_tail variable, and steal an item from the back without the owning thread knowing anything about it. The owning thread can carry on completely independently, unaware that one of its items has been nicked. However, this only works when there are at least 3 items in the list, as that guarantees there will be at least one node between the owning thread performing operations on the list head and the thread stealing items from the tail - there's no chance of the two threads operating on the same node at the same time and causing a race condition. If there's less than three items in the list, then there does need to be some synchronization between the two threads. In this case, the lock on the ThreadLocalList object is used to mediate access to a thread's list when there's the possibility of contention. Thread synchronization In ConcurrentBag, this is done using several mechanisms: Operations performed by the owner thread only take out the lock when there are less than three items in the collection. With three or greater items, there won't be any conflict with a stealing thread operating on the tail of the list. If a lock isn't taken out, the owning thread sets the list's m_currentOp variable to a non-zero value for the duration of the operation. This indicates to all other threads that there is a non-locked operation currently occuring on that list. The stealing thread always takes out the lock, to prevent two threads trying to steal from the same list at the same time. After taking out the lock, the stealing thread spinwaits until m_currentOp has been set to zero before actually performing the steal. This ensures there won't be a conflict with the owning thread when the number of items in the list is on the 2-3 item borderline. If any add or remove operations are started in the meantime, and the list is below 3 items, those operations try to take out the list's lock and are blocked until the stealing thread has finished. This allows a thread to steal an item from another thread's list without corrupting it. What about synchronization in the collection as a whole? Collection synchronization Any thread that operates on the collection's global structure (accessing anything outside the thread local lists) has to take out the collection's global lock - m_globalListsLock. This single lock is sufficient when adding a new thread local list, as the items inside each thread's list are unaffected. However, what about operations (such as Count or ToArray) that need to access every item in the collection? In order to ensure a consistent view, all operations on the collection are stopped while the count or ToArray is performed. This is done by freezing the bag at the start, performing the global operation, and unfreezing at the end: The global lock is taken out, to prevent structural alterations to the collection. m_needSync is set to true. This notifies all the threads that they need to take out their list's lock irregardless of what operation they're doing. All the list locks are taken out in order. This blocks all locking operations on the lists. The freezing thread waits for all current lockless operations to finish by spinwaiting on each m_currentOp field. The global operation can then be performed while the bag is frozen, but no other operations can take place at the same time, as all other threads are blocked on a list's lock. Then, once the global operation has finished, the locks are released, m_needSync is unset, and normal concurrent operation resumes. Concurrent principles That's the essence of how ConcurrentBag operates. Each thread operates independently on its own local list, except when they have to steal items from another list. When stealing, only the stealing thread is forced to take out the lock; the owning thread only has to when there is the possibility of contention. And a global lock controls accesses to the structure of the collection outside the thread lists. Operations affecting the entire collection take out all locks in the collection to freeze the contents at a single point in time. So, what principles can we extract here? Threads operate independently Thread-static variables and ThreadLocal makes this easy. Threads operate entirely concurrently on their own structures; only when they need to grab data from another thread is there any thread contention. Minimised lock-taking Even when two threads need to operate on the same data structures (one thread stealing from another), they do so in such a way such that the probability of actually blocking on a lock is minimised; the owning thread always operates on the head of the list, and the stealing thread always operates on the tail. Management of lockless operations Any operations that don't take out a lock still have a 'hook' to force them to lock when necessary. This allows all operations on the collection to be stopped temporarily while a global snapshot is taken. Hopefully, such operations will be short-lived and infrequent. That's all the concurrent collections covered. I hope you've found it as informative and interesting as I have. Next, I'll be taking a closer look at ThreadLocal, which I came across while analyzing ConcurrentBag. As you'll see, the operation of this class deserves a much closer look.

    Read the article

  • Replication Services as ETL extraction tool

    - by jorg
    In my last blog post I explained the principles of Replication Services and the possibilities it offers in a BI environment. One of the possibilities I described was the use of snapshot replication as an ETL extraction tool: “Snapshot Replication can also be useful in BI environments, if you don’t need a near real-time copy of the database, you can choose to use this form of replication. Next to an alternative for Transactional Replication it can be used to stage data so it can be transformed and moved into the data warehousing environment afterwards. In many solutions I have seen developers create multiple SSIS packages that simply copies data from one or more source systems to a staging database that figures as source for the ETL process. The creation of these packages takes a lot of (boring) time, while Replication Services can do the same in minutes. It is possible to filter out columns and/or records and it can even apply schema changes automatically so I think it offers enough features here. I don’t know how the performance will be and if it really works as good for this purpose as I expect, but I want to try this out soon!” Well I have tried it out and I must say it worked well. I was able to let replication services do work in a fraction of the time it would cost me to do the same in SSIS. What I did was the following: Configure snapshot replication for some Adventure Works tables, this was quite simple and straightforward. Create an SSIS package that executes the snapshot replication on demand and waits for its completion. This is something that you can’t do with out of the box functionality. While configuring the snapshot replication two SQL Agent Jobs are created, one for the creation of the snapshot and one for the distribution of the snapshot. Unfortunately these jobs are  asynchronous which means that if you execute them they immediately report back if the job started successfully or not, they do not wait for completion and report its result afterwards. So I had to create an SSIS package that executes the jobs and waits for their completion before the rest of the ETL process continues. Fortunately I was able to create the SSIS package with the desired functionality. I have made a step-by-step guide that will help you configure the snapshot replication and I have uploaded the SSIS package you need to execute it. Configure snapshot replication   The first step is to create a publication on the database you want to replicate. Connect to SQL Server Management Studio and right-click Replication, choose for New.. Publication…   The New Publication Wizard appears, click Next Choose your “source” database and click Next Choose Snapshot publication and click Next   You can now select tables and other objects that you want to publish Expand Tables and select the tables that are needed in your ETL process In the next screen you can add filters on the selected tables which can be very useful. Think about selecting only the last x days of data for example. Its possible to filter out rows and/or columns. In this example I did not apply any filters. Schedule the Snapshot Agent to run at a desired time, by doing this a SQL Agent Job is created which we need to execute from a SSIS package later on. Next you need to set the Security Settings for the Snapshot Agent. Click on the Security Settings button.   In this example I ran the Agent under the SQL Server Agent service account. This is not recommended as a security best practice. Fortunately there is an excellent article on TechNet which tells you exactly how to set up the security for replication services. Read it here and make sure you follow the guidelines!   On the next screen choose to create the publication at the end of the wizard Give the publication a name (SnapshotTest) and complete the wizard   The publication is created and the articles (tables in this case) are added Now the publication is created successfully its time to create a new subscription for this publication.   Expand the Replication folder in SSMS and right click Local Subscriptions, choose New Subscriptions   The New Subscription Wizard appears   Select the publisher on which you just created your publication and select the database and publication (SnapshotTest)   You can now choose where the Distribution Agent should run. If it runs at the distributor (push subscriptions) it causes extra processing overhead. If you use a separate server for your ETL process and databases choose to run each agent at its subscriber (pull subscriptions) to reduce the processing overhead at the distributor. Of course we need a database for the subscription and fortunately the Wizard can create it for you. Choose for New database   Give the database the desired name, set the desired options and click OK You can now add multiple SQL Server Subscribers which is not necessary in this case but can be very useful.   You now need to set the security settings for the Distribution Agent. Click on the …. button Again, in this example I ran the Agent under the SQL Server Agent service account. Read the security best practices here   Click Next   Make sure you create a synchronization job schedule again. This job is also necessary in the SSIS package later on. Initialize the subscription at first synchronization Select the first box to create the subscription when finishing this wizard Complete the wizard by clicking Finish The subscription will be created In SSMS you see a new database is created, the subscriber. There are no tables or other objects in the database available yet because the replication jobs did not ran yet. Now expand the SQL Server Agent, go to Jobs and search for the job that creates the snapshot:   Rename this job to “CreateSnapshot” Now search for the job that distributes the snapshot:   Rename this job to “DistributeSnapshot” Create an SSIS package that executes the snapshot replication We now need an SSIS package that will take care of the execution of both jobs. The CreateSnapshot job needs to execute and finish before the DistributeSnapshot job runs. After the DistributeSnapshot job has started the package needs to wait until its finished before the package execution finishes. The Execute SQL Server Agent Job Task is designed to execute SQL Agent Jobs from SSIS. Unfortunately this SSIS task only executes the job and reports back if the job started succesfully or not, it does not report if the job actually completed with success or failure. This is because these jobs are asynchronous. The SSIS package I’ve created does the following: It runs the CreateSnapshot job It checks every 5 seconds if the job is completed with a for loop When the CreateSnapshot job is completed it starts the DistributeSnapshot job And again it waits until the snapshot is delivered before the package will finish successfully Quite simple and the package is ready to use as standalone extract mechanism. After executing the package the replicated tables are added to the subscriber database and are filled with data:   Download the SSIS package here (SSIS 2008) Conclusion In this example I only replicated 5 tables, I could create a SSIS package that does the same in approximately the same amount of time. But if I replicated all the 70+ AdventureWorks tables I would save a lot of time and boring work! With replication services you also benefit from the feature that schema changes are applied automatically which means your entire extract phase wont break. Because a snapshot is created using the bcp utility (bulk copy) it’s also quite fast, so the performance will be quite good. Disadvantages of using snapshot replication as extraction tool is the limitation on source systems. You can only choose SQL Server or Oracle databases to act as a publisher. So if you plan to build an extract phase for your ETL process that will invoke a lot of tables think about replication services, it would save you a lot of time and thanks to the Extract SSIS package I’ve created you can perfectly fit it in your usual SSIS ETL process.

    Read the article

  • Operator of the week - Assert

    - by Fabiano Amorim
    Well my friends, I was wondering how to help you in a practical way to understand execution plans. So I think I'll talk about the Showplan Operators. Showplan Operators are used by the Query Optimizer (QO) to build the query plan in order to perform a specified operation. A query plan will consist of many physical operators. The Query Optimizer uses a simple language that represents each physical operation by an operator, and each operator is represented in the graphical execution plan by an icon. I'll try to talk about one operator every week, but so as to avoid having to continue to write about these operators for years, I'll mention only of those that are more common: The first being the Assert. The Assert is used to verify a certain condition, it validates a Constraint on every row to ensure that the condition was met. If, for example, our DDL includes a check constraint which specifies only two valid values for a column, the Assert will, for every row, validate the value passed to the column to ensure that input is consistent with the check constraint. Assert  and Check Constraints: Let's see where the SQL Server uses that information in practice. Take the following T-SQL: IF OBJECT_ID('Tab1') IS NOT NULL   DROP TABLE Tab1 GO CREATE TABLE Tab1(ID Integer, Gender CHAR(1))  GO  ALTER TABLE TAB1 ADD CONSTRAINT ck_Gender_M_F CHECK(Gender IN('M','F'))  GO INSERT INTO Tab1(ID, Gender) VALUES(1,'X') GO To the command above the SQL Server has generated the following execution plan: As we can see, the execution plan uses the Assert operator to check that the inserted value doesn't violate the Check Constraint. In this specific case, the Assert applies the rule, 'if the value is different to "F" and different to "M" than return 0 otherwise returns NULL'. The Assert operator is programmed to show an error if the returned value is not NULL; in other words, the returned value is not a "M" or "F". Assert checking Foreign Keys Now let's take a look at an example where the Assert is used to validate a foreign key constraint. Suppose we have this  query: ALTER TABLE Tab1 ADD ID_Genders INT GO  IF OBJECT_ID('Tab2') IS NOT NULL   DROP TABLE Tab2 GO CREATE TABLE Tab2(ID Integer PRIMARY KEY, Gender CHAR(1))  GO  INSERT INTO Tab2(ID, Gender) VALUES(1, 'F') INSERT INTO Tab2(ID, Gender) VALUES(2, 'M') INSERT INTO Tab2(ID, Gender) VALUES(3, 'N') GO  ALTER TABLE Tab1 ADD CONSTRAINT fk_Tab2 FOREIGN KEY (ID_Genders) REFERENCES Tab2(ID) GO  INSERT INTO Tab1(ID, ID_Genders, Gender) VALUES(1, 4, 'X') Let's look at the text execution plan to see what these Assert operators were doing. To see the text execution plan just execute SET SHOWPLAN_TEXT ON before run the insert command. |--Assert(WHERE:(CASE WHEN NOT [Pass1008] AND [Expr1007] IS NULL THEN (0) ELSE NULL END))      |--Nested Loops(Left Semi Join, PASSTHRU:([Tab1].[ID_Genders] IS NULL), OUTER REFERENCES:([Tab1].[ID_Genders]), DEFINE:([Expr1007] = [PROBE VALUE]))           |--Assert(WHERE:(CASE WHEN [Tab1].[Gender]<>'F' AND [Tab1].[Gender]<>'M' THEN (0) ELSE NULL END))           |    |--Clustered Index Insert(OBJECT:([Tab1].[PK]), SET:([Tab1].[ID] = RaiseIfNullInsert([@1]),[Tab1].[ID_Genders] = [@2],[Tab1].[Gender] = [Expr1003]), DEFINE:([Expr1003]=CONVERT_IMPLICIT(char(1),[@3],0)))           |--Clustered Index Seek(OBJECT:([Tab2].[PK]), SEEK:([Tab2].[ID]=[Tab1].[ID_Genders]) ORDERED FORWARD) Here we can see the Assert operator twice, first (looking down to up in the text plan and the right to left in the graphical plan) validating the Check Constraint. The same concept showed above is used, if the exit value is "0" than keep running the query, but if NULL is returned shows an exception. The second Assert is validating the result of the Tab1 and Tab2 join. It is interesting to see the "[Expr1007] IS NULL". To understand that you need to know what this Expr1007 is, look at the Probe Value (green text) in the text plan and you will see that it is the result of the join. If the value passed to the INSERT at the column ID_Gender exists in the table Tab2, then that probe will return the join value; otherwise it will return NULL. So the Assert is checking the value of the search at the Tab2; if the value that is passed to the INSERT is not found  then Assert will show one exception. If the value passed to the column ID_Genders is NULL than the SQL can't show a exception, in that case it returns "0" and keeps running the query. If you run the INSERT above, the SQL will show an exception because of the "X" value, but if you change the "X" to "F" and run again, it will show an exception because of the value "4". If you change the value "4" to NULL, 1, 2 or 3 the insert will be executed without any error. Assert checking a SubQuery: The Assert operator is also used to check one subquery. As we know, one scalar subquery can't validly return more than one value: Sometimes, however, a  mistake happens, and a subquery attempts to return more than one value . Here the Assert comes into play by validating the condition that a scalar subquery returns just one value. Take the following query: INSERT INTO Tab1(ID_TipoSexo, Sexo) VALUES((SELECT ID_TipoSexo FROM Tab1), 'F')    INSERT INTO Tab1(ID_TipoSexo, Sexo) VALUES((SELECT ID_TipoSexo FROM Tab1), 'F')    |--Assert(WHERE:(CASE WHEN NOT [Pass1016] AND [Expr1015] IS NULL THEN (0) ELSE NULL END))        |--Nested Loops(Left Semi Join, PASSTHRU:([tempdb].[dbo].[Tab1].[ID_TipoSexo] IS NULL), OUTER REFERENCES:([tempdb].[dbo].[Tab1].[ID_TipoSexo]), DEFINE:([Expr1015] = [PROBE VALUE]))              |--Assert(WHERE:([Expr1017]))             |    |--Compute Scalar(DEFINE:([Expr1017]=CASE WHEN [tempdb].[dbo].[Tab1].[Sexo]<>'F' AND [tempdb].[dbo].[Tab1].[Sexo]<>'M' THEN (0) ELSE NULL END))              |         |--Clustered Index Insert(OBJECT:([tempdb].[dbo].[Tab1].[PK__Tab1__3214EC277097A3C8]), SET:([tempdb].[dbo].[Tab1].[ID_TipoSexo] = [Expr1008],[tempdb].[dbo].[Tab1].[Sexo] = [Expr1009],[tempdb].[dbo].[Tab1].[ID] = [Expr1003]))              |              |--Top(TOP EXPRESSION:((1)))              |                   |--Compute Scalar(DEFINE:([Expr1008]=[Expr1014], [Expr1009]='F'))              |                        |--Nested Loops(Left Outer Join)              |                             |--Compute Scalar(DEFINE:([Expr1003]=getidentity((1856985942),(2),NULL)))              |                             |    |--Constant Scan              |                             |--Assert(WHERE:(CASE WHEN [Expr1013]>(1) THEN (0) ELSE NULL END))              |                                  |--Stream Aggregate(DEFINE:([Expr1013]=Count(*), [Expr1014]=ANY([tempdb].[dbo].[Tab1].[ID_TipoSexo])))             |                                       |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[Tab1].[PK__Tab1__3214EC277097A3C8]))              |--Clustered Index Seek(OBJECT:([tempdb].[dbo].[Tab2].[PK__Tab2__3214EC27755C58E5]), SEEK:([tempdb].[dbo].[Tab2].[ID]=[tempdb].[dbo].[Tab1].[ID_TipoSexo]) ORDERED FORWARD)  You can see from this text showplan that SQL Server as generated a Stream Aggregate to count how many rows the SubQuery will return, This value is then passed to the Assert which then does its job by checking its validity. Is very interesting to see that  the Query Optimizer is smart enough be able to avoid using assert operators when they are not necessary. For instance: INSERT INTO Tab1(ID_TipoSexo, Sexo) VALUES((SELECT ID_TipoSexo FROM Tab1 WHERE ID = 1), 'F') INSERT INTO Tab1(ID_TipoSexo, Sexo) VALUES((SELECT TOP 1 ID_TipoSexo FROM Tab1), 'F')  For both these INSERTs, the Query Optimiser is smart enough to know that only one row will ever be returned, so there is no need to use the Assert. Well, that's all folks, I see you next week with more "Operators". Cheers, Fabiano

    Read the article

  • The Data Scientist

    - by BuckWoody
    A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So  watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

    Read the article

  • Developing a Cost Model for Cloud Applications

    - by BuckWoody
    Note - please pay attention to the date of this post. As much as I attempt to make the information below accurate, the nature of distributed computing means that components, units and pricing will change over time. The definitive costs for Microsoft Windows Azure and SQL Azure are located here, and are more accurate than anything you will see in this post: http://www.microsoft.com/windowsazure/offers/  When writing software that is run on a Platform-as-a-Service (PaaS) offering like Windows Azure / SQL Azure, one of the questions you must answer is how much the system will cost. I will not discuss the comparisons between on-premise costs (which are nigh impossible to calculate accurately) versus cloud costs, but instead focus on creating a general model for estimating costs for a given application. You should be aware that there are (at this writing) two billing mechanisms for Windows and SQL Azure: “Pay-as-you-go” or consumption, and “Subscription” or commitment. Conceptually, you can consider the former a pay-as-you-go cell phone plan, where you pay by the unit used (at a slightly higher rate) and the latter as a standard cell phone plan where you commit to a contract and thus pay lower rates. In this post I’ll stick with the pay-as-you-go mechanism for simplicity, which should be the maximum cost you would pay. From there you may be able to get a lower cost if you use the other mechanism. In any case, the model you create should hold. Developing a good cost model is essential. As a developer or architect, you’ll most certainly be asked how much something will cost, and you need to have a reliable way to estimate that. Businesses and Organizations have been used to paying for servers, software licenses, and other infrastructure as an up-front cost, and power, people to the systems and so on as an ongoing (and sometimes not factored) cost. When presented with a new paradigm like distributed computing, they may not understand the true cost/value proposition, and that’s where the architect and developer can guide the conversation to make a choice based on features of the application versus the true costs. The two big buckets of use-types for these applications are customer-based and steady-state. In the customer-based use type, each successful use of the program results in a sale or income for your organization. Perhaps you’ve written an application that provides the spot-price of foo, and your customer pays for the use of that application. In that case, once you’ve estimated your cost for a successful traversal of the application, you can build that into the price you charge the user. It’s a standard restaurant model, where the price of the meal is determined by the cost of making it, plus any profit you can make. In the second use-type, the application will be used by a more-or-less constant number of processes or users and no direct revenue is attached to the system. A typical example is a customer-tracking system used by the employees within your company. In this case, the cost model is often created “in reverse” - meaning that you pilot the application, monitor the use (and costs) and that cost is held steady. This is where the comparison with an on-premise system becomes necessary, even though it is more difficult to estimate those on-premise true costs. For instance, do you know exactly how much cost the air conditioning is because you have a team of system administrators? This may sound trivial, but that, along with the insurance for the building, the wiring, and every other part of the system is in fact a cost to the business. There are three primary methods that I’ve been successful with in estimating the cost. None are perfect, all are demand-driven. The general process is to lay out a matrix of: components units cost per unit and then multiply that times the usage of the system, based on which components you use in the program. That sounds a bit simplistic, but using those metrics in a calculation becomes more detailed. In all of the methods that follow, you need to know your application. The components for a PaaS include computing instances, storage, transactions, bandwidth and in the case of SQL Azure, database size. In most cases, architects start with the first model and progress through the other methods to gain accuracy. Simple Estimation The simplest way to calculate costs is to architect the application (even UML or on-paper, no coding involved) and then estimate which of the components you’ll use, and how much of each will be used. Microsoft provides two tools to do this - one is a simple slider-application located here: http://www.microsoft.com/windowsazure/pricing-calculator/  The other is a tool you download to create an “Return on Investment” (ROI) spreadsheet, which has the advantage of leading you through various questions to estimate what you plan to use, located here: https://roianalyst.alinean.com/msft/AutoLogin.do?d=176318219048082115  You can also just create a spreadsheet yourself with a structure like this: Program Element Azure Component Unit of Measure Cost Per Unit Estimated Use of Component Total Cost Per Component Cumulative Cost               Of course, the consideration with this model is that it is difficult to predict a system that is not running or hasn’t even been developed. Which brings us to the next model type. Measure and Project A more accurate model is to actually write the code for the application, using the Software Development Kit (SDK) which can run entirely disconnected from Azure. The code should be instrumented to estimate the use of the application components, logging to a local file on the development system. A series of unit and integration tests should be run, which will create load on the test system. You can use standard development concepts to track this usage, and even use Windows Performance Monitor counters. The best place to start with this method is to use the Windows Azure Diagnostics subsystem in your code, which you can read more about here: http://blogs.msdn.com/b/sumitm/archive/2009/11/18/introducing-windows-azure-diagnostics.aspx This set of API’s greatly simplifies tracking the application, and in fact you can use this information for more than just a cost model. After you have the tracking logs, you can plug the numbers into ay of the tools above, which should give a representative cost or in some cases a unit cost. The consideration with this model is that the SDK fabric is not a one-to-one comparison with performance on the actual Windows Azure fabric. Those differences are usually smaller, but they do need to be considered. Also, you may not be able to accurately predict the load on the system, which might lead to an architectural change, which changes the model. This leads us to the next, most accurate method for a cost model. Sample and Estimate Using standard statistical and other predictive math, once the application is deployed you will get a bill each month from Microsoft for your Azure usage. The bill is quite detailed, and you can export the data from it to do analysis, and using methods like regression and so on project out into the future what the costs will be. I normally advise that the architect also extrapolate a unit cost from those metrics as well. This is the information that should be reported back to the executives that pay the bills: the past cost, future projected costs, and unit cost “per click” or “per transaction”, as your case warrants. The challenge here is in the model itself - statistical methods are not foolproof, and the larger the sample (in this case I recommend the entire population, not a smaller sample) is key. References and Tools Articles: http://blogs.msdn.com/b/patrick_butler_monterde/archive/2010/02/10/windows-azure-billing-overview.aspx http://technet.microsoft.com/en-us/magazine/gg213848.aspx http://blog.codingoutloud.com/2011/06/05/azure-faq-how-much-will-it-cost-me-to-run-my-application-on-windows-azure/ http://blogs.msdn.com/b/johnalioto/archive/2010/08/25/10054193.aspx http://geekswithblogs.net/iupdateable/archive/2010/02/08/qampa-how-can-i-calculate-the-tco-and-roi-when.aspx   Other Tools: http://cloud-assessment.com/ http://communities.quest.com/community/cloud_tools

    Read the article

  • Disaster, or Migration?

    - by Rob Farley
    This post is in two parts – technical and personal. And I should point out that it’s prompted in part by this month’s T-SQL Tuesday, hosted by Allen Kinsel. First, the technical: I’ve had a few conversations with people recently about migration – moving a SQL Server database from one box to another (sometimes, but not primarily, involving an upgrade). One question that tends to come up is that of downtime. Obviously there will be some period of time between the old server being available and the new one. The way that most people seem to think of migration is this: Build a new server. Stop people from using the old server. Take a backup of the old server Restore it on the new server. Reconfigure the client applications (or alternatively, configure the new server to use the same address as the old) Make the new server online. There are other things involved, such as testing, of course. But this is essentially the process that people tell me they’re planning to follow. The bit that I want to look at today (as you’ve probably guessed from my title) is the “backup and restore” section. If a SQL database is using the Simple Recovery Model, then the only restore option is the last database backup. This backup could be full or differential. The transaction log never gets backed up in the Simple Recovery Model. Instead, it truncates regularly to stay small. One that’s using the Full Recovery Model (or Bulk-Logged) won’t truncate its log – the log must be backed up regularly. This provides the benefit of having a lot more option available for restores. It’s a requirement for most systems of High Availability, because if you’re making sure that a spare box is up-and-running, ready to take over, then you have to be interested in the logs that are happening on the current box, rather than truncating them all the time. A High Availability system such as Mirroring, Replication or Log Shipping will initialise the spare machine by restoring a full database backup (and maybe a differential backup if available), and then any subsequent log backups. Once the secondary copy is close, transactions can be applied to keep the two in sync. The main aspect of any High Availability system is to have a redundant system that is ready to take over. So the similarity for migration should be obvious. If you need to move a database from one box to another, then introducing a High Availability mechanism can help. By turning on the Full Recovery Model and then taking a backup (so that the now-interesting logs have some context), logs start being kept, and are therefore available for getting the new box ready (even if it’s an upgraded version). When the migration is ready to occur, a failover can be done, letting the new server take over the responsibility of the old, just as if a disaster had happened. Except that this is a planned failover, not a disaster at all. There’s a fine line between a disaster and a migration. Failovers can be useful in patching, upgrading, maintenance, and more. Hopefully, even an unexpected disaster can be seen as just another failover, and there can be an opportunity there – perhaps to get some work done on the principal server to increase robustness. And if I’ve just set up a High Availability system for even the simplest of databases, it’s not necessarily a bad thing. :) So now the personal: It’s been an interesting time recently... June has been somewhat odd. A court case with which I was involved got resolved (through mediation). I can’t go into details, but my lawyers tell me that I’m allowed to say how I feel about it. The answer is ‘lousy’. I don’t regret pursuing it as long as I did – but in the end I had to make a decision regarding the commerciality of letting it continue, and I’m going to look forward to the days when the kind of money I spent on my lawyers is small change. Mind you, if I had a similar situation with an employer, I’d do the same again, but that doesn’t really stop me feeling frustrated about it. The following day I had to fly to country Victoria to see my grandmother, who wasn’t expected to last the weekend. She’s still around a week later as I write this, but her 92-year-old body has basically given up on her. She’s been a Christian all her life, and is looking forward to eternity. We’ll all miss her though, and it’s hard to see my family grieving. Then on Tuesday, I was driving back to the airport with my family to come home, when something really bizarre happened. We were travelling down the freeway, just pulled out to go past a truck (farm-truck sized, not a semi-trailer), when a car-sized mass of metal fell off it. It was something like an industrial air-conditioner, but from where I was sitting, it was just a mass of spinning metal, like something out of a movie (one friend described it as “holidays by Michael Bay”). Somehow, and I’m really don’t know how, the part of it nearest us bounced high enough to clear the car, and there wasn’t even a scratch. We pulled over the check, and I was just thanking God that we’d changed lanes when we had, and that we remained unharmed. I had all kinds of thoughts about what could’ve happened if we’d had something that size land on the windscreen... All this has drilled home that while I feel that I haven’t provided as well for the family as I could’ve done (like by pursuing an expensive legal case), I shouldn’t even consider that I have proper control over things. I get to live life, and make decisions based on what I feel is right at the time. But I’m not going to get everything right, and there will be things that feel like disasters, some which could’ve been in my control and some which are very much beyond my control. The case feels like something I could’ve pursued differently, a disaster that could’ve been avoided in some way. Gran dying is lousy of course. An accident on the freeway would have been awful. I need to recognise that the worst disasters are ones that I can’t affect, and that I need to look at things in context – perhaps seeing everything that happens as a migration instead. Life is never the same from one day to the next. Every event has a before and an after – sometimes it’s clearly positive, sometimes it’s not. I remember good events in my life (such as my wedding), and bad (such as the loss of my father when I was ten, or the back injury I had eight years ago). I’m not suggesting that I know how to view everything from the “God works all things for good” perspective, but I am trying to look at last week as a migration of sorts. Those things are behind me now, and the future is in God’s hands. Hopefully I’ve learned things, and will be able to live accordingly. I’ve come through this time now, and even though I’ll miss Gran, I’ll see her again one day, and the future is bright.

    Read the article

  • Database Migration Scripts: Getting from place A to place B

    - by Phil Factor
    We’ll be looking at a typical database ‘migration’ script which uses an unusual technique to migrate existing ‘de-normalised’ data into a more correct form. So, the book-distribution business that uses the PUBS database has gradually grown organically, and has slipped into ‘de-normalisation’ habits. What’s this? A new column with a list of tags or ‘types’ assigned to books. Because books aren’t really in just one category, someone has ‘cured’ the mismatch between the database and the business requirements. This is fine, but it is now proving difficult for their new website that allows searches by tags. Any request for history book really has to look in the entire list of associated tags rather than the ‘Type’ field that only keeps the primary tag. We have other problems. The TypleList column has duplicates in there which will be affecting the reporting, and there is the danger of mis-spellings getting there. The reporting system can’t be persuaded to do reports based on the tags and the Database developers are complaining about the unCoddly things going on in their database. In your version of PUBS, this extra column doesn’t exist, so we’ve added it and put in 10,000 titles using SQL Data Generator. /* So how do we refactor this database? firstly, we create a table of all the tags. */IF  OBJECT_ID('TagName') IS NULL OR OBJECT_ID('TagTitle') IS NULL  BEGIN  CREATE TABLE  TagName (TagName_ID INT IDENTITY(1,1) PRIMARY KEY ,     Tag VARCHAR(20) NOT NULL UNIQUE)  /* ...and we insert into it all the tags from the list (remembering to take out any leading spaces */  INSERT INTO TagName (Tag)     SELECT DISTINCT LTRIM(x.y.value('.', 'Varchar(80)')) AS [Tag]     FROM     (SELECT  Title_ID,          CONVERT(XML, '<list><i>' + REPLACE(TypeList, ',', '</i><i>') + '</i></list>')          AS XMLkeywords          FROM   dbo.titles)g    CROSS APPLY XMLkeywords.nodes('/list/i/text()') AS x ( y )  /* we can then use this table to provide a table that relates tags to articles */  CREATE TABLE TagTitle   (TagTitle_ID INT IDENTITY(1, 1),   [title_id] [dbo].[tid] NOT NULL REFERENCES titles (Title_ID),   TagName_ID INT NOT NULL REFERENCES TagName (Tagname_ID)   CONSTRAINT [PK_TagTitle]       PRIMARY KEY CLUSTERED ([title_id] ASC, TagName_ID)       ON [PRIMARY])        CREATE NONCLUSTERED INDEX idxTagName_ID  ON  TagTitle (TagName_ID)  INCLUDE (TagTitle_ID,title_id)        /* ...and it is easy to fill this with the tags for each title ... */        INSERT INTO TagTitle (Title_ID, TagName_ID)    SELECT DISTINCT Title_ID, TagName_ID      FROM        (SELECT  Title_ID,          CONVERT(XML, '<list><i>' + REPLACE(TypeList, ',', '</i><i>') + '</i></list>')          AS XMLkeywords          FROM   dbo.titles)g    CROSS APPLY XMLkeywords.nodes('/list/i/text()') AS x ( y )    INNER JOIN TagName ON TagName.Tag=LTRIM(x.y.value('.', 'Varchar(80)'))    END    /* That's all there was to it. Now we can select all titles that have the military tag, just to try things out */SELECT Title FROM titles  INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID  INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID  WHERE tagname.tag='Military'/* and see the top ten most popular tags for titles */SELECT Tag, COUNT(*) FROM titles  INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID  INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID  GROUP BY Tag ORDER BY COUNT(*) DESC/* and if you still want your list of tags for each title, then here they are */SELECT title_ID, title, STUFF(  (SELECT ','+tagname.tag FROM titles thisTitle    INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID    INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID  WHERE ThisTitle.title_id=titles.title_ID  FOR XML PATH(''), TYPE).value('.', 'varchar(max)')  ,1,1,'')    FROM titles  ORDER BY title_ID So we’ve refactored our PUBS database without pain. We’ve even put in a check to prevent it being re-run once the new tables are created. Here is the diagram of the new tag relationship We’ve done both the DDL to create the tables and their associated components, and the DML to put the data in them. I could have also included the script to remove the de-normalised TypeList column, but I’d do a whole lot of tests first before doing that. Yes, I’ve left out the assertion tests too, which should check the edge cases and make sure the result is what you’d expect. One thing I can’t quite figure out is how to deal with an ordered list using this simple XML-based technique. We can ensure that, if we have to produce a list of tags, we can get the primary ‘type’ to be first in the list, but what if the entire order is significant? Thank goodness it isn’t in this case. If it were, we might have to revisit a string-splitter function that returns the ordinal position of each component in the sequence. You’ll see immediately that we can create a synchronisation script for deployment from a comparison tool such as SQL Compare, to change the schema (DDL). On the other hand, no tool could do the DML to stuff the data into the new table, since there is no way that any tool will be able to work out where the data should go. We used some pretty hairy code to deal with a slightly untypical problem. We would have to do this migration by hand, and it has to go into source control as a batch. If most of your database changes are to be deployed by an automated process, then there must be a way of over-riding this part of the data synchronisation process to do this part of the process taking the part of the script that fills the tables, Checking that the tables have not already been filled, and executing it as part of the transaction. Of course, you might prefer the approach I’ve taken with the script of creating the tables in the same batch as the data conversion process, and then using the presence of the tables to prevent the script from being re-run. The problem with scripting a refactoring change to a database is that it has to work both ways. If we install the new system and then have to rollback the changes, several books may have been added, or had their tags changed, in the meantime. Yes, you have to script any rollback! These have to be mercilessly tested, and put in source control just in case of the rollback of a deployment after it has been in place for any length of time. I’ve shown you how to do this with the part of the script .. /* and if you still want your list of tags for each title, then here they are */SELECT title_ID, title, STUFF(  (SELECT ','+tagname.tag FROM titles thisTitle    INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID    INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID  WHERE ThisTitle.title_id=titles.title_ID  FOR XML PATH(''), TYPE).value('.', 'varchar(max)')  ,1,1,'')    FROM titles  ORDER BY title_ID …which would be turned into an UPDATE … FROM script. UPDATE titles SET  typelist= ThisTaglistFROM     (SELECT title_ID, title, STUFF(    (SELECT ','+tagname.tag FROM titles thisTitle      INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID      INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID    WHERE ThisTitle.title_id=titles.title_ID    ORDER BY CASE WHEN tagname.tag=titles.[type] THEN 1 ELSE 0  END DESC    FOR XML PATH(''), TYPE).value('.', 'varchar(max)')    ,1,1,'')  AS ThisTagList  FROM titles)fINNER JOIN Titles ON f.title_ID=Titles.title_ID You’ll notice that it isn’t quite a round trip because the tags are in a different order, though we’ve managed to make sure that the primary tag is the first one as originally. So, we’ve improved the database for the poor book distributors using PUBS. It is not a major deal but you’ve got to be prepared to provide a migration script that will go both forwards and backwards. Ideally, database refactoring scripts should be able to go from any version to any other. Schema synchronization scripts can do this pretty easily, but no data synchronisation scripts can deal with serious refactoring jobs without the developers being able to specify how to deal with cases like this.

    Read the article

  • drop down menu will not display outside containing div in IE7..

    - by playahabana
    I am tearing my hair out over this, I have a dropdown menu using CSS and jQuery (thanks to Soh Tanaka) and it works perfectly in Firefox, Safari, Google chrome and I.E. 8, but in IE 7 it will not drop down outside the 'Banner div'. It drops below the nav div however. I have moved the nav div higher in the banner the result is the same, menu drops until it reaches the border of the banner div and then vanishes.... Below is the css. This is my first website and I have some limited understanding of what I am doing. The drop down menu includes transparent png's as links (I know, I know...but it's what the Boss wants...) please could someone take a quick scan at the below CSS and let me know what is wroong? Is this some form of the IE z-index bug? i have tried all different combinations of z-index and still I can't get a different result. . The html is below as well. Thankyou in advance #banner { position: relative; width: 62.5em; height: 12em; background-color: #46280A; background-image: url('images/includes/banner2.jpg'); background-repeat: no-repeat; background-position: center; -moz-box-shadow: -4px 6px 8px #000; -webkit-box-shadow: -4px 6px 8px #000; box-shadow: -4px 6px 8px #000; /* For IE 8 */ -ms-filter: "progid:DXImageTransform.Microsoft.Shadow(Strength=8, Direction=225, Color='#000000')"; /* For IE 5.5 - 7 */ filter: progid:DXImageTransform.Microsoft.Shadow(Strength=8, Direction=225, Color='#000000'); z-index: 1; } /*------------------------------------SCROLLER---------------------------------------------*/ #headlines{ position: absolute; top: 1.3em; right: 2.75em; overflow: hidden; height: 2.5em; width: 24em; background-color: #000000; display: block; z-index: 3; } #news{ position: relative; height: 3.1em; line-height: 2.5em; font-size: 0.8em color: #FFFF99; white-space: nowrap; overflow: hidden; font-family: Georgia,Arial; } #scrollerglass{ position: absolute; top: 0.95em; right: 2em; height: 52px; width: 410px; border: none; padding: 0.2em 0em 0em 0em; line-height: 0.7em; text-align: center; background-image: url('images/includes/scrollerglass.png'); background-color: transparent; background-repeat: no-repeat; background-position: center; opacity: 20; z-index: 10; } #scrollerglass a i { visibility: hiddn ; } /-------------------------------------NAVIGATION-----------------------------------------/ #nav { position: absolute; top: 5.8em; left: 0.2em; font-family: trebuchet, sans-serif; font-size: 1em; line-height: 3.75em; text-align: center; color: #FFFF00; z-index: 3; } ul.navlist { list-style: none; padding: 0em; margin: 1em; float: left; width: 62.5em; background: transparent; font-size: 1em; } ul.navlist li { position: relative; /*--Declare X and Y axis base for sub navigation--*/ float: left; margin: 0em 1.4em; padding: 0em 0.7em 0em 0em; z-index: 1; } ul.navlist li a{ display: block; text-decoration: none; float: left; border: 0px solid; } ul.navlist li img{ border: 0px solid; } ul.navlist li span { trigger styles--*/ width: 1.2em; height: 5.25em; float: left; background: url(images/links/downlogo.png) no-repeat center top; } ul.navlist li span.subhover { background-position: center bottom; cursor: pointer; } ul.navlist li ul.navdrop { list-style: none; position: absolute; float: left; top: 5.3em; left: -2.4em; height: 15.0em; width: 11.25em; margin: 0; padding: 0.5em 0em 0em 0em; display: none; background-position: center; background-image: url('images/includes/slider.jpg'); background-color: transparent; background-repeat: no-repeat; -moz-box-shadow: -4px 6px 8px #000; -webkit-box-shadow: -4px 6px 8px #000; box-shadow: -4px 6px 8px #000; /* For IE 8 */ -ms-filter: "progid:DXImageTransform.Microsoft.Shadow(Strength=8, Direction=225, Color='#000000')"; /* For IE 5.5 - 7 */ filter: progid:DXImageTransform.Microsoft.Shadow(Strength=8, Direction=225, Color='#000000'); z-index:1; } ul.navlist li ul.navdrop li{ margin: 0em 2.3em 0em 0em; padding: 1em 0em 0em 0em; width: 8em; clear: both; } html ul.navlist li ul.navdrop li a { border: 0px solid; width: 11.25em; } html ul.navlist li ul.navdrop li a:hover { background: transparent; } <div id="banner"> <div id="headlines"> <div id="news"> Whatever we want to promote </div> </div> <div id="scrollerglass"> <a href="vintagecigars.php"> <i>------s-c-r-o-l-l-e-r - - - -l-i-n-k-s--------<br /> <br>------s-c-r-o-l-l-e-r - - - -l-i-n-k-s------</i></a> </div> <div id="nav"> <ul class="navmenu"> <li><a href="index.php"><img src="images/links/home.png" alt="Home" ></a></li> <li><a href="ourbar.php"><img src="images/links/ourbar.png" alt="Our Bar" ></a> <ul class="navdrop"> <li ><a href="ourcocktails.php"><img src="images/links/cockteles.png" alt="Our Cocktails" ></a></li> <li ><a href="celebrate.php"><img src="images/links/celebrate.png" alt="Celebrate in Style" ></a></li> </ul> </li> <li><a href="ourcigars.php"><img src="images/links/ourcigars.png" alt="Our Cigars" ></a> <ul class="navdrop"> <li ><a href="edicioneslimitadas.php"><img src="images/links/edicioneslimitadas.png" alt="Edition Limitadas" ></a></li> <li ><a href="cigartasting.php"><img src="images/links/cigartasting.png" alt="Cigar Tastings" ></a></li> </ul> </li> <li><a href="personalroller.php"><img src="images/links/personalcigar.png" alt="Personal Cigar Roller" ></a></li> <li><a href="galleryentrance.php"><img src="images/links/photogallery.png" alt="Photo Gallery" ></a></li> <li><a href="contactus.php"><img src="images/links/contactus.png" alt="Contact Us" ></a></li> </ul></div></div><!--end banner-->

    Read the article

  • SQL Server Configuration timeouts - and a workaround [SSIS]

    - by jamiet
    Ever since I started writing SSIS packages back in 2004 I have opted to store configurations in .dtsConfig (.i.e. XML) files rather than in a SQL Server table (aka SQL Server Configurations) however recently I inherited some packages that used SQL Server Configurations and thus had to immerse myself in their murky little world. To all the people that have ever gone onto the SSIS forum and asked questions about ambiguous behaviour of SQL Server Configurations I now say this... I feel your pain! The biggest problem I have had was in dealing with the change to the order in which configurations get applied that came about in SSIS 2008. Those changes are detailed on MSDN at SSIS Package Configurations however the pertinent bits are: As the utility loads and runs the package, events occur in the following order: The dtexec utility loads the package. The utility applies the configurations that were specified in the package at design time and in the order that is specified in the package. (The one exception to this is the Parent Package Variables configurations. The utility applies these configurations only once and later in the process.) The utility then applies any options that you specified on the command line. The utility then reloads the configurations that were specified in the package at design time and in the order specified in the package. (Again, the exception to this rule is the Parent Package Variables configurations). The utility uses any command-line options that were specified to reload the configurations. Therefore, different values might be reloaded from a different location. The utility applies the Parent Package Variable configurations. The utility runs the package. To understand how these steps differ from SSIS 2005 I recommend reading Doug Laudenschlager’s blog post Understand how SSIS package configurations are applied. The very nature of SQL Server Configurations means that the Connection String for the database holding the configuration values needs to be supplied from the command-line. Typically then the call to execute your package resembles this: dtexec /FILE Package.dtsx /SET "\Package.Connections[SSISConfigurations].Properties[ConnectionString]";"\"Data Source=SomeServer;Initial Catalog=SomeDB;Integrated Security=SSPI;\"", The problem then is that, as per the steps above, the package will (1) attempt to apply all configurations using the Connection String stored in the package for the "SSISConfigurations" Connection Manager before then (2) applying the Connection String from the command-line and then (3) apply the same configurations all over again. In the packages that I inherited that first attempt to apply the configurations would timeout (not unexpected); I had 8 SQL Server Configurations in the package and thus the package was waiting for 2 minutes until all the Configurations timed out (i.e. 15seconds per Configuration) - in a package that only executes for ~8seconds when it gets to do its actual work a delay of 2minutes was simply unacceptable. We had three options in how to deal with this: Get rid of the use of SQL Server configurations and use .dtsConfig files instead Edit the packages when they get deployed Change the timeout on the "SSISConfigurations" Connection Manager #1 was my preferred choice but, for reasons I explain below*, wasn't an option in this particular instance. #2 was discounted out of hand because it negates the point of using Configurations in the first place. This left us with #3 - change the timeout on the Connection Manager. This is done by going into the properties of the Connection Manager, opening the "All" tab and changing the Connect Timeout property to some suitable value (in the screenshot below I chose 2 seconds). This change meant that the attempts to apply the SQL Server configurations timed out in 16 seconds rather than two minutes; clearly this isn't an optimum solution but its certainly better than it was. So there you have it - if you are having problems with SQL Server configuration timeouts within SSIS try changing the timeout of the Connection Manager. Better still - don't bother using SQL Server Configuration in the first place. Even better - install RC0 of SQL Server 2012 to start leveraging SSIS parameters and leave the nasty old world of configurations behind you. @Jamiet * Basically, we are leveraging a SSIS execution/logging framework in which the client had invested a lot of resources and SQL Server Configurations are an integral part of that.

    Read the article

  • Using jQuery to Insert a New Database Record

    - by Stephen Walther
    The goal of this blog entry is to explore the easiest way of inserting a new record into a database using jQuery and .NET. I’m going to explore two approaches: using Generic Handlers and using a WCF service (In a future blog entry I’ll take a look at OData and WCF Data Services). Create the ASP.NET Project I’ll start by creating a new empty ASP.NET application with Visual Studio 2010. Select the menu option File, New Project and select the ASP.NET Empty Web Application project template. Setup the Database and Data Model I’ll use my standard MoviesDB.mdf movies database. This database contains one table named Movies that looks like this: I’ll use the ADO.NET Entity Framework to represent my database data: Select the menu option Project, Add New Item and select the ADO.NET Entity Data Model project item. Name the data model MoviesDB.edmx and click the Add button. In the Choose Model Contents step, select Generate from database and click the Next button. In the Choose Your Data Connection step, leave all of the defaults and click the Next button. In the Choose Your Data Objects step, select the Movies table and click the Finish button. Unfortunately, Visual Studio 2010 cannot spell movie correctly :) You need to click on Movy and change the name of the class to Movie. In the Properties window, change the Entity Set Name to Movies. Using a Generic Handler In this section, we’ll use jQuery with an ASP.NET generic handler to insert a new record into the database. A generic handler is similar to an ASP.NET page, but it does not have any of the overhead. It consists of one method named ProcessRequest(). Select the menu option Project, Add New Item and select the Generic Handler project item. Name your new generic handler InsertMovie.ashx and click the Add button. Modify your handler so it looks like Listing 1: Listing 1 – InsertMovie.ashx using System.Web; namespace WebApplication1 { /// <summary> /// Inserts a new movie into the database /// </summary> public class InsertMovie : IHttpHandler { private MoviesDBEntities _dataContext = new MoviesDBEntities(); public void ProcessRequest(HttpContext context) { context.Response.ContentType = "text/plain"; // Extract form fields var title = context.Request["title"]; var director = context.Request["director"]; // Create movie to insert var movieToInsert = new Movie { Title = title, Director = director }; // Save new movie to DB _dataContext.AddToMovies(movieToInsert); _dataContext.SaveChanges(); // Return success context.Response.Write("success"); } public bool IsReusable { get { return true; } } } } In Listing 1, the ProcessRequest() method is used to retrieve a title and director from form parameters. Next, a new Movie is created with the form values. Finally, the new movie is saved to the database and the string “success” is returned. Using jQuery with the Generic Handler We can call the InsertMovie.ashx generic handler from jQuery by using the standard jQuery post() method. The following HTML page illustrates how you can retrieve form field values and post the values to the generic handler: Listing 2 – Default.htm <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Add Movie</title> <script src="http://ajax.microsoft.com/ajax/jquery/jquery-1.4.2.js" type="text/javascript"></script> </head> <body> <form> <label>Title:</label> <input name="title" /> <br /> <label>Director:</label> <input name="director" /> </form> <button id="btnAdd">Add Movie</button> <script type="text/javascript"> $("#btnAdd").click(function () { $.post("InsertMovie.ashx", $("form").serialize(), insertCallback); }); function insertCallback(result) { if (result == "success") { alert("Movie added!"); } else { alert("Could not add movie!"); } } </script> </body> </html>     When you open the page in Listing 2 in a web browser, you get a simple HTML form: Notice that the page in Listing 2 includes the jQuery library. The jQuery library is included with the following SCRIPT tag: <script src="http://ajax.microsoft.com/ajax/jquery/jquery-1.4.2.js" type="text/javascript"></script> The jQuery library is included on the Microsoft Ajax CDN so you can always easily include the jQuery library in your applications. You can learn more about the CDN at this website: http://www.asp.net/ajaxLibrary/cdn.ashx When you click the Add Movie button, the jQuery post() method is called to post the form data to the InsertMovie.ashx generic handler. Notice that the form values are serialized into a URL encoded string by calling the jQuery serialize() method. The serialize() method uses the name attribute of form fields and not the id attribute. Notes on this Approach This is a very low-level approach to interacting with .NET through jQuery – but it is simple and it works! And, you don’t need to use any JavaScript libraries in addition to the jQuery library to use this approach. The signature for the jQuery post() callback method looks like this: callback(data, textStatus, XmlHttpRequest) The second parameter, textStatus, returns the HTTP status code from the server. I tried returning different status codes from the generic handler with an eye towards implementing server validation by returning a status code such as 400 Bad Request when validation fails (see http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html ). I finally figured out that the callback is not invoked when the textStatus has any value other than “success”. Using a WCF Service As an alternative to posting to a generic handler, you can create a WCF service. You create a new WCF service by selecting the menu option Project, Add New Item and selecting the Ajax-enabled WCF Service project item. Name your WCF service InsertMovie.svc and click the Add button. Modify the WCF service so that it looks like Listing 3: Listing 3 – InsertMovie.svc using System.ServiceModel; using System.ServiceModel.Activation; namespace WebApplication1 { [ServiceBehavior(IncludeExceptionDetailInFaults=true)] [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class MovieService { private MoviesDBEntities _dataContext = new MoviesDBEntities(); [OperationContract] public bool Insert(string title, string director) { // Create movie to insert var movieToInsert = new Movie { Title = title, Director = director }; // Save new movie to DB _dataContext.AddToMovies(movieToInsert); _dataContext.SaveChanges(); // Return movie (with primary key) return true; } } }   The WCF service in Listing 3 uses the Entity Framework to insert a record into the Movies database table. The service always returns the value true. Notice that the service in Listing 3 includes the following attribute: [ServiceBehavior(IncludeExceptionDetailInFaults=true)] You need to include this attribute if you want to get detailed error information back to the client. When you are building an application, you should always include this attribute. When you are ready to release your application, you should remove this attribute for security reasons. Using jQuery with the WCF Service Calling a WCF service from jQuery requires a little more work than calling a generic handler from jQuery. Here are some good blog posts on some of the issues with using jQuery with WCF: http://encosia.com/2008/06/05/3-mistakes-to-avoid-when-using-jquery-with-aspnet-ajax/ http://encosia.com/2008/03/27/using-jquery-to-consume-aspnet-json-web-services/ http://weblogs.asp.net/scottgu/archive/2007/04/04/json-hijacking-and-how-asp-net-ajax-1-0-mitigates-these-attacks.aspx http://www.west-wind.com/Weblog/posts/896411.aspx http://www.west-wind.com/weblog/posts/324917.aspx http://professionalaspnet.com/archive/tags/WCF/default.aspx The primary requirement when calling WCF from jQuery is that the request use JSON: The request must include a content-type:application/json header. Any parameters included with the request must be JSON encoded. Unfortunately, jQuery does not include a method for serializing JSON (Although, oddly, jQuery does include a parseJSON() method for deserializing JSON). Therefore, we need to use an additional library to handle the JSON serialization. The page in Listing 4 illustrates how you can call a WCF service from jQuery. Listing 4 – Default2.aspx <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Add Movie</title> <script src="http://ajax.microsoft.com/ajax/jquery/jquery-1.4.2.js" type="text/javascript"></script> <script src="Scripts/json2.js" type="text/javascript"></script> </head> <body> <form> <label>Title:</label> <input id="title" /> <br /> <label>Director:</label> <input id="director" /> </form> <button id="btnAdd">Add Movie</button> <script type="text/javascript"> $("#btnAdd").click(function () { // Convert the form into an object var data = { title: $("#title").val(), director: $("#director").val() }; // JSONify the data data = JSON.stringify(data); // Post it $.ajax({ type: "POST", contentType: "application/json; charset=utf-8", url: "MovieService.svc/Insert", data: data, dataType: "json", success: insertCallback }); }); function insertCallback(result) { // unwrap result result = result["d"]; if (result === true) { alert("Movie added!"); } else { alert("Could not add movie!"); } } </script> </body> </html> There are several things to notice about Listing 4. First, notice that the page includes both the jQuery library and Douglas Crockford’s JSON2 library: <script src="Scripts/json2.js" type="text/javascript"></script> You need to include the JSON2 library to serialize the form values into JSON. You can download the JSON2 library from the following location: http://www.json.org/js.html When you click the button to submit the form, the form data is converted into a JavaScript object: // Convert the form into an object var data = { title: $("#title").val(), director: $("#director").val() }; Next, the data is serialized into JSON using the JSON2 library: // JSONify the data var data = JSON.stringify(data); Finally, the form data is posted to the WCF service by calling the jQuery ajax() method: // Post it $.ajax({   type: "POST",   contentType: "application/json; charset=utf-8",   url: "MovieService.svc/Insert",   data: data,   dataType: "json",   success: insertCallback }); You can’t use the standard jQuery post() method because you must set the content-type of the request to be application/json. Otherwise, the WCF service will reject the request for security reasons. For details, see the Scott Guthrie blog post: http://weblogs.asp.net/scottgu/archive/2007/04/04/json-hijacking-and-how-asp-net-ajax-1-0-mitigates-these-attacks.aspx The insertCallback() method is called when the WCF service returns a response. This method looks like this: function insertCallback(result) {   // unwrap result   result = result["d"];   if (result === true) {       alert("Movie added!");   } else {     alert("Could not add movie!");   } } When we called the jQuery ajax() method, we set the dataType to JSON. That causes the jQuery ajax() method to deserialize the response from the WCF service from JSON into a JavaScript object automatically. The following value is passed to the insertCallback method: {"d":true} For security reasons, a WCF service always returns a response with a “d” wrapper. The following line of code removes the “d” wrapper: // unwrap result result = result["d"]; To learn more about the “d” wrapper, I recommend that you read the following blog posts: http://encosia.com/2009/02/10/a-breaking-change-between-versions-of-aspnet-ajax/ http://encosia.com/2009/06/29/never-worry-about-asp-net-ajaxs-d-again/ Summary In this blog entry, I explored two methods of inserting a database record using jQuery and .NET. First, we created a generic handler and called the handler from jQuery. This is a very low-level approach. However, it is a simple approach that works. Next, we looked at how you can call a WCF service using jQuery. This approach required a little more work because you need to serialize objects into JSON. We used the JSON2 library to perform the serialization. In the next blog post, I want to explore how you can use jQuery with OData and WCF Data Services.

    Read the article

  • Guide to reduce TFS database growth using the Test Attachment Cleaner

    - by terje
    Recently there has been several reports on TFS databases growing too fast and growing too big.  Notable this has been observed when one has started to use more features of the Testing system.  Also, the TFS 2010 handles test results differently from TFS 2008, and this leads to more data stored in the TFS databases. As a consequence of this there has been released some tools to remove unneeded data in the database, and also some fixes to correct for bugs which has been found and corrected during this process.  Further some preventive practices and maintenance rules should be adopted. A lot of people have blogged about this, among these are: Anu’s very important blog post here describes both the problem and solutions to handle it.  She describes both the Test Attachment Cleaner tool, and also some QFE/CU releases to fix some underlying bugs which prevented the tool from being fully effective. Brian Harry’s blog post here describes the problem too This forum thread describes the problem with some solution hints. Ravi Shanker’s blog post here describes best practices on solving this (TBP) Grant Holidays blogpost here describes strategies to use the Test Attachment Cleaner both to detect space problems and how to rectify them.   The problem can be divided into the following areas: Publishing of test results from builds Publishing of manual test results and their attachments in particular Publishing of deployment binaries for use during a test run Bugs in SQL server preventing total cleanup of data (All the published data above is published into the TFS database as attachments.) The test results will include all data being collected during the run.  Some of this data can grow rather large, like IntelliTrace logs and video recordings.   Also the pushing of binaries which happen for automated test runs, including tests run during a build using code coverage which will include all the files in the deployment folder, contributes a lot to the size of the attached data.   In order to handle this systematically, I have set up a 3-stage process: Find out if you have a database space issue Set up your TFS server to minimize potential database issues If you have the “problem”, clean up the database and otherwise keep it clean   Analyze the data Are your database( s) growing ?  Are unused test results growing out of proportion ? To find out about this you need to query your TFS database for some of the information, and use the Test Attachment Cleaner (TAC) to obtain some  more detailed information. If you don’t have too many databases you can use the SQL Server reports from within the Management Studio to analyze the database and table sizes. Or, you can use a set of queries . I find queries often faster to use because I can tweak them the way I want them.  But be aware that these queries are non-documented and non-supported and may change when the product team wants to change them. If you have multiple Project Collections, find out which might have problems: (Disclaimer: The queries below work on TFS 2010. They will not work on Dev-11, since the table structure have been changed.  I will try to update them for Dev-11 when it is released.) Open a SQL Management Studio session onto the SQL Server where you have your TFS Databases. Use the query below to find the Project Collection databases and their sizes, in descending size order.  use master select DB_NAME(database_id) AS DBName, (size/128) SizeInMB FROM sys.master_files where type=0 and substring(db_name(database_id),1,4)='Tfs_' and DB_NAME(database_id)<>'Tfs_Configuration' order by size desc Doing this on one of our SQL servers gives the following results: It is pretty easy to see on which collection to start the work   Find out which tables are possibly too large Keep a special watch out for the Tfs_Attachment table. Use the script at the bottom of Grant’s blog to find the table sizes in descending size order. In our case we got this result: From Grant’s blog we learnt that the tbl_Content is in the Version Control category, so the major only big issue we have here is the tbl_AttachmentContent.   Find out which team projects have possibly too large attachments In order to use the TAC to find and eventually delete attachment data we need to find out which team projects have these attachments. The team project is a required parameter to the TAC. Use the following query to find this, replace the collection database name with whatever applies in your case:   use Tfs_DefaultCollection select p.projectname, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by p.projectname order by sum(a.compressedlength) desc In our case we got this result (had to remove some names), out of more than 100 team projects accumulated over quite some years: As can be seen here it is pretty obvious the “Byggtjeneste – Projects” are the main team project to take care of, with the ones on lines 2-4 as the next ones.  Check which attachment types takes up the most space It can be nice to know which attachment types takes up the space, so run the following query: use Tfs_DefaultCollection select a.attachmenttype, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by a.attachmenttype order by sum(a.compressedlength) desc We then got this result: From this it is pretty obvious that the problem here is the binary files, as also mentioned in Anu’s blog. Check which file types, by their extension, takes up the most space Run the following query use Tfs_DefaultCollection select SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999)as Extension, sum(compressedlength)/1024 as SizeInKB from tbl_Attachment group by SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999) order by sum(compressedlength) desc This gives a result like this:   Now you should have collected enough information to tell you what to do – if you got to do something, and some of the information you need in order to set up your TAC settings file, both for a cleanup and for scheduled maintenance later.    Get your TFS server and environment properly set up Even if you have got the problem or if have yet not got the problem, you should ensure the TFS server is set up so that the risk of getting into this problem is minimized.  To ensure this you should install the following set of updates and components. The assumption is that your TFS Server is at SP1 level. Install the QFE for KB2608743 – which also contains detailed instructions on its use, download from here. The QFE changes the default settings to not upload deployed binaries, which are used in automated test runs. Binaries will still be uploaded if: Code coverage is enabled in the test settings. You change the UploadDeploymentItem to true in the testsettings file. Be aware that this might be reset back to false by another user which haven't installed this QFE. The hotfix should be installed to The build servers (the build agents) The machine hosting the Test Controller Local development computers (Visual Studio) Local test computers (MTM) It is not required to install it to the TFS Server, test agents or the build controller – it has no effect on these programs. If you use the SQL Server 2008 R2 you should also install the CU 10 (or later).  This CU fixes a potential problem of hanging “ghost” files.  This seems to happen only in certain trigger situations, but to ensure it doesn’t bite you, it is better to make sure this CU is installed. There is no such CU for SQL Server 2008 pre-R2 Work around:  If you suspect hanging ghost files, they can be – with some mental effort, deduced from the ghost counters using the following SQL query: use master SELECT DB_NAME(database_id) as 'database',OBJECT_NAME(object_id) as 'objectname', index_type_desc,ghost_record_count,version_ghost_record_count,record_count,avg_record_size_in_bytes FROM sys.dm_db_index_physical_stats (DB_ID(N'<DatabaseName>'), OBJECT_ID(N'<TableName>'), NULL, NULL , 'DETAILED') The problem is a stalled ghost cleanup process.  Restarting the SQL server after having stopped all components that depends on it, like the TFS Server and SPS services – that is all applications that connect to the SQL server. Then restart the SQL server, and finally start up all dependent processes again.  (I would guess a complete server reboot would do the trick too.) After this the ghost cleanup process will run properly again. The fix will come in the next CU cycle for SQL Server R2 SP1.  The R2 pre-SP1 and R2 SP1 have separate maintenance cycles, and are maintained individually. Each have its own set of CU’s. When it comes I will add the link here to that CU. The "hanging ghost file” issue came up after one have run the TAC, and deleted enourmes amount of data.  The SQL Server can get into this hanging state (without the QFE) in certain cases due to this. And of course, install and set up the Test Attachment Cleaner command line power tool.  This should be done following some guidelines from Ravi Shanker: “When you run TAC, ensure that you are deleting small chunks of data at regular intervals (say run TAC every night at 3AM to delete data that is between age 730 to 731 days) – this will ensure that small amounts of data are being deleted and SQL ghosted record cleanup can catch up with the number of deletes performed. “ This rule minimizes the risk of the ghosted hang problem to occur, and further makes it easier for the SQL server ghosting process to work smoothly. “Run DBCC SHRINKDB post the ghosted records are cleaned up to physically reclaim the space on the file system” This is the last step in a 3 step process of removing SQL server data. First they are logically deleted. Then they are cleaned out by the ghosting process, and finally removed using the shrinkdb command. Cleaning out the attachments The TAC is run from the command line using a set of parameters and controlled by a settingsfile.  The parameters point out a server uri including the team project collection and also point at a specific team project. So in order to run this for multiple team projects regularly one has to set up a script to run the TAC multiple times, once for each team project.  When you install the TAC there is a very useful readme file in the same directory. When the deployment binaries are published to the TFS server, ALL items are published up from the deployment folder. That often means much more files than you would assume are necessary. This is a brute force technique. It works, but you need to take care when cleaning up. Grant has shown how their settings file looks in his blog post, removing all attachments older than 180 days , as long as there are no active workitems connected to them. This setting can be useful to clean out all items, both in a clean-up once operation, and in a general There are two scenarios we need to consider: Cleaning up an existing overgrown database Maintaining a server to avoid an overgrown database using scheduled TAC   1. Cleaning up a database which has grown too big due to these attachments. This job is a “Once” job.  We do this once and then move on to make sure it won’t happen again, by taking the actions in 2) below.  In this scenario you should only consider the large files. Your goal should be to simply reduce the size, and don’t bother about  the smaller stuff. That can be left a scheduled TAC cleanup ( 2 below). Here you can use a very general settings file, and just remove the large attachments, or you can choose to remove any old items.  Grant’s settings file is an example of the last one.  A settings file to remove only large attachments could look like this: <!-- Scenario : Remove large files --> <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> </Attachment> </DeletionCriteria> Or like this: If you want only to remove dll’s and pdb’s about that size, add an Extensions-section.  Without that section, all extensions will be deleted. <!-- Scenario : Remove large files of type dll's and pdb's --> <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="dll" /> <Include value="pdb" /> </Extensions> </Attachment> </DeletionCriteria> Before you start up your scheduled maintenance, you should clear out all older items. 2. Scheduled maintenance using the TAC If you run a schedule every night, and remove old items, and also remove them in small batches.  It is important to run this often, like every night, in order to keep the number of deleted items low. That way the SQL ghost process works better. One approach could be to delete all items older than some number of days, let’s say 180 days. This could be combined with restricting it to keep attachments with active or resolved bugs.  Doing this every night ensures that only small amounts of data is deleted. <!-- Scenario : Remove old items except if they have active or resolved bugs --> <DeletionCriteria> <TestRun> <AgeInDays OlderThan="180" /> </TestRun> <Attachment /> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved"/> </LinkedBugs> </DeletionCriteria> In my experience there are projects which are left with active or resolved workitems, akthough no further work is done.  It can be wise to have a cleanup process with no restrictions on linked bugs at all. Note that you then have to remove the whole LinkedBugs section. A approach which could work better here is to do a two step approach, use the schedule above to with no LinkedBugs as a sweeper cleaning task taking away all data older than you could care about.  Then have another scheduled TAC task to take out more specifically attachments that you are not likely to use. This task could be much more specific, and based on your analysis clean out what you know is troublesome data. <!-- Scenario : Remove specific files early --> <DeletionCriteria> <TestRun > <AgeInDays OlderThan="30" /> </TestRun> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="iTrace"/> <Include value="dll"/> <Include value="pdb"/> <Include value="wmv"/> </Extensions> </Attachment> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved" /> </LinkedBugs> </DeletionCriteria> The readme document for the TAC says that it recognizes “internal” extensions, but it does recognize any extension. To run the tool do the following command: tcmpt attachmentcleanup /collection:your_tfs_collection_url /teamproject:your_team_project /settingsfile:path_to_settingsfile /outputfile:%temp%/teamproject.tcmpt.log /mode:delete   Shrinking the database You could run a shrink database command after the TAC has run in cases where there are a lot of data being deleted.  In this case you SHOULD do it, to free up all that space.  But, after the shrink operation you should do a rebuild indexes, since the shrink operation will leave the database in a very fragmented state, which will reduce performance. Note that you need to rebuild indexes, reorganizing is not enough. For smaller amounts of data you should NOT shrink the database, since the data will be reused by the SQL server when it need to add more records.  In fact, it is regarded as a bad practice to shrink the database regularly.  So on a daily maintenance schedule you should NOT shrink the database. To shrink the database you do a DBCC SHRINKDATABASE command, and then follow up with a DBCC INDEXDEFRAG afterwards.  I find the easiest way to do this is to create a SQL Maintenance plan including the Shrink Database Task and the Rebuild Index Task and just execute it when you need to do this.

    Read the article

  • Raspberry Pi entrance signed backed by Umbraco - Part 1

    - by Chris Houston
    Being experts on all things Umbraco, we jumped at the chance to help our client, QV Offices, with their pressing signage predicament. They needed to display a sign in the entrance to their building and approached us for our advice. Of course it had to be electronic: displaying multiple names of their serviced office clients, meeting room bookings and on-the-pulse promotions. But with a winding Victorian staircase and minimal storage space how could the monitor be run, updated and managed? That’s where we came in…Raspberry PiUmbraco CMSAutomatic updatesAutomated monitor of the signPower saving when the screen is not in useMounting the screenThe screen that has been used is a standard LED low energy Full HD screen and has been mounted on the wall using it's VESA mounting points, as the wall is a stud wall we were able to add an access panel behind the screen to feed through the mains, HDMI and sensor cables.The Raspberry Pi is then tucked away out of sight in the main electrical cupboard which just happens to be next to the sign, we had an electrician add a power point inside this cupboard to allow us to power the screen and the Raspberry Pi.Designing the interface and editing the contentAlthough a room sign was the initial requirement from QV Offices, their medium term goal has always been to add online meeting booking to their website and hence we suggested adding information about the current and next day's meetings to the sign that would be pulled directly from their online booking system.We produced the design and built the web page to fit exactly on a 1920 x 1080 screen (Full HD in Portrait)As you would expect all the information can be edited via an Umbraco CMS, they are able to add floors, rooms, clients and virtual clients as well as add meeting bookings to their meeting diary.How we configured the Raspberry PiAfter receiving a new Raspberry Pi we downloaded the latest release of Raspbian operating system and followed the official guide which shows how to copy the OS onto an SD card from a Mac, we then followed the majority of steps on this useful guide: 10 Things to Do After Buying a Raspberry Pi.Installing ChromiumWe chose to use the Chromium web browser which for those who do not know is the open sourced version of Google Chrome. You can install this from the terminal with the following command:sudo apt-get install chromium-browserInstalling UnclutterWe found this little application which automatically hides the mouse pointer, it is used in the script below and is installed using the following command:sudo apt-get install unclutterAuto start Chromium and disabling the screen saver, power saving and mouseWhen the Raspberry Pi has been installed it will not have a keyboard or mouse and hence if their was a power cut we needed it to always boot and re-loaded Chromium with the correct URL.Our preferred command line text editor is Nano and I have assumed you know how to use this editor or will be able to work it out pretty quickly.So using the following command:sudo nano /etc/xdg/lxsession/LXDE/autostartWe then changed the autostart file content to:@lxpanel --profile LXDE@pcmanfm --desktop --profile LXDE@xscreensaver -no-splash@xset s off@xset -dpms@xset s noblank@chromium --kiosk --incognito http://www.qvoffices.com/someURL@unclutter -idle 0The first few commands turn off the screen saver and power saving, we then open Cromium in Kiosk Mode (full screen with no menu etc) and pass in the URL to use (I have changed the URL in this example) We found a useful blog post with the Cromium command line switches.Finally we also open an application called Unclutter which auto hides the mouse after 0 seconds, so you will never see a mouse on the sign.We also had to edit the following file:sudo nano /etc/lightdm/lightdm.confAnd added the following line under the [SeatDefault] section:xserver-command=X -s 0 dpmsRefreshing the screenWe decided to try and add a scheduled task that would trigger Chromium to reload the page, at some point in the future we might well change this to using Javascript to update the content, but for now this works fine.First we installed the XDOTool which enables you to script Keyboard commands:sudo apt-get install xdotoolWe used the Refreshing Chromium Browser by Shell Script post as a reference and created the following shell script (which we called refreshing.sh):export DISPLAY=":0"WID=$(xdotool search --onlyvisible --class chromium|head -1)xdotool windowactivate ${WID}xdotool key ctrl+F5This selects the correct display and then sends a CTRL + F5 to refresh Chromium.You will need to give this file execute permissions:chmod a=rwx refreshing.shNow we have the script file setup we just need to schedule it to call this script periodically which is done by using Crontab, to edit this you use the following command:crontab -eAnd we added the following:*/5 * * * * DISPLAY=":.0" /home/pi/scripts/refreshing.sh >/home/pi/cronlog.log 2>&1This calls our script every 5 minutes to refresh the display and it logs any errors to the cronlog.log file.SummaryQV Offices now have a richer and more manageable booking system than they did before we started, and a great new sign to boot.How could we make sure that the sign was running smoothly downstairs in a busy office centre? A second post will follow outlining exactly how Vizioz enabled QV Offices to monitor their sign simply and remotely, from the comfort of their desks.

    Read the article

  • Creating PDF documents dynamically using Umbraco and XSL-FO part 2

    - by Vizioz Limited
    Since my last post I have made a few modifications to the PDF generation, the main one being that the files are now dynamically renamed so that they reflect the name of the case study instead of all being called PDF.PDF which was not a very helpful filename, I just wanted to get something live last week, so decided that something was better than nothing :)The issue with the filenames comes down to the way that the PDF's are being generated by using an alternative template in Umbraco, this means that all you need to do is add " /pdf " to the end of a case study URL and it will create a PDF version of the case study. The down side is that your browser will merrily download the file and save it as PDF.PDF because that is the name of the last part of the URL.What you need to do is set the content-disposition header to be equal to the name you would like the file use, Darren Ferguson mentioned this on the Change the name of the PDF forum post.We have used the same technique for downloading dynamically generated excel files, so I thought it would be useful to create a small macro to set both this header and also to set the caching headers to prevent any caching issues, I think in the past we have experienced all possible issues, including various issues where IE behaves differently to other browsers when you are using SSL and so the below code should work in all situations!The template for the PDF alternative template is very simple:<%@ Master Language="C#" MasterPageFile="~/umbraco/masterpages/default.master" AutoEventWireup="true" %><asp:Content ID="Content1" ContentPlaceHolderID="ContentPlaceHolderDefault" runat="server"> <umbraco:Macro Alias="PDFHeaders" runat="server"></umbraco:Macro> <umbraco:Macro xsl="FO-CaseStudy.xslt" Alias="PDFXSLFO" runat="server"></umbraco:Macro></asp:Content>The following code snippet is the XSLT macro that simply creates our file name and then passes the file name into the helper function:<xsl:template match="/"> <xsl:variable name="fileName"> <xsl:text>Vizioz_</xsl:text> <xsl:value-of select="$currentPage/@nodeName" /> <xsl:text>_case_study.pdf</xsl:text> </xsl:variable> <xsl:value-of select="Vizioz.Helper:AddDocumentDownloadHeaders('application/pdf', $fileName)"/> </xsl:template>And the following code is the helper function that clears the current response and adds all the appropriate headers:public static void AddDocumentDownloadHeaders(string contentType, string fileName){ HttpResponse response = HttpContext.Current.Response; HttpRequest request = HttpContext.Current.Request; response.Clear(); response.ClearHeaders(); if (request.IsSecureConnection & request.Browser.Browser == "IE") { // Don't use the caching headers if the browser is IE and it's a secure connection // see: http://support.microsoft.com/kb/323308 } else { // force not using the cache response.AppendHeader("Cache-Control", "no-cache"); response.AppendHeader("Cache-Control", "private"); response.AppendHeader("Cache-Control", "no-store"); response.AppendHeader("Cache-Control", "must-revalidate"); response.AppendHeader("Cache-Control", "max-stale=0"); response.AppendHeader("Cache-Control", "post-check=0"); response.AppendHeader("Cache-Control", "pre-check=0"); response.AppendHeader("Pragma", "no-cache"); response.Cache.SetCacheability(HttpCacheability.NoCache); response.Cache.SetNoStore(); response.Cache.SetExpires(DateTime.UtcNow.AddMinutes(-1)); } response.AppendHeader("Expires", DateTime.Now.AddMinutes(-1).ToLongDateString()); response.AppendHeader("Keep-Alive", "timeout=3, max=993"); response.AddHeader("content-disposition", "attachment; filename=\"" + fileName + "\""); response.ContentType = contentType;}I will write another blog soon with some more details about XSL-FO and how to create the PDF's dynamically.Please do re-tweet if you find this interest :)

    Read the article

  • Want a headless build server for SSDT without installing Visual Studio? You’re out of luck!

    - by jamiet
    An issue that regularly seems to rear its head on my travels is that of headless build servers for SSDT. What does that mean exactly? Let me give you my interpretation of it. A SQL Server Data Tools (SSDT) project incorporates a build process that will basically parse all of the files within the project and spit out a .dacpac file. Where an organisation employs a Continuous Integration process they will likely want to automate the building of that dacpac whenever someone commits a change to the source control repository. In order to do that the organisation will use a build server (e.g. TFS, TeamCity, Jenkins) and hence that build server requires all the pre-requisite software that understands how to build an SSDT project. The simplest way to install all of those pre-requisites is to install SSDT itself however a lot of folks don’t like that approach because it installs a lot unnecessary components on there, not least Visual Studio itself. Those folks (of which i am one) are of the opinion that it should be unnecessary to install a heavyweight GUI in order to simply get a few software components required to do something that inherently doesn’t even need a GUI. The phrase “headless build server” is often used to describe a build server that doesn’t contain any heavyweight GUI tools such as Visual Studio and is a desirable state for a build server. In his blog post Headless MSBuild Support for SSDT (*.sqlproj) Projects Gert Drapers outlines the steps necessary to obtain a headless build server for SSDT: This article describes how to install the required components to build and publish SQL Server Data Tools projects (*.sqlproj) using MSBuild without installing the full SQL Server Data Tool hosted inside the Visual Studio IDE. http://sqlproj.com/index.php/2012/03/headless-msbuild-support-for-ssdt-sqlproj-projects/ Frankly however going through these steps is a royal PITA and folks like myself have longed for Microsoft to support headless build support for SSDT by providing a distributable installer that installs only the pre-requisites for building SSDT projects. Yesterday in MSDN forum thread Building a VS2013 headless build server - it's sooo hard Mike Hingley complained about this very thing and it prompted a response from Kevin Cunnane from the SSDT product team: The official recommendation from the TFS / Visual Studio team is to install the version of Visual Studio you use on the build machine. I, like many others, would rather not have to install full blown Visual Studio and so I asked: Is there any chance you'll ever support any of these scenarios: Installation of all build/deploy pre-requisites without installing the VS shell? TFS shipping with all of the pre-requisites for doing SSDT project build/deploys 3rd party build servers (e.g. TeamCity) shipping with all of the requisites for doing SSDT project build/deploys I have to say that the lack of a single installer containing all the pre-requisites for SSDT build/deploy puzzles me. Surely the DacFX installer would be a perfect vehicle for that? Kevin replied again: The answer is no for all 3 scenarios. We looked into this issue, discussed it with the Visual Studio / TFS team, and in the end agreed to go with their latest guidance which is to install Visual Studio (e.g. VS2013 Express for Web) on the build machine. This is how Visual Studio Online is doing it and it's the approach recommended for customers setting up their own TFS build servers. I would hope this is compatible with 3rd party build servers but have not verified whether this works with TeamCity etc. Note that DacFx MSI isn't a suitable release vehicle for this as we don't want to include Visual Studio/MSBuild dependencies in that package. It's meant to just include the core DacFx DLLs used by SSMS, SqlPackage.exe on the command line, etc. What this means is we won't be providing a separate MSI installer or nuget package with just the necessary build DLLs you need to run your build and tests. If someone wanted to create a script that generated a nuget package based on our DLLs and targets files, then release that somewhere on the web for easier integration with 3rd party build servers we've no problem with that. Again, here’s the link to the thread and its worth reading in its entirety if this is something that interests you. So there you have it. Microsoft will not be be providing support for headless build servers for SSDT but if someone in the community wants to go ahead and roll their own, go right ahead. @Jamiet

    Read the article

  • Inequality joins, Asynchronous transformations and Lookups : SSIS

    - by jamiet
    It is pretty much accepted by SQL Server Integration Services (SSIS) developers that synchronous transformations are generally quicker than asynchronous transformations (for a description of synchronous and asynchronous transformations go read Asynchronous and synchronous data flow components). Notice I said “generally” and not “always”; there are circumstances where using asynchronous transformations can be beneficial and in this blog post I’ll demonstrate such a scenario, one that is pretty common when building data warehouses. Imagine I have a [Customer] dimension table that manages information about all of my customers as a slowly-changing dimension. If that is a type 2 slowly changing dimension then you will likely have multiple rows per customer in that table. Furthermore you might also have datetime fields that indicate the effective time period of each member record. Here is such a table that contains data for four dimension members {Terry, Max, Henry, Horace}: Notice that we have multiple records per customer and that the [SCDStartDate] of a record is equivalent to the [SCDEndDate] of the record that preceded it (if there was one). (Note that I am on record as saying I am not a fan of this technique of storing an [SCDEndDate] but for the purposes of clarity I have included it here.) Anyway, the idea here is that we will have some incoming data containing [CustomerName] & [EffectiveDate] and we need to use those values to lookup [Customer].[CustomerId]. The logic will be: Lookup a [CustomerId] WHERE [CustomerName]=[CustomerName] AND [SCDStartDate] <= [EffectiveDate] AND [EffectiveDate] <= [SCDEndDate] The conventional approach to this would be to use a full cached lookup but that isn’t an option here because we are using inequality conditions. The obvious next step then is to use a non-cached lookup which enables us to change the SQL statement to use inequality operators: Let’s take a look at the dataflow: Notice these are all synchronous components. This approach works just fine however it does have the limitation that it has to issue a SQL statement against your lookup set for every row thus we can expect the execution time of our dataflow to increase linearly in line with the number of rows in our dataflow; that’s not good. OK, that’s the obvious method. Let’s now look at a different way of achieving this using an asynchronous Merge Join transform coupled with a Conditional Split. I’ve shown it post-execution so that I can include the row counts which help to illustrate what is going on here: Notice that there are more rows output from our Merge Join component than on the input. That is because we are joining on [CustomerName] and, as we know, we have multiple records per [CustomerName] in our lookup set. Notice also that there are two asynchronous components in here (the Sort and the Merge Join). I have embedded a video below that compares the execution times for each of these two methods. The video is just over 8minutes long. View on Vimeo  For those that can’t be bothered watching the video I’ll tell you the results here. The dataflow that used the Lookup transform took 36 seconds whereas the dataflow that used the Merge Join took less than two seconds. An illustration in case it is needed: Pretty conclusive proof that in some scenarios it may be quicker to use an asynchronous component than a synchronous one. Your mileage may of course vary. The scenario outlined here is analogous to performance tuning procedural SQL that uses cursors. It is common to eliminate cursors by converting them to set-based operations and that is effectively what we have done here. Our non-cached lookup is performing a discrete operation for every single row of data, exactly like a cursor does. By eliminating this cursor-in-disguise we have dramatically sped up our dataflow. I hope all of that proves useful. You can download the package that I demonstrated in the video from my SkyDrive at http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100514/20100514%20Lookups%20and%20Merge%20Joins.zip Comments are welcome as always. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • July, the 31 Days of SQL Server DMO’s – Day 22 (sys.dm_db_index_physical_stats)

    - by Tamarick Hill
    The sys.dm_db_index_physical_stats Dynamic Management Function is used to return information about the fragmentation levels, page counts, depth, number of levels, record counts, etc. about the indexes on your database instance. One row is returned for each level in a given index, which we will discuss more later. The function takes a total of 5 input parameters which are (1) database_id, (2) object_id, (3) index_id, (4) partition_number, and (5) the mode of the scan level that you would like to run. Let’s use this function with our AdventureWorks2012 database to better illustrate the information it provides. SELECT * FROM sys.dm_db_index_physical_stats(db_id('AdventureWorks2012'), NULL, NULL, NULL, NULL) As you can see from the result set, there is a lot of beneficial information returned from this DMF. The first couple of columns in the result set (database_id, object_id, index_id, partition_number, index_type_desc, alloc_unit_type_desc) are either self-explanatory or have been explained in our previous blog sessions so I will not go into detail about these at this time. The next column in the result set is the index_depth which represents how deep the index goes. For example, If we have a large index that contains 1 root page, 3 intermediate levels, and 1 leaf level, our index depth would be 5. The next column is the index_level which refers to what level (of the depth) a particular row is referring to. Next is probably one of the most beneficial columns in this result set, which is the avg_fragmentation_in_percent. This column shows you how fragmented a particular level of an index may be. Many people use this column within their index maintenance jobs to dynamically determine whether they should do REORG’s or full REBUILD’s of a given index. The fragment count represents the number of fragments in a leaf level while the avg_fragment_size_in_pages represents the number of pages in a fragment. The page_count column tells you how many pages are in a particular index level. From my result set above, you see the the remaining columns all have NULL values. This is because I did not specify a ‘mode’ in my query and as a result it used the ‘LIMITED’ mode by default. The LIMITED mode is meant to be lightweight so it does collect information for every column in the result set. I will re-run my query again using the ‘DETAILED’ mode and you will see we now have results for these rows. SELECT * FROM sys.dm_db_index_physical_stats(db_id('AdventureWorks2012'), NULL, NULL, NULL, ‘DETAILED’)   From the remaining columns, you see we get even more detailed information such as how many records are in a particular index level (record_count). We have a column for ghost_record_count which represents the number of records that have been marked for deletion, but have not physically been removed by the background ghost cleanup process. We later see information on the MIN, MAX, and AVG record size in bytes. The forwarded_record_count column refers to records that have been updated and now no longer fit within the row on the page anymore and thus have to be moved. A forwarded record is left in the original location with a pointer to the new location. The last column in the result set is the compressed_page_count column which tells you how many pages in your index have been compressed. This is a very powerful DMF that returns good information about the current indexes in your system. However, based on the mode you select, it could be a very resource intensive function so be careful with how you use it. For more information on this Dynamic Management Function, please see the below Books Online link: http://msdn.microsoft.com/en-us/library/ms188917.aspx Follow me on Twitter @PrimeTimeDBA

    Read the article

  • Archiving SQLHelp tweets

    - by jamiet
    #SQLHelp is a Twitter hashtag that can be used by any Twitter user to get help from the SQL Server community. I think its fair to say that in its first year of being it has proved to be a very useful resource however Kendra Little (@kendra_little) made a very salient point yesterday when she tweeted: Is there a way to search the archives of #sqlhelp Trying to remember answer to a question I know I saw a couple months ago http://twitter.com/#!/Kendra_Little/status/15538234184441856 This highlights an inherent problem with Twitter’s search capability – it simply does not reach far enough back in time. I have made steps to remedy that situation by putting into place two initiatives to archive Tweets that contain the #sqlhelp hashtag. The Archivist http://archivist.visitmix.com/ is a free service that, quite simply, archives a history of tweets that contain a given search term by periodically polling Twitter’s search service with that search term and subsequently displaying a dashboard providing an aggregate view of those tweets for things like tweet volume over time, top users and top words (Archivist FAQ). I have set up an archive on The Archivist for “sqlhelp” which you can view at http://archivist.visitmix.com/jamiet/7. Here is a screenshot of the SQLHelp dashboard 36 minutes after I set it up: There is lots of good information in there, including the fact that Jonathan Kehayias (@SQLSarg) is the most active SQLHelp tweeter (I suspect as an answerer rather than a questioner ) and that SSIS has proven to be a rather (ahem) popular subject!! Datasift The Archivist has its uses though for our purposes it has a couple of downsides. For starters you cannot search through an archive (which is what Kendra was after) and nor can you export the contents of the archive for offline analysis. For those functions we need something a bit more heavyweight and for that I present to you Datasift. Datasift is a tool (currently an alpha release) that allows you to search for tweets and provide them through an object called a Datasift stream. That sounds very similar to normal Twitter search though it has one distinct advantage that other Twitter search tools do not – Datasift has access to Twitter’s Streaming API (aka the Twitter Firehose). In addition it has access to a lot of other rather nice features: It provides the Datasift API that allows you to consume the output of a Datasift stream in your tool of choice (bring on my favourite ultimate mashup tool J ) It has a query language (called Filtered Stream Definition Language – FSDL for short) A Datasift stream can consume (and filter) other Datasift streams Datasift can (and does) consume services other than Twitter If I refer to Datasift as “ETL for tweets” then you may get some sort of idea what it is all about. Just as I did with The Archivist I have set up a publicly available Datasift stream for “sqlhelp” at http://datasift.net/stream/1581/sqlhelp. Here is the FSDL query that provides the data: twitter.text contains "sqlhelp" Pretty simple eh? At the current time it provides little more than a rudimentary dashboard but as Datasift is currently an alpha release I think this may be worth keeping an eye on. The real value though is the ability to consume the output of a stream via Datasift’s RESTful API, observe: http://api.datasift.net/stream.xml?stream_identifier=c7015255f07e982afdeebdf1ae6e3c0d&username=jamiet&api_key=XXXXXXX (Note that an api_key is required during the alpha period so, given that I’m not supplying my api_key, this URI will not work for you) Just to prove that a Datasift stream can indeed consume data from another stream I have set up a second stream that further filters the first one for tweets containing “SSIS”. That one is at http://datasift.net/stream/1586/ssis-sqlhelp and here is the FSDL query: rule "414c9845685ff8d2548999cf3162e897" and (interaction.content contains "ssis") When Datasift moves beyond alpha I’ll re-assess how useful this is going to be and post a follow-up blog. @Jamiet

    Read the article

  • Some thoughts on email hosting for one’s own domain

    - by jamiet
    I have used the same email providers for my own domains for a few years now however I am considering moving over to a new provider. In this email I’ll share my current thoughts and hopefully I’ll get some feedback that might help me to decide on what to do next. What I use today I have three email addresses that I use primarily (I have changed the domains in this blog post as I don’t want to give them away to spammers): [email protected] – My personal account that I give out to family and friends and which I use to register on websites [email protected]  - An account that I use to catch email from the numerous mailing lists that I am on [email protected] – I am a self-employed consultant so this is an account that I hand out to my clients, my accountant, and other work-related organisations Those two domains (jtpersonaldomain.com & jtworkdomain.com) are both managed at http://domains.live.com which is a fantastic service provided by Microsoft that for some perplexing reason they never bother telling anyone about. It offers multiple accounts (I have seven at jtpersonaldomain.com though as already stated I only use two of them) which are accessed via Outlook.com (formerly Hotmail.com) along with usage reporting plus a few other odds and sods that I never use. Best of all though, its totally free. In addition, given that I have got both domains hosted using http://domains.live.com I can link my various accounts together and switch between them at Outlook.com without having to login and logout: N.B. You’ll notice that there are two other accounts listed there in addition to the three I already mentioned. One is my mum’s account which helps me provide IT support/spam filtering services to her and the other is the donation account for AdventureWorks on Azure. I find that linking feature to be very handy indeed. Finally, http://domains.live.com is the epitome of “it just works”. I set up jtworkdomain.com at http://domains.live.com over three years ago and I am pretty certain I haven’t been back there even once to administer it. Proposed changes OK, so if I like http://domains.live.com so much why am I considering changing? Well, I earn my corn in the Microsoft ecosystem and if I’m reading the tea-leaves correctly its looking increasingly likely that the services that I’m going to have to be familiar with in the future are all going to be running on top of and alongside Windows Azure Active Directory and Office 365 respectively. Its clear to me that Microsoft’s are pushing their customers toward cloud services and, like it or lump it, data integration developers like me may have to come along for the ride. I don’t think the day is too far off when we can log into Windows Azure SQL Database (aka SQL Azure), Team Foundation Service, Dynamics etc… using the same credentials that are currently used for Office 365 and over time I would expect those things to get integrated together a lot better – that integration will be based upon a Windows Azure Active Directory identity. This should not come as a surprise, in my opinion Microsoft’s whole enterprise play over the past 15 or 20 years can be neatly surmised as “get people onto Windows Server and Active Directory then upsell from there” – in the not-too-distant-future the only difference is that they’re trying to do it in the cloud. I want to get familiar with these services and hence I am considering moving jtworkdomain.com onto Office 365. I’ll lose the convenience of easily being able to switch to that account at Outlook.com and moreover I’ll have to start paying for it (I think it’ll be about fifty quid a year – not a massive amount but its quite a bit more than free) but increasingly this is beginning to look like a move I have to make. So that’s where my head is at right now. Anyone have any relevant thoughts or experiences to share? Please let me know in the comments below. @Jamiet

    Read the article

  • Bitmask data insertions in SSDT Post-Deployment scripts

    - by jamiet
    On my current project we are using SQL Server Data Tools (SSDT) to manage our database schema and one of the tasks we need to do often is insert data into that schema once deployed; the typical method employed to do this is to leverage Post-Deployment scripts and that is exactly what we are doing. Our requirement is a little different though, our data is split up into various buckets that we need to selectively deploy on a case-by-case basis. I was going to use a SQLCMD variable for each bucket (defaulted to some value other than “Yes”) to define whether it should be deployed or not so we could use something like this in our Post-Deployment script: IF ($(DeployBucket1Flag) = 'Yes')BEGIN   :r .\Bucket1.data.sqlENDIF ($(DeployBucket2Flag) = 'Yes')BEGIN   :r .\Bucket2.data.sqlENDIF ($(DeployBucket3Flag) = 'Yes')BEGIN   :r .\Bucket3.data.sqlEND That works fine and is, I’m sure, a very common technique for doing this. It is however slightly ugly because we have to litter our deployment with various SQLCMD variables. My colleague James Rowland-Jones (whom I’m sure many of you know) suggested another technique – bitmasks. I won’t go into detail about how this works (James has already done that at Using a Bitmask - a practical example) but I’ll summarise by saying that you can deploy different combinations of the buckets simply by supplying a different numerical value for a single SQLCMD variable. Each bit of that value’s binary representation signifies whether a particular bucket should be deployed or not. This is better demonstrated using the following simple script (which can be easily leveraged inside your Post-Deployment scripts): /* $(DeployData) is a SQLCMD variable that would, if you were using this in SSDT, be declared in the SQLCMD variables section of your project file. It should contain a numerical value, defaulted to 0. In this example I have declared it using a :setvar statement. Test the affect of different values by changing the :setvar statement accordingly. Examples: :setvar DeployData 1 will deploy bucket 1 :setvar DeployData 2 will deploy bucket 2 :setvar DeployData 3   will deploy buckets 1 & 2 :setvar DeployData 6   will deploy buckets 2 & 3 :setvar DeployData 31  will deploy buckets 1, 2, 3, 4 & 5 */ :setvar DeployData 0 DECLARE  @bitmask VARBINARY(MAX) = CONVERT(VARBINARY,$(DeployData)); IF (@bitmask & 1 = 1) BEGIN     PRINT 'Bucket 1 insertions'; END IF (@bitmask & 2 = 2) BEGIN     PRINT 'Bucket 2 insertions'; END IF (@bitmask & 4 = 4) BEGIN     PRINT 'Bucket 3 insertions'; END IF (@bitmask & 8 = 8) BEGIN     PRINT 'Bucket 4 insertions'; END IF (@bitmask & 16 = 16) BEGIN     PRINT 'Bucket 5 insertions'; END An example of running this using DeployData=6 The binary representation of 6 is 110. The second and third significant bits of that binary number are set to 1 and hence buckets 2 and 3 are “activated”. Hope that makes sense and is useful to some of you! @Jamiet P.S. I used the awesome HTML Copy feature of Visual Studio’s Productivity Power Tools in order to format the T-SQL code above for this blog post.

    Read the article

< Previous Page | 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064  | Next Page >