Search Results

Search found 114 results on 5 pages for 'jamiet'.

Page 5/5 | < Previous Page | 1 2 3 4 5

Subscribable World Cup 2010 Calendar

- by jamiet

I bang on quite a lot on this blog about ways in which data can get published over the web and one of the most interesting ways, in my opinion, of publishing data in a structured manner that is well understood is to use the iCalendar specification. There isn’t much information in the world that doesn’t have some concept of “when” so iCalendar is a great way of distributing that information. You have probably used iCalendar at some point without even knowing about it. All files with a .ics suffix are iCalendar format files and that is why you can happily import them into Outlook, Hotmail Calendar, Google Calendar etc… where they can be parsed and have the semantic data (when, where and who) extracted from them. Importing of iCalendar format data is really only half the trick though; in my opinion the real value of iCalendar-formatted calendar is the ability to subscribe to them. Subscribing has a simple benefit over importing but that single benefit is of massive importance: a subscriber to an iCalendar calendar can periodically check to see if any updates have been made and, if they have, automatically update the local copy. The real benefit to the user is the productivity gain – a single update to an iCalendar means that all subscribers are automatically made aware of the change and there is zero effort on the part of the subscriber; as my former colleague Howard van Rooijen is fond of saying, “work smarter not harder” – nowhere is this edict more ably demonstrated than subscribing versus importing of calendars. If you want to read some more thoughts about iCalendar then go and read my past blog post Calendar syndication - My big hope for 2009's breakthrough technology or better still go and seek out Jon Udell who speaks very authoritatively on the issue of iCalendar. With this subject of iCalendar on my mind I was interested to discover (via Steve Clayton’s blog post Download the world cup fixtures) that the BBC had made a .ics file available containing all of the matches in the upcoming World Cup. As you can probably guess this was a file that was made available so that it could be imported into your calendar of choice. It had one obvious downside though, right now nobody knows who is going to be playing in the knock-out stages so the calendar looks like this: with no teams being named after 25th June. How much more useful would this calendar have been if the BBC had made it possible to subscribe to the calendar instead, thus the calendar could be updated with the teams for the knock out stages when they are known and every subscriber would have a permanently up-to-date record of all the fixtures in their calendar. Better still, the calendar could be updated with match results as well or perhaps even post a match report from the BBC sport pages; when calendars are made subscribable a sea of opportunity opens up for distribution of information. So with that in mind I have decided to go one better than the BBC. I have imported their .ics into a brand new Hotmail calendar and made it publicly available at the following URLs: HTML http://cid-dc1ed121af0476be.calendar.live.com/calendar/World+Cup+2010/index.html iCalendar webcal://cid-dc1ed121af0476be.calendar.live.com/calendar/World+Cup+2010/calendar.ics The link you’re really interested in is the second one - click on that and it should open up in your calendar software of choice. Or, if you want to view it in an online calendar such as Hotmail Calendar or Google Calendar, copy and paste that URL into the appropriate place. Some people have told me they’re having trouble with the iCalendar link in which case hit the HTML link and then click “View ICS” at the resultant web page: I shall endeavour to keep the calendar updated throughout the World Cup and even if I don’t you’re no worse off than if you had imported the BBC’s .ics file so why not give it a try? If I do keep it up to date then you will have a permanent record of the 2010 World Cup available in your calendar. Forever. If you have your calendar synced to your smartphone then you’ll be carrying match reports around with you without you having to do a single thing. Surely that’s worth a quick click isn’t it? If you have any thoughts let me have them in the comments below. Thanks for reading. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Introducing sp_ssiscatalog (v1.0.0.0)

- by jamiet

Regular readers of my blog may know that over the last year I have made available a suite of SQL Server Reporting Services (SSRS) reports that provide visualisations of the data in the SQL Server Integration Services (SSIS) 2012 Catalog. Those reports are available at http://ssisreportingpack.codeplex.com. As I have built these reports and used them myself on a real life project a couple of things have dawned on me: As soon as your SSIS Catalog gets a significant amount of data in it the performance of the reports degrades rapidly. This is hampered by the fact that there are limitations as to the SQL statements that I can embed within a SSRS report. SSIS professionals are data guys at heart and those types of people feel more comfortable in a query environment rather than having to go through the rigmarole of standing up a reporting server (well, I know I do anyway) Hence I have decided to take a different tack with the reporting pack. Taking my lead from Adam Machanic’s sp_whoisactive and Brent Ozar’s sp_blitz I have produced sp_ssiscatalog, a stored procedure that makes it easy to get at the crucial data in the SSIS Catalog. I will spend the rest of this blog explaining exactly what sp_ssiscatalog does and how to use it but if you would rather just download the bits yourself and start to play you can download v1.0.0.0 from DB v1.0.0.0. Usage Scenarios Most Recent Execution I find that the most frequent information that one needs to get from the SSIS Catalog is information pertaining to the most recent execution. Hence if you execute sp_ssiscatalog with no parameters, that is exactly what you will get. EXEC [dbo].[sp_ssiscatalog] This will return up to 5 resultsets: EXECUTION - Summary information about the execution including status, start time & end time EVENTS - All events that occurred during the execution OnError,OnTaskFailed - All events where event_name is either OnError or OnTaskFailed OnWarning - All events where event_name is OnWarning EXECUTABLE_STATS - Duration and execution result of every executable in the execution All 5 resultsets will be displayed if there is any data satisfying that resultset. In other words, if there are no (for example) OnWarning events then the OnWarning resultset will not be displayed. The display of these 5 resultsets can be toggled respectively by these 5 optional parameters (all of which are of type BIT): @exec_execution @exec_events @exec_errors @exec_warnings @exec_executable_stats Any Execution As just explained the default behaviour is to supply data for the most recent execution. If you wish to specify which execution the data should return data for simply supply the execution_id as a parameter: EXEC [dbo].[sp_ssiscatalog] 6 All Executions sp_ssiscatalog can also return information about all executions: EXEC [dbo].[sp_ssiscatalog] @operation_type='execs' The most recent execution will appear at the top. sp_ssiscatalog provides a number of parameters that enable you to filter the resultset: @execs_folder_name @execs_project_name @execs_package_name @execs_executed_as_name @execs_status_desc Some typical usages might be: //Return all failed executions EXEC [dbo].[sp_ssiscatalog] @operation_type='execs',@execs_status_desc='failed' //Return all executions for a specified folder EXEC [dbo].[sp_ssiscatalog] @operation_type='execs',@execs_folder_name='My folder' //Return all executions of a specified package in a specified project EXEC [dbo].[sp_ssiscatalog] @operation_type='execs',@execs_project_name='My project', @execs_package_name='Pkg.dtsx' Installing sp_ssicatalog Under the covers sp_ssiscatalog actually calls many other stored procedures and functions hence creating it on your server is not simply a case of running a CREATE PROCEDURE script. I maintain the code in an SQL Server Data Tools (SSDT) database project which means that you have two ways of obtaining it. Download the source code You can download the latest (at the time of writing) source code from http://ssisreportingpack.codeplex.com/SourceControl/changeset/view/70192. Hit the download button to download all the source code in a zip file. The contents of that zip file will include an SSDT database project which you can open up in SSDT and publish just like any other SSDT database project. You can publish to a new database or any existing database, even [SSISDB] if you prefer. Download a dacpac Maintaining the code in an SSDT database project means that it can all get packaged up into a dacpac that you can then publish to your SQL Server. That dacpac is available from DB v1.0.0.0: Ordinarily a dacpac can be deployed to a SQL Server from SSMS using the Deploy Dacpac wizard however in this case there is a limitation. Due to sp_ssiscatalog referring to objects in the SSIS Catalog (which it has to do of course) the dacpac contains a SqlCmd variable to store the name of the database that underpins the SSIS Catalog; unfortunately the Deploy Dacpac wizard in SSMS has a rather gaping limitation in that it cannot deploy dacpacs containing SqlCmd variables. Hence, we can use the command-line tool, sqlpackage.exe, instead. Don’t worry if reverting to the command-line sounds a little daunting, I assure you it is not. Simply open a Visual Studio command-prompt and cd to the folder containing the downloaded dacpac: Type: "%PROGRAMFILES(x86)%\Microsoft SQL Server\110\DAC\bin\sqlpackage.exe" /action:Publish /TargetDatabaseName:SsisReportingPack /SourceFile:SSISReportingPack.dacpac /Variables:SSISDB=SSISDB /TargetServerName:(local) or the shortened form: "%PROGRAMFILES(x86)%\Microsoft SQL Server\110\DAC\bin\sqlpackage.exe" /a:Publish /tdn:SsisReportingPack /sf:SSISReportingPack.dacpac /v:SSISDB=SSISDB /tsn:(local) remembering to set your server name appropriately (here mine is set to “(local)” ). If everything works successfully you will see this: And you’re done! You’ll have a new database called [SsisReportingPack] which contains sp_ssiscatalog: Good luck with sp_ssiscatalog. I have been using it extensively on my own projects recently and it has proved to be very useful indeed. Rest-assured however, I will be adding many new capabilities in the future. Feedback is welcome. @Jamiet

Read the article
SSIS Reporting Pack v0.4 – Execution Report updated

- by jamiet

SSIS Reporting Pack is a suite of reports that I maintain at http://ssisreportingpack.codeplex.com/ that provide visualisation over the SSIS Catalog in SQL Server 2012 and attempt to add value over the reports that ship in the box. Work on the reports has stalled (my last SSIS Reporting Pack blog post was on 4th September 2011) as I’ve had rather more important things going on my life of late however I have recently checked-in a fix that couldn’t really be delayed. I discovered a problem with the Execution report that was causing the report to effectively hang, it was caused by this bit of SQL hidden away in the report definition: [generated_executables] AS ( SELECT [new_executable].[execution_path],[new_executable].[parent_execution_path] FROM ( SELECT [execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_close_square] + 1) , [parent_execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_open_square]) FROM ( SELECT [execution_path] , [char_index_open_square] = CHARINDEX('[',REVERSE([execution_path]),1) , [char_index_close_square] = CHARINDEX(']',REVERSE([execution_path]),1) , [length_exec_path] = LEN([execution_path]) FROM [exec_stats] es WHERE execution_path LIKE '%\[%]%' ESCAPE '\' )AS [loop_iteration] ) AS [new_executable] GROUP BY [new_executable].[execution_path],[new_executable].[parent_execution_path]) It was there because SSIS does not currently treat a loop iteration as an executable yet I figured there was still value in being able to view it as such – this SQL essentially “invents” new executables for those loop iterations; its what enabled the following visualisation: where each of the three iterations of a For Each Loop called “FEL Loop over top performing regions” appear in the report. Unfortunately, as I alluded, this could under certain circumstances (most likely when there were many loop iterations) cause the report to hang as it waited for the results to be constructed and returned. The change that I have made eradicates this generation of “fake” executables and thus produces this visualisation instead: Notice that the three “children” of the For Each Loop are no longer the three iterations but actually the task (“EPT Call Data Export Package”) contained within that For Each Loop. The problem here is of course that there is no longer a visual distinction between those three iterations; I have instead made the full execution path viewable via a tooltip: If you preferred the “old” way of presenting this information and are happy to put up with the performance degradation then I have kept the old version of the report hanging around in the reporting pack as “execution loop with iterations” however none of the other reports link to it so you will have to browse to it manually if you want to use it. Please let me know if you ARE using it – I would be very interested to hear about your experiences. The last change to make you aware of in the execution report is that by default I no longer show OnPreValidate or OnPostValidate messages as I consider them to be superfluous and only serve to clutter up the results. If you want to put them back, well, its open source so go right ahead! The latest release of SSIS Reporting Pack that contains all of these changes is v0.4 and can be downloaded from http://ssisreportingpack.codeplex.com/releases/view/88178 Feedback on all of the above changes would be very much appreciated. @Jamiet

Read the article
SSIS Reporting Pack v0.4 – Execution Report updated

- by jamiet

SSIS Reporting Pack is a suite of reports that I maintain at http://ssisreportingpack.codeplex.com/ that provide visualisation over the SSIS Catalog in SQL Server 2012 and attempt to add value over the reports that ship in the box. Work on the reports has stalled (my last SSIS Reporting Pack blog post was on 4th September 2011) as I’ve had rather more important things going on my life of late however I have recently checked-in a fix that couldn’t really be delayed. I discovered a problem with the Execution report that was causing the report to effectively hang, it was caused by this bit of SQL hidden away in the report definition: [generated_executables] AS ( SELECT [new_executable].[execution_path],[new_executable].[parent_execution_path] FROM ( SELECT [execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_close_square] + 1) , [parent_execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_open_square]) FROM ( SELECT [execution_path] , [char_index_open_square] = CHARINDEX('[',REVERSE([execution_path]),1) , [char_index_close_square] = CHARINDEX(']',REVERSE([execution_path]),1) , [length_exec_path] = LEN([execution_path]) FROM [exec_stats] es WHERE execution_path LIKE '%\[%]%' ESCAPE '\' )AS [loop_iteration] ) AS [new_executable] GROUP BY [new_executable].[execution_path],[new_executable].[parent_execution_path]) It was there because SSIS does not currently treat a loop iteration as an executable yet I figured there was still value in being able to view it as such – this SQL essentially “invents” new executables for those loop iterations; its what enabled the following visualisation: where each of the three iterations of a For Each Loop called “FEL Loop over top performing regions” appear in the report. Unfortunately, as I alluded, this could under certain circumstances (most likely when there were many loop iterations) cause the report to hang as it waited for the results to be constructed and returned. The change that I have made eradicates this generation of “fake” executables and thus produces this visualisation instead: Notice that the three “children” of the For Each Loop are no longer the three iterations but actually the task (“EPT Call Data Export Package”) contained within that For Each Loop. The problem here is of course that there is no longer a visual distinction between those three iterations; I have instead made the full execution path viewable via a tooltip: If you preferred the “old” way of presenting this information and are happy to put up with the performance degradation then I have kept the old version of the report hanging around in the reporting pack as “execution loop with iterations” however none of the other reports link to it so you will have to browse to it manually if you want to use it. Please let me know if you ARE using it – I would be very interested to hear about your experiences. The last change to make you aware of in the execution report is that by default I no longer show OnPreValidate or OnPostValidate messages as I consider them to be superfluous and only serve to clutter up the results. If you want to put them back, well, its open source so go right ahead! The latest release of SSIS Reporting Pack that contains all of these changes is v0.4 and can be downloaded from http://ssisreportingpack.codeplex.com/releases/view/88178 Feedback on all of the above changes would be very much appreciated. @Jamiet

Read the article
Thinking differently about BI delivery

- by jamiet

My day job involves implementing Business Intelligence (BI) solutions which, as I have said before, is simply about giving people the information they need to do their jobs. I’m always interested in learning about new ways of achieving that aim and that is my motivation for writing blog entries that are not concerned with SQL or SQL Server per se. Implementing BI systems usually involves hacking together a bunch third party products with some in-house “glue” and delivering information using some shiny, expensive web-based front-end tool; the list of vendors that supply such tools is big and ever-growing. No doubt these tools have their place and of late I have started to wonder whether they can be supplemented with different ways of delivering information. The problem I have with these separate web-based tools is exactly that – they are separate web-based tools. What’s the problem with that you might ask? I’ll explain! They force the information worker to go somewhere unfamiliar in order to get the information they need to do their jobs. Would it not be better if we could deliver information into the tools that those information workers are already using and not force them to go somewhere else? I look at the rise of blogging over recent years and I realise that what made them popular is that people can subscribe to RSS feeds and have information pushed to them in their tool of choice rather than them having to go and find the information for themselves in a tool that has been foisted upon them. Would it not be a good idea to adopt the principle of subscription for the benefit of delivering BI information as well? I think it would and in the rest of this blog entry I’ll outline such a scenario where the power of subscription could be used to enhance the delivery of information to information workers. Typical questions that information workers ask might be: What are my year-on-year sales figures? What was my footfall yesterday? How many widgets have I sold so far today? Each of those questions includes a time element and that shouldn’t surprise us, any BI system that I have worked on includes the dimension of time. Now, what do people use to view and organise their time-oriented information? Its not a trick question, they use a calendar and in the enterprise space more often than not that calendar is managed using Outlook. Given then that information workers are already looking at their calendar in Outlook anyway would it not make sense then to deliver information into that same calendar? Of course it would. Calendars are a great way of visualising information such as sales figures. Observe: Just in this single screenshot I have managed to convey a multitude of information. The information worker can see, at a glance, information about hourly/daily/weekly/monthly sales and, moreover, he/she is viewing that information right inside the tool that they use every day. There is no effort on the part of him/her, the information just appears hour after hour, day after day. Taking the idea further, each one of those calendar items could be a mini-dashboard in its own right. Double-clicking on an item could show a plethora of other information about that time slot such as breaking the sales down per region or year-over-year comparisons. Perhaps the title could employ a sparkline? Loads of possibilities. The point is that calendars are a completely natural way to visualise information; we should make more use of them! The real beauty of delivering information using calendars for us BI developers is that it should be so easy. In the case of Outlook we don’t need to write complicated VBA code that can go and manipulate a person’s calendar, simply publishing data in a format that Outlook can understand is sufficient and happily such formats already exist; iCalendar is the accepted format and the even more flexible xCalendar is hopefully on its way as well. I’d like to make one last point and this one is with my SQL Server hat on. Reporting Services 2008 R2 introduced the ability to publish data as subscribable Atom feeds so it seems logical that it could also be a vehicle for delivering calendar feeds too. If you think this would be a good idea go and vote for it at Publish data as iCalendar feeds and please please please add some comments (especially if you vote it down). Work smarter, not harder! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Reading the tea leaves from Windows Azure support

- by jamiet

A few idle thoughts… Three months ago I had an issue regarding Windows Azure where I was unable to login to the management portal. At the time I contacted Azure support, the issue was soon resolved and I thought no more about it. Until today that is when I received an email from Azure support providing a detailed analysis of the root cause, the fix and moreover precise details about when and where things occurred. The email itself is interesting and I have included the entirety of it below. A few things were interesting to me: The level of detail and the diligence in investigating and reporting the issue I found really rather impressive. They even outline the number of users that were affected (127 in case you can’t be bothered reading). Compare this to the quite pathetic support that another division within Microsoft, Skype, provided to Greg Low recently: Skype support and dead parrot sketches This line: “Windows Azure performed a planned change from using the Microsoft account service (formerly Windows Live ID) to the Azure Active Directory (AAD) as its primary authentication mechanism on August 24th. This change was made to enable future innovation in the area of authentication – particularly for organizationally owned identities, identity federation, stronger authentication methods and compliance certification. ” I also found to be particularly interesting. I have long thought that one of the reasons Microsoft has proved to be such a money-making machine in the enterprise is because they provide the infrastructure and then upsell on top of that – and nothing is more infrastructural than Active Directory. It has struck me of late that they are trying to make the same play of late in the cloud by tying all their services into Azure Active Directory and here we see a clear indication of that by making AAD the authentication mechanism for anyone using Windows Azure. I get the feeling that we’re going to hear much much more about AAD in the future; isn’t it about time we could log on to SQL Azure Windows Azure SQL Database without resorting to SQL authentication, for example? And why do Microsoft have two identity providers – Microsoft Account (aka Windows Live ID) and AAD – isn’t it about time those things were combined? As I said, just some idle thoughts. Below is the transcript of the email if you are interested. @Jamiet This is regarding the support request <redacted> where in you were not able to login into the windows azure management portal with live id. We are providing you with the summary, root cause analysis and information about permanent fix: Incident Title: You were unable to access Windows Azure Portal after Microsoft Account to Azure Active Directory account Migration. Service Impacted: Management Portal Incident Start Date and Time: 8/24/2012 4:30:00 PM Date and Time Service was Restored: 10/17/2012 12:00:00 AM Summary: Windows Azure performed a planned change from using the Microsoft account service (formerly Windows Live ID) to the Azure Active Directory (AAD) as its primary authentication mechanism on August 24th. This change was made to enable future innovation in the area of authentication – particularly for organizationally owned identities, identity federation, stronger authentication methods and compliance certification. While this migration was largely transparent to Windows Azure users, a small number of users whose sign-in names were part of a Windows Live Custom Domain were unable to login. This incompatibility was not discovered during the Quality Assurance testing phase prior to the migration. Customer Impact: Customers whose sign-in names were part of a Windows Live Custom Domain were unable to sign-in the Management Portal after ~4:00 p.m. PST on August 24th, 2012. We determined that the issue did impact at least 127 users in 98 of these Windows Live Custom Domains and had a maximum potential impact of 1,110 users in total. Root Cause: The root cause of the issue was an incompatibility in the AAD authentication service to handle logins from Microsoft accounts whose sign-in names were part of a Windows Live Custom Domains. This issue was not discovered during the Quality Assurance testing phase prior to the migration from Microsoft Account (MSA) to AAD. Mitigations: The issue was mitigated for the majority of affected users by 8:20 a.m. PST on August 25th, 2012 by running some internal scripts to correct many known Windows Live Custom Domains. The remaining affected domains fell into two categories: Windows Live Custom Domains that were not corrected by 8/25/2012. An additional 48 Windows Live Custom Domains were fixed in the weeks following the incident within 2 business days after the AAD team received an escalation from product support regarding those accounts. Windows Live Custom domains that were also provisioned in Office365. Some of the affected Windows Live Custom Domains had already been provisioned in AAD because their owners signed up for Office365 which is a service that also uses AAD. In these cases the Azure customers had to work around the issue by renaming their Microsoft Account or using a different Microsoft Account to administer their Azure subscription. Permanent Fix: The Azure Active Directory team permanently fixed the issue for all customers on 10/17/2012 in an upgraded release of the AAD service.

Read the article
Interesting things – Twitter annotations and your phone as a web server

- by jamiet

I overheard/read a couple of things today that really made me, data junkie that I am, take a step back and think, “Hmmm, yeah, that could be really interesting” and I wanted to make a note of them here so that (a) I could bring them to the attention of anyone that happens to read this and (b) I can maybe come back here in a few years and see if either of these have come to fruition. Your phone as a web server While listening to Jon Udell’s (twitter) “Interviews with Innovators Podcast” today in which he interviewed Herbert Van de Sompel (twitter) about his Momento project. During the interview Jon and Herbert made the following remarks: Jon: [some people] really had this vision of a web of servers, the notion that every node on the internet, every connected entity, is potentially a server and a client…we can see where we’re getting to a point where these endpoint devices we have in our pockets are going to be massively capable and it may be in the not too distant future that significant chunks of the web archive will be cached all over the place including on your own machine… Herbert: wasn’t it Opera who at one point turned your browser into a server? That really got my brain ticking. We all carry a mobile phone with us and therefore we all potentially carry a mobile web server with us as well and to my mind the only thing really stopping that from happening is the capabilities of the phone hardware, the capabilities of the network infrastructure and the will to just bloody do it. Certainly all the standards required for addressing a web server on a phone already exist (to this uninitiated observer DNS and IPv6 seem to solve that problem) so why not? I tweeted about the idea and Rory Street answered back with “why would you want a phone to be a web server?”: Its a fair question and one that I would like to try and answer. Mobile phones are increasingly becoming our window onto the world as we use them to upload messages to Twitter, record our location on FourSquare or interact with our friends on Facebook but in each of these cases some other service is acting as our intermediary; to see what I’m thinking you have to go via Twitter, to see where I am you have to go to FourSquare (I’m using ‘I’ liberally, I don’t actually use FourSquare before you ask). Why should this have to be the case? Why can’t that data be decentralised? Why can’t we be masters of our own data universe? If my phone acted as a web server then I could expose all of that information without needing those intermediary services. I see a time when we can pass around URLs such as the following: http://jamiesphone.net/location/current - Where is Jamie right now? http://jamiesphone.net/location/2010-04-21 – Where was Jamie on 21st April 2010? http://jamiesphone.net/thoughts/current – What’s on Jamie’s mind right now? http://jamiesphone.net/blog – What documents is Jamie sharing with me? http://jamiesphone.net/calendar/next7days – Where is Jamie planning to be over the next 7 days? and those URLs get served off of the phone in our pockets. If we govern that data then we can control who has access to it and (crucially) how long its available for. Want to wipe yourself off the face of the web? its pretty easy if you’re in control of all the data – just turn your phone off. None of this exists today but I look forward to a time when it does. Opera really were onto something last June when they announced Opera Unite (admittedly Unite only works because Opera provide an intermediary DNS-alike system – it isn’t totally decentralised). Opening up Twitter annotations Last week Twitter held their first developer conference called Chirp where they announced an upcoming new feature called ‘Twitter Annotations’; in short this will allow us to attach metadata to a Tweet thus enhancing the tweet itself. Think of it as a richer version of hashtags. To think of it another way Twitter are turning their data into a humongous Entity-Attribute-Value or triple-tuple store. That alone has huge implications both for the web and Twitter as a whole – the ability to enrich that 140 characters data and thus make it more useful is indeed compelling however today I stumbled upon a blog post from Eugene Mandel entitled Tweet Annotations – a Way to a Metadata Marketplace? where he proposed the idea of allowing tweets to have metadata added by people other than the person who tweeted the original tweet. This idea really fascinated me especially when I read some of the potential uses that Eugene and his commenters suggested. They included: Amazon could attach an ISBN to a tweet that mentions a book. Specialist clients apps for book lovers could be built up around this metadata. Advertisers could pay to place adverts in metadata. The revenue generated from those adverts could be shared with the tweeter or people who add the metadata. Granted, allowing anyone to add metadata to a tweet has the potential to create a spam problem the like of which we haven’t even envisaged but spam hasn’t halted the growth of the web and neither should it halt the growth of data annotations either. The original tweeter should of course be able to determine who can add metadata and whether it should be moderated. As Eugene says himself: Opening publishing tweet annotations to anyone will open the way to a marketplace of metadata where client developers, data mining companies and advertisers can add new meaning to Twitter and build innovative businesses. What Eugene and his followers did not mention is what I think is potentially the most fascinating use of opening up annotations. Google’s success today is built on their page rank algorithm that measures the validity of a web page by the number of incoming links to it and the page rank of the sites containing those links – its a system built on reputation. Twitter annotations could open up a new paradigm however – let’s call it People rank- where reputation can be measured by the metadata that people choose to apply to links and the websites containing those links. Its not hard to see why Google and Microsoft have paid big bucks to get access to the Twitter firehose! Neither of these features, phones as a web server or the ability to add annotations to other people’s tweets, exist today but I strongly believe that they could dramatically enhance the web as we know it today. I hope to look back on this blog post in a few years in the knowledge that these ideas have been put into place. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
SSIS Lookup component tuning tips

- by jamiet

Yesterday evening I attended a London meeting of the UK SQL Server User Group at Microsoft’s offices in London Victoria. As usual it was both a fun and informative evening and in particular there seemed to be a few questions arising about tuning the SSIS Lookup component; I rattled off some comments and figured it would be prudent to drop some of them into a dedicated blog post, hence the one you are reading right now. Scene setting A popular pattern in SSIS is to use a Lookup component to determine whether a record in the pipeline already exists in the intended destination table or not and I cover this pattern in my 2006 blog post Checking if a row exists and if it does, has it changed? (note to self: must rewrite that blog post for SSIS2008). Fundamentally the SSIS lookup component (when using FullCache option) sucks some data out of a database and holds it in memory so that it can be compared to data in the pipeline. One of the big benefits of using SSIS dataflows is that they process data one buffer at a time; that means that not all of the data from your source exists in the dataflow at the same time and is why a SSIS dataflow can process data volumes that far exceed the available memory. However, that only applies to data in the pipeline; for reasons that are hopefully obvious ALL of the data in the lookup set must exist in the memory cache for the duration of the dataflow’s execution which means that any memory used by the lookup cache will not be available to be used as a pipeline buffer. Moreover, there’s an obvious correlation between the amount of data in the lookup cache and the time it takes to charge that cache; the more data you have then the longer it will take to charge and the longer you have to wait until the dataflow actually starts to do anything. For these reasons your goal is simple: ensure that the lookup cache contains as little data as possible. General tips Here is a simple tick list you can follow in order to tune your lookups: Use a SQL statement to charge your cache, don’t just pick a table from the dropdown list made available to you. (Read why in SELECT *... or select from a dropdown in an OLE DB Source component?) Only pick the columns that you need, ignore everything else Make the database columns that your cache is populated from as narrow as possible. If a column is defined as VARCHAR(20) then SSIS will allocate 20 bytes for every value in that column – that is a big waste if the actual values are significantly less than 20 characters in length. Do you need DT_WSTR typed columns or will DT_STR suffice? DT_WSTR uses twice the amount of space to hold values that can be stored using a DT_STR so if you can use DT_STR, consider doing so. Same principle goes for the numerical datatypes DT_I2/DT_I4/DT_I8. Only populate the cache with data that you KNOW you will need. In other words, think about your WHERE clause! Thinking outside the box It is tempting to build a large monolithic dataflow that does many things, one of which is a Lookup. Often though you can make better use of your available resources by, well, mixing things up a little and here are a few ideas to get your creative juices flowing: There is no rule that says everything has to happen in a single dataflow. If you have some particularly resource intensive lookups then consider putting that lookup into a dataflow all of its own and using raw files to pass the pipeline data in and out of that dataflow. Know your data. If you think, for example, that the majority of your incoming rows will match with only a small subset of your lookup data then consider chaining multiple lookup components together; the first would use a FullCache containing that data subset and the remaining data that doesn’t find a match could be passed to a second lookup that perhaps uses a NoCache lookup thus negating the need to pull all of that least-used lookup data into memory. Do you need to process all of your incoming data all at once? If you can process different partitions of your data separately then you can partition your lookup cache as well. For example, if you are using a lookup to convert a location into a [LocationId] then why not process your data one region at a time? This will mean your lookup cache only has to contain data for the location that you are currently processing and with the ability of the Lookup in SSIS2008 and beyond to charge the cache using a dynamically built SQL statement you’ll be able to achieve it using the same dataflow and simply loop over it using a ForEach loop. Taking the previous data partitioning idea further … a dataflow can contain more than one data path so why not split your data using a conditional split component and, again, charge your lookup caches with only the data that they need for that partition. Lookups have two uses: to (1) find a matching row from the lookup set and (2) put attributes from that matching row into the pipeline. Ask yourself, do you need to do these two things at the same time? After all once you have the key column(s) from your lookup set then you can use that key to get the rest of attributes further downstream, perhaps even in another dataflow. Are you using the same lookup data set multiple times? If so, consider the file caching option in SSIS 2008 and beyond. Above all, experiment and be creative with different combinations. You may be surprised at what works. Final thoughts If you want to know more about how the Lookup component differs in SSIS2008 from SSIS2005 then I have a dedicated blog post about that at Lookup component gets a makeover. I am on a mini-crusade at the moment to get a BULK MERGE feature into the database engine, the thinking being that if the database engine can quickly merge massive amounts of data in a similar manner to how it can insert massive amounts using BULK INSERT then that’s a lot of work that wouldn’t have to be done in the SSIS pipeline. If you think that is a good idea then go and vote for BULK MERGE on Connect. If you have any other tips to share then please stick them in the comments. Hope this helps! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Blogging tips for SQL Server professionals

- by jamiet

For some time now I have been intending to put some material together relating my blogging experiences since I began blogging in 2004 and that led to me submitting a session for SQLBits recently where I intended to do just that. That didn’t get enough votes to allow me to present however so instead I resolved to write a blog post about it and Simon Sabin’s recent post Blogging – how do you do it? has prompted me to get around to completing it. So, here I present a compendium of tips that I’ve picked up from authoring a fair few blog posts over the past 6 years. Feedburner Feedburner.com is a service that can consume your blog’s default RSS feed and provide another, replacement, feed that has exactly the same content. You can then supply that replacement feed on your blog site for other people to consume in their RSS readers. Why would you want to do this? Well, two reasons actually: It makes your blog portable. If you ever want to move your blog to a different URL you don’t have to tell your subscribers to move to a different feed. The feedburner feed is a pointer to your blog content rather than being a copy of it. Feedburner will collect stats telling you how many people are subscribed to your feed, which RSS readers they use, stuff like that. Here’s a sample screenshot for http://sqlblog.com/blogs/jamie_thomson/: It also tells you what your most viewed posts are: Web stats like these are notoriously inaccurate but then again the method of measurement here is not important, what IS important is that it gives you a trustworthy ranking of your blog posts and (in my opinion) knowing which are your most popular posts is more important than knowing exactly how many views each post has had. This is just the tip of the iceberg of what Feedburner provides and I recommend every new blogger to try it! Monitor subscribers using Google Reader If for some reason Feedburner is not to your taste or (more likely) you already have an established RSS feed that you do not want to change then Google provide another way in which you can monitor your readership in the shape of their online RSS reader, Google Reader. It provides, for every RSS feed, a collection of stats including the number of Google Reader users that have subscribed to that RSS feed. This is really valuable information and in fact I have been recording this statistic for mine and a number of other blogs for a few years now and as such I can produce the following chart that indicates how readership is trending for those blogs over time: [Good news for my fellow SQLBlog bloggers.] As Stephen Few readily points out, its not the numbers that are important but the trend. Search Engine Optimisation (SEO) SEO (or “How do I get my blog to show up in Google”) is a massive area of expertise which I don’t want (and am unable) to cover in much detail here but there are some simple rules of thumb that will help: Tags – If your blog engine offers the ability to add tags to your blog post, use them. Invariably those tags go into the meta section of the page HTML and search engines lap that stuff up. For example, from my recent post Microsoft publish Visual Studio 2010 Database Project Guidance: Title – Search engines take notice of web page titles as well so make them specific and descriptive (e.g. “Configuring dtsConfig connection strings”) rather than esoteric and meaningless in a vain attempt to be humorous (e.g. “Last night a DJ saved my ETL batch”)! Title(2) – Make your title even more search engine friendly by mentioning high level subject areas, not dissimilar to Twitter hashtags. For example, if you look at all of my posts related to SSIS you will notice that nearly all contain the word “SSIS” in the title even if I had to shoehorn it in there by putting it in square brackets or similar. Another tip, if you ARE putting words into your titles in this artificial manner then put them at the end so that they’re not that prominent in search engine results; they’re there for the search engines to consume, not for human beings. Images – Always add titles and alternate text (ALT attribute) to images in your blog post. If you use Windows 7 or Windows Vista then you can use Live Writer (which Simon recommended) makes this easy for you. Headings – If you want to highlight section headings use heading tags (e.g. <H1>, <H2>, <H3> etc…) rather than just formatting the text appropriately – again, Live makes this easy. These tags give your blog posts structure that is understood by search engines and RSS readers alike. (I believe it makes them more amenable to CSS as well – though that’s not something I know too much about). If you check the HTML source for the blog post you’re reading right now you’ll be able to scan through and see where I have used heading tags. Microsoft provide a free tool called the SEO Toolkit that will analyse your blog site (for free) and tell you what things you should change to improve SEO. Go read more and download for free at Search Engine Optimization Toolkit. Did I mention that it was free? Miscellaneous Tips If you are including code in your blog post then ensure it is formatted correctly. Use SQL Server Central’s T-SQL prettifier for formatting T-SQL code. Use images and videos. Personally speaking there’s nothing I like less when reading a blog than paragraph after paragraph of text. Images make your blog more appealing which means people are more likely to read what you have written. Be original. Don’t plagiarise other people’s content and don’t simply rewrite the contents of Books Online. Every time you publish a blog post tweet a link to it. Include hashtags in your tweet that are more likely to grab people’s attention. That’s probably enough for now - I hope this blog post proves useful to someone out there. If you would appreciate a related session at a forthcoming SQLBits conference then please let me know. This will likely be my last blog post for 2010 so I would like to take this opportunity to thank everyone that has commented on, linked to or read any of my blog posts in that time. 2011 is shaping up to be a very interesting for SQL Server observers with the impending release of SQL Server code-named Denali and I promise I’ll have lots more content on that as the year progresses. Happy New Year. @Jamiet

Read the article
Using Find/Replace with regular expressions inside a SSIS package

- by jamiet

Another one of those might-be-useful-again-one-day-so-I’ll-share-it-in-a-blog-post blog posts I am currently working on a SQL Server Integration Services (SSIS) 2012 implementation where each package contains a parameter called ETLIfcHist_ID: During normal execution this will get altered when the package is executed from the Execute Package Task however we want to make sure that at deployment-time they all have a default value of –1. Of course, they tend to get changed during development so I wanted a way of easily changing them all back to the default value. Opening up each package in turn and editing them was an option but given that we have over 40 packages and we might want to carry out this reset fairly frequently I needed a more automated method so I turned to Visual Studio’s Find/Replace… feature Of course, we don’t know what value will be in that parameter so I can’t simply search for a particular value; hence I opted to use a regular expression to identify the value to be change. In the rest of this blog post I’ll explain how to do that. For demonstration purposes I have taken the contents of a .dtsx file and stripped out everything except the element containing the parameters (<DTS:PackageParameters>), if you want to play along at home you can copy-paste the XML document below into a new XML file and open it up in Visual Studio: <?xml version="1.0"?> <DTS:Executable xmlns:DTS="www.microsoft.com/SqlServer/Dts"> <DTS:PackageParameters> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="InterfaceHistory_ID: used for Lineage" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}" DTS:ObjectName="ETLIfcHist_ID"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property> </DTS:PackageParameter> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="Some other description" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25845C7E}" DTS:ObjectName="SomeOtherObjectName"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">SomeOtherValue</DTS:Property> </DTS:PackageParameter> </DTS:PackageParameters> </DTS:Executable> We are trying to identify the value of the parameter whose name is ETLIfcHist_ID – notice that in the XML document above that value is VALUE_TO_BE_CHANGED. The following regular expression will find the appropriate portion of the XML document: {\<DTS\:PackageParameter[\n ]*DTS\:CreationName="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Description="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DTSID="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:ObjectName="ETLIfcHist_ID"\>[\n ]*\<DTS\:Property[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Name="ParameterValue"\>}[A-Za-z0-9\:_\{\}- ]*{\<\/DTS\:Property\>} I have highlighted the name of the parameter that we’re looking for. I have also highlighted two portions identified by pairs of curly braces “{…}”; these are important because they pick out the two portions either side of the value I want to replace, in other words the portions highlighted here: <DTS:PackageParameters> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="InterfaceHistory_ID: used for Lineage" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}" DTS:ObjectName="ETLIfcHist_ID"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property> </DTS:PackageParameter> Those sections in the curly braces are termed tag expressions and can be identified in the replace expression using a backslash and a number identifying which tag expression you’re referring to according to its ordinal position. Hence, our replace expression is simply: \1-1\2 We’re saying the portion of our file identified by the regular expression should be replaced by the first curly brace section, then the literal –1, then the second curly brace section. Make sense? Give it a go yourself by plugging those two expressions into Visual Studio’s Find and Replace dialog. If you set it to look in “All Open Documents” then you can open up the code-behind of all your packages and change all of them at once. The Find and Replace dialog will look like this: That’s it! I realise that not everyone will be looking to change the value of a parameter but hopefully I have shown you a technique that you can modify to work for your own scenario. Given that this blog post is, y’know, on the web I have no doubt that someone is going to find a fault with my find regex expression and if that person is you….that’s OK. Let me know about it in the comments below and perhaps we can work together to come up with something better! Note that some parameters may have a different set of properties (for example some, but not all, of my parameters have a DTS:Required attribute) so your find regular expression may have to change accordingly. When researching this I found the following article to be invaluable: Visual Studio Find/Replace Regular Expression Usage @Jamiet

Read the article
Exploring the Excel Services REST API

- by jamiet

Over the last few years Analysis Services guru Chris Webb and I have been on something of a crusade to enable better access to data that is locked up in countless Excel workbooks that litter the hard drives of enterprise PCs. The most prominent manifestation of that crusade up to now has been a forum thread that Chris began on Microsoft Answers entitled Excel Web App API? Chris began that thread with: I was wondering whether there was an API for the Excel Web App? Specifically, I was wondering if it was possible (or if it will be possible in the future) to expose data in a spreadsheet in the Excel Web App as an OData feed, in the way that it is possible with Excel Services? Up to recently the last 10 words of that paragraph "in the way that it is possible with Excel Services" had completely washed over me however a comment on my recent blog post Thoughts on ExcelMashup.com (and a rant) by Josh Booker in which Josh said: Excel Services is a service application built for sharepoint 2010 which exposes a REST API for excel documents. We're looking forward to pros like you giving it a try now that Office365 makes sharepoint more easily accessible. Can't wait for your future blog about using REST API to load data from Excel on Offce 365 in SSIS. made me think that perhaps the Excel Services REST API is something I should be looking into and indeed that is what I have been doing over the past few days. And you know what? I'm rather impressed with some of what Excel Services' REST API has to offer. Unfortunately Excel Services' REST API also has one debilitating aspect that renders this blog post much less useful than it otherwise would be; namely that it is not publicly available from the Excel Web App on SkyDrive. Therefore all I can do in this blog post is show you screenshots of what the REST API provides in Sharepoint rather than linking you directly to those REST resources; that's a great shame because one of the benefits of a REST API is that it is easily and ubiquitously demonstrable from a web browser. Instead I am hosting a workbook on Sharepoint in Office 365 because that does include Excel Services' REST API but, again, all I can do is show you screenshots. N.B. If anyone out there knows how to make Office-365-hosted spreadsheets publicly-accessible (i.e. without requiring a username/password) please do let me know (because knowing which forum on which to ask the question is an exercise in futility). In order to demonstrate Excel Services' REST API I needed some decent data and for that I used the World Tourism Organization Statistics Database and Yearbook - United Nations World Tourism Organization dataset hosted on Azure Datamarket (its free, by the way); this dataset "provides comprehensive information on international tourism worldwide and offers a selection of the latest available statistics on international tourist arrivals, tourism receipts and expenditure" and you can explore the data for yourself here. If you want to play along at home by viewing the data as it exists in Excel then it can be viewed here. Let's dive in. The root of Excel Services' REST API is the model resource which resides at: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model Note that this is true for every workbook hosted in a Sharepoint document library - each Excel workbook is a RESTful resource. (Update: Mark Stacey on Twitter tells me that "It's turned off by default in onpremise Sharepoint (1 tickbox to turn on though)". Thanks Mark!) The data is provided as an ATOM feed but I have Firefox's feed reading ability turned on so you don't see the underlying XML goo. As you can see there are four top level resources, Ranges, Charts, Tables and PivotTables; exploring one of those resources is where things get interesting. Let's take a look at the Tables Resource: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables Our workbook contains only one table, called ‘Table1’ (to reiterate, you can explore this table yourself here). Viewing that table via the REST API is pretty easy, we simply append the name of the table onto our previous URI: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1') As you can see, that quite simply gives us a representation of the data in that table. What you cannot see from this screenshot is that this is pure HTML that is being served up; that is all well and good but actually we can do more interesting things. If we specify that the data should be returned not as HTML but as: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1')?$format=image then that data comes back as a pure image and can be used in any web page where you would ordinarily use images. This is the thing that I really like about Excel Services’ REST API – we can embed an image in any web page but instead of being a copy of the data, that image is actually live – if the underlying data in the workbook were to change then hitting refresh will show a new image. Pretty cool, no? The same is true of any Charts or Pivot Tables in your workbook - those can be embedded as images too and if the underlying data changes, boom, the image in your web page changes too. There is a lot of data in the workbook so the image returned by that previous URI is too large to show here so instead let’s take a look at a different resource, this time a range: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15') That URI returns cells A1 to C15 from a worksheet called “Data”: And if we ask for that as an image again: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15')?$format=image Were this image resource not behind a username/password then this would be a live image of the data in the workbook as opposed to one that I had to copy and upload elsewhere. Nonetheless I hope this little wrinkle doesn't detract from the inate value of what I am trying to articulate here; that an existing image in a web page can be changed on-the-fly simply by inserting some data into an Excel workbook. I for one think that that is very cool indeed! I think that's enough in the way of demo for now as this shows what is possible using Excel Services' REST API. Of course, not all features work quite how I would like and here is a bulleted list of some of my more negative feedback: The URIs are pig-ugly. Are "_vti_bin" & "ExcelRest.aspx" really necessary as part of the URI? Would this not be better: http://server/Documents/TourismExpenditureInMillionsOfUSD.xlsx/Model/Tables(‘Table1’) That URI provides the necessary addressability and is a lot easier to remember. Discoverability of these resources is not easy, we essentially have to handcrank a URI ourselves. Take the example of embedding a chart into a blog post - would it not be better if I could browse first through the document library to an Excel workbook and THEN through the workbook to the chart/range/table that I am interested in? Call it a wizard if you like. That would be really cool and would, I am sure, promote this feature and cut down on the copy-and-paste disease that the REST API is meant to alleviate. The resources that I demonstrated can be returned as feeds as well as images or HTML simply by changing the format parameter to ?$format=atom however for some inexplicable reason they don't return OData and no-one on the Excel Services team can tell me why (believe me, I have asked). $format is an OData parameter however other useful parameters such as $top and $filter are not supported. It would be nice if they were. Although I haven't demonstrated it here Excel Services' REST API does provide a makeshift way of altering the data by changing the value of specific cells however what it does not allow you to do is add new data into the workbook. Google Docs allows this and was one of the motivating factors for Chris Webb's forum post that I linked to above. None of this works for Excel workbooks hosted on SkyDrive This blog post is as long as it needs to be for a short introduction so I'll stop now. If you want to know more than I recommend checking out a few links: Excel Services REST API documentation on MSDNSo what does REST on Excel Services look like??? by Shahar PrishExcel Services in SharePoint 2010 REST API Syntax by Christian Stich. Any thoughts? Let's hear them in the comments section below! @Jamiet

Read the article
Exploring the Excel Services REST API

- by jamiet

Over the last few years Analysis Services guru Chris Webb and I have been on something of a crusade to enable better access to data that is locked up in countless Excel workbooks that litter the hard drives of enterprise PCs. The most prominent manifestation of that crusade up to now has been a forum thread that Chris began on Microsoft Answers entitled Excel Web App API? Chris began that thread with: I was wondering whether there was an API for the Excel Web App? Specifically, I was wondering if it was possible (or if it will be possible in the future) to expose data in a spreadsheet in the Excel Web App as an OData feed, in the way that it is possible with Excel Services? Up to recently the last 10 words of that paragraph "in the way that it is possible with Excel Services" had completely washed over me however a comment on my recent blog post Thoughts on ExcelMashup.com (and a rant) by Josh Booker in which Josh said: Excel Services is a service application built for sharepoint 2010 which exposes a REST API for excel documents. We're looking forward to pros like you giving it a try now that Office365 makes sharepoint more easily accessible. Can't wait for your future blog about using REST API to load data from Excel on Offce 365 in SSIS. made me think that perhaps the Excel Services REST API is something I should be looking into and indeed that is what I have been doing over the past few days. And you know what? I'm rather impressed with some of what Excel Services' REST API has to offer. Unfortunately Excel Services' REST API also has one debilitating aspect that renders this blog post much less useful than it otherwise would be; namely that it is not publicly available from the Excel Web App on SkyDrive. Therefore all I can do in this blog post is show you screenshots of what the REST API provides in Sharepoint rather than linking you directly to those REST resources; that's a great shame because one of the benefits of a REST API is that it is easily and ubiquitously demonstrable from a web browser. Instead I am hosting a workbook on Sharepoint in Office 365 because that does include Excel Services' REST API but, again, all I can do is show you screenshots. N.B. If anyone out there knows how to make Office-365-hosted spreadsheets publicly-accessible (i.e. without requiring a username/password) please do let me know (because knowing which forum on which to ask the question is an exercise in futility). In order to demonstrate Excel Services' REST API I needed some decent data and for that I used the World Tourism Organization Statistics Database and Yearbook - United Nations World Tourism Organization dataset hosted on Azure Datamarket (its free, by the way); this dataset "provides comprehensive information on international tourism worldwide and offers a selection of the latest available statistics on international tourist arrivals, tourism receipts and expenditure" and you can explore the data for yourself here. If you want to play along at home by viewing the data as it exists in Excel then it can be viewed here. Let's dive in. The root of Excel Services' REST API is the model resource which resides at: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model Note that this is true for every workbook hosted in a Sharepoint document library - each Excel workbook is a RESTful resource. (Update: Mark Stacey on Twitter tells me that "It's turned off by default in onpremise Sharepoint (1 tickbox to turn on though)". Thanks Mark!) The data is provided as an ATOM feed but I have Firefox's feed reading ability turned on so you don't see the underlying XML goo. As you can see there are four top level resources, Ranges, Charts, Tables and PivotTables; exploring one of those resources is where things get interesting. Let's take a look at the Tables Resource: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables Our workbook contains only one table, called ‘Table1’ (to reiterate, you can explore this table yourself here). Viewing that table via the REST API is pretty easy, we simply append the name of the table onto our previous URI: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1') As you can see, that quite simply gives us a representation of the data in that table. What you cannot see from this screenshot is that this is pure HTML that is being served up; that is all well and good but actually we can do more interesting things. If we specify that the data should be returned not as HTML but as: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1')?$format=image then that data comes back as a pure image and can be used in any web page where you would ordinarily use images. This is the thing that I really like about Excel Services’ REST API – we can embed an image in any web page but instead of being a copy of the data, that image is actually live – if the underlying data in the workbook were to change then hitting refresh will show a new image. Pretty cool, no? The same is true of any Charts or Pivot Tables in your workbook - those can be embedded as images too and if the underlying data changes, boom, the image in your web page changes too. There is a lot of data in the workbook so the image returned by that previous URI is too large to show here so instead let’s take a look at a different resource, this time a range: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15') That URI returns cells A1 to C15 from a worksheet called “Data”: And if we ask for that as an image again: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15')?$format=image Were this image resource not behind a username/password then this would be a live image of the data in the workbook as opposed to one that I had to copy and upload elsewhere. Nonetheless I hope this little wrinkle doesn't detract from the inate value of what I am trying to articulate here; that an existing image in a web page can be changed on-the-fly simply by inserting some data into an Excel workbook. I for one think that that is very cool indeed! I think that's enough in the way of demo for now as this shows what is possible using Excel Services' REST API. Of course, not all features work quite how I would like and here is a bulleted list of some of my more negative feedback: The URIs are pig-ugly. Are "_vti_bin" & "ExcelRest.aspx" really necessary as part of the URI? Would this not be better: http://server/Documents/TourismExpenditureInMillionsOfUSD.xlsx/Model/Tables(‘Table1’) That URI provides the necessary addressability and is a lot easier to remember. Discoverability of these resources is not easy, we essentially have to handcrank a URI ourselves. Take the example of embedding a chart into a blog post - would it not be better if I could browse first through the document library to an Excel workbook and THEN through the workbook to the chart/range/table that I am interested in? Call it a wizard if you like. That would be really cool and would, I am sure, promote this feature and cut down on the copy-and-paste disease that the REST API is meant to alleviate. The resources that I demonstrated can be returned as feeds as well as images or HTML simply by changing the format parameter to ?$format=atom however for some inexplicable reason they don't return OData and no-one on the Excel Services team can tell me why (believe me, I have asked). $format is an OData parameter however other useful parameters such as $top and $filter are not supported. It would be nice if they were. Although I haven't demonstrated it here Excel Services' REST API does provide a makeshift way of altering the data by changing the value of specific cells however what it does not allow you to do is add new data into the workbook. Google Docs allows this and was one of the motivating factors for Chris Webb's forum post that I linked to above. None of this works for Excel workbooks hosted on SkyDrive This blog post is as long as it needs to be for a short introduction so I'll stop now. If you want to know more than I recommend checking out a few links: Excel Services REST API documentation on MSDNSo what does REST on Excel Services look like??? by Shahar PrishExcel Services in SharePoint 2010 REST API Syntax by Christian Stich. Any thoughts? Let's hear them in the comments section below! @Jamiet

Read the article
Service Broker, not ETL

- by jamiet

I have been very quiet on this blog of late and one reason for that is I have been very busy on a client project that I would like to talk about a little here. The client that I have been working for has a website that runs on a distributed architecture utilising a messaging infrastructure for communication between different endpoints. My brief was to build a system that could consume these messages and produce analytical information in near-real-time. More specifically I basically had to deliver a data warehouse however it was the real-time aspect of the project that really intrigued me. This real-time requirement meant that using an Extract transformation, Load (ETL) tool was out of the question and so I had no choice but to write T-SQL code (i.e. stored-procedures) to process the incoming messages and load the data into the data warehouse. This concerned me though – I had no way to control the rate at which data would arrive into the system yet we were going to have end-users querying the system at the same time that those messages were arriving; the potential for contention in such a scenario was pretty high and and was something I wanted to minimise as much as possible. Moreover I did not want the processing of data inside the data warehouse to have any impact on the customer-facing website. As you have probably guessed from the title of this blog post this is where Service Broker stepped in! For those that have not heard of it Service Broker is a queuing technology that has been built into SQL Server since SQL Server 2005. It provides a number of features however the one that was of interest to me was the fact that it facilitates asynchronous data processing which, in layman’s terms, means the ability to process some data without requiring the system that supplied the data having to wait for the response. That was a crucial feature because on this project the customer-facing website (in effect an OLTP system) would be calling one of our stored procedures with each message – we did not want to cause the OLTP system to wait on us every time we processed one of those messages. This asynchronous nature also helps to alleviate the contention problem because the asynchronous processing activity is handled just like any other task in the database engine and hence can wait on another task (such as an end-user query). Service Broker it was then! The stored procedure called by the OLTP system would simply put the message onto a queue and we would use a feature called activation to pick each message off the queue in turn and process it into the warehouse. At the time of writing the system is not yet up to full capacity but so far everything seems to be working OK (touch wood) and crucially our users are seeing data in near-real-time. By near-real-time I am talking about latencies of a few minutes at most and to someone like me who is used to building systems that have overnight latencies that is a huge step forward! So then, am I advocating that you all go out and dump your ETL tools? Of course not, no! What this project has taught me though is that in certain scenarios there may be better ways to implement a data warehouse system then the traditional “load data in overnight” approach that we are all used to. Moreover I have really enjoyed getting to grips with a new technology and even if you don’t want to use Service Broker you might want to consider asynchronous messaging architectures for your BI/data warehousing solutions in the future. This has been a very high level overview of my use of Service Broker and I have deliberately left out much of the minutiae of what has been a very challenging implementation. Nonetheless I hope I have caused you to reflect upon your own approaches to BI and question whether other approaches may be more tenable. All comments and questions gratefully received! Lastly, if you have never used Service Broker before and want to kick the tyres I have provided below a very simple “Service Broker Hello World” script that will create all of the objects required to facilitate Service Broker communications and then send the message “Hello World” from one place to anther! This doesn’t represent a “proper” implementation per se because it doesn’t close down down conversation objects (which you should always do in a real-world scenario) but its enough to demonstrate the capabilities! @Jamiet ----------------------------------------------------------------------------------------------- /*This is a basic Service Broker Hello World app. Have fun! -Jamie */ USE MASTER GO CREATE DATABASE SBTest GO --Turn Service Broker on! ALTER DATABASE SBTest SET ENABLE_BROKER GO USE SBTest GO -- 1) we need to create a message type. Note that our message type is -- very simple and allowed any type of content CREATE MESSAGE TYPE HelloMessage VALIDATION = NONE GO -- 2) Once the message type has been created, we need to create a contract -- that specifies who can send what types of messages CREATE CONTRACT HelloContract (HelloMessage SENT BY INITIATOR) GO --We can query the metadata of the objects we just created SELECT * FROM sys.service_message_types WHERE name = 'HelloMessage'; SELECT * FROM sys.service_contracts WHERE name = 'HelloContract'; SELECT * FROM sys.service_contract_message_usages WHERE service_contract_id IN (SELECT service_contract_id FROM sys.service_contracts WHERE name = 'HelloContract') AND message_type_id IN (SELECT message_type_id FROM sys.service_message_types WHERE name = 'HelloMessage'); -- 3) The communication is between two endpoints. Thus, we need two queues to -- hold messages CREATE QUEUE SenderQueue CREATE QUEUE ReceiverQueue GO --more querying metatda SELECT * FROM sys.service_queues WHERE name IN ('SenderQueue','ReceiverQueue'); --we can also select from the queues as if they were tables SELECT * FROM SenderQueue SELECT * FROM ReceiverQueue -- 4) Create the required services and bind them to be above created queues CREATE SERVICE Sender ON QUEUE SenderQueue CREATE SERVICE Receiver ON QUEUE ReceiverQueue (HelloContract) GO --more querying metadata SELECT * FROM sys.services WHERE name IN ('Receiver','Sender'); -- 5) At this point, we can begin the conversation between the two services by -- sending messages DECLARE @conversationHandle UNIQUEIDENTIFIER DECLARE @message NVARCHAR(100) BEGIN BEGIN TRANSACTION; BEGIN DIALOG @conversationHandle FROM SERVICE Sender TO SERVICE 'Receiver' ON CONTRACT HelloContract WITH ENCRYPTION=OFF -- Send a message on the conversation SET @message = N'Hello, World'; SEND ON CONVERSATION @conversationHandle MESSAGE TYPE HelloMessage (@message) COMMIT TRANSACTION END GO --check contents of queues SELECT * FROM SenderQueue SELECT * FROM ReceiverQueue GO -- Receive a message from the queue RECEIVE CONVERT(NVARCHAR(MAX), message_body) AS MESSAGE FROM ReceiverQueue GO --If no messages were received and/or you can't see anything on the queues you may wish to check the following for clues: SELECT * FROM sys.transmission_queue -- Cleanup DROP SERVICE Sender DROP SERVICE Receiver DROP QUEUE SenderQueue DROP QUEUE ReceiverQueue DROP CONTRACT HelloContract DROP MESSAGE TYPE HelloMessage GO USE MASTER GO DROP DATABASE SBTest GO

Read the article
Investigation: Can different combinations of components effect Dataflow performance?

- by jamiet

Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation! The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test: One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category CSPL Split by Month of Birth and Income Category Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category CSPL Split by Month of Birth 1& 2 Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

Read the article

Search Results

Search found 114 results on 5 pages for 'jamiet'.

Page 5/5 | < Previous Page | 1 2 3 4 5

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

< Previous Page | 1 2 3 4 5