Search Results

Search found 10711 results on 429 pages for 'ghost blog'.

Page 113/429 | < Previous Page | 109 110 111 112 113 114 115 116 117 118 119 120 | Next Page >

Common request: export #Tabular model and data to #PowerPivot

- by Marco Russo (SQLBI)

I received this request in many courses, messages and also forum discussions: having an Analysis Services Tabular model, it would be nice being able to extract a correspondent PowerPivot data model. In order of priority, here are the specific feature people (including me) would like to see: Create an empty PowerPivot workbook with the same data model of a Tabular model Change the connections of the tables in the PowerPivot workbook extracting data from the Tabular data model Every table should have an EVALUATE ‘TableName’ query in DAX Apply a filter to data extracted from every table For example, you might want to extract all data for a single country or year or customer group Using the same technique of applying filter used for role based security would be nice Expose an API to automate the process of creating a PowerPivot workbook Use case: prepare one workbook for every employee containing only its data, that he can use offline Common request for salespeople who want a mini-BI tool to use in front of the customer/lead/supplier, regardless of a connection available This feature would increase the adoption of PowerPivot and Tabular (and, therefore, Business Intelligence licenses instead of Standard), and would probably raise the sales of Office 2013 / Office 365 driven by ISV, who are the companies who requests this feature more. If Microsoft would do this, it would be acceptable it only works on Office 2013. But if a third-party will do that, it will make sense (for their revenues) to cover both Excel 2010 and Excel 2013. Another important reason for this feature is that the “Offline cube” feature that you have in Excel is not available when your PivotTable is connected to a Tabular model, but it can only be used when you connect to Analysis Services Multidimensional. If you think this is an important features, you can vote this Connect item.

Read the article
SolidQ Journal - free SQL goodness for February

- by Greg Low

The SolidQ Journal for February just made it out by the end of February 28th. But again, it's great to see the content appearing. I've included the second part of the article on controlling the execution context of stored procedures. The first part was in December. Also this month, along with Fernando Guerrero's editorial, Analysis Services guru Craig Utley has written about aggregations, Herbert Albert and Gianluca Holz have continued their double-act and described how to automate database migrations,...(read more)

Read the article
Hyper-V for Developers - presentation from NxtGenUG Oxford (including link to more info on Dynamic M

- by Liam Westley

Many thanks to Richard Hopton and the NxtGenUG guys in Oxford for inviting me to talk on Hyper-V for Developers last night, and for Research Machines for providing the venue. It was great to have developers not yet using Hyper-V who were really interested in some of the finer points to help them with specific requirements. For those wanting to follow up on the topics I covered, you can download the presentation deck as either PDF (with speaker notes included) or as the original PowerPoint slidedeck, http://www.tigernews.co.uk/blog-twickers/nxtgenugoxford/HyperV4Devs.pdf http://www.tigernews.co.uk/blog-twickers/nxtgenugoxford/HyperV4Devs.zip I also mentioned the new feature, Dynamic Memory, coming in Service Pack 1, had been presented in a session at TechEd 2010 by Ben Armstrong, and you can download his presentation from here, http://blogs.msdn.com/b/virtual_pc_guy/archive/2010/06/08/talking-about-dynamic-memory.aspx

Read the article
Oracle Product Leader Named a Leader in Gartner MQ for MDM of Product Data Solutions

- by Mala Narasimharajan

Gartner recently Oracle as a leader in the MQ report for MDM of Product Data Solutions. They named Oracle as a leader with the following key points: Strong MDM portfolio covering multiple data domains, industries and use cases Oracle PDH can be a good fit for Oracle EBS customers and can form part of a multidomain solution: Deep MDM of product data functionality Evolving support for information stewardship For more information on the report visit Oracle's Analyst Relations blog at http://blog.us.oracle.com/dimdmar/. To learn more about Oracle's product information solutions for master data management click here.

Read the article
OT: Thank You, Microsoft

- by andyleonard

cross-posted from AndyLeonard.me … Each April 1st for the past five years, I have been honored to receive an email from Microsoft informing me I have been recognized as a SQL Server MVP. Tomorrow will be different. Back in January – when I wrote this – I requested Microsoft not consider me for renewal. I have enjoyed serving as a Microsoft MVP. I only got to see what it is like to be a SQL Server MVP, and I think we are part of a special community that makes being an MVP even more special. I have...(read more)

Read the article
Speaking at SQLSaturday #44 in Huntington Beach, CA (Los Angeles Area)

- by Ben Nevarez

I'll be presenting a session at SQLSaturday #44 in Huntington Beach, the first SQLSaturday on Southern California. The event takes place on Saturday, April 24 at the Golden West College on 15744 Goldenwest St, Huntington Beach, CA 92647.. For more information visit the following link http://sqlsaturday.com/44/eventhome.aspx My session is “How the Query Optimizer Works”. I hope to see you there. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
PASS Business Intelligence Virtual Chapter Upcoming Sessions (November 2013)

- by Sergio Govoni

Let me point out the upcoming live events, dedicated to Business Intelligence with SQL Server, that PASS Business Intelligence Virtual Chapter has scheduled for November 2013. The "Accidental Business Intelligence Project Manager"Date: Thursday 7th November - 8:00 PM GMT / 3:00 PM EST / Noon PSTSpeaker: Jen StirrupURL: https://attendee.gotowebinar.com/register/5018337449405969666 You've watched the Apprentice with Donald Trump and Lord Alan Sugar. You know that the Project Manager is usually the one gets firedYou've heard that Business Intelligence projects are prone to failureYou know that a quick Bing search for "why do Business Intelligence projects fail?" produces a search result of 25 million hits!Despite all this… you're now Business Intelligence Project Manager – now what do you do?In this session, Jen will provide a "sparks from the anvil" series of steps and working practices in Business Intelligence Project Management. What about waterfall vs agile? What is a Gantt chart anyway? Is Microsoft Project your friend or a problematic aspect of being a BI PM? Jen will give you some ideas and insights that will help you set your BI project right: assess priorities, avoid conflict, empower the BI team and generally deliver the Business Intelligence project successfully! Dimensional Modelling Design Patterns: Beyond BasicsDate: Tuesday 12th November - Noon AEDT / 1:00 AM GMT / Monday 11th November 5:00 PM PSTSpeaker: Jason Horner, Josh Fennessy and friendsURL: https://attendee.gotowebinar.com/register/852881628115426561 This session will provide a deeper dive into the art of dimensional modeling. We will look at the different types of fact tables and dimension tables, how and when to use them. We will also some approaches to creating rich hierarchies that make reporting a snap. This session promises to be very interactive and engaging, bring your toughest Dimensional Modeling quandaries. Data Vault Data Warehouse ArchitectureDate: Tuesday 19th November - 4:00 PM PST / 7 PM EST / Wednesday 20th November 11:00 PM AEDTSpeaker: Jeff Renz and Leslie WeedURL: https://attendee.gotowebinar.com/register/1571569707028142849 Data vault is a compelling architecture for an enterprise data warehouse using SQL Server 2012. A well designed data vault data warehouse facilitates fast, efficient and maintainable data integration across business systems. In this session Leslie and I will review the basics about enterprise data warehouse design, introduce you to the data vault architecture and discuss how you can leverage new features of SQL Server 2012 help make your data warehouse solution provide maximum value to your users.

Read the article
Could the rel="author" just be a username?

- by Gkhan14

I want to use rel="author", however the type of blog I run is about a game, and doesn't relate to my real identity. I'm more known for my screen name, so would this still be okay to use for the rel="author" tag? For example, if my Google+ account is for my user, and not for myself, could I still use it within the rel="author" tag? I don't want to get penalized in any sort of way. My main reason to do this, is to improve click through rate, and just make my blog post sections look better in the searches.

Read the article
T-SQL User-Defined Functions: the good, the bad, and the ugly (part 1)

- by Hugo Kornelis

So you thought that encapsulating code in user-defined functions for easy reuse is a good idea? Think again! SQL Server supports three types of user-defined functions. Only one of them qualifies as good. The other two – well, the title says it all, doesn’t it? The bad: scalar functions A scalar user-defined function (UDF) is very much like a stored procedure, except that it always returns a single value of a predefined data type – and because of that property, it isn’t invoked with an EXECUTE statement,...(read more)

Read the article
PowerShell - grabbing values out of the registry and running them

- by Rob Farley

So I closed an application that runs when Windows starts up, but it doesn’t have a Start Menu entry, and I was trying to find it. Ok, I could’ve run regedit.exe, navigated through the tree and found the list of things that run when Windows starts up, but I thought I’d use PowerShell instead. PowerShell presents the registry as if it’s a volume on a disk, and you can navigate around it using commands like cd and dir. It wasn’t hard to find the folder I knew I was after – tab completion (starting the word and then hitting the Tab key) was a friend here. But unfortunately dir doesn’t list values, only subkeys (which look like folders). PS C:\Windows\system32> dir HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Run PS C:\Windows\system32> Instead, I needed to use Get-Item to fetch the ‘Run’ key, and use its Property property. This listed the values in there for me, as an array of strings (I could work this out using Get-Member). PS C:\Windows\system32> (Get-Item HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Run).Property QuickSet SynTPEnh Zune Launcher PS C:\Windows\system32> Ok, so the thing I wanted wasn’t in there (an app called PureText, whicih lets me Paste As Text using Windows+V). That’s ok – a simple change to use HKCU instead of HKLM (Current User instead of Local Machine), and I found it. Now to fetch the details of the application itself, using the RegistryKey method GetValue PS C:\Windows\system32> (Get-Item HKCU:\SOFTWARE\Microsoft\Windows\CurrentVersion\Run).GetValue('PureText') "C:\Users\Rob\Utilities\PureText.exe" PS C:\Windows\system32> And finally, surrounding it in a bit of code to execute that command. That needs an ampersand and the Invoke-Expression cmdlet. PS C:\Windows\system32> '& ' + (Get-Item HKCU:\SOFTWARE\Microsoft\Windows\CurrentVersion\Run).GetValue('PureText') | Invoke-Expression A simple bit of exploring PowerShell which will makes for a much easier way of finding and running those apps which start with Windows.

Read the article
More Tables or More Databases?

- by BuckWoody

I got an e-mail from someone that has an interesting situation. He has 15,000 customers, and he asks if he should have a database for their data per customer. Without a LOT more data it’s impossible to say, of course, but there are some general concepts to keep in mind. Whenever you’re segmenting data, it’s all about boundary choices. You have not only boundaries around how big the data will get, but things like how many objects (tables, stored procedures and so on) that will be involved, if there are any cross-sections of data (do they share location or product information) and – very important – what are the security requirements? From the answer to these types of questions, you now have the choice of making multiple tables in a single database, or using multiple databases. A database carries some overhead – it needs a certain amount of memory for locking and so on. But it has a very clean boundary – everything from objects to security can be kept apart. Having multiple users in the same database is possible as well, using things like a Schema. But keeping 15,000 schemas can be challenging as well. My recommendation in complex situations like this is similar to a post on decisions that I did earlier – I lay out the choices on a spreadsheet in rows, and then my requirements at the top in the columns. I give each choice a number based on how well it meets each requirement. At the end, the highest number wins. And many times it’s a mix – perhaps this person could segment customers into larger regions or districts or products, in a database. Within that database might be multiple schemas for the customers. Of course, he needs to query across all customers, that becomes another requirement. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Real tortoises keep it slow and steady. How about the backups?

- by Maria Zakourdaev

… Four tortoises were playing in the backyard when they decided they needed hibiscus flower snacks. They pooled their money and sent the smallest tortoise out to fetch the snacks. Two days passed and there was no sign of the tortoise. "You know, she is taking a lot of time", said one of the tortoises. A little voice from just out side the fence said, "If you are going to talk that way about me I won't go." Is it too much to request from the quite expensive 3rd party backup tool to be a way faster than the SQL server native backup? Or at least save a respectable amount of storage by producing a really smaller backup files? By saying “really smaller”, I mean at least getting a file in half size. After Googling the internet in an attempt to understand what other “sql people” are using for database backups, I see that most people are using one of three tools which are the main players in SQL backup area: LiteSpeed by Quest SQL Backup by Red Gate SQL Safe by Idera The feedbacks about those tools are truly emotional and happy. However, while reading the forums and blogs I have wondered, is it possible that many are accustomed to using the above tools since SQL 2000 and 2005. This can easily be understood due to the fact that a 300GB database backup for instance, using regular a SQL 2005 backup statement would have run for about 3 hours and have produced ~150GB file (depending on the content, of course). Then you take a 3rd party tool which performs the same backup in 30 minutes resulting in a 30GB file leaving you speechless, you run to management persuading them to buy it due to the fact that it is definitely worth the price. In addition to the increased speed and disk space savings you would also get backup file encryption and virtual restore - features that are still missing from the SQL server. But in case you, as well as me, don’t need these additional features and only want a tool that performs a full backup MUCH faster AND produces a far smaller backup file (like the gain you observed back in SQL 2005 days) you will be quite disappointed. SQL Server backup compression feature has totally changed the market picture. Medium size database. Take a look at the table below, check out how my SQL server 2008 R2 compares to other tools when backing up a 300GB database. It appears that when talking about the backup speed, SQL 2008 R2 compresses and performs backup in similar overall times as all three other tools. 3rd party tools maximum compression level takes twice longer. Backup file gain is not that impressive, except the highest compression levels but the price that you pay is very high cpu load and much longer time. Only SQL Safe by Idera was quite fast with it’s maximum compression level but most of the run time have used 95% cpu on the server. Note that I have used two types of destination storage, SATA 11 disks and FC 53 disks and, obviously, on faster storage have got my backup ready in half time. Looking at the above results, should we spend money, bother with another layer of complexity and software middle-man for the medium sized databases? I’m definitely not going to do so. Very large database As a next phase of this benchmark, I have moved to a 6 terabyte database which was actually my main backup target. Note, how multiple files usage enables the SQL Server backup operation to use parallel I/O and remarkably increases it’s speed, especially when the backup device is heavily striped. SQL Server supports a maximum of 64 backup devices for a single backup operation but the most speed is gained when using one file per CPU, in the case above 8 files for a 2 Quad CPU server. The impact of additional files is minimal. However, SQLsafe doesn’t show any speed improvement between 4 files and 8 files. Of course, with such huge databases every half percent of the compression transforms into the noticeable numbers. Saving almost 470GB of space may turn the backup tool into quite valuable purchase. Still, the backup speed and high CPU are the variables that should be taken into the consideration. As for us, the backup speed is more critical than the storage and we cannot allow a production server to sustain 95% cpu for such a long time. Bottomline, 3rd party backup tool developers, we are waiting for some breakthrough release. There are a few unanswered questions, like the restore speed comparison between different tools and the impact of multiple backup files on restore operation. Stay tuned for the next benchmarks. Benchmark server: SQL Server 2008 R2 sp1 2 Quad CPU Database location: NetApp FC 15K Aggregate 53 discs Backup statements: No matter how good that UI is, we need to run the backup tasks from inside of SQL Server Agent to make sure they are covered by our monitoring systems. I have used extended stored procedures (command line execution also is an option, I haven’t noticed any impact on the backup performance). SQL backup LiteSpeed SQL Backup SQL safe backup database <DBNAME> to disk= '\\<networkpath>\par1.bak' , disk= '\\<networkpath>\par2.bak', disk= '\\<networkpath>\par3.bak' with format, compression EXECUTE master.dbo.xp_backup_database @database = N'<DBName>', @backupname= N'<DBName> full backup', @desc = N'Test', @compressionlevel=8, @filename= N'\\<networkpath>\par1.bak', @filename= N'\\<networkpath>\par2.bak', @filename= N'\\<networkpath>\par3.bak', @init = 1 EXECUTE master.dbo.sqlbackup '-SQL "BACKUP DATABASE <DBNAME> TO DISK= ''\\<networkpath>\par1.sqb'', DISK= ''\\<networkpath>\par2.sqb'', DISK= ''\\<networkpath>\par3.sqb'' WITH DISKRETRYINTERVAL = 30, DISKRETRYCOUNT = 10, COMPRESSION = 4, INIT"' EXECUTE master.dbo.xp_ss_backup @database = 'UCMSDB', @filename = '\\<networkpath>\par1.bak', @backuptype = 'Full', @compressionlevel = 4, @backupfile = '\\<networkpath>\par2.bak', @backupfile = '\\<networkpath>\par3.bak' If you still insist on using 3rd party tools for the backups in your production environment with maximum compression level, you will definitely need to consider limiting cpu usage which will increase the backup operation time even more: RedGate : use THREADPRIORITY option ( values 0 – 6 ) LiteSpeed : use @throttle ( percentage, like 70%) SQL safe : the only thing I have found was @Threads option. Yours, Maria

Read the article
Data Education: Great Classes Coming to a City Near You

- by Adam Machanic

In case you haven't noticed, Data Education (the training company I started a couple of years ago) has expanded beyond the US northeast; we're currently offering courses with top trainers in both St. Louis and Chicago , as well as the Boston area. The courses are starting to fill up fast—not surprising when you consider we’re talking about experienced instructors like Kalen Delaney , Rob Farley , and Allan Hirt —but we have still have some room. We’re very excited about bringing the highest quality...(read more)

Read the article
C++ Accelerated Massive Parallelism

- by Daniel Moth

At AMD's Fusion conference Herb Sutter announced in his keynote session a technology that our team has been working on that we call C++ Accelerated Massive Parallelism (C++ AMP) and during the keynote I showed a brief demo of an app built with our technology. After the keynote, I go deeper into the technology in my breakout session. If you read both those abstracts, you'll get some information about what C++ AMP is, without being too explicit since we published the abstracts before the technology was announced. You can find the official online announcement at Soma's blog post. Here, I just wanted to capture the key points about C++ AMP that can serve as an introduction and an FAQ. So, in no particular order… C++ AMP lowers the barrier to entry for heterogeneous hardware programmability and brings performance to the mainstream, without sacrificing developer productivity or solution portability. is designed not only to help you address today's massively parallel hardware (i.e. GPUs and APUs), but it also future proofs your code investments with a forward looking design. is part of Visual C++. You don't need to use a different compiler or learn different syntax. is modern C++. Not C or some other derivative. is integrated and supported fully in Visual Studio vNext. Editing, building, debugging, profiling and all the other goodness of Visual Studio work well with C++ AMP. provides an STL-like library as part of the existing concurrency namespace and delivered in the new amp.h header file. makes it extremely easy to work with large multi-dimensional data on heterogeneous hardware; in a manner that exposes parallelization. introduces only one core C++ language extension. builds on DirectX (and DirectCompute in particular) which offers a great hardware abstraction layer that is ubiquitous and reliable. The architecture is such, that this point can be thought of as an implementation detail that does not surface to the API layer. Stay tuned on my blog for more over the coming months where I will switch from just talking about C++ AMP to showing you how to use the API with code examples… Comments about this post welcome at the original blog.

Read the article
Reproducing a Conversion Deadlock

- by Alexander Kuznetsov

Even if two processes compete on only one resource, they still can embrace in a deadlock. The following scripts reproduce such a scenario. In one tab, run this: CREATE TABLE dbo.Test ( i INT ) ; GO INSERT INTO dbo.Test ( i ) VALUES ( 1 ) ; GO SET TRANSACTION ISOLATION LEVEL SERIALIZABLE ; BEGIN TRAN SELECT i FROM dbo.Test ; --UPDATE dbo.Test SET i=2 ; After this script has completed, we have an outstanding transaction holding a shared lock. In another tab, let us have that another connection have...(read more)

Read the article
Security-related database settings are not restored when a DB is restored

- by Greg Low

A question came up today about whether it was a bug that the TRUSTWORTHY database setting isn't restored to its previous value when a database is restored. TRUSTWORTHY is a very powerful setting for a database. By design, it's not restored when a database is. We actually documented this behavior when writing the Upgrade Technical Reference for 2008: http://www.microsoft.com/downloads/en/confirmation.aspx?familyId=66d3e6f5-6902-4fdd-af75-9975aea5bea7&displayLang=en The other settings that are...(read more)

Read the article
Export all SSIS packages from msdb using Powershell

- by jamiet

Have you ever wanted to dump all the SSIS packages stored in msdb out to files? Of course you have, who wouldn’t? Right? Well, at least one person does because this was the subject of a thread (save all ssis packages to file) on the SSIS forum earlier today. Some of you may have already figured out a way of doing this but for those that haven’t here is a nifty little script that will do it for you and it uses our favourite jack-of-all tools … Powershell!! Imagine I have the following package folder structure on my Integration Services server (i.e. in [msdb]): There are two packages in there called “20110111 Chaining Expression components” & “Package”, I want to export those two packages into a folder structure that mirrors that in [msdb]. Here is the Powershell script that will do that: Param($SQLInstance = "localhost") #####Add all the SQL goodies (including Invoke-Sqlcmd)##### add-pssnapin sqlserverprovidersnapin100 -ErrorAction SilentlyContinue add-pssnapin sqlservercmdletsnapin100 -ErrorAction SilentlyContinue cls $Packages = Invoke-Sqlcmd -MaxCharLength 10000000 -ServerInstance $SQLInstance -Query "WITH cte AS ( SELECT cast(foldername as varchar(max)) as folderpath, folderid FROM msdb..sysssispackagefolders WHERE parentfolderid = '00000000-0000-0000-0000-000000000000' UNION ALL SELECT cast(c.folderpath + '\' + f.foldername as varchar(max)), f.folderid FROM msdb..sysssispackagefolders f INNER JOIN cte c ON c.folderid = f.parentfolderid ) SELECT c.folderpath,p.name,CAST(CAST(packagedata AS VARBINARY(MAX)) AS VARCHAR(MAX)) as pkg FROM cte c INNER JOIN msdb..sysssispackages p ON c.folderid = p.folderid WHERE c.folderpath NOT LIKE 'Data Collector%'" Foreach ($pkg in $Packages) { $pkgName = $Pkg.name $folderPath = $Pkg.folderpath $fullfolderPath = "c:\temp\$folderPath\" if(!(test-path -path $fullfolderPath)) { mkdir $fullfolderPath | Out-Null } $pkg.pkg | Out-File -Force -encoding ascii -FilePath "$fullfolderPath\$pkgName.dtsx" } To run it simply change the “localhost” parameter of the server you want to connect to either by editing the script or passing it in when the script is executed. It will create the folder structure in C:\Temp (which you can also easily change if you so wish – just edit the script accordingly). Here’s the folder structure that it created for me: Notice how it is a mirror of the folder structure in [msdb]. Hope this is useful! @Jamiet

Read the article
Write DAX queries in Report Builder #ssrs #dax #ssas #tabular

- by Marco Russo (SQLBI)

If you use Report Builder with Reporting Services, you can use DAX queries even if the editor for Analysis Services provider does not support DAX syntax. In fact, the DMX editor that you can use in Visual Studio editor of Reporting Services (see a previous post on that), is not available in Report Builder. However, as Sagar Salvi commented in this Microsoft Connect entry, you can use the DAX query text in the query of a Dataset by using the OLE DB provider instead of the Analysis Services one. I think it’s a good idea to show the steps required. First, create a DataSet using the OLE DB connection type, and provide the connection string the provider (Provider), the server name (Data Source) and the database name (Initial Catalog), such as: Provider=MSOLAP;Data Source=SERVERNAME\\TABULAR;Initial Catalog=AdventureWorks Tabular Model SQL 2012 Then, create a Dataset using the data source previously defined, select the Text query type, and write the DAX code in the Query pane: You can also use the Query Designer window, that doesn’t provide any particular help in writing the DAX query, but at least can show a preview of the result of the query execution. I hope DAX will get better editors in the future… in the meantime, remember you can use DAX Studio to write and test your DAX queries, and DAX Formatter to improve their readability!If you want to learn the DAX Query Language, I suggest you watching my video Data Analysis Expressions as a Query Language on Project Botticelli!

Read the article
NomCom Time

- by RickHeiges

Well, it is official... there is a race for the community seats on the PASS NomCom. I am very pleased to see that we have 12 people who decided to put their names forward for this task. This is largely a thankless job that takes a great deal of time, judgement, and consideration. I have put my name forward as one of those people who would like to take on this task and serve PASS (and the greater SQL Community) in this effort. You can find out more about me and the other candidates for the NomCom...(read more)

Read the article
Fun with Aggregates

- by Paul White

There are interesting things to be learned from even the simplest queries. For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join. The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units. The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too? To answer that, let’s look at some other ways to formulate this query. This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones. The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above. It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join! The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join. In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting. The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set. In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL). To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709; -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case. The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate. This is something I would like to see exposed in show plan so I suggested it on Connect. Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all. Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans. Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :) The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier. What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too). Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places. It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates. As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

Read the article
24 Hours of PASS - Today!

- by andyleonard

24 Hours of PASS (2010) starts this morning! Tune in! :{> Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!...(read more)

Read the article
Presenting Database Design for Developers for PASS AppDev - Today!

- by andyleonard

I am honored to present Database Design for Developers to the PASS AppDev Virtual Chapter today at noon EDT via LiveMeeting! Details here . :{> Andy Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!...(read more)

Read the article
My favourite feature of SQL 2008 R2

- by Rob Farley

Interestingly, my favourite new feature of SQL Server 2008 R2 isn’t any of the obvious things. You may have read my recent posts about how much I like some of the new Reporting Services features, such as the map control. Or you may have seen my presentation at SQLBits V on StreamInsight, which I think has great potential to change the way many applications handle data (by allowing easier querying of data before it even reaches the database). Next week the Adelaide SQL Server User Group has...(read more)

Read the article
Use Those Schemas, People!

- by BuckWoody

Database Schemas are just containers – they aren’t users or anything else – think of a sub-directory on the hard drive. In early versions of SQL Server we “hid” schemas, placing all objects under “dbo”, which gave the erroneous perception that Schemas are users. In SQL Server 2005, we “un-hid” or re-introduced schemas within the database. Users can have a default schema (a place where their new objects go), you can add new schemas and transfer objects between them, and they have many other benefits. But I still see a lot of applications, developed by shops I know as well as vendors, that don’t make use of a Schema. Everything is piled under dbo. I completely understand this – since permissions can be granted to a schema, they feel a lot like a user, so it’s just easier not to worry about both users and schemas when you create a database. But if you’ll use them properly you can make your application more understandable and portable. You should at least take a few minutes and read more about them – you owe it to your users: http://msdn.microsoft.com/en-us/library/ms190387.aspx Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
SQL Server 2014 – delayed transaction durability

- by Michael Zilberstein

As I’m downloading SQL Server 2014 CTP2 at this very moment, I’ve noticed new fascinating feature that hadn’t been announced in CTP1 : delayed transaction durability . It means that if your system is heavy on writes and on another hand you can tolerate data loss on some rare occasions – you can consider declaring transaction as DELAYED_DURABILITY = ON . In this case transaction would be committed when log is written to some buffer in memory – not to disk as usual. This way transactions can become...(read more)

Read the article

< Previous Page | 109 110 111 112 113 114 115 116 117 118 119 120 | Next Page >