Search Results

Search found 58774 results on 2351 pages for 'ssis data tranformations'.

Page 4/2351 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Big Data – Beginning Big Data Series Next Month in 21 Parts

- by Pinal Dave

Big Data is the next big thing. There was a time when we used to talk in terms of MB and GB of the data. However, the industry is changing and we are now moving to a conversation where we discuss about data in Petabyte, Exabyte and Zettabyte. It seems that the world is now talking about increased Volume of the data. In simple world we all think that Big Data is nothing but plenty of volume. In reality Big Data is much more than just a huge volume of the data. When talking about the data we need to understand about variety and volume along with volume. Though Big data look like a simple concept, it is extremely complex subject when we attempt to start learning the same. My Journey I have recently presented on Big Data in quite a few organizations and I have received quite a few questions during this roadshow event. I have collected all the questions which I have received and decided to post about them on the blog. In the month of October 2013, on every weekday we will be learning something new about Big Data. Every day I will share a concept/question and in the same blog post we will learn the answer of the same. Big Data – Plenty of Questions I received quite a few questions during my road trip. Here are few of the questions. I want to learn Big Data – where should I start? Do I need to know SQL to learn Big Data? What is Hadoop? There are so many organizations talking about Big Data, and every one has a different approach. How to start with big Data? Do I need to know Java to learn about Big Data? What is different between various NoSQL languages. I will attempt to answer most of the questions during the month long series in the next month. Big Data – Big Subject Big Data is a very big subject and I no way claim that I will be covering every single big data concept in this series. However, I promise that I will be indeed sharing lots of basic concepts which are revolving around Big Data. We will discuss from fundamentals about Big Data and continue further learning about it. I will attempt to cover the concept so simple that many of you might have wondered about it but afraid to ask. Your Role! During this series next month, I need your one help. Please keep on posting questions you might have related to big data as blog post comments and on Facebook Page. I will monitor them closely and will try to answer them as well during this series. Now make sure that you do not miss any single blog post in this series as every blog post will be linked to each other. You can subscribe to my feed or like my Facebook page or subscribe via email (by entering email in the blog post). Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Big Data, PostADay, SQL, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SSIS Reporting Pack v0.4 – Execution Report updated

- by jamiet

SSIS Reporting Pack is a suite of reports that I maintain at http://ssisreportingpack.codeplex.com/ that provide visualisation over the SSIS Catalog in SQL Server 2012 and attempt to add value over the reports that ship in the box. Work on the reports has stalled (my last SSIS Reporting Pack blog post was on 4th September 2011) as I’ve had rather more important things going on my life of late however I have recently checked-in a fix that couldn’t really be delayed. I discovered a problem with the Execution report that was causing the report to effectively hang, it was caused by this bit of SQL hidden away in the report definition: [generated_executables] AS ( SELECT [new_executable].[execution_path],[new_executable].[parent_execution_path] FROM ( SELECT [execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_close_square] + 1) , [parent_execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_open_square]) FROM ( SELECT [execution_path] , [char_index_open_square] = CHARINDEX('[',REVERSE([execution_path]),1) , [char_index_close_square] = CHARINDEX(']',REVERSE([execution_path]),1) , [length_exec_path] = LEN([execution_path]) FROM [exec_stats] es WHERE execution_path LIKE '%\[%]%' ESCAPE '\' )AS [loop_iteration] ) AS [new_executable] GROUP BY [new_executable].[execution_path],[new_executable].[parent_execution_path]) It was there because SSIS does not currently treat a loop iteration as an executable yet I figured there was still value in being able to view it as such – this SQL essentially “invents” new executables for those loop iterations; its what enabled the following visualisation: where each of the three iterations of a For Each Loop called “FEL Loop over top performing regions” appear in the report. Unfortunately, as I alluded, this could under certain circumstances (most likely when there were many loop iterations) cause the report to hang as it waited for the results to be constructed and returned. The change that I have made eradicates this generation of “fake” executables and thus produces this visualisation instead: Notice that the three “children” of the For Each Loop are no longer the three iterations but actually the task (“EPT Call Data Export Package”) contained within that For Each Loop. The problem here is of course that there is no longer a visual distinction between those three iterations; I have instead made the full execution path viewable via a tooltip: If you preferred the “old” way of presenting this information and are happy to put up with the performance degradation then I have kept the old version of the report hanging around in the reporting pack as “execution loop with iterations” however none of the other reports link to it so you will have to browse to it manually if you want to use it. Please let me know if you ARE using it – I would be very interested to hear about your experiences. The last change to make you aware of in the execution report is that by default I no longer show OnPreValidate or OnPostValidate messages as I consider them to be superfluous and only serve to clutter up the results. If you want to put them back, well, its open source so go right ahead! The latest release of SSIS Reporting Pack that contains all of these changes is v0.4 and can be downloaded from http://ssisreportingpack.codeplex.com/releases/view/88178 Feedback on all of the above changes would be very much appreciated. @Jamiet

Read the article
SSIS Reporting Pack v0.4 – Execution Report updated

- by jamiet

SSIS Reporting Pack is a suite of reports that I maintain at http://ssisreportingpack.codeplex.com/ that provide visualisation over the SSIS Catalog in SQL Server 2012 and attempt to add value over the reports that ship in the box. Work on the reports has stalled (my last SSIS Reporting Pack blog post was on 4th September 2011) as I’ve had rather more important things going on my life of late however I have recently checked-in a fix that couldn’t really be delayed. I discovered a problem with the Execution report that was causing the report to effectively hang, it was caused by this bit of SQL hidden away in the report definition: [generated_executables] AS ( SELECT [new_executable].[execution_path],[new_executable].[parent_execution_path] FROM ( SELECT [execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_close_square] + 1) , [parent_execution_path] = SUBSTRING([loop_iteration].[execution_path] ,1, [loop_iteration].length_exec_path - [loop_iteration].[char_index_open_square]) FROM ( SELECT [execution_path] , [char_index_open_square] = CHARINDEX('[',REVERSE([execution_path]),1) , [char_index_close_square] = CHARINDEX(']',REVERSE([execution_path]),1) , [length_exec_path] = LEN([execution_path]) FROM [exec_stats] es WHERE execution_path LIKE '%\[%]%' ESCAPE '\' )AS [loop_iteration] ) AS [new_executable] GROUP BY [new_executable].[execution_path],[new_executable].[parent_execution_path]) It was there because SSIS does not currently treat a loop iteration as an executable yet I figured there was still value in being able to view it as such – this SQL essentially “invents” new executables for those loop iterations; its what enabled the following visualisation: where each of the three iterations of a For Each Loop called “FEL Loop over top performing regions” appear in the report. Unfortunately, as I alluded, this could under certain circumstances (most likely when there were many loop iterations) cause the report to hang as it waited for the results to be constructed and returned. The change that I have made eradicates this generation of “fake” executables and thus produces this visualisation instead: Notice that the three “children” of the For Each Loop are no longer the three iterations but actually the task (“EPT Call Data Export Package”) contained within that For Each Loop. The problem here is of course that there is no longer a visual distinction between those three iterations; I have instead made the full execution path viewable via a tooltip: If you preferred the “old” way of presenting this information and are happy to put up with the performance degradation then I have kept the old version of the report hanging around in the reporting pack as “execution loop with iterations” however none of the other reports link to it so you will have to browse to it manually if you want to use it. Please let me know if you ARE using it – I would be very interested to hear about your experiences. The last change to make you aware of in the execution report is that by default I no longer show OnPreValidate or OnPostValidate messages as I consider them to be superfluous and only serve to clutter up the results. If you want to put them back, well, its open source so go right ahead! The latest release of SSIS Reporting Pack that contains all of these changes is v0.4 and can be downloaded from http://ssisreportingpack.codeplex.com/releases/view/88178 Feedback on all of the above changes would be very much appreciated. @Jamiet

Read the article
Working with Decimal fields in SSIS

- by CoffeeAddict

I'm using SQL Server 2008 w/SP2. I've got an incoming decimal(9,2) field incoming through my OLE DB transformation to my recordset destination transformation. It's like it's reading it as something other than a decimal? I don't know..I'm not an SSIS guru. So continuing on...the problem I have starts here with me trying to stuff the value into a variable for this decimal field. In a foreach loop, I have a variable to represent this decimal field so I can work with it. The first problem that I believe is pretty well known is SSIS variables do not have a decimal type. And from my own testing and what I've read out there, people are using type object for the variable to make SSIS "happy" with decimal values? It makes mine happy. But, then in my foreach loop, I have a for loop. And inside that I'm using an E*xecute SQL Task transformation*. In it, I need to create a parameter mapping to my variable so I can work with that decimal field in my T-SQL call in here. So now I see a type decimal for the parameter and use it and set that to point to my variable. When I run SSIS and it hits my SQL call, I get this in my output window.: The type is not supported.DBTYPE_DECIMAL So I am hitting a wall here. All I wanna do is work with a decimal!!!

Read the article
Big Data – Role of Cloud Computing in Big Data – Day 11 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the NewSQL. In this article we will understand the role of Cloud in Big Data Story What is Cloud? Cloud is the biggest buzzword around from last few years. Everyone knows about the Cloud and it is extremely well defined online. In this article we will discuss cloud in the context of the Big Data. Cloud computing is a method of providing a shared computing resources to the application which requires dynamic resources. These resources include applications, computing, storage, networking, development and various deployment platforms. The fundamentals of the cloud computing are that it shares pretty much share all the resources and deliver to end users as a service. Examples of the Cloud Computing and Big Data are Google and Amazon.com. Both have fantastic Big Data offering with the help of the cloud. We will discuss this later in this blog post. There are two different Cloud Deployment Models: 1) The Public Cloud and 2) The Private Cloud Public Cloud Public Cloud is the cloud infrastructure build by commercial providers (Amazon, Rackspace etc.) creates a highly scalable data center that hides the complex infrastructure from the consumer and provides various services. Private Cloud Private Cloud is the cloud infrastructure build by a single organization where they are managing highly scalable data center internally. Here is the quick comparison between Public Cloud and Private Cloud from Wikipedia: Public Cloud Private Cloud Initial cost Typically zero Typically high Running cost Unpredictable Unpredictable Customization Impossible Possible Privacy No (Host has access to the data Yes Single sign-on Impossible Possible Scaling up Easy while within defined limits Laborious but no limits Hybrid Cloud Hybrid Cloud is the cloud infrastructure build with the composition of two or more clouds like public and private cloud. Hybrid cloud gives best of the both the world as it combines multiple cloud deployment models together. Cloud and Big Data – Common Characteristics There are many characteristics of the Cloud Architecture and Cloud Computing which are also essentially important for Big Data as well. They highly overlap and at many places it just makes sense to use the power of both the architecture and build a highly scalable framework. Here is the list of all the characteristics of cloud computing important in Big Data Scalability Elasticity Ad-hoc Resource Pooling Low Cost to Setup Infastructure Pay on Use or Pay as you Go Highly Available Leading Big Data Cloud Providers There are many players in Big Data Cloud but we will list a few of the known players in this list. Amazon Amazon is arguably the most popular Infrastructure as a Service (IaaS) provider. The history of how Amazon started in this business is very interesting. They started out with a massive infrastructure to support their own business. Gradually they figured out that their own resources are underutilized most of the time. They decided to get the maximum out of the resources they have and hence they launched their Amazon Elastic Compute Cloud (Amazon EC2) service in 2006. Their products have evolved a lot recently and now it is one of their primary business besides their retail selling. Amazon also offers Big Data services understand Amazon Web Services. Here is the list of the included services: Amazon Elastic MapReduce – It processes very high volumes of data Amazon DynammoDB – It is fully managed NoSQL (Not Only SQL) database service Amazon Simple Storage Services (S3) – A web-scale service designed to store and accommodate any amount of data Amazon High Performance Computing – It provides low-tenancy tuned high performance computing cluster Amazon RedShift – It is petabyte scale data warehousing service Google Though Google is known for Search Engine, we all know that it is much more than that. Google Compute Engine – It offers secure, flexible computing from energy efficient data centers Google Big Query – It allows SQL-like queries to run against large datasets Google Prediction API – It is a cloud based machine learning tool Other Players Besides Amazon and Google we also have other players in the Big Data market as well. Microsoft is also attempting Big Data with the Cloud with Microsoft Azure. Additionally Rackspace and NASA together have initiated OpenStack. The goal of Openstack is to provide a massively scaled, multitenant cloud that can run on any hardware. Thing to Watch The cloud based solutions provides a great integration with the Big Data’s story as well it is very economical to implement as well. However, there are few things one should be very careful when deploying Big Data on cloud solutions. Here is a list of a few things to watch: Data Integrity Initial Cost Recurring Cost Performance Data Access Security Location Compliance Every company have different approaches to Big Data and have different rules and regulations. Based on various factors, one can implement their own custom Big Data solution on a cloud. Tomorrow In tomorrow’s blog post we will discuss about various Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
ODBC in SSIS 2012

- by jamiet

In August 2011 the SQL Server client team published a blog post entitled Microsoft is Aligning with ODBC for Native Relational Data Access in which they basically said "OLE DB is the past, ODBC is the future. Deal with it.". From that blog post:We encourage you to adopt ODBC in the development of your new and future versions of your application. You don’t need to change your existing applications using OLE DB, as they will continue to be supported on Denali throughout its lifecycle. While this gives you a large window of opportunity for changing your applications before the deprecation goes into effect, you may want to consider migrating those applications to ODBC as a part of your future roadmap.I recently undertook a project using SSIS2012 and heeded that advice by opting to use ODBC Connection Managers rather than OLE DB Connection Managers. Unfortunately my finding was that the ODBC Connection Manager is not yet ready for primetime use in SSIS 2012. The main issue I found was that you can't populate an Object variable with a recordset when using an Execute SQL Task connecting to an ODBC data source; any attempt to do so will result in an error:"Disconnected recordsets are not available from ODBC connections." I have filed a bug on Connect at ODBC Connection Manager does not have same funcitonality as OLE DB. For this reason I strongly recommend that you don't make the move to ODBC Connection Managers in SSIS just yet - best to wait for the next version of SSIS before doing that.I found another couple of issues with the ODBC Connection Manager that are worth keeping in mind:It doesn't recognise System Data Source Names (DSNs), only User DSNs (bug filed at ODBC System DSNs are not available in the ODBC Connection Manager) UPDATE: According to a comment on that Connect item this may only be a problem on 64bit.In the OLE DB Connection Manager parameter ordinals are 0-based, in the ODBC Connection Manager they are 1-based (oh I just can't wait for the upgrade mess that ensues from this one!!!)You have been warned!@jamiet

Read the article
SSIS Dashboard v0.4

- by Davide Mauri

Following the post on SSISDB script on Gist, I’ve been working on a HTML5 SSIS Dashboard, in order to have a nice looking, user friendly and, most of all, useful, SSIS Dashboard. Since this is a “spare-time” project, I’ve decided to develop it using Python since it’s THE data language (R aside), it’s a beautiful & powerful, well established and well documented and with a rich ecosystem around. Plus it has full support in Visual Studio, through the amazing Python Tools For Visual Studio plugin, I decided also to use Flask, a very good micro-framework to create websites, and use the SB Admin 2.0 Bootstrap admin template, since I’m all but a Web Designer. The result is here: https://github.com/yorek/ssis-dashboard and I can say I’m pretty satisfied with the work done so far (I’ve worked on it for probably less than 24 hours). Though there’s some features I’d like to add in t future (historical execution time, some charts, connection with AzureML to do prediction on expected execution times) it’s already usable. Of course I’ve tested it only on my development machine, so check twice before putting it in production but, give the fact that, virtually, there is not installation needed (you only need to install Python), and that all queries are based on standard SSISDB objects, I expect no big problems. If you want to test, contribute and/or give feedback please fell free to do it…I would really love to see this little project become a community project! Enjoy!

Read the article
The one feature that would make me invest in SSIS 2012

- by Peter Larsson

This week I was invited my Microsoft to give two presentations in Slovenia. My presentations went well and I had good energy and the audience was interacting with me. When I had some time over from networking and partying, I attended a few other presentations. At least the ones who where held in English. One of these was "SQL Server Integration Services 2012 - All the News, and More", given by Davide Mauri, a fellow co-worker from SolidQ. We started to talk and soon came into the details of the new things in SSIS 2012. All of the official things Davide talked about are good stuff, but for me, the best thing is one he didn't cover in his presentation. In earlier versions of SSIS than 2012, it is possible to have a stored procedure to act as a data source, as long as it doesn't have a temp table in it. In that case, you will get an error message from SSIS that "Metadata could not be found". This is still true with SSIS 2012, so the thing I am talking about is not really a SSIS feature, it's a SQL Server 2012 feature. And this is the EXECUTE WITH RESULTSETS feature! With this, you can have a stored procedure with a temp table to deliver the resultset to SSIS, if you execute the stored procedure from SSIS and add the "WITH RESULTSETS" option. If you do this, SSIS is able to take the metadata from the code you write in SSIS and not from the stored procedure! And it's very fast too. Let's say you have a stored procedure in earlier versions and when referencing that stored procedure in SSIS forced SSIS to call the stored procedure (which can take hours), to retrieve the metadata. Now, with RESULTSETS, SSIS 2012 can continue in milliseconds! This is because you provide the metadata in the RESULTSETS clause, and if the data from the stored procedure doesn't match this RESULTSETS, you will get an error anyway, so it makes sense Microsoft has provided this optimization for us.

Read the article
ODBC in SSIS 2012

- by jamiet

In August 2011 the SQL Server client team published a blog post entitled Microsoft is Aligning with ODBC for Native Relational Data Access in which they basically said "OLE DB is the past, ODBC is the future. Deal with it.". From that blog post:We encourage you to adopt ODBC in the development of your new and future versions of your application. You don’t need to change your existing applications using OLE DB, as they will continue to be supported on Denali throughout its lifecycle. While this gives you a large window of opportunity for changing your applications before the deprecation goes into effect, you may want to consider migrating those applications to ODBC as a part of your future roadmap.I recently undertook a project using SSIS2012 and heeded that advice by opting to use ODBC Connection Managers rather than OLE DB Connection Managers. Unfortunately my finding was that the ODBC Connection Manager is not yet ready for primetime use in SSIS 2012. The main issue I found was that you can't populate an Object variable with a recordset when using an Execute SQL Task connecting to an ODBC data source; any attempt to do so will result in an error:"Disconnected recordsets are not available from ODBC connections." I have filed a bug on Connect at ODBC Connection Manager does not have same funcitonality as OLE DB. For this reason I strongly recommend that you don't make the move to ODBC Connection Managers in SSIS just yet - best to wait for the next version of SSIS before doing that.I found another couple of issues with the ODBC Connection Manager that are worth keeping in mind:It doesn't recognise System Data Source Names (DSNs), only User DSNs (bug filed at ODBC System DSNs are not available in the ODBC Connection Manager) UPDATE: According to a comment on that Connect item this may only be a problem on 64bit.In the OLE DB Connection Manager parameter ordinals are 0-based, in the ODBC Connection Manager they are 1-based (oh I just can't wait for the upgrade mess that ensues from this one!!!)You have been warned!@jamiet

Read the article
Merge Join component sorted outputs [SSIS]

- by jamiet

One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this: Why isn’t the Merge Join output sorted (i.e.IsSorted=True)? This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well? Well here’s a little secret, the Merge Join output IS sorted! There’s a caveat though – it is only under certain circumstances and SSIS itself doesn’t do a good job of informing you of it. Let’s take a look at an example. Here we have a dataflow that consumes data from the [AdventureWorks2008].[Sales].[SalesOrderHeader] & [AdventureWorks2008].[Sales].[SalesOrderDetail] tables then joins them using a Merge Join component: Let’s take a look inside the editor of the Merge Join: We are joining on the [SalesOrderId] field (which is what the two inputs just happen to be sorted upon). We are also putting [SalesOrderHeader].[SalesOrderId] into the output. Believe it or not the output from this Merge Join component is sorted (i.e. has IsSorted=True) but unfortunately the Merge Join component does not have an Advanced Editor hence it is hidden away from us. There are a couple of ways to prove to you that is the case; I could open up the package XML inside the .dtsx file and show you the metadata but there is an easier way than that – I can attach a Sort component to the output. Take a look: Notice that the Sort component is attempting to sort on the [SalesOrderId] column. This gives us the following warning: Validation warning. DFT Get raw data: {992B7C9A-35AD-47B9-A0B0-637F7DDF93EB}: The data is already sorted as specified so the transform can be removed. The warning proves that the output from the Merge Join is sorted! It must be noted that the Merge Join output will only have IsSorted=True if at least one of the join columns is included in the output. So there you go, the Merge Join component can indeed produce a sorted output and that’s very useful in order to avoid unnecessary expensive Sort operations downstream. Hope this is useful to someone out there! @Jamiet P.S. Thank you to Bob Bojanic on the SSIS product team who pointed this out to me!

Read the article
Merge Join component sorted outputs [SSIS]

- by jamiet

One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this: Why isn’t the Merge Join output sorted (i.e.IsSorted=True)? This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well? Well here’s a little secret, the Merge Join output IS sorted! There’s a caveat though – it is only under certain circumstances and SSIS itself doesn’t do a good job of informing you of it. Let’s take a look at an example. Here we have a dataflow that consumes data from the [AdventureWorks2008].[Sales].[SalesOrderHeader] & [AdventureWorks2008].[Sales].[SalesOrderDetail] tables then joins them using a Merge Join component: Let’s take a look inside the editor of the Merge Join: We are joining on the [SalesOrderId] field (which is what the two inputs just happen to be sorted upon). We are also putting [SalesOrderHeader].[SalesOrderId] into the output. Believe it or not the output from this Merge Join component is sorted (i.e. has IsSorted=True) but unfortunately the Merge Join component does not have an Advanced Editor hence it is hidden away from us. There are a couple of ways to prove to you that is the case; I could open up the package XML inside the .dtsx file and show you the metadata but there is an easier way than that – I can attach a Sort component to the output. Take a look: Notice that the Sort component is attempting to sort on the [SalesOrderId] column. This gives us the following warning: Validation warning. DFT Get raw data: {992B7C9A-35AD-47B9-A0B0-637F7DDF93EB}: The data is already sorted as specified so the transform can be removed. The warning proves that the output from the Merge Join is sorted! It must be noted that the Merge Join output will only have IsSorted=True if at least one of the join columns is included in the output. So there you go, the Merge Join component can indeed produce a sorted output and that’s very useful in order to avoid unnecessary expensive Sort operations downstream. Hope this is useful to someone out there! @Jamiet P.S. Thank you to Bob Bojanic on the SSIS product team who pointed this out to me!

Read the article
REPLACENULL in SSIS 2012

- by Davide Mauri

While preparing my slides e demos for the forthcoming SQL Server Conference 2012 in Italy, I’ve come across a nice addition to DTS Expression language which I never noticed before and that seems unknown also to the blogosphere: REPLACENULL. REPLACENULL is the same of ISNULL in T-SQL. It’s *very* useful especially when loading a fact table of your BI solution when you need to replace unexisting reference to dimension with dummy values. Here’s an example of how it can be used (please notice that in this example I’m NOT loading a fact table): I’ve noticed that the feature was requested by fellow MVP John Welch http://connect.microsoft.com/SQLServer/feedback/details/636057/ssis-add-a-replacenull-function-to-the-expression-language So: Thanks John and Thanks SSIS Team ! Ah, btw, the Help online is here http://msdn.microsoft.com/en-us/library/hh479601(v=sql.110).aspx Enjoy!

Read the article
SSIS: Building SQL databases on-the-fly using concatenated SQL scripts

- by DrJohn

Over the years I have developed many techniques which help automate the whole SQL Server build process. In my current process, where I need to build entire OLAP data marts on-the-fly, I make regular use of a simple but very effective mechanism to concatenate all the SQL Scripts together from my SSMS (SQL Server Management Studio) projects. This proves invaluable because in two clicks I can redeploy an entire SQL Server database with all tables, views, stored procedures etc. Indeed, I can also use the concatenated SQL scripts with SSIS to build SQL Server databases on-the-fly. You may be surprised to learn that I often redeploy the database several times per day, or even several times per hour, during the development process. This is because the deployment errors are logged and you can quickly see where SQL Scripts have object dependency errors. For example, after changing a table structure you may have forgotten to change any related views. The deployment log immediately points out all the objects which failed to build so you can fix and redeploy the database very quickly. The alternative approach (i.e. doing changes in the database directly using the SSMS UI) would require you to check all dependent objects before making changes. The chances are that you will miss something and wonder why your app returns the wrong data – a common problem caused by changing a table without re-creating dependent views. Using SQL Projects in SSMS A great many developers fail to make use of SQL Projects in SSMS (SQL Server Management Studio). To me they are invaluable way of organizing your SQL Scripts. The screenshot below shows a typical SSMS solution made up of several projects – one project for tables, another for views etc. The key point is that the projects naturally fall into the right order in file system because of the project name. The number in the folder or file name ensures that the projects the SQL scripts are concatenated together in the order that they need to be executed. Hence the script filenames start with 100, 110 etc. Concatenating SQL Scripts To concatenate the SQL Scripts together into one file, I use notepad.exe to create a simple batch file (see example screenshot) which uses the TYPE command to write the content of the SQL Script files into a combined file. As the SQL Scripts are in several folders, I simply use several TYPE command multiple times and append the output together. If you are unfamiliar with batch files, you may not know that the angled bracket (>) means write output of the program into a file. Two angled brackets (>>) means append output of this program into a file. So the command-line DIR > filelist.txt would write the content of the DIR command into a file called filelist.txt. In the example shown above, the concatenated file is called SB_DDS.sql If, like me you place the concatenated file under source code control, then the source code control system will change the file's attribute to "read-only" which in turn would cause the TYPE command to fail. The ATTRIB command can be used to remove the read-only flag. Using SQLCmd to execute the concatenated file Now that the SQL Scripts are all in one big file, we can execute the script against a database using SQLCmd using another batch file as shown below: SQLCmd has numerous options, but the script shown above simply executes the SS_DDS.sql file against the SB_DDS_DB database on the local machine and logs the errors to a file called SB_DDS.log. So after executing the batch file you can simply check the error log to see if your database built without a hitch. If you have errors, then simply fix the source files, re-create the concatenated file and re-run the SQLCmd to rebuild the database. This two click operation allows you to quickly identify and fix errors in your entire database definition.Using SSIS to execute the concatenated file To execute the concatenated SQL script using SSIS, you simply drop an Execute SQL task into your package and set the database connection as normal and then select File Connection as the SQLSourceType (as shown below). Create a file connection to your concatenated SQL script and you are ready to go. Tips and TricksAdd a new-line at end of every fileThe most common problem encountered with this approach is that the GO statement on the last line of one file is placed on the same line as the comment at the top of the next file by the TYPE command. The easy fix to this is to ensure all your files have a new-line at the end.Remove all USE database statementsThe SQLCmd identifies which database the script should be run against. So you should remove all USE database commands from your scripts - otherwise you may get unintentional side effects!!Do the Create Database separatelyIf you are using SSIS to create the database as well as create the objects and populate the database, then invoke the CREATE DATABASE command against the master database using a separate package before calling the package that executes the concatenated SQL script.

Read the article
Export all SSIS packages from msdb using Powershell

- by jamiet

Have you ever wanted to dump all the SSIS packages stored in msdb out to files? Of course you have, who wouldn’t? Right? Well, at least one person does because this was the subject of a thread (save all ssis packages to file) on the SSIS forum earlier today. Some of you may have already figured out a way of doing this but for those that haven’t here is a nifty little script that will do it for you and it uses our favourite jack-of-all tools … Powershell!! Imagine I have the following package folder structure on my Integration Services server (i.e. in [msdb]): There are two packages in there called “20110111 Chaining Expression components” & “Package”, I want to export those two packages into a folder structure that mirrors that in [msdb]. Here is the Powershell script that will do that: Param($SQLInstance = "localhost") #####Add all the SQL goodies (including Invoke-Sqlcmd)##### add-pssnapin sqlserverprovidersnapin100 -ErrorAction SilentlyContinue add-pssnapin sqlservercmdletsnapin100 -ErrorAction SilentlyContinue cls $Packages = Invoke-Sqlcmd -MaxCharLength 10000000 -ServerInstance $SQLInstance -Query "WITH cte AS ( SELECT cast(foldername as varchar(max)) as folderpath, folderid FROM msdb..sysssispackagefolders WHERE parentfolderid = '00000000-0000-0000-0000-000000000000' UNION ALL SELECT cast(c.folderpath + '\' + f.foldername as varchar(max)), f.folderid FROM msdb..sysssispackagefolders f INNER JOIN cte c ON c.folderid = f.parentfolderid ) SELECT c.folderpath,p.name,CAST(CAST(packagedata AS VARBINARY(MAX)) AS VARCHAR(MAX)) as pkg FROM cte c INNER JOIN msdb..sysssispackages p ON c.folderid = p.folderid WHERE c.folderpath NOT LIKE 'Data Collector%'" Foreach ($pkg in $Packages) { $pkgName = $Pkg.name $folderPath = $Pkg.folderpath $fullfolderPath = "c:\temp\$folderPath\" if(!(test-path -path $fullfolderPath)) { mkdir $fullfolderPath | Out-Null } $pkg.pkg | Out-File -Force -encoding ascii -FilePath "$fullfolderPath\$pkgName.dtsx" } To run it simply change the “localhost” parameter of the server you want to connect to either by editing the script or passing it in when the script is executed. It will create the folder structure in C:\Temp (which you can also easily change if you so wish – just edit the script accordingly). Here’s the folder structure that it created for me: Notice how it is a mirror of the folder structure in [msdb]. Hope this is useful! @Jamiet

Read the article
SSIS Denali CTP1 Source Assistant

- by andyleonard

I like the new Data Flow Source Assistant in SSIS Denali. The default view is shown above, with the "Show installed only" checkbox checked. When not checked, the list of Source types changes: In previous versions of SSIS, I rarely created connections in the Connection Managers pane - I usually hit a New button in either a Source or Destination Adapter, or in a task. It was just easier letting the task or adapter pick the proper Connection Manager editor. This is handy and a time-saver. :{>...(read more)

Read the article
Inability to detect the Output from inside a SSIS script component

- by Danaja

In the script of the script components the Output buffer is not being detected as an existing component. I am trying to use the following piece of code Output0Buffer.AddRow(); within the public override void Input0_ProcessInputRow(Input0Buffer Row) method. I know it should be available within this method because at the moment I am copying and using a component from a previous project that has this code and it works. but when I create a new component and put the same code in it doesn't Can any one explain why this is happening?

Read the article
Big Data – Buzz Words: Importance of Relational Database in Big Data World – Day 9 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is HDFS. In this article we will take a quick look at the importance of the Relational Database in Big Data world. A Big Question? Here are a few questions I often received since the beginning of the Big Data Series - Does the relational database have no space in the story of the Big Data? Does relational database is no longer relevant as Big Data is evolving? Is relational database not capable to handle Big Data? Is it true that one no longer has to learn about relational data if Big Data is the final destination? Well, every single time when I hear that one person wants to learn about Big Data and is no longer interested in learning about relational database, I find it as a bit far stretched. I am not here to give ambiguous answers of It Depends. I am personally very clear that one who is aspiring to become Big Data Scientist or Big Data Expert they should learn about relational database. NoSQL Movement The reason for the NoSQL Movement in recent time was because of the two important advantages of the NoSQL databases. Performance Flexible Schema In personal experience I have found that when I use NoSQL I have found both of the above listed advantages when I use NoSQL database. There are instances when I found relational database too much restrictive when my data is unstructured as well as they have in the datatype which my Relational Database does not support. It is the same case when I have found that NoSQL solution performing much better than relational databases. I must say that I am a big fan of NoSQL solutions in the recent times but I have also seen occasions and situations where relational database is still perfect fit even though the database is growing increasingly as well have all the symptoms of the big data. Situations in Relational Database Outperforms Adhoc reporting is the one of the most common scenarios where NoSQL is does not have optimal solution. For example reporting queries often needs to aggregate based on the columns which are not indexed as well are built while the report is running, in this kind of scenario NoSQL databases (document database stores, distributed key value stores) database often does not perform well. In the case of the ad-hoc reporting I have often found it is much easier to work with relational databases. SQL is the most popular computer language of all the time. I have been using it for almost over 10 years and I feel that I will be using it for a long time in future. There are plenty of the tools, connectors and awareness of the SQL language in the industry. Pretty much every programming language has a written drivers for the SQL language and most of the developers have learned this language during their school/college time. In many cases, writing query based on SQL is much easier than writing queries in NoSQL supported languages. I believe this is the current situation but in the future this situation can reverse when No SQL query languages are equally popular. ACID (Atomicity Consistency Isolation Durability) – Not all the NoSQL solutions offers ACID compliant language. There are always situations (for example banking transactions, eCommerce shopping carts etc.) where if there is no ACID the operations can be invalid as well database integrity can be at risk. Even though the data volume indeed qualify as a Big Data there are always operations in the application which absolutely needs ACID compliance matured language. The Mixed Bag I have often heard argument that all the big social media sites now a days have moved away from Relational Database. Actually this is not entirely true. While researching about Big Data and Relational Database, I have found that many of the popular social media sites uses Big Data solutions along with Relational Database. Many are using relational databases to deliver the results to end user on the run time and many still uses a relational database as their major backbone. Here are a few examples: Facebook uses MySQL to display the timeline. (Reference Link) Twitter uses MySQL. (Reference Link) Tumblr uses Sharded MySQL (Reference Link) Wikipedia uses MySQL for data storage. (Reference Link) There are many for prominent organizations which are running large scale applications uses relational database along with various Big Data frameworks to satisfy their various business needs. Summary I believe that RDBMS is like a vanilla ice cream. Everybody loves it and everybody has it. NoSQL and other solutions are like chocolate ice cream or custom ice cream – there is a huge base which loves them and wants them but not every ice cream maker can make it just right for everyone’s taste. No matter how fancy an ice cream store is there is always plain vanilla ice cream available there. Just like the same, there are always cases and situations in the Big Data’s story where traditional relational database is the part of the whole story. In the real world scenarios there will be always the case when there will be need of the relational database concepts and its ideology. It is extremely important to accept relational database as one of the key components of the Big Data instead of treating it as a substandard technology. Ray of Hope – NewSQL In this module we discussed that there are places where we need ACID compliance from our Big Data application and NoSQL will not support that out of box. There is a new termed coined for the application/tool which supports most of the properties of the traditional RDBMS and supports Big Data infrastructure – NewSQL. Tomorrow In tomorrow’s blog post we will discuss about NewSQL. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
To sample or not to sample...

- by [email protected]

Ideally, we would know the exact answer to every question. How many people support presidential candidate A vs. B? How many people suffer from H1N1 in a given state? Does this batch of manufactured widgets have any defective parts? Knowing exact answers is expensive in terms of time and money and, in most cases, is impractical if not impossible. Consider asking every person in a region for their candidate preference, testing every person with flu symptoms for H1N1 (assuming every person reported when they had flu symptoms), or destructively testing widgets to determine if they are "good" (leaving no product to sell). Knowing exact answers, fortunately, isn't necessary or even useful in many situations. Understanding the direction of a trend or statistically significant results may be sufficient to answer the underlying question: who is likely to win the election, have we likely reached a critical threshold for flu, or is this batch of widgets good enough to ship? Statistics help us to answer these questions with a certain degree of confidence. This focuses on how we collect data. In data mining, we focus on the use of data, that is data that has already been collected. In some cases, we may have all the data (all purchases made by all customers), in others the data may have been collected using sampling (voters, their demographics and candidate choice). Building data mining models on all of your data can be expensive in terms of time and hardware resources. Consider a company with 40 million customers. Do we need to mine all 40 million customers to get useful data mining models? The quality of models built on all data may be no better than models built on a relatively small sample. Determining how much is a reasonable amount of data involves experimentation. When starting the model building process on large datasets, it is often more efficient to begin with a small sample, perhaps 1000 - 10,000 cases (records) depending on the algorithm, source data, and hardware. This allows you to see quickly what issues might arise with choice of algorithm, algorithm settings, data quality, and need for further data preparation. Instead of waiting for a model on a large dataset to build only to find that the results don't meet expectations, once you are satisfied with the results on the initial sample, you can take a larger sample to see if model quality improves, and to get a sense of how the algorithm scales to the particular dataset. If model accuracy or quality continues to improve, consider increasing the sample size. Sampling in data mining is also used to produce a held-aside or test dataset for assessing classification and regression model accuracy. Here, we reserve some of the build data (data that includes known target values) to be used for an honest estimate of model error using data the model has not seen before. This sampling transformation is often called a split because the build data is split into two randomly selected sets, often with 60% of the records being used for model building and 40% for testing. Sampling must be performed with care, as it can adversely affect model quality and usability. Even a truly random sample doesn't guarantee that all values are represented in a given attribute. This is particularly troublesome when the attribute with omitted values is the target. A predictive model that has not seen any examples for a particular target value can never predict that target value! For other attributes, values may consist of a single value (a constant attribute) or all unique values (an identifier attribute), each of which may be excluded during mining. Values from categorical predictor attributes that didn't appear in the training data are not used when testing or scoring datasets. In subsequent posts, we'll talk about three sampling techniques using Oracle Database: simple random sampling without replacement, stratified sampling, and simple random sampling with replacement.

Read the article
Importing data from text file to specific columns using BULK INSERT

- by Dinesh Asanka

Bulk insert is much faster than using other techniques such as SSIS. However, when you are using bulk insert you can’t insert to specific columns. If, for example, there are five columns in a table you should have five values for each record in the text file you are importing from. This is an issue when you are expecting default values to be inserted into tables. Let us say you have table as below: In this table, you are expecting ID, Status and CreatedDate to be updated automatically, so your text file may only have FirstName LastName values as below: Dinesh,Asanka Saman,Liyanage Ruwan,Silva Susantha,Bathige Jude,Peires Sanjeewa,Jayawickrama If you use bulk insert to this table like follows, You will be returned an error: Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (ID). To avoid this you will need to create a view with the columns you are expecting to fill and use bulk insert against it. If you check the table now, you will see table with values in the text file and the default values.

Read the article
Kipróbálható az ingyenes új Oracle Data Miner 11gR2 grafikus workflow-val

- by Fekete Zoltán

Oracle Data Mining technológiai információs oldal. Oracle Data Miner 11g Release 2 - Early Adopter oldal. Megjelent, letöltheto és kipróbálható az Oracle Data Mining, az Oracle adatbányászat új grafikus felülete, az Oracle Data Miner 11gR2. Az Oracle Data Minerhez egyszeruen az SQL Developer-t kell letöltenünk, mivel az adatbányászati felület abból indítható. Az Oracle Data Mining az Oracle adatbáziskezelobe ágyazott adatbányászati motor, ami az Oracle Database Enterprise Edition opciója. Az adatbányászat az adattárházak elemzésének kifinomult eszköze és folyamata. Az Oracle Data Mining in-database-mining elonyeit felvonultatja: - nincs felesleges adatmozgatás, a teljes adatbányászati folyamatban az adatbázisban maradnak az adatok - az adatbányászati modellek is az Oracle adatbázisban vannak - az adatbányászati eredmények, cluster adatok, döntések, valószínuségek, stb. szintén az adatbázisban keletkeznek, és ott közvetlenül elemezhetoek Az új ingyenes Data Miner felület "hatalmas gazdagodáson" ment keresztül az elozo verzióhoz képest. - grafikus adatbányászati workflow szerkesztés és futtatás jelent meg! - továbbra is ingyenes - kibovült a felület - új elemzési lehetoségekkel bovült - az SQL Developer 3.0 felületrol indítható, ez megkönnyíti az adatbányászati funkciók meghívását az adatbázisból, ha épp nem a grafikus felületetet szeretnénk erre használni Az ingyenes Data Miner felület az Oracle SQL Developer kiterjesztéseként érheto el, így az elemzok közvetlenül dolgozhatnak az adatokkal az adatbázisban és a Data Miner grafikus felülettel is, építhetnek és kiértékelhetnek, futtathatnak modelleket, predikciókat tehetnek és elemezhetnek, támogatást kapva az adatbányászati módszertan megvalósítására. A korábbi Oracle Data Miner felület a Data Miner Classic néven fut és továbbra is letöltheto az OTN-rol. Az új Data Miner GUI-ból egy képernyokép: Milyen feladatokra ad megoldási lehetoséget az Oracle Data Mining: - ügyfél viselkedés megjövendölése, prediktálása - a "legjobb" ügyfelek eredményes megcélzása - ügyfél megtartás, elvándorlás kezelés (churn) - ügyfél szegmensek, klaszterek, profilok keresése és vizsgálata - anomáliák, visszaélések felderítése - stb.

Read the article
The SSIS tuning tip that everyone misses

- by Rob Farley

I know that everyone misses this, because I’m yet to find someone who doesn’t have a bit of an epiphany when I describe this. When tuning Data Flows in SQL Server Integration Services, people see the Data Flow as moving from the Source to the Destination, passing through a number of transformations. What people don’t consider is the Source, getting the data out of a database. Remember, the source of data for your Data Flow is not your Source Component. It’s wherever the data is, within your database, probably on a disk somewhere. You need to tune your query to optimise it for SSIS, and this is what most people fail to do. I’m not suggesting that people don’t tune their queries – there’s plenty of information out there about making sure that your queries run as fast as possible. But for SSIS, it’s not about how fast your query runs. Let me say that again, but in bolder text: The speed of an SSIS Source is not about how fast your query runs. If your query is used in a Source component for SSIS, the thing that matters is how fast it starts returning data. In particular, those first 10,000 rows to populate that first buffer, ready to pass down the rest of the transformations on its way to the Destination. Let’s look at a very simple query as an example, using the AdventureWorks database: We’re picking the different Weight values out of the Product table, and it’s doing this by scanning the table and doing a Sort. It’s a Distinct Sort, which means that the duplicates are discarded. It'll be no surprise to see that the data produced is sorted. Obvious, I know, but I'm making a comparison to what I'll do later. Before I explain the problem here, let me jump back into the SSIS world... If you’ve investigated how to tune an SSIS flow, then you’ll know that some SSIS Data Flow Transformations are known to be Blocking, some are Partially Blocking, and some are simply Row transformations. Take the SSIS Sort transformation, for example. I’m using a larger data set for this, because my small list of Weights won’t demonstrate it well enough. Seven buffers of data came out of the source, but none of them could be pushed past the Sort operator, just in case the last buffer contained the data that would be sorted into the first buffer. This is a blocking operation. Back in the land of T-SQL, we consider our Distinct Sort operator. It’s also blocking. It won’t let data through until it’s seen all of it. If you weren’t okay with blocking operations in SSIS, why would you be happy with them in an execution plan? The source of your data is not your OLE DB Source. Remember this. The source of your data is the NCIX/CIX/Heap from which it’s being pulled. Picture it like this... the data flowing from the Clustered Index, through the Distinct Sort operator, into the SELECT operator, where a series of SSIS Buffers are populated, flowing (as they get full) down through the SSIS transformations. Alright, I know that I’m taking some liberties here, because the two queries aren’t the same, but consider the visual. The data is flowing from your disk and through your execution plan before it reaches SSIS, so you could easily find that a blocking operation in your plan is just as painful as a blocking operation in your SSIS Data Flow. Luckily, T-SQL gives us a brilliant query hint to help avoid this. OPTION (FAST 10000) This hint means that it will choose a query which will optimise for the first 10,000 rows – the default SSIS buffer size. And the effect can be quite significant. First let’s consider a simple example, then we’ll look at a larger one. Consider our weights. We don’t have 10,000, so I’m going to use OPTION (FAST 1) instead. You’ll notice that the query is more expensive, using a Flow Distinct operator instead of the Distinct Sort. This operator is consuming 84% of the query, instead of the 59% we saw from the Distinct Sort. But the first row could be returned quicker – a Flow Distinct operator is non-blocking. The data here isn’t sorted, of course. It’s in the same order that it came out of the index, just with duplicates removed. As soon as a Flow Distinct sees a value that it hasn’t come across before, it pushes it out to the operator on its left. It still has to maintain the list of what it’s seen so far, but by handling it one row at a time, it can push rows through quicker. Overall, it’s a lot more work than the Distinct Sort, but if the priority is the first few rows, then perhaps that’s exactly what we want. The Query Optimizer seems to do this by optimising the query as if there were only one row coming through: This 1 row estimation is caused by the Query Optimizer imagining the SELECT operation saying “Give me one row” first, and this message being passed all the way along. The request might not make it all the way back to the source, but in my simple example, it does. I hope this simple example has helped you understand the significance of the blocking operator. Now I’m going to show you an example on a much larger data set. This data was fetching about 780,000 rows, and these are the Estimated Plans. The data needed to be Sorted, to support further SSIS operations that needed that. First, without the hint. ...and now with OPTION (FAST 10000): A very different plan, I’m sure you’ll agree. In case you’re curious, those arrows in the top one are 780,000 rows in size. In the second, they’re estimated to be 10,000, although the Actual figures end up being 780,000. The top one definitely runs faster. It finished several times faster than the second one. With the amount of data being considered, these numbers were in minutes. Look at the second one – it’s doing Nested Loops, across 780,000 rows! That’s not generally recommended at all. That’s “Go and make yourself a coffee” time. In this case, it was about six or seven minutes. The faster one finished in about a minute. But in SSIS-land, things are different. The particular data flow that was consuming this data was significant. It was being pumped into a Script Component to process each row based on previous rows, creating about a dozen different flows. The data flow would take roughly ten minutes to run – ten minutes from when the data first appeared. The query that completes faster – chosen by the Query Optimizer with no hints, based on accurate statistics (rather than pretending the numbers are smaller) – would take a minute to start getting the data into SSIS, at which point the ten-minute flow would start, taking eleven minutes to complete. The query that took longer – chosen by the Query Optimizer pretending it only wanted the first 10,000 rows – would take only ten seconds to fill the first buffer. Despite the fact that it might have taken the database another six or seven minutes to get the data out, SSIS didn’t care. Every time it wanted the next buffer of data, it was already available, and the whole process finished in about ten minutes and ten seconds. When debugging SSIS, you run the package, and sit there waiting to see the Debug information start appearing. You look for the numbers on the data flow, and seeing operators going Yellow and Green. Without the hint, I’d sit there for a minute. With the hint, just ten seconds. You can imagine which one I preferred. By adding this hint, it felt like a magic wand had been waved across the query, to make it run several times faster. It wasn’t the case at all – but it felt like it to SSIS.

Read the article
Is SSIS able to query flat files from another Windows Server?

- by atricapilla

I pretty new SQL Server Integration Server (SSIS) user. Is SSIS able to query data from text files located in another Windows Server? I mean that when SSIS is installed on Windwos Server A, is SSIS able to query data from e.g. one folder containing text files in Windows Server B (under same domain)? I have used only SAP BO Data Integrator ETL tool and it cannot query flat files from another Server: during execution, all files must be located on the Job Server machine that executes the job.

Read the article
SSIS Script Component Testing Strategy

- by Paul Kohler

This question is in respect to the script component specifically. I am aware of ssisUnit etc… With simple SSIS Scripts Components, it’s sufficient to let basic testing flesh out issues, however I am working with a script that has grown in complexity over time. To better test the functionality I am considering abstracting the script logic into a DLL that gets deployed with the package, and then use the custom component in the script. The advantage is that the function will be more testable etc but it’s one more deployment artefact that needs to be managed. My question is, does anyone know of a better way to test such an SSIS script in a more isolated manner than to run the whole package and examine the output?

Read the article
FileNameColumnName property, Flat File Source Adapter : SSIS Nugget

- by jamiet

I saw a question on MSDN’s SSIS forum the other day that went something like this: I’m loading data into a table from a flat file but I want to be able to store the name of that file as well. Is there a way of doing that? I don’t want to come across as disrespecting those who took the time to reply but there was a few answers along the lines of “loop over the files using a For Each, store the file name in a variable yadda yadda yadda” when in fact there is a much much simpler way of accomplishing this; it just happens to be a little hidden away as I shall now explain! The Flat File Source Adapter has a property called FileNameColumnName which for some reason it isn’t exposed through the Flat File Source editor, it is however exposed via the Advanced Properties: You’ll see in the screenshot above that I have set FileNameColumnName=“Filename” (it doesn’t matter what name you use, anything except a non-zero string will work). What this will do is create a new column in our dataflow called “Filename” that contains, unsurprisingly, the name of the file from which the row was sourced. All very simple. This is particularly useful if you are extracting data from multiple files using the MultiFlatFile Connection Manager as it allows you to differentiate between data from each of the files as you can see in the following screenshot: So there you have it, the FileNameColumnName property; a little known secret of SSIS. I hope it proves to be useful to someone out there. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
FileNameColumnName property, Flat File Source Adapter : SSIS Nugget

- by jamiet

I saw a question on MSDN’s SSIS forum the other day that went something like this: I’m loading data into a table from a flat file but I want to be able to store the name of that file as well. Is there a way of doing that? I don’t want to come across as disrespecting those who took the time to reply but there was a few answers along the lines of “loop over the files using a For Each, store the file name in a variable yadda yadda yadda” when in fact there is a much much simpler way of accomplishing this; it just happens to be a little hidden away as I shall now explain! The Flat File Source Adapter has a property called FileNameColumnName which for some reason it isn’t exposed through the Flat File Source editor, it is however exposed via the Advanced Properties: You’ll see in the screenshot above that I have set FileNameColumnName=“Filename” (it doesn’t matter what name you use, anything except a non-zero string will work). What this will do is create a new column in our dataflow called “Filename” that contains, unsurprisingly, the name of the file from which the row was sourced. All very simple. This is particularly useful if you are extracting data from multiple files using the MultiFlatFile Connection Manager as it allows you to differentiate between data from each of the files as you can see in the following screenshot: So there you have it, the FileNameColumnName property; a little known secret of SSIS. I hope it proves to be useful to someone out there. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article

Search Results

Search found 58774 results on 2351 pages for 'ssis data tranformations'.

Page 4/2351 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Pinal Dave

- by jamiet

- by jamiet

- by CoffeeAddict

- by Pinal Dave

- by jamiet

- by Davide Mauri

- by Peter Larsson

- by jamiet

- by jamiet

- by jamiet

- by Davide Mauri

- by DrJohn

- by jamiet

- by andyleonard

- by Danaja

- by Pinal Dave

- by [email protected]

- by Dinesh Asanka

- by Fekete Zoltán

- by Rob Farley

- by atricapilla

- by Paul Kohler

- by jamiet

- by jamiet

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >