Search Results

Search found 58774 results on 2351 pages for 'ssis data tranformations'.

Page 5/2351 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • SSIS – Delete all files except for the most recent one

    - by jorg
    Quite often one or more sources for a data warehouse consist of flat files. Most of the times these files are delivered as a zip file with a date in the file name, for example FinanceDataExport_20100528.zip Currently I work at a project that does a full load into the data warehouse every night. A zip file with some flat files in it is dropped in a directory on a daily basis. Sometimes there are multiple zip files in the directory, this can happen because the ETL failed or somebody puts a new zip file in the directory manually. Because the ETL isn’t incremental only the most recent file needs to be loaded. To implement this I used the simple code below; it checks which file is the most recent and deletes all other files. Note: In a previous blog post I wrote about unzipping zip files within SSIS, you might also find this useful: SSIS – Unpack a ZIP file with the Script Task Public Sub Main() 'Use this piece of code to loop through a set of files in a directory 'and delete all files except for the most recent one based on a date in the filename. 'File name example: 'DataExport_20100413.zip Dim rootDirectory As New DirectoryInfo(Dts.Variables("DirectoryFromSsisVariable").Value.ToString) Dim mostRecentFile As String = "" Dim currentFileDate As Integer Dim mostRecentFileDate As Integer = 0 'Check which file is the most recent For Each fi As FileInfo In rootDirectory.GetFiles("*.zip") currentFileDate = CInt(Left(Right(fi.Name, 12), 8)) 'Get date from current filename (based on a file that ends with: YYYYMMDD.zip) If currentFileDate > mostRecentFileDate Then mostRecentFileDate = currentFileDate mostRecentFile = fi.Name End If Next 'Delete all files except the most recent one For Each fi As FileInfo In rootDirectory.GetFiles("*.zip") If fi.Name <> mostRecentFile Then File.Delete(rootDirectory.ToString + "\" + fi.Name) End If Next Dts.TaskResult = ScriptResults.Success End Sub Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • SSIS Prehistory video

    - by jamiet
    I’m currently wasting spending my Easter bank holiday putting together my presentation SSIS Dataflow Performance Tuning for the upcoming SQL Bits conference in London and in doing so I’m researching some old material about how the dataflow actually works. Boring as it is I’ve gotten easily sidelined and have chanced upon an old video on Channel 9 entitled Euan Garden - Tour of SQL Server Team (part I). Euan is a former member of the SQL Server team and in this series of videos he walks the halls of the SQL Server building on Microsoft’s Redmond campus talking to some of the various protagonists and in this one he happens upon the SQL Server Integration Services team. The video was shot in 2004 so this is a fascinating (to me anyway) glimpse into the development of SSIS from before it was ever shipped and if you’re a geek like me you’ll really enjoy this behind-the-scenes look into how and why the product was architected. The video is also notable for the presence of the cameraman – none other than the now-rather-more-famous-than-he-was-then Robert Scoble. See it at http://channel9.msdn.com/posts/TheChannel9Team/Euan-Garden-Tour-of-SQL-Server-Team-part-I/ Enjoy! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • SSIS Prehistory video

    - by jamiet
    I’m currently wasting spending my Easter bank holiday putting together my presentation SSIS Dataflow Performance Tuning for the upcoming SQL Bits conference in London and in doing so I’m researching some old material about how the dataflow actually works. Boring as it is I’ve gotten easily sidelined and have chanced upon an old video on Channel 9 entitled Euan Garden - Tour of SQL Server Team (part I). Euan is a former member of the SQL Server team and in this series of videos he walks the halls of the SQL Server building on Microsoft’s Redmond campus talking to some of the various protagonists and in this one he happens upon the SQL Server Integration Services team. The video was shot in 2004 so this is a fascinating (to me anyway) glimpse into the development of SSIS from before it was ever shipped and if you’re a geek like me you’ll really enjoy this behind-the-scenes look into how and why the product was architected. The video is also notable for the presence of the cameraman – none other than the now-rather-more-famous-than-he-was-then Robert Scoble. See it at http://channel9.msdn.com/posts/TheChannel9Team/Euan-Garden-Tour-of-SQL-Server-Team-part-I/ Enjoy! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Tale of an Encrypted SSIS Package in msdb and a Lost Password

    - by Argenis
      Yesterday a Developer at work asked for a copy of an SSIS package in Production so he could work on it (please, dear Reader – withhold judgment on Source Control – I know!). I logged on to the SSIS instance, and when I went to export the package… Oops. I didn’t have that password. The DBA who uploaded the package to Production is long gone; my fellow DBA had no idea either - and the Devs returned a cricket sound when queried. So I posed the obligatory question on #SQLHelp and a bunch of folks jumped in – some to help and some to make fun of me (thanks, @SQLSoldier @crummel4 @maryarcia and @sqljoe). I tried their suggestions to no avail…even ran some queries to see if I could figure out how to extract the package XML from the system tables in msdb:   SELECT CAST(CAST(p.packagedata AS varbinary(max)) AS varchar(max)) FROM msdb.dbo.sysssispackages p WHERE p.name = 'LePackage'   This just returned a bunch of XML with encrypted data on it:  I knew there was a job in SQL Agent scheduled to execute the package, and when I tried to look at details on the job step I got the following: Not very helpful. The password had to be saved somewhere, but where?? All of a sudden I remembered that there was a system table I hadn’t queried yet: SELECT sjs.command FROM msdb.dbo.sysjobs sj JOIN msdb.dbo.sysjobsteps sjs ON sj.job_id = sjs.job_id WHERE sj.name = 'Run LePackage' The result: “Well, that’s really secure”, I thought to myself. Cheers, -Argenis

    Read the article

  • Big Data – Operational Databases Supporting Big Data – RDBMS and NoSQL – Day 12 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Cloud in the Big Data Story. In this article we will understand the role of Operational Databases Supporting Big Data Story. Even though we keep on talking about Big Data architecture, it is extremely crucial to understand that Big Data system can’t just exist in the isolation of itself. There are many needs of the business can only be fully filled with the help of the operational databases. Just having a system which can analysis big data may not solve every single data problem. Real World Example Think about this way, you are using Facebook and you have just updated your information about the current relationship status. In the next few seconds the same information is also reflected in the timeline of your partner as well as a few of the immediate friends. After a while you will notice that the same information is now also available to your remote friends. Later on when someone searches for all the relationship changes with their friends your change of the relationship will also show up in the same list. Now here is the question – do you think Big Data architecture is doing every single of these changes? Do you think that the immediate reflection of your relationship changes with your family member is also because of the technology used in Big Data. Actually the answer is Facebook uses MySQL to do various updates in the timeline as well as various events we do on their homepage. It is really difficult to part from the operational databases in any real world business. Now we will see a few of the examples of the operational databases. Relational Databases (This blog post) NoSQL Databases (This blog post) Key-Value Pair Databases (Tomorrow’s post) Document Databases (Tomorrow’s post) Columnar Databases (The Day After’s post) Graph Databases (The Day After’s post) Spatial Databases (The Day After’s post) Relational Databases We have earlier discussed about the RDBMS role in the Big Data’s story in detail so we will not cover it extensively over here. Relational Database is pretty much everywhere in most of the businesses which are here for many years. The importance and existence of the relational database are always going to be there as long as there are meaningful structured data around. There are many different kinds of relational databases for example Oracle, SQL Server, MySQL and many others. If you are looking for Open Source and widely accepted database, I suggest to try MySQL as that has been very popular in the last few years. I also suggest you to try out PostgreSQL as well. Besides many other essential qualities PostgreeSQL have very interesting licensing policies. PostgreSQL licenses allow modifications and distribution of the application in open or closed (source) form. One can make any modifications and can keep it private as well as well contribute to the community. I believe this one quality makes it much more interesting to use as well it will play very important role in future. Nonrelational Databases (NOSQL) We have also covered Nonrelational Dabases in earlier blog posts. NoSQL actually stands for Not Only SQL Databases. There are plenty of NoSQL databases out in the market and selecting the right one is always very challenging. Here are few of the properties which are very essential to consider when selecting the right NoSQL database for operational purpose. Data and Query Model Persistence of Data and Design Eventual Consistency Scalability Though above all of the properties are interesting to have in any NoSQL database but the one which most attracts to me is Eventual Consistency. Eventual Consistency RDBMS uses ACID (Atomicity, Consistency, Isolation, Durability) as a key mechanism for ensuring the data consistency, whereas NonRelational DBMS uses BASE for the same purpose. Base stands for Basically Available, Soft state and Eventual consistency. Eventual consistency is widely deployed in distributed systems. It is a consistency model used in distributed computing which expects unexpected often. In large distributed system, there are always various nodes joining and various nodes being removed as they are often using commodity servers. This happens either intentionally or accidentally. Even though one or more nodes are down, it is expected that entire system still functions normally. Applications should be able to do various updates as well as retrieval of the data successfully without any issue. Additionally, this also means that system is expected to return the same updated data anytime from all the functioning nodes. Irrespective of when any node is joining the system, if it is marked to hold some data it should contain the same updated data eventually. As per Wikipedia - Eventual consistency is a consistency model used in distributed computing that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. In other words -  Informally, if no additional updates are made to a given data item, all reads to that item will eventually return the same value. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • timetable in a jTable

    - by chandra
    I want to create a timetable in a jTable. For the top row it will display from monday to sunday and the left colume will display the time of the day with 2h interval e.g 1st colume (0000 - 0200), 2nd colume (0200 - 0400) .... And if i click a button the timing will change from 2h interval to 1h interval. I do not want to hardcode it because i need to do for 2h, 1h, 30min , 15min, 1min, 30sec and 1 sec interval and it will take too long for me to hardcode. Can anyone show me an example or help me create an example for the 2h to 1h interval so that i know what to do? The data array is for me to store data and are there any other easier or shortcuts for me to store them because if it is in 1 sec interval i got thousands of array i need to type it out. private void oneHour() //1 interval functions { if(!once) { initialize(); once = true; } jTable.setModel(new javax.swing.table.DefaultTableModel( new Object [][] { {"0000 - 0100", data[0][0], data[0][1], data[0][2], data[0][3], data[0][4], data[0][5], data[0][6]}, {"0100 - 0200", data[2][0], data[2][1], data[2][2], data[2][3], data[2][4], data[2][5], data[2][6]}, {"0200 - 0300", data[4][0], data[4][1], data[4][2], data[4][3], data[4][4], data[4][5], data[4][6]}, {"0300 - 0400", data[6][0], data[6][1], data[6][2], data[6][3], data[6][4], data[6][5], data[6][6]}, {"0400 - 0600", data[8][0], data[8][1], data[8][2], data[8][3], data[8][4], data[8][5], data[8][6]}, {"0600 - 0700", data[10][0], data[4][1], data[10][2], data[10][3], data[10][4], data[10][5], data[10][6]}, {"0700 - 0800", data[12][0], data[12][1], data[12][2], data[12][3], data[12][4], data[12][5], data[12][6]}, {"0800 - 0900", data[14][0], data[14][1], data[14][2], data[14][3], data[14][4], data[14][5], data[14][6]}, {"0900 - 1000", data[16][0], data[16][1], data[16][2], data[16][3], data[16][4], data[16][5], data[16][6]}, {"1000 - 1100", data[18][0], data[18][1], data[18][2], data[18][3], data[18][4], data[18][5], data[18][6]}, {"1100 - 1200", data[20][0], data[20][1], data[20][2], data[20][3], data[20][4], data[20][5], data[20][6]}, {"1200 - 1300", data[22][0], data[22][1], data[22][2], data[22][3], data[22][4], data[22][5], data[22][6]}, {"1300 - 1400", data[24][0], data[24][1], data[24][2], data[24][3], data[24][4], data[24][5], data[24][6]}, {"1400 - 1500", data[26][0], data[26][1], data[26][2], data[26][3], data[26][4], data[26][5], data[26][6]}, {"1500 - 1600", data[28][0], data[28][1], data[28][2], data[28][3], data[28][4], data[28][5], data[28][6]}, {"1600 - 1700", data[30][0], data[30][1], data[30][2], data[30][3], data[30][4], data[30][5], data[30][6]}, {"1700 - 1800", data[32][0], data[32][1], data[32][2], data[32][3], data[32][4], data[32][5], data[32][6]}, {"1800 - 1900", data[34][0], data[34][1], data[34][2], data[34][3], data[34][4], data[34][5], data[34][6]}, {"1900 - 2000", data[36][0], data[36][1], data[36][2], data[36][3], data[36][4], data[36][5], data[36][6]}, {"2000 - 2100", data[38][0], data[38][1], data[38][2], data[38][3], data[38][4], data[38][5], data[38][6]}, {"2100 - 2200", data[40][0], data[40][1], data[40][2], data[40][3], data[40][4], data[40][5], data[40][6]}, {"2200 - 2300", data[42][0], data[42][1], data[42][2], data[42][3], data[42][4], data[42][5], data[42][6]}, {"2300 - 2400", data[44][0], data[44][1], data[44][2], data[44][3], data[44][4], data[44][5], data[44][6]}, {"2400 - 0000", data[46][0], data[46][1], data[46][2], data[46][3], data[46][4], data[46][5], data[46][6]}, }, new String [] { "Time/Day", "(Mon)", "(Tue)", "(Wed)", "(Thurs)", "(Fri)", "(Sat)", "(Sun)" } )); } private void twoHour() //2 hour interval functions { if(!once) { initialize(); once = true; } jTable.setModel(new javax.swing.table.DefaultTableModel( new Object [][] { {"0000 - 0200", data[0][0], data[0][1], data[0][2], data[0][3], data[0][4], data[0][5], data[0][6]}, {"0200 - 0400", data[4][0], data[4][1], data[4][2], data[4][3], data[4][4], data[4][5], data[4][6]}, {"0400 - 0600", data[8][0], data[8][1], data[8][2], data[8][3], data[8][4], data[8][5], data[8][6]}, {"0600 - 0800", data[12][0], data[12][1], data[12][2], data[12][3], data[12][4], data[12][5], data[12][6]}, {"0800 - 1000", data[16][0], data[16][1], data[16][2], data[16][3], data[16][4], data[16][5], data[16][6]}, {"1000 - 1200", data[20][0], data[20][1], data[20][2], data[20][3], data[20][4], data[20][5], data[20][6]}, {"1200 - 1400", data[24][0], data[24][1], data[24][2], data[24][3], data[24][4], data[24][5], data[24][6]}, {"1400 - 1600", data[28][0], data[28][1], data[28][2], data[28][3], data[28][4], data[28][5], data[28][6]}, {"1600 - 1800", data[32][0], data[32][1], data[32][2], data[32][3], data[32][4], data[32][5], data[32][6]}, {"1800 - 2000", data[36][0], data[36][1], data[36][2], data[36][3], data[36][4], data[36][5], data[36][6]}, {"2000 - 2200", data[40][0], data[40][1], data[40][2], data[40][3], data[40][4], data[40][5], data[40][6]}, {"2200 - 2400",data[44][0], data[44][1], data[44][2], data[44][3], data[44][4], data[44][5], data[44][6]} },

    Read the article

  • Running SSIS packages from C#

    - by Piotr Rodak
    Most of the developers and DBAs know about two ways of deploying packages: You can deploy them to database server and run them using SQL Server Agent job or you can deploy the packages to file system and run them using dtexec.exe utility. Both approaches have their pros and cons. However I would like to show you that there is a third way (sort of) that is often overlooked, and it can give you capabilities the ‘traditional’ approaches can’t. I have been working for a few years with applications that run packages from host applications that are implemented in .NET. As you know, SSIS provides programming model that you can use to implement more flexible solutions. SSIS applications are usually thought to be batch oriented, with fairly rigid architecture and processing model, with fixed timeframes when the packages are executed to process data. It doesn’t to be the case, you don’t have to limit yourself to batch oriented architecture. I have very good experiences with service oriented architectures processing large amounts of data. These applications are more complex than what I would like to show here, but the principle stays the same: you can execute packages as a service, on ad-hoc basis. You can also implement and schedule various signals, HTTP calls, file drops, time schedules, Tibco messages and other to run the packages. You can implement event handler that will trigger execution of SSIS when a certain event occurs in StreamInsight stream. This post is just a small example of how you can use the API and other features to create a service that can run SSIS packages on demand. I thought it might be a good idea to implement a restful service that would listen to requests and execute appropriate actions. As it turns out, it is trivial in C#. The application is implemented as console application for the ease of debugging and running. In reality, you might want to implement the application as Windows service. To begin, you have to reference namespace System.ServiceModel.Web and then add a few lines of code: Uri baseAddress = new Uri("http://localhost:8011/");               WebServiceHost svcHost = new WebServiceHost(typeof(PackRunner), baseAddress);                           try             {                 svcHost.Open();                   Console.WriteLine("Service is running");                 Console.WriteLine("Press enter to stop the service.");                 Console.ReadLine();                   svcHost.Close();             }             catch (CommunicationException cex)             {                 Console.WriteLine("An exception occurred: {0}", cex.Message);                 svcHost.Abort();             } The interesting lines are 3, 7 and 13. In line 3 you create a WebServiceHost object. In line 7 you start listening on the defined URL and then in line 13 you shut down the service. As you have noticed, the WebServiceHost constructor is accepting type of an object (here: PackRunner) that will be instantiated as singleton and subsequently used to process the requests. This is the class where you put your logic, but to tell WebServiceHost how to use it, the class must implement an interface which declares methods to be used by the host. The interface itself must be ornamented with attribute ServiceContract. [ServiceContract]     public interface IPackRunner     {         [OperationContract]         [WebGet(UriTemplate = "runpack?package={name}")]         string RunPackage1(string name);           [OperationContract]         [WebGet(UriTemplate = "runpackwithparams?package={name}&rows={rows}")]         string RunPackage2(string name, int rows);     } Each method that is going to be used by WebServiceHost has to have attribute OperationContract, as well as WebGet or WebInvoke attribute. The detailed discussion of the available options is outside of scope of this post. I also recommend using more descriptive names to methods . Then, you have to provide the implementation of the interface: public class PackRunner : IPackRunner     {         ... There are two methods defined in this class. I think that since the full code is attached to the post, I will show only the more interesting method, the RunPackage2.   /// <summary> /// Runs package and sets some of its variables. /// </summary> /// <param name="name">Name of the package</param> /// <param name="rows">Number of rows to export</param> /// <returns></returns> public string RunPackage2(string name, int rows) {     try     {         string pkgLocation = ConfigurationManager.AppSettings["PackagePath"];           pkgLocation = Path.Combine(pkgLocation, name.Replace("\"", ""));           Console.WriteLine();         Console.WriteLine("Calling package {0} with parameter {1}.", name, rows);                  Application app = new Application();         Package pkg = app.LoadPackage(pkgLocation, null);           pkg.Variables["User::ExportRows"].Value = rows;         DTSExecResult pkgResults = pkg.Execute();         Console.WriteLine();         Console.WriteLine(pkgResults.ToString());         if (pkgResults == DTSExecResult.Failure)         {             Console.WriteLine();             Console.WriteLine("Errors occured during execution of the package:");             foreach (DtsError er in pkg.Errors)                 Console.WriteLine("{0}: {1}", er.ErrorCode, er.Description);             Console.WriteLine();             return "Errors occured during execution. Contact your support.";         }                  Console.WriteLine();         Console.WriteLine();         return "OK";     }     catch (Exception ex)     {         Console.WriteLine(ex);         return ex.ToString();     } }   The method accepts package name and number of rows to export. The packages are deployed to the file system. The path to the packages is configured in the application configuration file. This way, you can implement multiple services on the same machine, provided you also configure the URL for each instance appropriately. To run a package, you have to reference Microsoft.SqlServer.Dts.Runtime namespace. This namespace is implemented in Microsoft.SQLServer.ManagedDTS.dll which in my case was installed in the folder “C:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies”. Once you have done it, you can create an instance of Microsoft.SqlServer.Dts.Runtime.Application as in line 18 in the above snippet. It may be a good idea to create the Application object in the constructor of the PackRunner class, to avoid necessity of recreating it each time the service is invoked. Then, in line 19 you see that an instance of Microsoft.SqlServer.Dts.Runtime.Package is created. The method LoadPackage in its simplest form just takes package file name as the first parameter. Before you run the package, you can set its variables to certain values. This is a great way of configuring your packages without all the hassle with dtsConfig files. In the above code sample, variable “User:ExportRows” is set to value of the parameter “rows” of the method. Eventually, you execute the package. The method doesn’t throw exceptions, you have to test the result of execution yourself. If the execution wasn’t successful, you can examine collection of errors exposed by the package. These are the familiar errors you often see during development and debugging of the package. I you run the package from the code, you have opportunity to persist them or log them using your favourite logging framework. The package itself is very simple; it connects to my AdventureWorks database and saves number of rows specified in variable “User::ExportRows” to a file. You should know that before you run the package, you can change its connection strings, logging, events and many more. I attach solution with the test service, as well as a project with two test packages. To test the service, you have to run it and wait for the message saying that the host is started. Then, just type (or copy and paste) the below command to your browser. http://localhost:8011/runpackwithparams?package=%22ExportEmployees.dtsx%22&rows=12 When everything works fine, and you modified the package to point to your AdventureWorks database, you should see "OK” wrapped in xml: I stopped the database service to simulate invalid connection string situation. The output of the request is different now: And the service console window shows more information: As you see, implementing service oriented ETL framework is not a very difficult task. You have ability to configure the packages before you run them, you can implement logging that is consistent with the rest of your system. In application I have worked with we also have resource monitoring and execution control. We don’t allow to run more than certain number of packages to run simultaneously. This ensures we don’t strain the server and we use memory and CPUs efficiently. The attached zip file contains two projects. One is the package runner. It has to be executed with administrative privileges as it registers HTTP namespace. The other project contains two simple packages. This is really a cool thing, you should check it out!

    Read the article

  • Oracle Data Mining a Star Schema: Telco Churn Case Study

    - by charlie.berger
    There is a complete and detailed Telco Churn case study "How to" Blog Series just posted by Ari Mozes, ODM Dev. Manager.  In it, Ari provides detailed guidance in how to leverage various strengths of Oracle Data Mining including the ability to: mine Star Schemas and join tables and views together to obtain a complete 360 degree view of a customer combine transactional data e.g. call record detail (CDR) data, etc. define complex data transformation, model build and model deploy analytical methodologies inside the Database  His blog is posted in a multi-part series.  Below are some opening excerpts for the first 3 blog entries.  This is an excellent resource for any novice to skilled data miner who wants to gain competitive advantage by mining their data inside the Oracle Database.  Many thanks Ari! Mining a Star Schema: Telco Churn Case Study (1 of 3) One of the strengths of Oracle Data Mining is the ability to mine star schemas with minimal effort.  Star schemas are commonly used in relational databases, and they often contain rich data with interesting patterns.  While dimension tables may contain interesting demographics, fact tables will often contain user behavior, such as phone usage or purchase patterns.  Both of these aspects - demographics and usage patterns - can provide insight into behavior.Churn is a critical problem in the telecommunications industry, and companies go to great lengths to reduce the churn of their customer base.  One case study1 describes a telecommunications scenario involving understanding, and identification of, churn, where the underlying data is present in a star schema.  That case study is a good example for demonstrating just how natural it is for Oracle Data Mining to analyze a star schema, so it will be used as the basis for this series of posts...... Mining a Star Schema: Telco Churn Case Study (2 of 3) This post will follow the transformation steps as described in the case study, but will use Oracle SQL as the means for preparing data.  Please see the previous post for background material, including links to the case study and to scripts that can be used to replicate the stages in these posts.1) Handling missing values for call data recordsThe CDR_T table records the number of phone minutes used by a customer per month and per call type (tariff).  For example, the table may contain one record corresponding to the number of peak (call type) minutes in January for a specific customer, and another record associated with international calls in March for the same customer.  This table is likely to be fairly dense (most type-month combinations for a given customer will be present) due to the coarse level of aggregation, but there may be some missing values.  Missing entries may occur for a number of reasons: the customer made no calls of a particular type in a particular month, the customer switched providers during the timeframe, or perhaps there is a data entry problem.  In the first situation, the correct interpretation of a missing entry would be to assume that the number of minutes for the type-month combination is zero.  In the other situations, it is not appropriate to assume zero, but rather derive some representative value to replace the missing entries.  The referenced case study takes the latter approach.  The data is segmented by customer and call type, and within a given customer-call type combination, an average number of minutes is computed and used as a replacement value.In SQL, we need to generate additional rows for the missing entries and populate those rows with appropriate values.  To generate the missing rows, Oracle's partition outer join feature is a perfect fit.  select cust_id, cdre.tariff, cdre.month, minsfrom cdr_t cdr partition by (cust_id) right outer join     (select distinct tariff, month from cdr_t) cdre     on (cdr.month = cdre.month and cdr.tariff = cdre.tariff);   ....... Mining a Star Schema: Telco Churn Case Study (3 of 3) Now that the "difficult" work is complete - preparing the data - we can move to building a predictive model to help identify and understand churn.The case study suggests that separate models be built for different customer segments (high, medium, low, and very low value customer groups).  To reduce the data to a single segment, a filter can be applied: create or replace view churn_data_high asselect * from churn_prep where value_band = 'HIGH'; It is simple to take a quick look at the predictive aspects of the data on a univariate basis.  While this does not capture the more complex multi-variate effects as would occur with the full-blown data mining algorithms, it can give a quick feel as to the predictive aspects of the data as well as validate the data preparation steps.  Oracle Data Mining includes a predictive analytics package which enables quick analysis. begin  dbms_predictive_analytics.explain(   'churn_data_high','churn_m6','expl_churn_tab'); end; /select * from expl_churn_tab where rank <= 5 order by rank; ATTRIBUTE_NAME       ATTRIBUTE_SUBNAME EXPLANATORY_VALUE RANK-------------------- ----------------- ----------------- ----------LOS_BAND                                      .069167052          1MINS_PER_TARIFF_MON  PEAK-5                   .034881648          2REV_PER_MON          REV-5                    .034527798          3DROPPED_CALLS                                 .028110322          4MINS_PER_TARIFF_MON  PEAK-4                   .024698149          5From the above results, it is clear that some predictors do contain information to help identify churn (explanatory value > 0).  The strongest uni-variate predictor of churn appears to be the customer's (binned) length of service.  The second strongest churn indicator appears to be the number of peak minutes used in the most recent month.  The subname column contains the interior piece of the DM_NESTED_NUMERICALS column described in the previous post.  By using the object relational approach, many related predictors are included within a single top-level column. .....   NOTE:  These are just EXCERPTS.  Click here to start reading the Oracle Data Mining a Star Schema: Telco Churn Case Study from the beginning.    

    Read the article

  • SSIS Design Pattern: Loading Variable-Length Rows

    - by andyleonard
    Introduction I encounter flat file sources with variable-length rows on occassion. Here, I supply one SSIS Design Pattern for loading them. What's a Variable-Length Row Flat File? Great question - let's start with a definition. A variable-length row flat file is a text source of some flavor - comma-separated values (CSV), tab-delimited file (TDF), or even fixed-length, positional-, or ordinal-based (where the location of the data on the row defines its field). The major difference between a "normal"...(read more)

    Read the article

  • Getting Dynamic in SSIS Queries

    - by ejohnson2010
    When you start working with SQL Server and SSIS, it isn’t long before you find yourself wishing you could change bits of SQL queries dynamically. Most commonly, I see people that want to change the date portion of a query so that you can limit your query to the last 30 days, for example. This can be done using a combination of expressions and variables. I will do this in two parts, first I will build a variable that will always contain the 1 st day of the previous month and then I will dynamically...(read more)

    Read the article

  • March 2012 - SSIS Training in London!

    - by andyleonard
    I am honored to announce I will be delivering From Zero To SSIS! in London, England 5-9 Mar 2012. This course is delivered in cooperation with my friends at TechniTrain who provide awesome training by talented technologists like Chris Webb ( Blog ), Gavin Payne ( Blog ), and Christian Bolton ( Blog ). This opportunity grew out of conversations at SQLBits 9 in Liverpool in September 2011. I had an awesome time at SQLBits and encourage everyone to attend the conference if you have the opportunity to...(read more)

    Read the article

  • SSIS and Parallelism: The Unseen Minions

    Sometimes, a procedural database process cannot easily be reduced to a set-based algorithm in order to reduce the time it takes. Then, you have to find other ways to parallelise it. Other ways? Josef shows how to use SSIS to drastically reduce the time that such a process takes.

    Read the article

  • SSIS Dashboard 0.5.2 and Live Demo Website

    - by Davide Mauri
    In the last days I’ve worked again on the SQL Server Integration Service Dashboard and I did some updates: Beta Added support for "*" wildcard in project names. Now you can filter a specific project name using an url like: http://<yourserver>/project/MyPro* Added initial support for Package Execution History. Just click on a package name and you'll see its latest 15 executions and I’ve also created a live demo website for all those who want to give it a try before downloading and using it: http://ssis-dashboard.azurewebsites.net/

    Read the article

  • SSIS Basics: Using the Merge Join Transformation

    SSIS is able to take sorted data from more than one OLE DB data source and merge them into one table which can then be sent to an OLE DB destination. This 'Merge Join' transformation works in a similar way to a SQL join by specifying a 'join key' relationship. this transformation can save a great deal of processing on the destination. Annette Allen, as usual, gives clear guidance on how to do it.

    Read the article

  • SQL Rally Pre-Con: Data Warehouse Modeling – Making the Right Choices

    - by Davide Mauri
    As you may have already learned from my old post or Adam’s or Kalen’s posts, there will be two SQL Rally in North Europe. In the Stockholm SQL Rally, with my friend Thomas Kejser, I’ll be delivering a pre-con on Data Warehouse Modeling: Data warehouses play a central role in any BI solution. It's the back end upon which everything in years to come will be created. For this reason, it must be rock solid and yet flexible at the same time. To develop such a data warehouse, you must have a clear idea of its architecture, a thorough understanding of the concepts of Measures and Dimensions, and a proven engineered way to build it so that quality and stability can go hand-in-hand with cost reduction and scalability. In this workshop, Thomas Kejser and Davide Mauri will share all the information they learned since they started working with data warehouses, giving you the guidance and tips you need to start your BI project in the best way possible?avoiding errors, making implementation effective and efficient, paving the way for a winning Agile approach, and helping you define how your team should work so that your BI solution will stand the test of time. You'll learn: Data warehouse architecture and justification Agile methodology Dimensional modeling, including Kimball vs. Inmon, SCD1/SCD2/SCD3, Junk and Degenerate Dimensions, and Huge Dimensions Best practices, naming conventions, and lessons learned Loading the data warehouse, including loading Dimensions, loading Facts (Full Load, Incremental Load, Partitioned Load) Data warehouses and Big Data (Hadoop) Unit testing Tracking historical changes and managing large sizes With all the Self-Service BI hype, Data Warehouse is become more and more central every day, since if everyone will be able to analyze data using self-service tools, it’s better for him/her to rely on correct, uniform and coherent data. Already 50 people registered from the workshop and seats are limited so don’t miss this unique opportunity to attend to this workshop that is really a unique combination of years and years of experience! http://www.sqlpass.org/sqlrally/2013/nordic/Agenda/PreconferenceSeminars.aspx See you there!

    Read the article

  • Enforce SSIS naming conventions using BI-xPress

    - by jamiet
    A long long long time ago (in 2006 in fact) I published a blog post entitled Suggested Best Practises and naming conventions in which I suggested a bunch of acronyms that folks could use to prefix object names in their SSIS packages, thus allowing easier identification of those objects in log records, here is a sample of some of those suggestions: If you have adopted these naming conventions (and I am led to believe that a bunch of people have) then you might like to know that you can now check for adherence to these conventions using a tool called BI-xPress from Pragmatic Works. BI-xPress includes a feature called the Best Practices Analyzer that scans your packages and assess them according to some rules that you specify. In addition Pragmatic Works have made available a collection of these rules that adhere to the naming conventions I specified in 2006 You can download this collection however I recommend you first read the accompanying article that demonstrates the capabilities of the Best Practices Analyzer. Pretty cool stuff. @Jamiet

    Read the article

  • Oracle Financial Analytics for SAP Certified with Oracle Data Integrator EE

    - by denis.gray
    Two days ago Oracle announced the release of Oracle Financial Analytics for SAP.  With the amount of press this has garnered in the past two days, there's a key detail that can't be missed.  This release is certified with Oracle Data Integrator EE - now making the combination of Data Integration and Business Intelligence a force to contend with.  Within the Oracle Press Release there were two important bullets: ·         Oracle Financial Analytics for SAP includes a pre-packaged ABAP code compliant adapter and is certified with Oracle Data Integrator Enterprise Edition to integrate SAP Financial Accounting data directly with the analytic application.  ·         Helping to integrate SAP financial data and disparate third-party data sources is Oracle Data Integrator Enterprise Edition which delivers fast, efficient loading and transformation of timely data into a data warehouse environment through its high-performance Extract Load and Transform (E-LT) technology. This is very exciting news, demonstrating Oracle's overall commitment to Oracle Data Integrator EE.   This is a great way to start off the new year and we look forward to building on this momentum throughout 2011.   The following links contain additional information and media responses about the Oracle Financial Analytics for SAP release. IDG News Service (Also appeared in PC World, Computer World, CIO: "Oracle is moving further into rival SAP's turf with Oracle Financial Analytics for SAP, a new BI (business intelligence) application that can crunch ERP (enterprise resource planning) system financial data for insights." Information Week: "Oracle talks a good game about the appeal of an optimized, all-Oracle stack. But the company also recognizes that we live in a predominantly heterogeneous IT world" CRN: "While some businesses with SAP Financial Accounting already use Oracle BI, those integrations had to be custom developed. The new offering provides pre-built integration capabilities." ECRM Guide:  "Among other features, Oracle Financial Analytics for SAP helps front-line managers improve financial performance and decision-making with what the company says is comprehensive, timely and role-based information on their departments' expenses and revenue contributions."   SAP Getting Started Guide for ODI on OTN: http://www.oracle.com/technetwork/middleware/data-integrator/learnmore/index.html For more information on the ODI and its SAP connectivity please review the Oracle® Fusion Middleware Application Adapters Guide for Oracle Data Integrator11g Release 1 (11.1.1)

    Read the article

  • SSIS - The expression cannot be parsed because it contains invalid elements at the location specifie

    - by simonsabin
    If you get the following error when trying to write an expression there is an easy solution Attempt to parse the expression "@[User::FilePath] + "\" + @[User::FileName] + ".raw"" failed.  The token "." at line number "1", character number "<some position>" was not recognized. The expression cannot be parsed because it contains invalid elements at the location specified. The SSIS expression language is a C based language and the \ is a token, this means you have to escape it with another one. i.e "\" becomes "\\", unlike C# you can't prefix the string with a @, you have to use the escaping route. In summary when ever you want to use \ you need to use two \\

    Read the article

  • How hard it will be for the programmer to learn MS SSRS adn SSIS [closed]

    - by user75380
    I have a programming background in php/python/java for 5 years and I know MySQL and PostgreSQL. Currently in our company the MSQL Business Intelligence person is leaving his job in 4 months. I am thinking of trying to go to his place, at least try as I want to move in Business Intelligence field in SSRS and SSIS. I just want to know that is it possible for me to get my head around those things in 4 months because I have no idea how they work and how hard it will be for me to pick up those things. Can I do that? I just want to know from experienced people if I can move towards that field? At least how should I start? In my area there are shortages of person, so once I know the stuff I can get into junior jobs easily but I want to know from experienced people.

    Read the article

  • SSIS is Case-Sensitive

    - by andyleonard
    Introduction SSIS is case-sensitive even if the database is case-insensitive. Imagine... ... you work in an ETL shop where someone who believes in natural keys won the Battle of the Joins. Imagine one of your natural keys is a string. (I know it's a stretch... play along!). Let's build some tables to sketch it out. If you do not have a TestDB database, why not? Build one! You'll use it often. Use TestDB go Create Table SSIS1 ( StrID char ( 5 ) , Name varchar ( 15 ) , Value int ) Insert Into SSIS1...(read more)

    Read the article

  • SSIS - XML Source Script

    - by simonsabin
    The XML Source in SSIS is great if you have a 1 to 1 mapping between entity and table. You can do more complex mapping but it becomes very messy and won't perform. What other options do you have? The challenge with XML processing is to not need a huge amount of memory. I remember using the early versions of Biztalk with loaded the whole document into memory to map from one document type to another. This was fine for small documents but was an absolute killer for large documents. You therefore need a streaming approach. For flexibility however you want to be able to generate your rows easily, and if you've ever used the XmlReader you will know its ugly code to write. That brings me on to LINQ. The is an implementation of LINQ over XML which is really nice. You can write nice LINQ queries instead of the XMLReader stuff. The downside is that by default LINQ to XML requires a whole XML document to work with. No streaming. Your code would look like this. We create an XDocument and then enumerate over a set of annoymous types we generate from our LINQ statement XDocument x = XDocument.Load("C:\\TEMP\\CustomerOrders-Attribute.xml");   foreach (var xdata in (from customer in x.Elements("OrderInterface").Elements("Customer")                        from order in customer.Elements("Orders").Elements("Order")                        select new { Account = customer.Attribute("AccountNumber").Value                                   , OrderDate = order.Attribute("OrderDate").Value }                        )) {     Output0Buffer.AddRow();     Output0Buffer.AccountNumber = xdata.Account;     Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate); } As I said the downside to this is that you are loading the whole document into memory. I did some googling and came across some helpful videos from a nice UK DPE Mike Taulty http://www.microsoft.com/uk/msdn/screencasts/screencast/289/LINQ-to-XML-Streaming-In-Large-Documents.aspx. Which show you how you can combine LINQ and the XmlReader to get a semi streaming approach. I took what he did and implemented it in SSIS. What I found odd was that when I ran it I got different numbers between using the streamed and non streamed versions. I found the cause was a little bug in Mikes code that causes the pointer in the XmlReader to progress past the start of the element and thus foreach (var xdata in (from customer in StreamReader("C:\\TEMP\\CustomerOrders-Attribute.xml","Customer")                                from order in customer.Elements("Orders").Elements("Order")                                select new { Account = customer.Attribute("AccountNumber").Value                                           , OrderDate = order.Attribute("OrderDate").Value }                                ))         {             Output0Buffer.AddRow();             Output0Buffer.AccountNumber = xdata.Account;             Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate);         } These look very similiar and they are the key element is the method we are calling, StreamReader. This method is what gives us streaming, what it does is return a enumerable list of elements, because of the way that LINQ works this results in the data being streamed in. static IEnumerable<XElement> StreamReader(String filename, string elementName) {     using (XmlReader xr = XmlReader.Create(filename))     {         xr.MoveToContent();         while (xr.Read()) //Reads the first element         {             while (xr.NodeType == XmlNodeType.Element && xr.Name == elementName)             {                 XElement node = (XElement)XElement.ReadFrom(xr);                   yield return node;             }         }         xr.Close();     } } This code is specifically designed to return a list of the elements with a specific name. The first Read reads the root element and then the inner while loop checks to see if the current element is the type we want. If not we do the xr.Read() again until we find the element type we want. We then use the neat function XElement.ReadFrom to read an element and all its sub elements into an XElement. This is what is returned and can be consumed by the LINQ statement. Essentially once one element has been read we need to check if we are still on the same element type and name (the inner loop) This was Mikes mistake, if we called .Read again we would advance the XmlReader beyond the start of the Element and so the ReadFrom method wouldn't work. So with the code above you can use what ever LINQ statement you like to flatten your XML into the rowsets you want. You could even have multiple outputs and generate your own surrogate keys.        

    Read the article

  • SSIS - Range lookups

    - by Repieter
      When developing an ETL solution in SSIS we sometimes need to do range lookups in SSIS. Several solutions for this can be found on the internet, but now we have built another solution which I would like to share, since it's pretty easy to implement and the performance is fast.   You can download the sample package to see how it works. Make sure you have the AdventureWorks2008R2 and AdventureWorksDW2008R2 databases installed. (Apologies for the layout of this blog, I don't do this too often :))   To give a little bit more information about the example, this is basically what is does: we load a facttable and do an SCD type 2 lookup operation of the Product dimension. This is done with a script component.   First we query the Data warehouse to create the lookup dataset. The query that is used for that is:   SELECT     [ProductKey]     ,[ProductAlternateKey]     ,[StartDate]     ,ISNULL([EndDate], '9999-01-01') AS EndDate FROM [DimProduct]     The output of this query is stored in a DataTable:     string lookupQuery = @"                         SELECT                             [ProductKey]                             ,[ProductAlternateKey]                             ,[StartDate]                             ,ISNULL([EndDate], '9999-01-01') AS EndDate                         FROM [DimProduct]";           OleDbCommand oleDbCommand = new OleDbCommand(lookupQuery, _oleDbConnection);         OleDbDataAdapter adapter = new OleDbDataAdapter(oleDbCommand);           _dataTable = new DataTable();         adapter.Fill(_dataTable);     Now that the dimension data is stored in the DataTable we use the following method to do the actual lookup:   public int RangeLookup(string businessKey, DateTime lookupDate)     {         // set default return value (Unknown)         int result = -1;           DataRow[] filteredRows;         filteredRows = _dataTable.Select(string.Format("ProductAlternateKey = '{0}'", businessKey));           for (int i = 0; i < filteredRows.Length; i++)         {             // check if the lookupdate is found between the startdate and enddate of any of the records             if (lookupDate >= (DateTime)filteredRows[i][2] && lookupDate < (DateTime)filteredRows[i][3])             {                 result = (filteredRows[i][0] == null) ? -1 : (int)filteredRows[i][0];                 break;             }         }           filteredRows = null;           return result;     }       This method is executed for every row that passes the script component. This is implemented in the ProcessInputRow method   public override void Input0_ProcessInputRow(Input0Buffer Row)     {         // Perform the lookup operation on the current row and put the value in the Surrogate Key Attribute         Row.ProductKey = RangeLookup(Row.ProductNumber, Row.OrderDate);     }   Now what actually happens?!   1. Every record passes the business key and the orderdate to the RangeLookup method. 2. The DataTable is then filtered on the business key of the current record. The output is stored in a DataRow [] object. 3. We loop over the DataRow[] object to see where the orderdate meets the following expression: (lookupDate >= (DateTime)filteredRows[i][2] && lookupDate < (DateTime)filteredRows[i][3]) 4. When the expression returns true (so where the data is between the Startdate and the EndDate), the surrogate key of the dimension record is returned   We have done some testing with this solution and it works great for us. Hope others can use this example to do their range lookups.

    Read the article

  • Inequality joins, Asynchronous transformations and Lookups : SSIS

    - by jamiet
    It is pretty much accepted by SQL Server Integration Services (SSIS) developers that synchronous transformations are generally quicker than asynchronous transformations (for a description of synchronous and asynchronous transformations go read Asynchronous and synchronous data flow components). Notice I said “generally” and not “always”; there are circumstances where using asynchronous transformations can be beneficial and in this blog post I’ll demonstrate such a scenario, one that is pretty common when building data warehouses. Imagine I have a [Customer] dimension table that manages information about all of my customers as a slowly-changing dimension. If that is a type 2 slowly changing dimension then you will likely have multiple rows per customer in that table. Furthermore you might also have datetime fields that indicate the effective time period of each member record. Here is such a table that contains data for four dimension members {Terry, Max, Henry, Horace}: Notice that we have multiple records per customer and that the [SCDStartDate] of a record is equivalent to the [SCDEndDate] of the record that preceded it (if there was one). (Note that I am on record as saying I am not a fan of this technique of storing an [SCDEndDate] but for the purposes of clarity I have included it here.) Anyway, the idea here is that we will have some incoming data containing [CustomerName] & [EffectiveDate] and we need to use those values to lookup [Customer].[CustomerId]. The logic will be: Lookup a [CustomerId] WHERE [CustomerName]=[CustomerName] AND [SCDStartDate] <= [EffectiveDate] AND [EffectiveDate] <= [SCDEndDate] The conventional approach to this would be to use a full cached lookup but that isn’t an option here because we are using inequality conditions. The obvious next step then is to use a non-cached lookup which enables us to change the SQL statement to use inequality operators: Let’s take a look at the dataflow: Notice these are all synchronous components. This approach works just fine however it does have the limitation that it has to issue a SQL statement against your lookup set for every row thus we can expect the execution time of our dataflow to increase linearly in line with the number of rows in our dataflow; that’s not good. OK, that’s the obvious method. Let’s now look at a different way of achieving this using an asynchronous Merge Join transform coupled with a Conditional Split. I’ve shown it post-execution so that I can include the row counts which help to illustrate what is going on here: Notice that there are more rows output from our Merge Join component than on the input. That is because we are joining on [CustomerName] and, as we know, we have multiple records per [CustomerName] in our lookup set. Notice also that there are two asynchronous components in here (the Sort and the Merge Join). I have embedded a video below that compares the execution times for each of these two methods. The video is just over 8minutes long. View on Vimeo  For those that can’t be bothered watching the video I’ll tell you the results here. The dataflow that used the Lookup transform took 36 seconds whereas the dataflow that used the Merge Join took less than two seconds. An illustration in case it is needed: Pretty conclusive proof that in some scenarios it may be quicker to use an asynchronous component than a synchronous one. Your mileage may of course vary. The scenario outlined here is analogous to performance tuning procedural SQL that uses cursors. It is common to eliminate cursors by converting them to set-based operations and that is effectively what we have done here. Our non-cached lookup is performing a discrete operation for every single row of data, exactly like a cursor does. By eliminating this cursor-in-disguise we have dramatically sped up our dataflow. I hope all of that proves useful. You can download the package that I demonstrated in the video from my SkyDrive at http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100514/20100514%20Lookups%20and%20Merge%20Joins.zip Comments are welcome as always. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • What is the definition of "Big Data"?

    - by Ben
    Is there one? All the definitions I can find describe the size, complexity / variety or velocity of the data. Wikipedia's definition is the only one I've found with an actual number Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. However, this seemingly contradicts the MIKE2.0 definition, referenced in the next paragraph, which indicates that "big" data can be small and that 100,000 sensors on an aircraft creating only 3GB of data could be considered big. IBM despite saying that: Big data is more simply than a matter of size. have emphasised size in their definition. O'Reilly has stressed "volume, velocity and variety" as well. Though explained well, and in more depth, the definition seems to be a re-hash of the others - or vice-versa of course. I think that a Computer Weekly article title sums up a number of articles fairly well "What is big data and how can it be used to gain competitive advantage". But ZDNet wins with the following from 2012: “Big Data” is a catch phrase that has been bubbling up from the high performance computing niche of the IT market... If one sits through the presentations from ten suppliers of technology, fifteen or so different definitions are likely to come forward. Each definition, of course, tends to support the need for that supplier’s products and services. Imagine that. Basically "big data" is "big" in some way shape or form. What is "big"? Is it quantifiable at the current time? If "big" is unquantifiable is there a definition that does not rely solely on generalities?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >