Search Results

Search found 3720 results on 149 pages for 'sam se'.

Page 6/149 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

Mac OS X behind OpenLDAP and Samba

- by Sam Hammamy

I have been battling for a week now to get my Mac (Mountain Lion) to authenticate on my home network's OpenLDAP and Samba. From several sources, like the Ubuntu community docs, and other blogs, and after a hell of a lot of trial and error and piecing things together, I have created a samba.ldif that will pass the smbldap-populate when combined with apple.ldif and I have a fully functional OpenLDAP server and a Samba PDC that uses LDAP to authenticate the OS X Machine. The problem is that when I login, the home directory is not created or pulled from the server. I get the following in system.log Sep 21 06:09:15 Sams-MacBook-Pro.local SecurityAgent[265]: User info context values set for sam Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_authenticate(): Got user: sam Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_authenticate(): Got ruser: (null) Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_authenticate(): Got service: authorization Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in od_principal_for_user(): no authauth availale for user. Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in od_principal_for_user(): failed: 7 Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_authenticate(): Failed to determine Kerberos principal name. Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_authenticate(): Done cleanup3 Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_authenticate(): Kerberos 5 refuses you Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_authenticate(): pam_sm_authenticate: ntlm Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_acct_mgmt(): OpenDirectory - Membership cache TTL set to 1800. Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in od_record_check_pwpolicy(): retval: 0 Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_setcred(): Establishing credentials Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_setcred(): Got user: sam Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_setcred(): Context initialised Sep 21 06:09:15 Sams-MacBook-Pro.local authorizationhost[270]: in pam_sm_setcred(): pam_sm_setcred: ntlm user sam doesn't have auth authority All that's great and good and I authenticate. Then I get CFPreferences: user home directory for user kCFPreferencesCurrentUser at /Network/Servers/172.17.148.186/home/sam is unavailable. User domains will be volatile. Failed looking up user domain root; url='file://localhost/Network/Servers/172.17.148.186/home/sam/' path=/Network/Servers/172.17.148.186/home/sam/ err=-43 uid=9000 euid=9000 If you're wondering where /Network/Servers/IP/home/sam comes from, it's from a couple of blogs that said the OpenLDAP attribute apple-user-homeDirectory should have that value and the NFSHomeDirectory on the mac should point to apple-user-homeDirectory I also set the attr apple-user-homeurl to <home_dir><url>smb://172.17.148.186/sam/</url><path></path></home_dir> which I found on this forum. Any help is appreciated, because I'm banging my head against the wall at this point. By the way, I intend to create a blog on my vps just for this, and create an install script in python that people can download so no one has to go through what I've had to go through this week :) After some sleep I am going to try to login from a windows machine and report back here. Thanks Sam

Read the article
Swing object: first setText() gets "stuck" when using Mac Java SE 6

- by Tim

Hi there, I am a Java newbie trying to maintain an application that works fine under J2SE 5.0 (32- and 64-bit) but has a very specific problem when run under Java SE 6 64-bit: [Tims-MPB:~] tlynch% java -version java version "1.6.0_15" Java(TM) SE Runtime Environment (build 1.6.0_15-b03-226) Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-92, mixed mode) The application is cross-platform and reportedly works correctly on Java SE 6 under Windows, though I haven't been able to verify that myself. The program uses a JTextField for some text entry and a JLabel to indicate the text to be entered. The first time the showDialog() method is called to set the label text and display the dialog, it works correctly, but subsequent calls all result in the display of the label from the initial invocation rather than the one most recently specified via setText(). public void showDialog(String msgText) { System.out.println("set ChatDialog: " + msgText); jLabel1.setText(msgText); jLabel1.repaint(); // I added this; it didn't help System.out.println("get ChatDialog: " + jLabel1.getText()); super.setVisible(true); } [the full text of the class is provided below] The added printlns validate that expected text is passed to the label's setText() method and is confirmed by retrieving it using getText(), but what shows up on the screen/GUI is always the text from the very first time the method was called for the object. A similar issue is observed with a JTextArea used to label another dialog box. These problem are consistent across multiple Mac systems running Java SE 6 under OS 10.5.x and 10.6.x, but they are never observed when one reverts to J2SE 5.0. If there is some background information pertinent to this problem that I have omitted, please let me know. Any insights or advice appreciated. package gui; import java.awt.*; import java.awt.event.KeyEvent; import javax.swing.*; // Referenced classes of package gui: // MyJPanel, ChatDialog_jTextField1_keyAdapter, WarWindow public class ChatDialog extends JDialog { public ChatDialog(JFrame parent, WarWindow w) { super(parent, true); text = ""; borderLayout1 = new BorderLayout(); jPanel1 = new MyJPanel(); borderLayout2 = new BorderLayout(); jPanel2 = new MyJPanel(); jPanel3 = new MyJPanel(); jLabel1 = new JLabel(); jTextField1 = new JTextField(); warWindow = w; try { jbInit(); } catch(Exception exception) { System.out.println("Problem with ChatDialog init"); exception.printStackTrace(); } return; } public String getText() { return text; } void jTextField1_keyPressed(KeyEvent e) { int id = e.getKeyCode(); switch(id) { case 10: // '\n' text = jTextField1.getText(); setVisible(false); break; } } private void jbInit() throws Exception { setLocation(232, 450); setSize(560, 60); setModal(true); setResizable(false); setUndecorated(true); getContentPane().setLayout(borderLayout1); jPanel1.setLayout(borderLayout2); jPanel2.setMinimumSize(new Dimension(10, 20)); jPanel2.setPreferredSize(new Dimension(10, 20)); jLabel1.setPreferredSize(new Dimension(380, 15)); jLabel1.setHorizontalAlignment(0); jLabel1.setText("Chat Message"); jTextField1.setPreferredSize(new Dimension(520, 21)); jTextField1.setRequestFocusEnabled(false); jTextField1.addKeyListener(new ChatDialog_jTextField1_keyAdapter(this)); getContentPane().add(jPanel1, "Center"); jPanel1.add(jPanel2, "North"); jPanel2.add(jLabel1, null); jPanel1.add(jPanel3, "Center"); jPanel3.add(jTextField1, null); } public void setVisible(boolean b) { jTextField1.setText(""); super.setVisible(b); } public void showDialog(String msgText) { System.out.println("set ChatDialog: " + msgText); jLabel1.setText(msgText); jLabel1.repaint(); // I added this; it didn't help System.out.println("get ChatDialog: " + jLabel1.getText()); super.setVisible(true); } void this_keyPressed(KeyEvent e) { int id = e.getKeyCode(); switch(id) { case 10: // '\n' System.exit(88); break; } } BorderLayout borderLayout1; BorderLayout borderLayout2; JLabel jLabel1; JPanel jPanel1; JPanel jPanel2; JPanel jPanel3; JTextField jTextField1; String text; WarWindow warWindow; }

Read the article
SQL Server Configuration timeouts - and a workaround [SSIS]

- by jamiet

Ever since I started writing SSIS packages back in 2004 I have opted to store configurations in .dtsConfig (.i.e. XML) files rather than in a SQL Server table (aka SQL Server Configurations) however recently I inherited some packages that used SQL Server Configurations and thus had to immerse myself in their murky little world. To all the people that have ever gone onto the SSIS forum and asked questions about ambiguous behaviour of SQL Server Configurations I now say this... I feel your pain! The biggest problem I have had was in dealing with the change to the order in which configurations get applied that came about in SSIS 2008. Those changes are detailed on MSDN at SSIS Package Configurations however the pertinent bits are: As the utility loads and runs the package, events occur in the following order: The dtexec utility loads the package. The utility applies the configurations that were specified in the package at design time and in the order that is specified in the package. (The one exception to this is the Parent Package Variables configurations. The utility applies these configurations only once and later in the process.) The utility then applies any options that you specified on the command line. The utility then reloads the configurations that were specified in the package at design time and in the order specified in the package. (Again, the exception to this rule is the Parent Package Variables configurations). The utility uses any command-line options that were specified to reload the configurations. Therefore, different values might be reloaded from a different location. The utility applies the Parent Package Variable configurations. The utility runs the package. To understand how these steps differ from SSIS 2005 I recommend reading Doug Laudenschlager’s blog post Understand how SSIS package configurations are applied. The very nature of SQL Server Configurations means that the Connection String for the database holding the configuration values needs to be supplied from the command-line. Typically then the call to execute your package resembles this: dtexec /FILE Package.dtsx /SET "\Package.Connections[SSISConfigurations].Properties[ConnectionString]";"\"Data Source=SomeServer;Initial Catalog=SomeDB;Integrated Security=SSPI;\"", The problem then is that, as per the steps above, the package will (1) attempt to apply all configurations using the Connection String stored in the package for the "SSISConfigurations" Connection Manager before then (2) applying the Connection String from the command-line and then (3) apply the same configurations all over again. In the packages that I inherited that first attempt to apply the configurations would timeout (not unexpected); I had 8 SQL Server Configurations in the package and thus the package was waiting for 2 minutes until all the Configurations timed out (i.e. 15seconds per Configuration) - in a package that only executes for ~8seconds when it gets to do its actual work a delay of 2minutes was simply unacceptable. We had three options in how to deal with this: Get rid of the use of SQL Server configurations and use .dtsConfig files instead Edit the packages when they get deployed Change the timeout on the "SSISConfigurations" Connection Manager #1 was my preferred choice but, for reasons I explain below*, wasn't an option in this particular instance. #2 was discounted out of hand because it negates the point of using Configurations in the first place. This left us with #3 - change the timeout on the Connection Manager. This is done by going into the properties of the Connection Manager, opening the "All" tab and changing the Connect Timeout property to some suitable value (in the screenshot below I chose 2 seconds). This change meant that the attempts to apply the SQL Server configurations timed out in 16 seconds rather than two minutes; clearly this isn't an optimum solution but its certainly better than it was. So there you have it - if you are having problems with SQL Server configuration timeouts within SSIS try changing the timeout of the Connection Manager. Better still - don't bother using SQL Server Configuration in the first place. Even better - install RC0 of SQL Server 2012 to start leveraging SSIS parameters and leave the nasty old world of configurations behind you. @Jamiet * Basically, we are leveraging a SSIS execution/logging framework in which the client had invested a lot of resources and SQL Server Configurations are an integral part of that.

Read the article
ODBC in SSIS 2012

- by jamiet

In August 2011 the SQL Server client team published a blog post entitled Microsoft is Aligning with ODBC for Native Relational Data Access in which they basically said "OLE DB is the past, ODBC is the future. Deal with it.". From that blog post:We encourage you to adopt ODBC in the development of your new and future versions of your application. You don’t need to change your existing applications using OLE DB, as they will continue to be supported on Denali throughout its lifecycle. While this gives you a large window of opportunity for changing your applications before the deprecation goes into effect, you may want to consider migrating those applications to ODBC as a part of your future roadmap.I recently undertook a project using SSIS2012 and heeded that advice by opting to use ODBC Connection Managers rather than OLE DB Connection Managers. Unfortunately my finding was that the ODBC Connection Manager is not yet ready for primetime use in SSIS 2012. The main issue I found was that you can't populate an Object variable with a recordset when using an Execute SQL Task connecting to an ODBC data source; any attempt to do so will result in an error:"Disconnected recordsets are not available from ODBC connections." I have filed a bug on Connect at ODBC Connection Manager does not have same funcitonality as OLE DB. For this reason I strongly recommend that you don't make the move to ODBC Connection Managers in SSIS just yet - best to wait for the next version of SSIS before doing that.I found another couple of issues with the ODBC Connection Manager that are worth keeping in mind:It doesn't recognise System Data Source Names (DSNs), only User DSNs (bug filed at ODBC System DSNs are not available in the ODBC Connection Manager) UPDATE: According to a comment on that Connect item this may only be a problem on 64bit.In the OLE DB Connection Manager parameter ordinals are 0-based, in the ODBC Connection Manager they are 1-based (oh I just can't wait for the upgrade mess that ensues from this one!!!)You have been warned!@jamiet

Read the article
Microsoft 2010 Product Tour

- by dmccollough

Randy Walker, Co-Founder of the Northwest Arkansas .Net User Group and Microsoft MVP has arranged for a couple of Microsoft experts, Sarika Calla Team Lead on the IDE Team and Kevin Halverson to give presentations on newly released Visual Studio 2010. June 1 – Bentonville, Arkansas Wal-Mart .Net User Group June 1 – Rogers, Arkansas Northwest Arkansas SQL Server User Group (lunch meeting) June 1 – Springdale, Arkansas Tyson devLoop June 1 – Fayetteville, Arkansas Northwest Arkansas .Net User Group June 2 – Fort Smith, Arkansas Datatronics June 2 – Little Rock, Arkansas Little Rock .Net User Group June 3 – Fort Worth, Texas Fort Worth .Net User Group Please contact Randy Walker with questions at [email protected].

Read the article
SSIS code smell – Unused columns in the dataflow

- by jamiet

A code smell is defined on Wikipedia as being a “symptom in the source code of a program that possibly indicates a deeper problem”. It’s a term commonly used by our code-writing brethren to describe sub-optimal code but I think the term can be applied equally well to SSIS packages too as I shall now explain One of my pet hates about SSIS development is packages that throw warnings of the form: The output column "ColumnName" (1358) on output "OLE DB Source Output" (1289) and component "OLE_SRC Name" (1279) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance. The warning is fairly self-explanatory – any column that appears in the data flow but doesn’t get used will throw this warning when the data flow is executed. Its not the negligible performance degradation that they cause that bothers me though, it’s the clutter that they cause in your log file/table. Take a look at the following screenshot if you don’t believe me: There are 231409 such warnings in the system that I took this screenshot from, that is 231409 log records that should not be there. The most infuriating thing about this warning is that it is so easily avoidable; eliminating such columns is a very quick and easy thing to do in the SSIS Designer. The only problem I see is that the warnings don’t occur until you execute the package – it would be preferable for the designer to have an unobtrusive way of informing you of them as well. Anyway, I digress… I consider such warnings to be a code smell because, to me, they’re symptomatic of a lack of due care and attention; a lack of developer discipline if you will. What other code smells can you think of when building SSIS packages? If I get a good list in the comments maybe I’ll compile them into a later blog post. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Parsing flat files using SSIS : SSIS Nugget

- by jamiet

Often when using SQL Server Integration Services (SSIS) you will find there is more than one way of accomplishing a task and that the most obvious method of doing so might not be the optimal one. In the video below I demonstrate this by way of an experiment using SSIS’s Flat File Source component; I show different ways that you can pull data from a flat file into the SSIS dataflow and also how the nature of the data itself can influence your choice as to how this task should be accomplished. If you are having trouble viewing the video in your blog reader then head to http://sqlblog.com/blogs/jamie_thomson/archive/2010/03/25/parsing-flat-files-using-ssis-ssis-nugget.aspx to see it as it is hosted on my blog! The main point I want to get across from this video is that a little bit of creative thinking when building your dataflows can sometimes be very beneficial for performance; quite often building a solution that isn’t the most obvious might actually turn out to be the best one. You’ll notice, if you have watched the video, that my editing skills weren’t quite up to snuff and I cut off the final few words however all I was saying was that if you have any feedback on this video then I would love to hear it either via email or preferably the comments section below. I hope this turns out to be useful to some of you. @Jamiet P.S. Incidentally the parsing that we do using SSIS expressions in the video would be much easier if we had a TOKENISE function in SSIS’s expression language and I have asked for the introduction of such a function on Connect at [SSIS] TOKEN(string, tokeniser_string, occurence) function. Feel free to go and vote that up if you think this feature would be useful! Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Introducing SSIS Reporting Pack for SQL Server code-named Denali

- by jamiet

In recent blog posts I have introduced the new SSIS Catalog that is forthcoming in SQL Server Code-named Denali: What's new in SSIS in Denali Introduction to SSIS Projects in Denali Parameters in SSIS In Denali SSIS Server, Catalogs, Environments and Environment Variables in SSIS in Denali The SSIS Catalog is responsible for executing SSIS packages and also for capturing the metadata from those executions. However, at the time of writing there is no mechanism provided to view analyse and drill into that metadata and that is the reason that I am, in this blog post, introducing a suite of SSIS Catalog reports called the SSIS Reporting Pack which you can download from my SkyDrive at http://cid-550f681dad532637.office.live.com/self.aspx/Public/SSIS%20Reporting%20Pack/SSISReportingPack%20v0.1.zip. In this first release the SSIS Reporting Pack includes five reports: Catalog – A high-level summary of all activity in the Catalog Folders – A summary of activity in each Catalog Folder Folder – Project-level activity per single Folder Executions – A visualisation of all executions per Folder/Project/Package/Environment or subset thereof Execution – Information about an individual execution Here is a screenshot of the Executions report: Notice that the SSIS Reporting Pack provides a visual overview of all executions in the Catalog. Each execution is represented as a bar on the bar chart, the success or otherwise of each execution is indicated by the colour of the bar and the execution time is indicated by the bar height. I have recorded a video that gives an overview of the SSIS Reporting which I have embedded below. If you are having any trouble viewing the video go see it at http://vimeo.com/17617974 I must stress that this is a very early version of the SSIS Reporting Pack and I am expecting it to change a lot over the coming year. I am very keen to get some feedback about this, specifically: let me know if anything does not work as you expect give me your feature requests The easiest way to get hold of of me for now is within the comments section of this blog post. That’s all for now. I hope the SSIS Reporting Pack proves useful and I look forward to hearing your feedback. Lastly, that download link again: http://cid-550f681dad532637.office.live.com/self.aspx/Public/SSIS%20Reporting%20Pack/SSISReportingPack%20v0.1.zip. @jamiet

Read the article
FileNameColumnName property, Flat File Source Adapter : SSIS Nugget

- by jamiet

I saw a question on MSDN’s SSIS forum the other day that went something like this: I’m loading data into a table from a flat file but I want to be able to store the name of that file as well. Is there a way of doing that? I don’t want to come across as disrespecting those who took the time to reply but there was a few answers along the lines of “loop over the files using a For Each, store the file name in a variable yadda yadda yadda” when in fact there is a much much simpler way of accomplishing this; it just happens to be a little hidden away as I shall now explain! The Flat File Source Adapter has a property called FileNameColumnName which for some reason it isn’t exposed through the Flat File Source editor, it is however exposed via the Advanced Properties: You’ll see in the screenshot above that I have set FileNameColumnName=“Filename” (it doesn’t matter what name you use, anything except a non-zero string will work). What this will do is create a new column in our dataflow called “Filename” that contains, unsurprisingly, the name of the file from which the row was sourced. All very simple. This is particularly useful if you are extracting data from multiple files using the MultiFlatFile Connection Manager as it allows you to differentiate between data from each of the files as you can see in the following screenshot: So there you have it, the FileNameColumnName property; a little known secret of SSIS. I hope it proves to be useful to someone out there. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Dynamic Unpivot : SSIS Nugget

- by jamiet

A question on the SSIS forum earlier today asked: I need to dynamically unpivot some set of columns in my source file. Every month there is one new column and its set of Values. I want to unpivot it without editing my SSIS packages that is deployed Let’s be clear about what we mean by Unpivot. It is a normalisation technique that basically converts columns into rows. By way of example it converts something like this: AccountCode Jan Feb Mar AC1 100.00 150.00 125.00 AC2 45.00 75.50 90.00 into something like this: AccountCode Month Amount AC1 Jan 100.00 AC1 Feb 150.00 AC1 Mar 125.00 AC2 Jan 45.00 AC2 Feb 75.50 AC2 Mar 90.00 The Unpivot transformation in SSIS is perfectly capable of carrying out the operation defined in this example however in the case outlined in the aforementioned forum thread the problem was a little bit different. I interpreted it to mean that the number of columns could change and in that scenario the Unpivot transformation (and indeed the SSIS dataflow in general) is rendered useless because it expects that the number of columns will not change from what is specified at design-time. There is a workaround however. Assuming all of the columns that CAN exist will appear at the end of the rows, we can (1) import all of the columns in the file as just a single column, (2) use a script component to loop over all the values in that “column” and (3) output each one as a column all of its own. Let’s go over that in a bit more detail. I’ve prepared a data file that shows some data that we want to unpivot which shows some customers and their mythical shopping lists (it has column names in the first row): We use a Flat File Connection Manager to specify the format of our data file to SSIS: and a Flat File Source Adapter to put it into the dataflow (no need a for a screenshot of that one – its very basic). Notice that the values that we want to unpivot all exist in a column called [Groceries]. Now onto the script component where the real work goes on, although the code is pretty simple: Here I show a screenshot of this executing along with some data viewers. As you can see we have successfully pulled out all of the values into a row all of their own thus accomplishing the Dynamic Unpivot that the forum poster was after. If you want to run the demo for yourself then I have uploaded the demo package and source file up to my SkyDrive: http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100529/Dynamic%20Unpivot.zip Simply extract the two files into a folder, make sure the Connection Manager is pointing to the file, and execute! Hope this is useful. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
SSIS Lookup component tuning tips

- by jamiet

Yesterday evening I attended a London meeting of the UK SQL Server User Group at Microsoft’s offices in London Victoria. As usual it was both a fun and informative evening and in particular there seemed to be a few questions arising about tuning the SSIS Lookup component; I rattled off some comments and figured it would be prudent to drop some of them into a dedicated blog post, hence the one you are reading right now. Scene setting A popular pattern in SSIS is to use a Lookup component to determine whether a record in the pipeline already exists in the intended destination table or not and I cover this pattern in my 2006 blog post Checking if a row exists and if it does, has it changed? (note to self: must rewrite that blog post for SSIS2008). Fundamentally the SSIS lookup component (when using FullCache option) sucks some data out of a database and holds it in memory so that it can be compared to data in the pipeline. One of the big benefits of using SSIS dataflows is that they process data one buffer at a time; that means that not all of the data from your source exists in the dataflow at the same time and is why a SSIS dataflow can process data volumes that far exceed the available memory. However, that only applies to data in the pipeline; for reasons that are hopefully obvious ALL of the data in the lookup set must exist in the memory cache for the duration of the dataflow’s execution which means that any memory used by the lookup cache will not be available to be used as a pipeline buffer. Moreover, there’s an obvious correlation between the amount of data in the lookup cache and the time it takes to charge that cache; the more data you have then the longer it will take to charge and the longer you have to wait until the dataflow actually starts to do anything. For these reasons your goal is simple: ensure that the lookup cache contains as little data as possible. General tips Here is a simple tick list you can follow in order to tune your lookups: Use a SQL statement to charge your cache, don’t just pick a table from the dropdown list made available to you. (Read why in SELECT *... or select from a dropdown in an OLE DB Source component?) Only pick the columns that you need, ignore everything else Make the database columns that your cache is populated from as narrow as possible. If a column is defined as VARCHAR(20) then SSIS will allocate 20 bytes for every value in that column – that is a big waste if the actual values are significantly less than 20 characters in length. Do you need DT_WSTR typed columns or will DT_STR suffice? DT_WSTR uses twice the amount of space to hold values that can be stored using a DT_STR so if you can use DT_STR, consider doing so. Same principle goes for the numerical datatypes DT_I2/DT_I4/DT_I8. Only populate the cache with data that you KNOW you will need. In other words, think about your WHERE clause! Thinking outside the box It is tempting to build a large monolithic dataflow that does many things, one of which is a Lookup. Often though you can make better use of your available resources by, well, mixing things up a little and here are a few ideas to get your creative juices flowing: There is no rule that says everything has to happen in a single dataflow. If you have some particularly resource intensive lookups then consider putting that lookup into a dataflow all of its own and using raw files to pass the pipeline data in and out of that dataflow. Know your data. If you think, for example, that the majority of your incoming rows will match with only a small subset of your lookup data then consider chaining multiple lookup components together; the first would use a FullCache containing that data subset and the remaining data that doesn’t find a match could be passed to a second lookup that perhaps uses a NoCache lookup thus negating the need to pull all of that least-used lookup data into memory. Do you need to process all of your incoming data all at once? If you can process different partitions of your data separately then you can partition your lookup cache as well. For example, if you are using a lookup to convert a location into a [LocationId] then why not process your data one region at a time? This will mean your lookup cache only has to contain data for the location that you are currently processing and with the ability of the Lookup in SSIS2008 and beyond to charge the cache using a dynamically built SQL statement you’ll be able to achieve it using the same dataflow and simply loop over it using a ForEach loop. Taking the previous data partitioning idea further … a dataflow can contain more than one data path so why not split your data using a conditional split component and, again, charge your lookup caches with only the data that they need for that partition. Lookups have two uses: to (1) find a matching row from the lookup set and (2) put attributes from that matching row into the pipeline. Ask yourself, do you need to do these two things at the same time? After all once you have the key column(s) from your lookup set then you can use that key to get the rest of attributes further downstream, perhaps even in another dataflow. Are you using the same lookup data set multiple times? If so, consider the file caching option in SSIS 2008 and beyond. Above all, experiment and be creative with different combinations. You may be surprised at what works. Final thoughts If you want to know more about how the Lookup component differs in SSIS2008 from SSIS2005 then I have a dedicated blog post about that at Lookup component gets a makeover. I am on a mini-crusade at the moment to get a BULK MERGE feature into the database engine, the thinking being that if the database engine can quickly merge massive amounts of data in a similar manner to how it can insert massive amounts using BULK INSERT then that’s a lot of work that wouldn’t have to be done in the SSIS pipeline. If you think that is a good idea then go and vote for BULK MERGE on Connect. If you have any other tips to share then please stick them in the comments. Hope this helps! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
FileNameColumnName property, Flat File Source Adapter : SSIS Nugget

- by jamiet

I saw a question on MSDN’s SSIS forum the other day that went something like this: I’m loading data into a table from a flat file but I want to be able to store the name of that file as well. Is there a way of doing that? I don’t want to come across as disrespecting those who took the time to reply but there was a few answers along the lines of “loop over the files using a For Each, store the file name in a variable yadda yadda yadda” when in fact there is a much much simpler way of accomplishing this; it just happens to be a little hidden away as I shall now explain! The Flat File Source Adapter has a property called FileNameColumnName which for some reason it isn’t exposed through the Flat File Source editor, it is however exposed via the Advanced Properties: You’ll see in the screenshot above that I have set FileNameColumnName=“Filename” (it doesn’t matter what name you use, anything except a non-zero string will work). What this will do is create a new column in our dataflow called “Filename” that contains, unsurprisingly, the name of the file from which the row was sourced. All very simple. This is particularly useful if you are extracting data from multiple files using the MultiFlatFile Connection Manager as it allows you to differentiate between data from each of the files as you can see in the following screenshot: So there you have it, the FileNameColumnName property; a little known secret of SSIS. I hope it proves to be useful to someone out there. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Inequality joins, Asynchronous transformations and Lookups : SSIS

- by jamiet

It is pretty much accepted by SQL Server Integration Services (SSIS) developers that synchronous transformations are generally quicker than asynchronous transformations (for a description of synchronous and asynchronous transformations go read Asynchronous and synchronous data flow components). Notice I said “generally” and not “always”; there are circumstances where using asynchronous transformations can be beneficial and in this blog post I’ll demonstrate such a scenario, one that is pretty common when building data warehouses. Imagine I have a [Customer] dimension table that manages information about all of my customers as a slowly-changing dimension. If that is a type 2 slowly changing dimension then you will likely have multiple rows per customer in that table. Furthermore you might also have datetime fields that indicate the effective time period of each member record. Here is such a table that contains data for four dimension members {Terry, Max, Henry, Horace}: Notice that we have multiple records per customer and that the [SCDStartDate] of a record is equivalent to the [SCDEndDate] of the record that preceded it (if there was one). (Note that I am on record as saying I am not a fan of this technique of storing an [SCDEndDate] but for the purposes of clarity I have included it here.) Anyway, the idea here is that we will have some incoming data containing [CustomerName] & [EffectiveDate] and we need to use those values to lookup [Customer].[CustomerId]. The logic will be: Lookup a [CustomerId] WHERE [CustomerName]=[CustomerName] AND [SCDStartDate] <= [EffectiveDate] AND [EffectiveDate] <= [SCDEndDate] The conventional approach to this would be to use a full cached lookup but that isn’t an option here because we are using inequality conditions. The obvious next step then is to use a non-cached lookup which enables us to change the SQL statement to use inequality operators: Let’s take a look at the dataflow: Notice these are all synchronous components. This approach works just fine however it does have the limitation that it has to issue a SQL statement against your lookup set for every row thus we can expect the execution time of our dataflow to increase linearly in line with the number of rows in our dataflow; that’s not good. OK, that’s the obvious method. Let’s now look at a different way of achieving this using an asynchronous Merge Join transform coupled with a Conditional Split. I’ve shown it post-execution so that I can include the row counts which help to illustrate what is going on here: Notice that there are more rows output from our Merge Join component than on the input. That is because we are joining on [CustomerName] and, as we know, we have multiple records per [CustomerName] in our lookup set. Notice also that there are two asynchronous components in here (the Sort and the Merge Join). I have embedded a video below that compares the execution times for each of these two methods. The video is just over 8minutes long. View on Vimeo For those that can’t be bothered watching the video I’ll tell you the results here. The dataflow that used the Lookup transform took 36 seconds whereas the dataflow that used the Merge Join took less than two seconds. An illustration in case it is needed: Pretty conclusive proof that in some scenarios it may be quicker to use an asynchronous component than a synchronous one. Your mileage may of course vary. The scenario outlined here is analogous to performance tuning procedural SQL that uses cursors. It is common to eliminate cursors by converting them to set-based operations and that is effectively what we have done here. Our non-cached lookup is performing a discrete operation for every single row of data, exactly like a cursor does. By eliminating this cursor-in-disguise we have dramatically sped up our dataflow. I hope all of that proves useful. You can download the package that I demonstrated in the video from my SkyDrive at http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100514/20100514%20Lookups%20and%20Merge%20Joins.zip Comments are welcome as always. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Enforce SSIS naming conventions using BI-xPress

- by jamiet

A long long long time ago (in 2006 in fact) I published a blog post entitled Suggested Best Practises and naming conventions in which I suggested a bunch of acronyms that folks could use to prefix object names in their SSIS packages, thus allowing easier identification of those objects in log records, here is a sample of some of those suggestions: If you have adopted these naming conventions (and I am led to believe that a bunch of people have) then you might like to know that you can now check for adherence to these conventions using a tool called BI-xPress from Pragmatic Works. BI-xPress includes a feature called the Best Practices Analyzer that scans your packages and assess them according to some rules that you specify. In addition Pragmatic Works have made available a collection of these rules that adhere to the naming conventions I specified in 2006 You can download this collection however I recommend you first read the accompanying article that demonstrates the capabilities of the Best Practices Analyzer. Pretty cool stuff. @Jamiet

Read the article
Introducing AutoVue Document Print Service

- by celine.beck

We recently announced the availability of our new AutoVue Document Print Service products. For more information, please read the article entitled Print Any Document Type with AutoVue Document Print Services that was posted on our blog. The AutoVue Document Print Service products help address a trivial, yet very common challenge: printing and batch printing documents. The AutoVue Document Print Service is a Web-Services based interface, which allows developers to complement their print server solutions by leveraging AutoVue's printing capabilities within broader enterprise applications like Asset Lifecycle Management, Product Lifecycle Management, Enterprise Content Management solutions, etc. This means that you can leverage the AutoVue Document Print Service products as part of your printing solution to automate the printing of virtually any document type required in any business process. Clients that consume AutoVue's Document Print Service can be written in any language (for example Java or .NET) as long as they understand Web Services Description Language (WSDL) and communicate using Simple Object Access Protocol (SOAP). The print solution consists of three main components, as described in the diagram below: a print server (not included in the AutoVue Document Print Service offering) that will interact with your application to identify the files that need to be printed, the printer to send each file, as well as the print options needed for each file (paper size, page orientation, etc), and collate the print job requests. The print server will also take care of calling the AutoVue Document Print Service to perform the actual printing. The AutoVue Document Print Services send files to a printer for printing. The AutoVue Document Print Service products leverage AutoVue's format- and platform agnostic technology to let you print/batch virtually any type of files, without requiring the authoring application installed on your machine. and Printers As shown above, you can trigger printing from your application either programmatically through automated business processes or manually through human interaction. If documents that need to be printed from your application are stored inside a content repository/Document Management System (DMS) such as Oracle Universal Content Management System (UCM), then the Print Server will need to identify the list of documents and pass the ID of each document to the AutoVue DPS to print. In this case, AutoVue DPS leverages the AutoVue VueLink integration (note: AutoVue VueLink integrations are pre-packaged AutoVue integrations with most common enterprise systems. Check our Website for more information on the subject) to fetch documents out of the document management system for printing. In lieu of the AutoVue VueLink integration, you can also leverage the AutoVue Integration Software Development Kit (iSDK) to build your own connector. If the documents you need to print from your application are not stored in a content management system, the Print Server will need to ensure that files are made available to the AutoVue Document Print Service. The Print Server could for example fetch the files out of your application or an extension to the application could be developed to fetch the files and make them available to the AutoVue DPS. More information on methods to pass on file information to the AutoVue Document Print Service products can be found in the AutoVue Document Print Service Overview documentation available on the Oracle Technology Network. Related article: Any Document Type with AutoVue Document Print Services

Read the article
Querying the SSIS Catalog? Here’s a handy query!

- by jamiet

I’ve been working on a SQL Server Integration Services (SSIS) solution for about 6 months now and I’ve learnt many many things that I intend to share on this blog just as soon as I get the time. Here’s a very short starter-for-ten… I’ve found the following query to be utterly invaluable when interrogating the SSIS Catalog to discover what is going on in my executions: SELECT event_message_id,MESSAGE,package_name,event_name,message_source_name,package_path,execution_path,message_type,message_source_typeFROM ( SELECT em.* FROM SSISDB.catalog.event_messages em WHERE em.operation_id = (SELECT MAX(execution_id) FROM SSISDB.catalog.executions) AND event_name NOT LIKE '%Validate%' )q/* Put in whatever WHERE predicates you might like*/--WHERE event_name = 'OnError'--WHERE package_name = 'Package.dtsx'--WHERE execution_path LIKE '%<some executable>%'ORDER BY message_time DESC Know it. Learn it. Love it. @jamiet

Read the article
Java Spotlight Episode 77: Donald Smith on the OpenJDK and Java

- by Roger Brinkley

Tweet An interview with Donald Smith about Java and OpenJDK. Joining us this week on the Java All Star Developer Panel are Dalibor Topic, Java Free and Open Source Software Ambassador and Arun Gupta, Java EE Guy. Right-click or Control-click to download this MP3 file. You can also subscribe to the Java Spotlight Podcast Feed to get the latest podcast automatically. If you use iTunes you can open iTunes and subscribe with this link: Java Spotlight Podcast in iTunes. Show Notes News Jersey 2.0 Milestone 2 available Oracle distribution of Eclipse (OEPE) now supports GlassFish 3.1.2 Oracle Linux 6 is now part of the certification matrix for 3.1.2 3rd part of Spring -> Java EE 6 article series published Joe Darcy - Repeating annotations in the works JEP 152: Crypto Operations with Network HSMs JEP 153: Launch JavaFX Applications OpenJDK bug database: Status update OpenJDK Governing Board 2012 Election: Results jtreg update March 2012 Take Two: Comparing JVMs on ARM/Linux The OpenJDK group at Oracle is growing App bundler project now open Events April 4-5, JavaOne Japan, Tokyo, Japan April 11, Cleveland JUG, Cleveland, OH April 12, GreenJUG, Greenville, SC April 17-18, JavaOne Russia, Moscow Russia April 18–20, Devoxx France, Paris, France April 17-20, GIDS, Bangalore April 21, Java Summit, Chennai April 26, Mix-IT, Lyon, France, May 3-4, JavaOne India, Hyderabad, India May 5, Bangalore, Pune, ?? - JUG outreach May 7, OTN Developer Day, Mumbai May 8, OTN Developer Day, Delhi Feature InterviewDonald Smith, MBA, MSc, is Director of Product Management for Oracle. He brings worldwide enterprise software experience, ranging from small "dot-com" through Fortune 500 companies. Donald speaks regularly about Java, open source, community development, business models, business integration and software development politics at conferences and events worldwide including Java One, Oracle World, Sun Tech Days, Evans Developer Relations Conference, OOPSLA, JAOO, Server Side Symposium, Colorado Software Summit and others. Prior to returning to Oracle, Donald was Director of Ecosystem Development for the Eclipse Foundation, an independent not-for-profit foundation supporting the Eclipse open source community. Mail Bag What’s Cool OpenJDK 7 port to Haiku JEP 154: Remove Serialization Goto for the Java Programming Language

Read the article
SSIS Prehistory video

- by jamiet

I’m currently wasting spending my Easter bank holiday putting together my presentation SSIS Dataflow Performance Tuning for the upcoming SQL Bits conference in London and in doing so I’m researching some old material about how the dataflow actually works. Boring as it is I’ve gotten easily sidelined and have chanced upon an old video on Channel 9 entitled Euan Garden - Tour of SQL Server Team (part I). Euan is a former member of the SQL Server team and in this series of videos he walks the halls of the SQL Server building on Microsoft’s Redmond campus talking to some of the various protagonists and in this one he happens upon the SQL Server Integration Services team. The video was shot in 2004 so this is a fascinating (to me anyway) glimpse into the development of SSIS from before it was ever shipped and if you’re a geek like me you’ll really enjoy this behind-the-scenes look into how and why the product was architected. The video is also notable for the presence of the cameraman – none other than the now-rather-more-famous-than-he-was-then Robert Scoble. See it at http://channel9.msdn.com/posts/TheChannel9Team/Euan-Garden-Tour-of-SQL-Server-Team-part-I/ Enjoy! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Java Spotlight Episode 86: Tony Printezis on Garbage Collection First

- by Roger Brinkley

Interview with Tony Printezis on Garbage Collection First (GC1). Joining us this week on the Java All Star Developer Panel is Arun Gupta, Java EE Guy. Right-click or Control-click to download this MP3 file. You can also subscribe to the Java Spotlight Podcast Feed to get the latest podcast automatically. If you use iTunes you can open iTunes and subscribe with this link: Java Spotlight Podcast in iTunes. Show Notes News JSR 358: A major revison of the Java Community Process - JCP 3.Next JAX-RS 2.0 Early Draft- Third Edition Events June 11-14, Cloud Computing Expo, New York City June 12, Boulder JUG June 13, Denver JUG June 13, Eclipse Juno DemoCamp, Redwoood Shore June 13, JUG Münster June 14, Java Klassentreffen, Vienna, Austria June 18-20, QCon, New York City June 19, CJUG, Chicago June 20, 1871, Chicago June 26-28, Jazoon, Zurich, Switzerland Jun 27, Houston JUG ?? July 5, Java Forum, Stuttgart, Germany Jul 13-14, IndicThreads, Delhi July 30-August 1, JVM Language Summit, Santa Clara Feature InterviewTony Printezis is a Principal Member of Technical Staff at Oracle, based in Burlington, MA. He has been contributing to the Java HotSpot Virtual Machine since 2006. He spends most of his time working on dynamic memory management for the Java platform, concentrating on performance, scalability, responsiveness, parallelism, and visualization of garbage collectors. He obtained a Ph.D. in 2000 and a BSc (Hons) in 1995, both from the University of Glasgow in Scotland. In addition, he is a JavaOne Rock Star, a title awarded for his highly rated JavaOne session on GC. Mail Bag What’s Cool JavaOne content selection is complete. Notifications done.

Read the article
Using Find/Replace with regular expressions inside a SSIS package

- by jamiet

Another one of those might-be-useful-again-one-day-so-I’ll-share-it-in-a-blog-post blog posts I am currently working on a SQL Server Integration Services (SSIS) 2012 implementation where each package contains a parameter called ETLIfcHist_ID: During normal execution this will get altered when the package is executed from the Execute Package Task however we want to make sure that at deployment-time they all have a default value of –1. Of course, they tend to get changed during development so I wanted a way of easily changing them all back to the default value. Opening up each package in turn and editing them was an option but given that we have over 40 packages and we might want to carry out this reset fairly frequently I needed a more automated method so I turned to Visual Studio’s Find/Replace… feature Of course, we don’t know what value will be in that parameter so I can’t simply search for a particular value; hence I opted to use a regular expression to identify the value to be change. In the rest of this blog post I’ll explain how to do that. For demonstration purposes I have taken the contents of a .dtsx file and stripped out everything except the element containing the parameters (<DTS:PackageParameters>), if you want to play along at home you can copy-paste the XML document below into a new XML file and open it up in Visual Studio: <?xml version="1.0"?> <DTS:Executable xmlns:DTS="www.microsoft.com/SqlServer/Dts"> <DTS:PackageParameters> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="InterfaceHistory_ID: used for Lineage" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}" DTS:ObjectName="ETLIfcHist_ID"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property> </DTS:PackageParameter> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="Some other description" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25845C7E}" DTS:ObjectName="SomeOtherObjectName"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">SomeOtherValue</DTS:Property> </DTS:PackageParameter> </DTS:PackageParameters> </DTS:Executable> We are trying to identify the value of the parameter whose name is ETLIfcHist_ID – notice that in the XML document above that value is VALUE_TO_BE_CHANGED. The following regular expression will find the appropriate portion of the XML document: {\<DTS\:PackageParameter[\n ]*DTS\:CreationName="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Description="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DTSID="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:ObjectName="ETLIfcHist_ID"\>[\n ]*\<DTS\:Property[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Name="ParameterValue"\>}[A-Za-z0-9\:_\{\}- ]*{\<\/DTS\:Property\>} I have highlighted the name of the parameter that we’re looking for. I have also highlighted two portions identified by pairs of curly braces “{…}”; these are important because they pick out the two portions either side of the value I want to replace, in other words the portions highlighted here: <DTS:PackageParameters> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="InterfaceHistory_ID: used for Lineage" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}" DTS:ObjectName="ETLIfcHist_ID"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property> </DTS:PackageParameter> Those sections in the curly braces are termed tag expressions and can be identified in the replace expression using a backslash and a number identifying which tag expression you’re referring to according to its ordinal position. Hence, our replace expression is simply: \1-1\2 We’re saying the portion of our file identified by the regular expression should be replaced by the first curly brace section, then the literal –1, then the second curly brace section. Make sense? Give it a go yourself by plugging those two expressions into Visual Studio’s Find and Replace dialog. If you set it to look in “All Open Documents” then you can open up the code-behind of all your packages and change all of them at once. The Find and Replace dialog will look like this: That’s it! I realise that not everyone will be looking to change the value of a parameter but hopefully I have shown you a technique that you can modify to work for your own scenario. Given that this blog post is, y’know, on the web I have no doubt that someone is going to find a fault with my find regex expression and if that person is you….that’s OK. Let me know about it in the comments below and perhaps we can work together to come up with something better! Note that some parameters may have a different set of properties (for example some, but not all, of my parameters have a DTS:Required attribute) so your find regular expression may have to change accordingly. When researching this I found the following article to be invaluable: Visual Studio Find/Replace Regular Expression Usage @Jamiet

Read the article
SSIS Catalog, Windows updates and deployment failures due to System.Core mismatch

- by jamiet

This is a heads-up for anyone doing development on SSIS. On my current project where we are implementing a SQL Server Integration Services (SSIS) 2012 solution we recently encountered a situation where we were unable to deploy any of our projects even though we had successfully deployed in the past. Any attempt to use the deployment wizard resulted in this error dialog: The text of the error (for all you search engine crawlers out there) was: A .NET Framework error occurred during execution of user-defined routine or aggregate "create_key_information": System.IO.FileLoadException: Could not load file or assembly 'System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) ---> System.IO.FileLoadException: The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) System.IO.FileLoadException: System.IO.FileLoadException: at Microsoft.SqlServer.IntegrationServices.Server.Security.CryptoGraphy.CreateSymmetricKey(String algorithm) at Microsoft.SqlServer.IntegrationServices.Server.Security.CryptoGraphy.CreateKeyInformation(SqlString algorithmName, SqlBytes& key, SqlBytes& IV) . (Microsoft SQL Server, Error: 6522) After some investigation and a bit of back and forth with some very helpful members of the SSIS product team (hey Matt, Wee Hyong) it transpired that this was due to a .Net Framework fix that had been delivered via Windows Update. I took a look at the server update history and indeed there have been some recently applied .Net Framework updates: This fix had (in the words of Matt Masson) “somehow caused a mismatch on System.Core for SQLCLR” and, as you may know, SQLCLR is used heavily within the SSIS Catalog. The fix was pretty simple – restart SQL Server. This causes the assemblies to be upgraded automatically. If you are using Data Quality Services (DQS) you may have experienced similar problems which are documented at Upgrade SQLCLR Assemblies After .NET Framework Update. I am hoping the SSIS team will follow-up with a more thorough explanation on their blog soon. You DBAs out there may be questioning why Windows Update is set to automatically apply updates on our production servers. We’re checking that out with our hosting provider right now You have been warned! @Jamiet

Read the article
Investigation: Can different combinations of components effect Dataflow performance?

- by jamiet

Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation! The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test: One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category CSPL Split by Month of Birth and Income Category Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category CSPL Split by Month of Birth 1& 2 Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

Read the article
SSIS Prehistory video

- by jamiet

I’m currently wasting spending my Easter bank holiday putting together my presentation SSIS Dataflow Performance Tuning for the upcoming SQL Bits conference in London and in doing so I’m researching some old material about how the dataflow actually works. Boring as it is I’ve gotten easily sidelined and have chanced upon an old video on Channel 9 entitled Euan Garden - Tour of SQL Server Team (part I). Euan is a former member of the SQL Server team and in this series of videos he walks the halls of the SQL Server building on Microsoft’s Redmond campus talking to some of the various protagonists and in this one he happens upon the SQL Server Integration Services team. The video was shot in 2004 so this is a fascinating (to me anyway) glimpse into the development of SSIS from before it was ever shipped and if you’re a geek like me you’ll really enjoy this behind-the-scenes look into how and why the product was architected. The video is also notable for the presence of the cameraman – none other than the now-rather-more-famous-than-he-was-then Robert Scoble. See it at http://channel9.msdn.com/posts/TheChannel9Team/Euan-Garden-Tour-of-SQL-Server-Team-part-I/ Enjoy! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Merge Join component sorted outputs [SSIS]

- by jamiet

One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this: Why isn’t the Merge Join output sorted (i.e.IsSorted=True)? This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well? Well here’s a little secret, the Merge Join output IS sorted! There’s a caveat though – it is only under certain circumstances and SSIS itself doesn’t do a good job of informing you of it. Let’s take a look at an example. Here we have a dataflow that consumes data from the [AdventureWorks2008].[Sales].[SalesOrderHeader] & [AdventureWorks2008].[Sales].[SalesOrderDetail] tables then joins them using a Merge Join component: Let’s take a look inside the editor of the Merge Join: We are joining on the [SalesOrderId] field (which is what the two inputs just happen to be sorted upon). We are also putting [SalesOrderHeader].[SalesOrderId] into the output. Believe it or not the output from this Merge Join component is sorted (i.e. has IsSorted=True) but unfortunately the Merge Join component does not have an Advanced Editor hence it is hidden away from us. There are a couple of ways to prove to you that is the case; I could open up the package XML inside the .dtsx file and show you the metadata but there is an easier way than that – I can attach a Sort component to the output. Take a look: Notice that the Sort component is attempting to sort on the [SalesOrderId] column. This gives us the following warning: Validation warning. DFT Get raw data: {992B7C9A-35AD-47B9-A0B0-637F7DDF93EB}: The data is already sorted as specified so the transform can be removed. The warning proves that the output from the Merge Join is sorted! It must be noted that the Merge Join output will only have IsSorted=True if at least one of the join columns is included in the output. So there you go, the Merge Join component can indeed produce a sorted output and that’s very useful in order to avoid unnecessary expensive Sort operations downstream. Hope this is useful to someone out there! @Jamiet P.S. Thank you to Bob Bojanic on the SSIS product team who pointed this out to me!

Read the article
The Work Order Printing Challenge

- by celine.beck

One of the biggest concerns we've heard from maintenance practitioners is the ability to print and batch print work order details along with its accompanying attachments. Indeed, maintenance workers traditionally rely on work order packets to complete their job. A standard work order packet can include a variety of information like equipment documentation, operating instructions, checklists, end-of-task feedback forms and the likes. Now, the problem is that most Asset Lifecycle Management applications do not provide a simple and efficient solution for process printing with document attachments. Work order forms can be easily printed but attachments are usually left out of the printing process. This sounds like a minor problem, but when you are processing high volume of work orders on a regular basis, this inconvenience can result in important inefficiencies. In order to print work order and its related attachments, maintenance personnel need to print the work order details and then go back to the work order and open each individual attachment using the proper authoring application to view and print each document. The printed output is collated into a work order packet. The AutoVue Document Print Service products that were just released in April 2010 aim at helping organizations address the work order printing challenge. Customers and partners can leverage the AutoVue Document Print Services to build a complete printing solution that complements their existing print server solution with AutoVue's document- and platform-agnostic document print services. The idea is to leverage AutoVue's printing services to invoke printing either programmatically or manually directly from within the work order management application, and efficiently process the printing of complete work order packets, including all types of attachments, from office files to more advanced engineering documents like 2D CAD drawings. Oracle partners like MIPRO Consulting, specialists in PeopleSoft implementations, have already expressed interest in the AutoVue Document Print Service products for their ability to offer print services to the PeopleSoft ALM suite, so that customers are able to print packages of documents for maintenance personnel. For more information on the subject, please consult MIPRO Consulting's article entitled Unsung Value: Primavera and AutoVue Integration into PeopleSoft posted on their blog. The blog post entitled Introducing AutoVue Document Print Service provides additional information on how the solution works. We would also love to hear what your thoughts are on the topic, so please do not hesitate to post your comments/feedback on our blog. Related Articles: Introducing AutoVue Document Print Service Print Any Document Type with AutoVue Document Print Services

Read the article

Search Results

Search found 3720 results on 149 pages for 'sam se'.

Page 6/149 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

- by Sam Hammamy

- by Tim

- by jamiet

- by jamiet

- by dmccollough

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by celine.beck

- by jamiet

- by Roger Brinkley

- by jamiet

- by Roger Brinkley

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by celine.beck

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >