Search Results

Search found 266 results on 11 pages for 'etl'.

Page 4/11 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11  | Next Page >

  • extract transform load

    - by mitch
    Wikipedia defines a 'typical' ETL cycle as : Cycle initiation Build reference data Extract (from sources) Validate Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates) Stage (load into staging tables, if used) Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair) Publish (to target tables) Archive Clean up ..What is meant by 'Build reference data'?

    Read the article

  • How to extract data from Google Analytics and build a data warehouse (webhouse) from it?

    - by nkaur301
    I have click stream data such as referring URL, top landing pages, top exit pages and metrics such as page views, number of visits, bounces all in Google Analytics. I am required to build a data warehouse from scratch(which I believe is known as web-house) from this data. My questions are:- 1)Is it possible? Every day data increases (some in terms of metrics or measures such as visits and some in terms of new referring sites), how would the process of loading the warehouse go about? 2)What ETL tool would help me to achieve this? Pentaho I believe has a way to pull out data from Google Analytics, has anyone used it? How does that process go? Any references, links would be appreciated besides answers.

    Read the article

  • Empty data problem - data layer or DAL?

    - by luckyluke
    I designing the new App now and giving the following question a lot of thought. I consume a lot of data from the warehouse, and the entities have a lot of dictionary based values (currency, country, tax-whatever data) - dimensions. I cannot be assured though that there won't be nulls. So I am thinking: create an empty value in each of teh dictionaries with special keyID - ie. -1 do the ETL (ssis) do the correct stuff and insert -1 where it needs to let the DAL know that -1 is special (Static const whatever thing) don't care in the code to check for nullness of dictionary entries because THEY will always have a value But maybe I should be thinking: import data AS IS let the DAL do the thinking using empty record Pattern still don't care in the code because business layer will have what it needs from DAL. I think is more of a approach thing but maybe i am missing something important here... What do You think? Am i clear? Please don't confuse it with empty record problem. I do use emptyCustomer think all the time and other defaults too.

    Read the article

  • Handling primary key duplicates in a data warehouse load

    - by Meff
    I'm currently building an ETL system to load a data warehouse from a transactional system. The grain of my fact table is the transaction level. In order to ensure I don't load duplicate rows I've put a primary key on the fact table, which is the transaction ID. I've encountered a problem with transactions being reversed - In the transactional database this is done via a status, which I pick up and I can work out if the transaction is being done, or rolled back so I can load a reversal row in the warehouse. However, the reversal row will have the same transaction ID and so I get a primary key violation. I've solved this for now by negating the primary key, so transaction ID 1 would be a payment, and transaction ID -1 (In the warehouse only) would be the reversal. I have considered an alternative of generating a BIT column, where 0 is normal and 1 is reversal, then making the PK the transaction ID and the BIT column. My question is, is this a good practice, and has anyone else encountered anything like this? For reference, this is a payment processing system, so values will not be modified, so there will only ever be transactions and reversals.

    Read the article

  • Best way to produce automated exports in tab-delimited form from Teradata?

    - by Cade Roux
    I would like to be able to produce a file by running a command or batch which basically exports a table or view (SELECT * FROM tbl), in text form (default conversions to text for dates, numbers, etc are fine), tab-delimited, with NULLs being converted to empty field (i.e. a NULL colum would have no space between tab characters, with appropriate line termination (CRLF or Windows), preferably also with column headings. This is the same export I can get in SQL Assistant 12.0, but choosing the export option, using tab delimiter, setting my NULL value to '' and including column headings. I have been unable to find the right combination of options - the closest I have gotten is by building a single column with CAST and '09'XC, but the rows still have a leading 2-byte length indicator in most settings I have tried. I would prefer not to have to build large strings for the various different tables.

    Read the article

  • Best Practise to populate Fact and Dimension Tables from Transactional Flat DB

    - by alex25
    Hi! I want to populate a star schema / cube in SSIS / SSAS. I prepared all my dimension tables and my fact table, primary keys etc. The source is a 'flat' (item level) table and my problem is now how to split it up and get it from one into the respective tables. I did a fair bit of googling but couldn't find a satisfying solution to the problem. One would imagine that this is a rather common problem/situation in BI development?! Thanks, alexl

    Read the article

  • Sybase: how can I remove non-printable characters from CHAR or VARCHAR fields with SQL?

    - by Kenny Drobnack
    I'm working with a Sybase database that seems to have non-printable characters in some of the string fields and this is throwing off some of our processing code. At first glance, it seemed to only be newlines and carriage returns, but we also have an ASCII code 27 in there - an ESC character, some accented characters, and some other oddities in there. I have no direct access to change the database, so changing the bad data isn't an option, yet. For now I have to make do with just filtering it out. We're trying to export the table data from one database and load it into a database used by another application in a nightly batch process. Ideally, I'd like to have a function that I can pass a list of characters and just have Sybase return the data with those characters removed. I'd like to keep it something we could do in plain SQL if possible. Something like this to remove characters that are ASCII 0 - 31. select str_replace(FIELD1, (0-31), NULL) as FIELD1, str_replace(FIELD2, (0-31), NULL) as FIELD2 from TABLE So far, str_replace is the nearest I can find, but it only allows replacing one string with another. No support for character ranges and won't let me do the above. We're running on Sybase ASE 12.5 on Unix servers.

    Read the article

  • SSIS 2005 - How to Import a Fixed Width Flat File?

    - by Greg
    I have a flat file that looks something like this: junk I don't care about \n \n columns names\n val1 val2 val3\n val1 val2 val3\n columns names \n val1 val2 val3\n I only care the lines with values. These value lines are all fixed width format and have the same line length. The other junk lines and column names can have any line width. When I try the flat file fixed width option or the ragged right option the preview looks all wrong. Any ideas what the easiest way to get this into SSIS is?

    Read the article

  • transforming binary data using ssis and sql server 2008

    - by Rick
    Hello All - I have a task to import/transform and extract zipped binary files that contain both text data as well as embeded binary data. Within the data is data that is relational in nature and needs to be processed into a defined database structure. Currently I have a C# single threaded app that essentially grabs all the files from the directory (currently there is 13K files of varying sizes) and extracts the data on a single thread line by line inserts to the database. As you could imagine this is a very slow process and unacceptable. There are several different parsing routines used depending on the header record in the file. There are potentially upto a million rows per file when all the data is extracted to the row level of detail. Follow on task is to parse those rows into their appropriate tables based on is content. i.e. the textual content has to be parsed further into "buckets" of like data in the database. That about sums up the big picture. Now for the problem task list. How do i iterate through a packet of data using SSIS? In the app the file is decompressed and then is parsed using streams data type and byte arrays and is routed to the required parsing routine based on the header data of each packet. There is bit swapping involved as well. Should i wrap up the app code into a script task(s) and let it do the custom processing? The data is seperated by year and the sql server tables is partitioned by year as well. I need to be able to "catch" bad file data as well and process by hand most likely. Should i simply load the zipped file to sql as a blob and parse the file with T-SQL? Would that be multi threaded if done that way? Not sure how to do the parsing in tsql that is involved here. Which do you think would be faster? Potentially the data that is currently processed via files could come to us via a socket. Can SSIS collect that data in real time? How would i go about setting that up? Processing these new files from the directorys will become a daily task. I can manage the data once i get it to sql server. Getting it there in a timely fashion seems to be the long pole in the tent for me. I would appreciate any comments or suggestions from the group. Rick

    Read the article

  • Common Lisp condition system for transfer of control

    - by Ken
    I'll admit right up front that the following is a pretty terrible description of what I want to do. Apologies in advance. Please ask questions to help me explain. :-) I've written ETLs in other languages that consist of individual operations that look something like: // in class CountOperation IEnumerable<Row> Execute(IEnumerable<Row> rows) { var count = 0; foreach (var row in rows) { row["record number"] = count++; yield return row; } } Then you string a number of these operations together, and call The Dispatcher, which is responsible for calling Operations and pushing data between them. I'm trying to do something similar in Common Lisp, and I want to use the same basic structure, i.e., each operation is defined like a normal function that inputs a list and outputs a list, but lazily. I can define-condition a condition (have-value) to use for yield-like behavior, and I can run it in a single loop, and it works great. I'm defining the operations the same way, looping through the inputs: (defun count-records (rows) (loop for count from 0 for row in rows do (signal 'have-value :value `(:count ,count @,row)))) The trouble is if I want to string together several operations, and run them. My first attempt at writing a dispatcher for these looks something like: (let ((next-op ...)) ;; pick an op from the set of all ops (loop (handler-bind ((have-value (...))) ;; records output from operation (setq next-op ...) ;; pick a new next-op (call next-op))) But restarts have only dynamic extent: each operation will have the same restart names. The restart isn't a Lisp object I can store, to store the state of a function: it's something you call by name (symbol) inside the handler block, not a continuation you can store for later use. Is it possible to do something like I want here? Or am I better off just making each operation function explicitly look at its input queue, and explicitly place values on the output queue?

    Read the article

  • C# Importing Large Volume of Data from CSV to Database

    - by guazz
    What's the most efficient method to load large volumes of data from CSV (3 million + rows) to a database. The data needs to be formatted(e.g. name column needs to be split into first name and last name, etc.) I need to do this in a efficiently as possible i.e. time constraints I am siding with the option of reading, transforming and loading the data using a C# application row-by-row? Is this ideal, if not, what are my options? Should I use multithreading?

    Read the article

  • Import de-normalized relational data from Excel into SQL Server

    - by roryf
    I need to import data from an Excel spreadsheet into SQL Server, but the data isn't in a relational/normalized format so the import wizard isn't going to cut it (as far as I know). The data is in this format: Category SubCategory Name Description Category#1 SubCategory#1 Product#1 Description#1 Category#1 SubCategory#1 Product#2 Description#2 Category#1 SubCategory#2 Product#3 Description#3 Category#1 SubCategory#2 Product#4 Description#4 Category#2 SubCategory#3 Product#5 Description#5 (apologies I'm lacking the inventiveness to come up with 'real' data at this time in the morning...) Each row contains a unique product, but the cateogry structure is duplicated. I want to import this data into three tables: Category SubCategory Product (I know SubCategory should really be contained within Category, DB was not my design) I need a way to import unique rows based on the Category and then SubCategory columns, and then when importing the other columns into Product, obtain a reference to the SubCategory based on name. Short of scripting this, is there any way to do it using the import wizard or some other tool?

    Read the article

  • discovering files in the FileSystem, through SSIS

    - by cometbill
    I have a folder where files are going to be dropped for importing into my data warehouse. \\server\share\loading_area I have the following (inherited) code that uses xp_cmdshell shivers to call out to the command shell to run the DIR command and insert the resulting filenames into a table in SQL Server. I would like to 'go native' and reproduce this functionality in SSIS. Thanks in advance guys and girls. Here's the code USE MyDatabase GO declare @CMD varchar(500) declare @EXTRACT_PATH varchar(255) set @EXTRACT_PATH = '\\server\share\folder\' create table tmp_FILELIST([FILENUM] int identity(1,1), [FNAME] varchar(100), [FILE_STATUS] varchar(20) NULL CONSTRAINT [DF_FILELIST_FILE_STATUS] DEFAULT ('PENDING')) set @CMD = 'dir ' + @EXTRACT_PATH + '*.* /b /on' insert tmp_FILELIST([FNAME]) exec master..xp_cmdshell @CMD --remove the DOS reply when the folder is empty delete tmp_FILELIST where [FNAME] is null or [FNAME] = 'File Not Found' --Remove my administrative and default/common, files not for importing, such as readme.txt delete tmp_FILELIST where [FNAME] is null or [FNAME] = 'readme.txt'

    Read the article

  • Import data from an SSRS report via SSIS package

    - by Chris
    First, I ask that you not ask 'why.' In the famous words of Tennyson "Ours is not to reason why. Ours is but to do and die." It's one of those, "This is what you have, deal with it." situations. The source data comes from SSRS report. The goal is to load the data into a database via SSIS. The hopeful goal is to avoid human intervention in having to download the SSRS report into Excel or CSV. There will be complex SSIS processing from there on out. Any suggestions are humbly appreciated.

    Read the article

  • How to prevent CAST errors on SSIS ?

    - by manitra
    Hello, The question Is it possible to ask SSIS to cast a value and return NULL in case the cast is not allowed instead of throwing an error ? My environment I'm using Visual Studio 2005 and Sql Server 2005 on Windows Server 2003. The general context Just in case you're curious, here is my use case. I have to store data coming from somewhere in a generic table (key/value structure with history) witch contains some sort of value that can be strings, numbers or dates. The structure is something like this : table Values { Id int, Date datetime, -- for history Key nvarchar(50) not null, Value nvarchar(50), DateValue datetime, NumberValue numeric(19,9) } I want to put the raw value in the Value column and try to put the same value in the DateValue column when i'm able to cast it to Datetime in the NumberValue column when i'm able to cast it to a number Those two typed columns would make all sort of aggregation and manipulation much easier and faster later. That's it, now you know why i'm asking this strange question. ============ Thanks in advance for your help.

    Read the article

  • How do I transform a single colmn list to item matrix in R?

    - by Indy
    I currently have data that is in the following format (note, this is 1 column, 4 row matrix): aa|bb bb|cc|ee|ee cc cc|ee and I want it displayed so that the column names are: aa, bb, cc, dd, and ee. And I want there to be 4 row such that each row counts the number of times each string was present in the matching row above. ie) aa bb cc dd ee 1 1 0 0 0 0 1 1 0 2 0 0 1 0 0 0 0 1 0 1 Does anyone know how to do this in R? I would post my attempt, but it is just getting ugly and complicated. Any help would be much appreciated. Thanks in advance.

    Read the article

  • SQL SERVER – List of Article on Expressor Data Integration Platform

    - by pinaldave
    The ability to transform data into meaningful and actionable information is the most important information in current business world. In this fast growing and changing business needs effective data integration is single most important thing in making proper decision making. I have been following expressor software since November 2010, when I met expressor team in Seattle. Here are my posts on their innovative data integration platform and expressor Studio, a free desktop ETL tool: 4 Tips for ETL Software IDE Developers Introduction to Adaptive ETL Tool – How adaptive is your ETL? Sharing your ETL Resources Across Applications with Ease expressor Studio Includes Powerful Scripting Capabilities expressor 3.2 Release Review 5 Tips for Improving Your Data with expressor Studio As I had mentioned in some of my blog posts on them, I encourage you to download and test-drive their Studio product – it’s free. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Documentation, SQL Download, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: SSIS

    Read the article

  • Resources to learn about engineering aspects of data analytics (OLAP, warehousing, ETL, etc.)

    - by JT
    I'm a math/stats guy, interested in learning more about the engineering aspects of "data analytics" (this may be an overly broad term, this is a case of "I don't know what I don't know", so I'm not sure how to be more specific). I'm fine with manipulating and analyzing the data once it's already stored somewhere and I can access it, and I'm fine with writing scripts and SQL queries (and have a general knowledge of things like normalization). What I don't know is the whole engineering process of capturing and storing the data. For example, terms I've heard thrown about that I only vaguely understand the meaning of include: - OLAP, OLTP - Data warehousing - ETL - ??? What's a good book (or any other resource) to learn about these kinds of things? What are things I should know about database design (normalization seems kinda "obvious" to me, something I would have done even before I knew the term -- is there anything else?)? In other words, for jobs falling under the umbrella term of "analytics engineer", what kinds of things should I know?

    Read the article

  • Change Data Capture Webinar

    I am going to be doing a webinar with our friends at Attunity on Change Data Capture.  Attunity have a good story around this technology and you can use it in your SSIS loads to great effect. Join Attunity and Konesans/SQLIS for a Webinar on 17 September Space is limited. Reserve your Webinar seat now at: https://www1.gotomeeting.com/register/693735512 Want increased efficiency and real-time speed when conducting ETL loads? Need lower implementation costs while minimizing system impact? Learn how change data capture (CDC) technologies can reduce ETL load times. Allan Mitchell, Principal Consultant at Konesans and SQLServer MVP specialising in ETL, will explain CDC concepts and benefits and how CDC can dramatically reduce ETL load times. Ian Archibald, Pre-Sales Director EMEA for Attunity, will present and demonstrate Attunity's award-winning Oracle-CDC for SSIS, a fully-integrated SSIS solution for designing, deploying and managing Oracle CDC processes. Title: Change Data Capture - Reducing ETL Load Times Date: Thursday, September 17, 2009 Time: 10:00 AM - 11:00 AM BST ABOUT THE SPEAKERS: Allan Mitchell is the joint owner of Konesans Ltd, a UK based consultancy specializing in SQL Server, and most importantly SQL Server Integration Services. Having been working with SQL Server from 6.5 onwards, he has extensive experience in many aspects of SQL Server, but now focuses on the BI suite of tools. He is a SQL Server MVP, a frequent poster on the MS SSIS/DTS newsgroups, and runs the sqldts.com and sqlis.com resource sites. Ian Archibald, Attunity Pre-Sales Director EMEA, has worked in Attunity’s UK Office for 17 years. An expert in Attunity solutions, Ian has extensive knowledge of Attunity’s products and data integration & CDC technologies. After registering you will receive a confirmation email containing information about joining the Webinar. System Requirements PC-based attendees Required: Windows® 2000, XP Home, XP Pro, 2003 Server, Vista Macintosh®-based attendees Required: Mac OS® X 10.4 (Tiger®) or newer

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11  | Next Page >