belvoir - Developer IT

Search Results

Search found 2 results on 1 pages for 'belvoir'.

Page 1/1 | 1

PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data

- by belvoir

Background: I have a PostgreSQL (v8.3) database that is heavily optimized for OLTP. I need to extract data from it on a semi real-time basis (some-one is bound to ask what semi real-time means and the answer is as frequently as I reasonably can but I will be pragmatic, as a benchmark lets say we are hoping for every 15min) and feed it into a data-warehouse. How much data? At peak times we are talking approx 80-100k rows per min hitting the OLTP side, off-peak this will drop significantly to 15-20k. The most frequently updated rows are ~64 bytes each but there are various tables etc so the data is quite diverse and can range up to 4000 bytes per row. The OLTP is active 24x5.5. Best Solution? From what I can piece together the most practical solution is as follows: Create a TRIGGER to write all DML activity to a rotating CSV log file Perform whatever transformations are required Use the native DW data pump tool to efficiently pump the transformed CSV into the DW Why this approach? TRIGGERS allow selective tables to be targeted rather than being system wide + output is configurable (i.e. into a CSV) and are relatively easy to write and deploy. SLONY uses similar approach and overhead is acceptable CSV easy and fast to transform Easy to pump CSV into the DW Alternatives considered .... Using native logging (http://www.postgresql.org/docs/8.3/static/runtime-config-logging.html). Problem with this is it looked very verbose relative to what I needed and was a little trickier to parse and transform. However it could be faster as I presume there is less overhead compared to a TRIGGER. Certainly it would make the admin easier as it is system wide but again, I don't need some of the tables (some are used for persistent storage of JMS messages which I do not want to log) Querying the data directly via an ETL tool such as Talend and pumping it into the DW ... problem is the OLTP schema would need tweaked to support this and that has many negative side-effects Using a tweaked/hacked SLONY - SLONY does a good job of logging and migrating changes to a slave so the conceptual framework is there but the proposed solution just seems easier and cleaner Using the WAL Has anyone done this before? Want to share your thoughts?

Read the article

Python MySQLdb LOAD LOCAL INFILE problems

- by belvoir

The problem is a simple one. When I execute the following I get different results depending on whether I run it from the MySQL console and from inside a Python Script using MySQLdb: LOAD DATA LOCAL INFILE '/tmp/source.csv' INTO TABLE test FIELDS TERMINATED BY '|' IGNORE 1 LINES; Console gives the following results: Records: 35002 Deleted: 0 Skipped: 0 Warnings: 0 Python (via .info()) returns the following: Records: 34977 Deleted: 0 Skipped: 0 Warnings: 8 So in summary, same source file, same SQL request, different results. From the console I can 'SHOW WARNINGS' an get a better handle on which records are causing the problems and why but from Python I can't idenitify how to do this or more importantly what the cause of the problem could be. Any suggestions? MySQL Server '5.1.41-3ubuntu12.1' Python '2.6.5' Tables are MyISAM