Skip navigation
15551 Views 10 Replies Latest reply: Oct 30, 2010 12:08 PM by Matt RSS
Evetsrm New Enzee 11 posts since
Sep 1, 2010
Currently Being Moderated

Sep 1, 2010 3:40 PM

SQL Server to Netezza - best approach question

I'm a new kid on the Enzee block - and can't wait to sink my teeth in this project.

 

I could use some collective wisdom to push me in the right direction.  I've been tasked with coming up with a general approach to the following problem.

 

We have a collection of desperate systems (say it isn't so!!) for which DTS (yes, I said DTS) packages update our DW database in SQL Server.  In the past, this database has been exposed to Siebel (now Oracle BI) and we have made the decision to move this data to Netezza for the interfact with Oracle BI.

 

The approach I have to come up with is the general direction of how to:

a) import data and table changes made from DTS packages overnight once they are complete

b) import intra-day data and table changes made from a variety of sources - again from desperate systems across the corporation

c) import intra-day data and table changes made by developers and system analysts for ad-hoc requests made by end users

 

I understand that to add a column, I'm going to have to build up a new one, then replace the old one with the newly built one because Netezza doesn't support Alter Table Add Column.

 

And the solution that seems to carry weight with me so far is an SSIS package for the overnight updates, and perhaps a scheduled sql job doing a compare to another SQL Database, then pushing differences over to Netezza in a batch configuration that can rollback with a log entry with any error at any transfer point.

 

If you've read this much, I am already in your debt for sticking with me, and if you can provide any wisdom or point me in the direction of what you think the best practice would be, I would be further in your debt.

 

Thanks you fellow Enzees for the wave of feedback I've heard for which you are famous.

  • David Birmingham Active Enzee 429 posts since
    Sep 24, 2007
    Currently Being Moderated
    1. Sep 1, 2010 4:04 PM (in response to Evetsrm)
    Re: SQL Server to Netezza - best approach question

    There is a best way to pull this off, but it's too lengthy to explain here. So I wrote a book Netezza Underground on Amazon.com (at the risk of shameless promotion)

     

    So the way to handle the Netezza machine like this, and many people do, is to build it out as an asset rather than a one-off, and also to treat it like a chugging processor (input/process/output) when you want to do data processing. Migrate all of your functionality out of the SSIS SQL and into Netezza SQL, and restructure how you solve the problems. You can create intermediate tables on the fly with the create-table-as-select, but I would wave you off of manufacturing new columns on the fly in the final target tables as part of the data processing. Your report users, if they are using a rules-based system, will not like the outcome.

  • Mark Aukeman New Enzee 26 posts since
    Oct 14, 2008
    Currently Being Moderated
    2. Sep 2, 2010 8:26 AM (in response to Evetsrm)
    Re: SQL Server to Netezza - best approach question

    If you are able to capture the data changes from the source systems, then you are in a good position to refresh Netezza using "ELT" (extract/load/transform) patterns:  source data changes >> flat files >> NZLOAD to staging database >> source to target transforms using NZSQL script or view >> insert change transactions into a staging table that is formatted like the target, and that classifies transactions as target INSERT or UPDATE >> refresh target in two steps -- INSERTs and UPDATEs.  This sequence assumes that any DELETEs are handled "soft deletes" which are handled as updates.  There are variations on this basic ELT sequence depending on the volume of of updates, and whether the target is a base fact, aggregate fact, SCD1 dimension, or SCD2 dimension table.

     

    The part I am not clear about is why you would be making table DDL changes based on daily intake stream.  Normally, schema changes would come through a change control process.

  • Shawn Fox Enzee Exraordinaire 1,482 posts since
    Aug 15, 2006
    Currently Being Moderated
    5. Sep 2, 2010 10:35 AM (in response to Evetsrm)
    Re: SQL Server to Netezza - best approach question

    Netezza 6.0 supports using "alter table" to add/drop columns.  Latest info I've heard says 6.0 should be available late this month (September 2010), but I've not seen an official release date annoucement so that is subject to change without notice.

     

    For near real time data movement from SQL Server (or Oracle, DB2, Sybase, Informix) to Netezza, you should take a look at DBMoto.  They offer a 30 day trial license on their website: http://www.hitsw.com/products_services/dbmoto/dbmoto.html

     

    WisdomForce also has a product called DBSync that does near real time replication from an Oracle source to a Netezza target.

  • Shawn Fox Enzee Exraordinaire 1,482 posts since
    Aug 15, 2006
    Currently Being Moderated
    7. Sep 10, 2010 11:26 AM (in response to Evetsrm)
    Re: SQL Server to Netezza - best approach question

    You can certainly accomplish data synchronization without using any tools at all, but you should always take the cost of development into account as well... the cost of the time for the developer to write the code and the support costs over time (which can really eat you alive if you do not have a very standard way of doing things).  In a simple case you may save money by doing it yourself, but if you have 100s of tables you will certainly save money using a tool designed for the purpose.

     

    The simplest way to do it is to put a trigger on each table in SQL Server that you want to monitor and insert the primary key value of each record that changes into a staging table.  You can then grab all of the distinct PK values from the staging table and join that back with the source table to get all of the data (rows that no longer exist were deleted... have to use a left join).  Save the data to a file, and use nzload to load it into Netezza.  You could also use triggers plus an ETL tool such as SSIS, Informatica, DataStage, etc.

     

    For small tables the best method is just to copy the entire table from the source to Netezza if you maintain the same table structures on both systems (small is relative, but it takes very little time to copy 100,000 rows for example).  For large tables you would want to use the trigger based mechanism although this can have a substantial effect on the performance of the programs that load data into the database.  This is another reason why a tool such as DBMoto is valuable... it adds very little to the workload on the source database since it is just parsing the transaction logs to get the changes instead of having to use a trigger to capture them in the database itself.

     

    Another option is to add a last update timestamp to each row in the source table which is probably updated using a trigger.  You can then just grab all the records that have been modified since the last time you checked the table.  This method, however, does not capture deletes.

  • David Birmingham Active Enzee 429 posts since
    Sep 24, 2007
    Currently Being Moderated
    9. Sep 10, 2010 10:36 PM (in response to Evetsrm)
    Re: SQL Server to Netezza - best approach question

    Just to add a tuppins to Shawn's comments. SQLServer provides for a rowversion - it's used for transactional concurrency and will automatically change (upward) whenever the row is changed for any reason. This allows you to track the given table by the rowversion that you last saw as the max, and the rowversion that you now see as the max, and then take the delta. Of course, if the model has not implemented rowversions, then you can't use them. But these are standard fare for most online transactional systems. Deleted records, as Shawn noted, you're on your own.

  • Matt New Enzee 11 posts since
    Jul 26, 2010
    Currently Being Moderated
    10. Oct 30, 2010 12:08 PM (in response to Evetsrm)
    Re: SQL Server to Netezza - best approach question

    Regarding how to identify DML changes made to the SQL Server tables, if you are using SQL 2008, take a look at Change Tracking and Change Data Capture.  These features are built in to the SQL engine and should take less resources than a trigger based solution, as well as less development effort.

More Like This

  • Retrieving data ...

Bookmarked By (1)

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points