Skip navigation
15336 Views 3 Replies Latest reply: Jan 19, 2010 5:04 PM by rmenon RSS
rmenon New Enzee 59 posts since
Apr 23, 2009
Currently Being Moderated

Jan 8, 2010 5:22 PM

update versus deletes

Hi folks

I am being told that in Netezza updates are in general slower than deletes and that for big tables they are not scalable.

 

I know they are two different statements but in a solution where you have two alternatives - one involving insert/delete and the other updates, is there a good engineering reason for us to choose deletes over updates. Is there any documentation about this anywhere?

 

Thank you!

  • Shawn Fox Enzee Exraordinaire 1,333 posts since
    Aug 15, 2006
    Currently Being Moderated
    1. Jan 8, 2010 6:14 PM (in response to rmenon)
    Re: update versus deletes

    One good reason to do delete/insert is that you cannot update a distribution column.  Depending on your data, that can be a problem.

     

    In testing I have done on 10000 series hardware, update/insert is faster than a delete/insert combination, but not by a substantial margin

     

    My general practice is to load all the data into a staging table, then perform delete + insert against the target table.  There are certainly exceptions, but you'll always do well with the delete + insert model.

     

    In all cases, if you aren't dealing with big tables it probably isn't going to matter a great deal.  Still it is best to follow exactly the same model with every process, a system is easier to support when every process follows exactly the same standard.  Your main goal should be to make it easy to audit the changes to the data each day.

     

    On Netezza performance of your ETL is rarely going to be an issue.  The main thing to focus on is making sure that when there is a problem you can determine what went wrong and you can easily recover from it.  For dimension type tables, that means keeping a full copy of the way the data looked prior to your ETL process running.  When possible, you should keep copies from the last several days.  For everything but your largest tables, storage is very cheap compared to the time spent debugging problems.

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    2. Jan 8, 2010 6:20 PM (in response to rmenon)
    Re: update versus deletes

    You also have the option, used mostly in "continuous" models, to add a companion table that contains all of the "deleted" records simply by using a unique identifier and the same distribution key. By joining to this table using a "where not exists" you effectively filter the deleted records from every query. Painless even on large tables, but again finds its practicality more in continuous models where there is no batch window for maintenance.

     

    As Shawn noted - be consistent with whatever you do.

More Like This

  • Retrieving data ...

Bookmarked By (0)