Skip navigation
19117 Views 5 Replies Latest reply: Nov 5, 2009 10:25 AM by pat.prashant RSS
Lawrence New Enzee 2 posts since
Sep 14, 2009
Currently Being Moderated

Oct 30, 2009 5:43 AM

Performance Issue in Netezza Database

We are using two NZ Databases. One is for QA server and the other is PRODUCTION server that have same setup.

 

The following query which is executed in QA consumes only 15-20 secs, rather it consumes 5-10 MINS in PRODUCTION server.

 

 

 

 

 

 

 

 

UPDATE

TMPETLSTGFINALCREDITMEDIAEVENTMINUTEBYDEVICE f

SET

f.isIntabHouseholdFlagID =

 

 

CASE

 

 

     WHEN i.isIntabFlag = 'N'

 

 

          THEN 0

 

 

     WHEN i.isIntabFlag = 'Y'

 

 

          THEN 1

 

 



END

 

FROM



 



 

ETLSTGINTABSTATUS i

WHERE

f.sourceInstalledHouseholdNumber = i.sourceInstalledHouseholdNumber

 

 

AND f.finalCreditViewedStartEST BETWEEN i.startDateTimeEST AND

i.endDateTimeEST ;

 

 



1) The volume of data (approx 4 million rows) is same (in the table ETLSTGINTABSTATUS) in QA and PRODUCTION server.
2) QA Server and PRODUCTION have the same setup.

 

 

It would be great, if any one guide in resolving this issue.



  • Chris Rodgers Active Enzee 172 posts since
    Sep 19, 2006
    Currently Being Moderated
    1. Oct 30, 2009 8:27 AM (in response to Lawrence)
    Re: Performance Issue in Netezza Database

    If you have the exact same query that performs differently on 2 identical/similar databases, the best thing to offer for assistance is an explain plan of *both* queries (explain verbose in one database, explain verbose in the second database).  Upload those here as attachments and folks will be able to help pinpoint the differences.

     

    What version of NPS are each of the servers running?

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    2. Nov 3, 2009 3:51 PM (in response to Lawrence)
    Re: Performance Issue in Netezza Database

    when you note, "same setup" what does this mean?

     

    Update performance is largely a function of the distribution key. You will see radical swings in performance like this if you are attempting to perform an update without the distribution key in the where-clause. The distribution key allows the machine to hash the key and then distribute the actual content of the update only to its designated SPU. Without it, you might see it dispatch the entire contents of the update to all the SPUs, which is why the radical swing.

     

    Are both tables distributed on "sourceInstalledHouseholdNumber"? And if not, is the target table at least distributed on the key mentioned in the join?

  • pat.prashant New Enzee 7 posts since
    Oct 30, 2009
    Currently Being Moderated
    3. Nov 5, 2009 4:44 AM (in response to David Birmingham)
    Re: Performance Issue in Netezza Database

    Thanks for your input David !

     

    All of our persistent tables are distributed on random due to a lack of a proper distribution key. But your question made us realise that the temp table involved in the update statement could atleast be distributed on "sourceInstalledHouseholdNumber", which then seems to have improved performance.

     

    Just for the record, QA system is on version 4.6.2 while the PROD is on 4.5.2. Could the different versions of the optimizer be also the reason for the radical difference in performance?

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    4. Nov 5, 2009 7:24 AM (in response to pat.prashant)
    Re: Performance Issue in Netezza Database

    Performance is found in the configuration of the data. How the data is layed out in the SPUs is the master arbiter. Tweaking and tuning the machines is the wrong answer.

     

    While the differences in the versions can have an effect, your real problem is in attempting to update a table that is distributed on random.

     

    Update is a key-based operation.

     

    Think about how you would perform this update in the real world. If you have forty houses on the block, and you have ten packages to send to the

    residents of the block, but the residents are scattered randomly across the houses, you would have to either send all ten packages to each house, or

    you would have to call each house to determine who lived there, then send the package to the right place.

     

    But if the houses have families by name, and you have a list of names in-hand and their home addresses, you could send each package directly to the home

    address without any further work.

     

    The Netezza machine applies power at the physics layer. Physics is driven by distribution-key coordination. If you have no distribution, you invoke a broadcast. Not

    all broadcasting is bad for outbound queries, but when you want to do an update, broadcasting is bad.

     

    I would also suggest that you put pen-to-paper and derive an insert-only model for your database. This only requires incrementally more creativity and work, but eliminates much of the data management problems you see following updates and deletes.

     

    for example, with updates and deletes, you must eventually reclaim the space created by the operation. It will not reclaim it for you. An update operation is a full record delete/insert combination. So you take the penalty of the delete and the insert together, rather than just the insert alone. What is the penalty? A hole in your table that you will have to reclaim, because when Netezza deletes it does not remove the record, it only marks it for deletion and therefore excludes it from all queries But it still occupies space. I've seen people over time say they have only a million records in a table but the table is taking up the space of forty million records - all because forty million deletes/updates have occurred over time and the space was never reclaimed.

  • pat.prashant New Enzee 7 posts since
    Oct 30, 2009
    Currently Being Moderated
    5. Nov 5, 2009 10:25 AM (in response to David Birmingham)
    Re: Performance Issue in Netezza Database

    Enlightening indeed, David. Thanks a ton!

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points