Skip navigation
23200 Views 15 Replies Latest reply: Aug 4, 2011 5:59 PM by David Birmingham RSS 1 2 Previous Next
Neeraj1101 New Enzee 2 posts since
Sep 2, 2010
Currently Being Moderated

Sep 8, 2010 4:23 PM

No Difference in Performance

Hi,

 

First of all, I apologize for duplicate thread.

 

I am doing a testing to check the performance.

 

First Test, In my table a and table b, we have a common columns having char(1) as datatype. Table has approx. 900,000 records and table B has 5 records.

 

We are checking the count(*)  from table A by joining table a and b with the cmmon column having char(1) as datatype.

 

Second test , we performed by joining both the tables by adding a common column havign an Integer as data type and executed select count(*) from table A by joining table a and b with the cmmon column having Ineteger(1) as datatype.

 

We are not seeing any different as far as the execution time is concerned.

 

As per my understanding, the execution time should be less when we are using joing on Integer data types.

 

Please let me know what is the reason of not finiding any difference in the performance

 

Regards

Neeraj

  • Shawn Fox Enzee Exraordinaire 1,482 posts since
    Aug 15, 2006
    Currently Being Moderated
    1. Sep 8, 2010 4:33 PM (in response to Neeraj1101)
    Re: No Difference in Performance

    900,000 rows is a very small table.  I doubt you have zone maps on it at all unless you are running on a skimmer system perhaps.  The minimum table size to get zone maps is around 15MB * dataslice count.  So even on a TF3 that is 24*15 = 360MB.  With that minimal size you won't see much of a difference between using zone maps or not.  If you want to test out the difference you need to have a table that is in the 10s of millions of records at least and possiblity quite a bit larger if you are on system like a TF12 or one of the really big systems like a TF96 or some such.

     

    Also a char(1) isn't really a valid test since that is a fixed width character and it is only 1 byte anyway.  The times when you'll run into issues are with something like a varchar(40) vs. an integer column (or when you can make use of zone maps).

     

    Personally I prefer to use char(1) for values like true/false, male/female, status columns, etc... I find them much easier to read.  You won't get zone maps on them though and that can be a problem in some situations.  With Netezza 6.0 you do get zone maps on character fields if you use them as one of the columns your table organization clause, but if you use multiple columns make sure their cardinality is fairly low otherwise you won't see much if any benefit from them.

  • Shawn Fox Enzee Exraordinaire 1,482 posts since
    Aug 15, 2006
    Currently Being Moderated
    3. Sep 8, 2010 7:19 PM (in response to Neeraj1101)
    Re: No Difference in Performance

    It has to be 15MB per dataslice (at least).  Use "select count(*) from _v_dslice" to determine the number of dataslices on your system.

     

    Also you wouldn't see any performance gain unless your source table is physically ordered on the column you are joining on (for example, create table foo as select * from bar order by x) and of course you would have to have a filter of some sort.  I assume that the table you are joining to has only some of the values that are in the target table?

     

    The easiest way to see if you have zone maps for your table is to download Aginity Workbench:

    http://www.aginity.com/Workbench/tabid/237/Default.aspx

     

    In the list of tables when you select a table you'll have a "Zone Maps" node below the table.  If you double click on "Zone Maps" it will run a query to determine the zone maps on the table.  If it doesn't find any then you don't have any good zone maps (or your table is too small).

  • Jordan Chernev Rookie 183 posts since
    Jun 24, 2011
    Currently Being Moderated
    4. Jul 25, 2011 11:49 AM (in response to Shawn Fox)
    Re: No Difference in Performance

    Apologies for reviving an old thread, but I wanted to double-check something.

     

    At my company, we're running NPS 6.0.2 running on TwinFin 3 and I have a table that has the following stats:

     

      • 6,023 rows
      • Bytes used: 5.00 MB (via nzAdmin)
      • Bytes Allocated: 120 MB

     

    So, according to the min MB requirement per data slice (15), the 22 active dataslices would require that 330 MB of space would be necessary for the system to generate zone maps. Yet, according to Aginity, this table has 2 active zone maps at 100% at that. Can someone please shed some light on this?

  • nickg Active Enzee 248 posts since
    Mar 16, 2011
    Currently Being Moderated
    5. Jul 25, 2011 1:45 PM (in response to Jordan Chernev)
    Re: No Difference in Performance

    Just a guess, but I would suspect Netezza creates zone maps regardless of the size of the table.  It doesn't make any sense that it would wait until a table reached a certain threshold because it would significantly complicate matters for the database for no real gain.

     

    In your particular case, there is probably only one data block on each SPU, which would certainly make the zone maps 100% effective in localizing the correct data blocks.

  • Jordan Chernev Rookie 183 posts since
    Jun 24, 2011
    Currently Being Moderated
    6. Jul 25, 2011 2:46 PM (in response to nickg)
    Re: No Difference in Performance

    Hey Nick,

     

    Thank you for your response. You're correct in saying that I have data on each data slice as the distribution is RANDOM. I've noticed something odd - if you specify RANDOM distribution and do NOT specify an ORDER BY clause in the CTAS, Netezza creates zone maps for multiple columns. Am I wrong in my observations or am I doing something wrong?

     

    Jordan

  • nickg Active Enzee 248 posts since
    Mar 16, 2011
    Currently Being Moderated
    7. Jul 25, 2011 6:06 PM (in response to Jordan Chernev)
    Re: No Difference in Performance

    By default Netezza creates zone maps for most numeric and date/time columns (the manual has a definitive list).  It does this regardless of how the table is created.

  • Shawn Fox Enzee Exraordinaire 1,482 posts since
    Aug 15, 2006
    Currently Being Moderated
    8. Jul 26, 2011 12:55 PM (in response to nickg)
    Re: No Difference in Performance

    Netezza always creates zone maps but does not always use them.  The rule used to be that zone maps were not used unless the table was at least 5 extents (15MB) on a dataslice... I'm not sure if that still holds, but it probably does.  All of the zone map information is stored in a single table and there is a cost to look the information up, so unless the potential gain is significant it is not worth paying the guaranteed cost of checking the zone maps to reduce the work.

     

    Clearly in the case of a 6000 row table it is completely pointless to check the zone maps, it is vastly cheaper just to look at every row in the table.

  • David Birmingham Active Enzee 429 posts since
    Sep 24, 2007
    Currently Being Moderated
    9. Aug 4, 2011 1:59 PM (in response to Neeraj1101)
    Re: No Difference in Performance

    You say that you are joining to this column but not actually filtering on it (e.g. where value-1) - joining on the column is different from using it as a zone map - sometimes they complement each other and sometimes not. But just because you are joining does not mean the zone map is being leveraged for the scan. In fact, joining alone without filtering on the given column, would guarantee a full table scan if all the values are in both tables. The only benefit of doing it this way (joining) is if you can place a subset of the data in one table and use it as filter to another. If both tables have the same data and you simply join, you will initiate a table scan.

     

    However, if both tables contain the same data and you join but also use where column = 1 for one or both of them, this would potentially leverage the zone maps where the filter was applied.

  • Jordan Chernev Rookie 183 posts since
    Jun 24, 2011
    Currently Being Moderated
    10. Aug 4, 2011 2:08 PM (in response to David Birmingham)
    Re: No Difference in Performance

    So, is it wrong to conclude that ZM will only trigger when you start filtering data?

  • David Birmingham Active Enzee 429 posts since
    Sep 24, 2007
    Currently Being Moderated
    11. Aug 4, 2011 2:13 PM (in response to Jordan Chernev)
    Re: No Difference in Performance

    Bingo - A zone map assists in filtering data. Just as a distribution key assists in co-located joins. But if you don't use the distribution key in the join, the co-location will never happen, just like if you don't actually mention the zone map column in the where-clause, the zone map is not leveraged for the query. Zone maps support filtration. If you aren't filtering, what do you need a zone map for?

  • Jordan Chernev Rookie 183 posts since
    Jun 24, 2011
    Currently Being Moderated
    12. Aug 4, 2011 2:30 PM (in response to David Birmingham)
    Re: No Difference in Performance

    So final question David... would zone maps get triggered once a result dataset resides in memory, e.g. it's a product from a join of two or more other tables? My guess would be no as ZM get triggered by the FPGAs, not memory..?

  • nickg Active Enzee 248 posts since
    Mar 16, 2011
    Currently Being Moderated
    13. Aug 4, 2011 5:04 PM (in response to Jordan Chernev)
    Re: No Difference in Performance

    Zone maps are only created for tables, not in-memory cache.  It is a technique to speed up disk access.

  • Jordan Chernev Rookie 183 posts since
    Jun 24, 2011
    Currently Being Moderated
    14. Aug 4, 2011 5:21 PM (in response to nickg)
    Re: No Difference in Performance

    Sorry if keep beating a dead horse here...

     

    So, if I have two tables that are UNION ALL'd in a non-materialized view and then JOINed with other tables, am I better off with implementing a totally new table on disk, containing the result sets for each source table in the aforementioned view, in terms of performance?

1 2 Previous Next

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points