Sep 8, 2010 4:23 PM
No Difference in Performance
-
Like (0)
Hi,
First of all, I apologize for duplicate thread.
I am doing a testing to check the performance.
First Test, In my table a and table b, we have a common columns having char(1) as datatype. Table has approx. 900,000 records and table B has 5 records.
We are checking the count(*) from table A by joining table a and b with the cmmon column having char(1) as datatype.
Second test , we performed by joining both the tables by adding a common column havign an Integer as data type and executed select count(*) from table A by joining table a and b with the cmmon column having Ineteger(1) as datatype.
We are not seeing any different as far as the execution time is concerned.
As per my understanding, the execution time should be less when we are using joing on Integer data types.
Please let me know what is the reason of not finiding any difference in the performance
Regards
Neeraj
900,000 rows is a very small table. I doubt you have zone maps on it at all unless you are running on a skimmer system perhaps. The minimum table size to get zone maps is around 15MB * dataslice count. So even on a TF3 that is 24*15 = 360MB. With that minimal size you won't see much of a difference between using zone maps or not. If you want to test out the difference you need to have a table that is in the 10s of millions of records at least and possiblity quite a bit larger if you are on system like a TF12 or one of the really big systems like a TF96 or some such.
Also a char(1) isn't really a valid test since that is a fixed width character and it is only 1 byte anyway. The times when you'll run into issues are with something like a varchar(40) vs. an integer column (or when you can make use of zone maps).
Personally I prefer to use char(1) for values like true/false, male/female, status columns, etc... I find them much easier to read. You won't get zone maps on them though and that can be a problem in some situations. With Netezza 6.0 you do get zone maps on character fields if you use them as one of the columns your table organization clause, but if you use multiple columns make sure their cardinality is fairly low otherwise you won't see much if any benefit from them.
Hi,
Thanks for the reply. I am new to netezza.
I performed a test with 16 MB(approx.) of data now. Still the same results. How can I check Zone map is enabled or not?
Regards
Neeraj
It has to be 15MB per dataslice (at least). Use "select count(*) from _v_dslice" to determine the number of dataslices on your system.
Also you wouldn't see any performance gain unless your source table is physically ordered on the column you are joining on (for example, create table foo as select * from bar order by x) and of course you would have to have a filter of some sort. I assume that the table you are joining to has only some of the values that are in the target table?
The easiest way to see if you have zone maps for your table is to download Aginity Workbench:
http://www.aginity.com/Workbench/tabid/237/Default.aspx
In the list of tables when you select a table you'll have a "Zone Maps" node below the table. If you double click on "Zone Maps" it will run a query to determine the zone maps on the table. If it doesn't find any then you don't have any good zone maps (or your table is too small).
Apologies for reviving an old thread, but I wanted to double-check something.
At my company, we're running NPS 6.0.2 running on TwinFin 3 and I have a table that has the following stats:
So, according to the min MB requirement per data slice (15), the 22 active dataslices would require that 330 MB of space would be necessary for the system to generate zone maps. Yet, according to Aginity, this table has 2 active zone maps at 100% at that. Can someone please shed some light on this?
Just a guess, but I would suspect Netezza creates zone maps regardless of the size of the table. It doesn't make any sense that it would wait until a table reached a certain threshold because it would significantly complicate matters for the database for no real gain.
In your particular case, there is probably only one data block on each SPU, which would certainly make the zone maps 100% effective in localizing the correct data blocks.
Hey Nick,
Thank you for your response. You're correct in saying that I have data on each data slice as the distribution is RANDOM. I've noticed something odd - if you specify RANDOM distribution and do NOT specify an ORDER BY clause in the CTAS, Netezza creates zone maps for multiple columns. Am I wrong in my observations or am I doing something wrong?
Jordan
By default Netezza creates zone maps for most numeric and date/time columns (the manual has a definitive list). It does this regardless of how the table is created.
Netezza always creates zone maps but does not always use them. The rule used to be that zone maps were not used unless the table was at least 5 extents (15MB) on a dataslice... I'm not sure if that still holds, but it probably does. All of the zone map information is stored in a single table and there is a cost to look the information up, so unless the potential gain is significant it is not worth paying the guaranteed cost of checking the zone maps to reduce the work.
Clearly in the case of a 6000 row table it is completely pointless to check the zone maps, it is vastly cheaper just to look at every row in the table.
You say that you are joining to this column but not actually filtering on it (e.g. where value-1) - joining on the column is different from using it as a zone map - sometimes they complement each other and sometimes not. But just because you are joining does not mean the zone map is being leveraged for the scan. In fact, joining alone without filtering on the given column, would guarantee a full table scan if all the values are in both tables. The only benefit of doing it this way (joining) is if you can place a subset of the data in one table and use it as filter to another. If both tables have the same data and you simply join, you will initiate a table scan.
However, if both tables contain the same data and you join but also use where column = 1 for one or both of them, this would potentially leverage the zone maps where the filter was applied.
So, is it wrong to conclude that ZM will only trigger when you start filtering data?
Bingo - A zone map assists in filtering data. Just as a distribution key assists in co-located joins. But if you don't use the distribution key in the join, the co-location will never happen, just like if you don't actually mention the zone map column in the where-clause, the zone map is not leveraged for the query. Zone maps support filtration. If you aren't filtering, what do you need a zone map for?
So final question David... would zone maps get triggered once a result dataset resides in memory, e.g. it's a product from a join of two or more other tables? My guess would be no as ZM get triggered by the FPGAs, not memory..?
Zone maps are only created for tables, not in-memory cache. It is a technique to speed up disk access.
Sorry if keep beating a dead horse here...
So, if I have two tables that are UNION ALL'd in a non-materialized view and then JOINed with other tables, am I better off with implementing a totally new table on disk, containing the result sets for each source table in the aforementioned view, in terms of performance?

