Feb 24, 2010 12:04 PM
Exact Numeric Data Types - what precision to use?
-
Like (0)
We have a billion row standard fact table (integer/bigint surrogate keys, fact columns are all in numeric(p,s) format).
The NZ document recommends:
Always use the smallest integer or fixed-point numeric whenever possible. When converting source data to the NPS system,
you may need to analyze the data to determine the smallest data type that you can use.
However, some of our fact columns may contain currency, e.g. Japanese yen, US $ etc.
To make the design more generic, we would like to use the largest precision for fixed point numeric whenever possible.
I did some experiments with Numeric(9,2), Numeric(18,2), Numeric(38,2) on Twinfin. It turns out the disk space is really
determined by the underlying data that are stored in those columns. Changing data types from 4 to 8 to 16 bytes fixed point numeric
almost has no impact on the actual data storage (compression is on by default).
Any recommendations on how to pick the "generic" precision for fixed-point numeric while preserving the disk consumption (potentially positive
impact on query performance)?
Well, not to split hairs or anything, but if you can do a smaller integer it will alway save the disk space over a larger one, and each additional byte in a billion-row table is a gigabyte of storage.
The necessary precision should not be driven by Netezza's constraints but by your business rules. So if you need the additional widht or precision, just do it. The compression will cover you either way.
Another aspect is that float does not compress. So stay in the numeric if at all possible (these are just scaled integers anyhow).
Another aspect of your data-size testing - you need to use a very large sample to test with because Netezza has a default-minimum database size that will be allocated no matter what the size of the table, so you could get a false positive (that they all appear the same) because your sample is too small.
I would stick with NUMERIC(18,X) if possible. A NUMERIC(18,X) uses 8 bytes whereas anything bigger uses 16 bytes. The net result is that NUMERIC(18) uses significantly less CPU / FPGA resources since it is half the size of a NUMERIC(38,X). Also you should see lower disk usage for NUMERIC(18,X) vs. NUMERIC(38,X). Compression will remove most of the difference but not all.
In the past I have usually used NUMERIC(18,4) to store monetary values but that was due to converting all money to a single currency (and storing the original conversion factor along with the fact data so that I could undo the conversion if needed).
Unless you are processing Zimbabwe currency or the US national debt a NUMERIC(18,X) should be plenty.
The only thing you would need to test, past all this - is to make sure that your BI math doesn't spontaneously combust because of overflow. Don't jump to convert them to float until you read my blog entry A Tastier Float here on the Netezza community (Gather 'Round the Grill)
<duplicate>

