In one of our primary tables, we'll call it a fact table, it contained a number of columns that had arrived through some pretty hairy ELT-based math algorithms. In all the crunching, we would see spontaneous overflow errors, so we converted some of them to float. More explosions occurred, and we converted more to float. After several more iterations, we converted them all to float. Then we discovered that the reporting layer also had to perform some hairy on-demand calculations, so it was a good thing we had float values to give them. Now everyone was safe.
However, as this table grew, and they always do, the floats became "bloats". Netezza does not compress a float data type. One day we looked up and the table was approaching 20 TB in size, with no end in sight. The theory was, that we could reduce these float values to numeric data types, we could save half the storage right away, and even more so with Netezza's compression, but it would put the reporting layer in danger of a spontaneous overflow explosion.
Once we performed the conversion of the table (as a test case) and saw it reduce in size to about 7 TB, we were hooked on the possibilities of compression but vexed as to the impact this would have on the consumers of the data.
We had experimented with surgically casting the data from numeric to float on-the-fly, but this would create a lot of headaches for the users if they always had to wrap every field with a casting notation. It did however, prove out one thing, that the time to cast the numeric-to-float is inconsequential when compared to the amount of I/O required to pull a float value from the SPU's disk "as is". In essence, we traded the time we saved in compression, and converted it to time used in casting.
So the next step would be, put a special view on top of the fact table, such that it would automatically cast every numeric column into a floating point value. Thus, whenever a reporting layer query required data, it would automatically and transparently leverage the view, pull less data from the disk, covert it to float in the CPU and then leverage it as float in memory. We effectively eliminated the cycles spent in I/O to rip the float value from the disk drive. We spent a little of it in the cast of the data to a float. We made the operation transparent to the reporting layer.
old way:
FLOAT ->>>>>>>>>>>>>>>>>>>>>>>CPU -> QUERY MATH
(16 bytes, no compression
new way:
NUMERIC ->>>>>>>>>>>>>>>>>>>>FLOAT -> CPU -> QUERY MATH
(8 bytes or less, with compression)
All of the CPU-level math then becomes inconsequential when we move to the Twinfin, since it has its own floating-point processor and can handily deal with the float type. But we can continue to mitigate the I/O hit for the data by storing it in a compressible numeric format, and coverting this on-demand to a float at the CPU level.