<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:clearspace="http://www.jivesoftware.com/xmlns/clearspace/rss" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>Gather 'round the Grill</title>
    <link>http://www.enzeecommunity.com/blogs/grill</link>
    <description />
    <pubDate>Mon, 28 Jun 2010 17:20:22 GMT</pubDate>
    <generator>Clearspace 2.5.3 (http://jivesoftware.com/products/clearspace/)</generator>
    <dc:date>2010-06-28T17:20:22Z</dc:date>
    <item>
      <title>Enzee Universe Aftermath</title>
      <link>http://www.enzeecommunity.com/blogs/grill/2010/06/28/enzee-universe-aftermath</link>
      <description>&lt;!-- [DocumentBodyStart:a19b0b76-dd39-453f-9667-5ad3e7d9f660] --&gt;&lt;div class='jive-rendered-content'&gt;&lt;p&gt;Whew! The Enzee Universe this year was quite an experience.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;I would like to offer my sincerest thanks to all of you who attended the Best Practices session held in marathon-form on Monday before the keynote. Over 300 people signed up, and many of you arrived the evening before, and at the end of the session, you were still there!&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Afterwards we took a checkpoint and then added more material to the powerpoint presentations that we used during the sessions, and these will be added to the content of the Enzee Universe downloads for those who attended.&lt;/p&gt;&lt;p&gt;&lt;br/&gt;Some of you also asked me about the music selections we played at the intro and during the breaks. These were selected in terms of "Your Theme Songs", because some of them were from superhero movies, and some from action-adventure flicks. Here they are, in no particular order, the tune, origin and reason for selection:&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Theme from "The Incredibles" - because we come from a family (The Enzee Universe) of mult-talented superstars&lt;/p&gt;&lt;p&gt;Theme from "Batman/The Dark Knight" - because sometimes we have to work in a thankless role (however personally rewarding, with high-tech toys)&lt;/p&gt;&lt;p&gt;Theme from "Superman" - because to competitors, Netezza is like Kryptonite, and to the rest of us, it solves World Problems and makes us look good without having to wear our underwear on the outside&lt;/p&gt;&lt;p&gt;Theme from "Spirit, Stallion of the Cimarron" - surging music for those who entered the frontier on a Mustang&lt;/p&gt;&lt;p&gt;Theme from "Surf's Up" - probably enough said, here&lt;/p&gt;&lt;p&gt;Theme from "Mission Impossible" - a congenially offered rebuttal to those naysayers who say it can't be done&lt;/p&gt;&lt;p&gt;Theme from "James Bond" - because he, like many of you, is an MTBA - That's Multi-Talented Bad A**&lt;/p&gt;&lt;p&gt;Herbie Hancock - &lt;em&gt;&lt;strong&gt;Rockit&lt;/strong&gt;&lt;/em&gt; - because that's what you do&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;I would also like to thank Netezza for the opportunity to share these ideas, many of which I have gathered over the course of my Netezza derring-do from people just like you, so some of the information is what-is-practiced in the field, and some of it is idea-works that we have re-synthesized into practices that seem work well as a sort of "adaptive composite". The objective of course is to share with you what others are doing, to enrich your base of ideas, but are certainly not hard-and-fast rules. The Netezza appliance is one that unlocks creativity, harnessing it for the Good of All Mankind. So guidelines and practices give us more critical mass to solve problems.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Likewise I would like to thank Netezza for the Enzee Community Voice Award presented to me on Tuesday night at the Gala, in recognition for being such a vocal supporter. But my words then apply as now "The people and the product create a synergy that's like electric current. I love interacting with the Enzee Community, and being a part of it".&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;I also noted that a larger number of independent consultant/contractors were present on this go-round. In the best practices sessions, there are a wide range of professionals, from those who are Netezza customers, to consultants working for firms, independent consultants, analysts and the like. The Enzee Universe has various video screens constantly running slow-motion surfing videos in keeping with the TwinFin Theme. One day you might be lookin' at that guy on the wave, thinkin' about the Twinfin and wondering if you're on the wave, or just watching it pass.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;As with all professions, you might be in the zone where your project is just ending, or is about to, and you're wondering about the next great thing. And as you know, I'm always on the hunt for bright architects and engineers, especially in this economy, so if any of you independent types are looking for an opportunity, give me a shout. I also extend the invitation to anyone else who is reading, with the qualified apology that I am not lurking at the doorways to steal away your company's valuable resources. But I have seen in the past that some bright folks find themselves tapping a pencil on their desk, coming down from the exhilaration of a Netezza 'experience" and wishing for more. I can say - the work is out there. I'm often in contact with people who need someone just like you - hooked on the technology. Hey, who &lt;strong&gt;&lt;em&gt;isn't?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Finally I offer a simple salutation to everyone who "gathered round the grill" this past week, sampled the wares, wishes and whatcha-ma-callits of the various vendors, trainers and speakers, and came away enriched and enabled to dream a little stronger, solve a little simpler, and crush those waves with the shredding confidence of parallel power. So I'll either see you in your natural habitat, interact with you here, or catch up with you in person when the Enzee Universe cranks up another adventure.&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:a19b0b76-dd39-453f-9667-5ad3e7d9f660] --&gt;</description>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">practice</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">consultants</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">thanks</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">universe</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">independent</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">voice</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">award</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">enzee</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">twinfin</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">job</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">opportunity</category>
      <pubDate>Mon, 28 Jun 2010 18:18:05 GMT</pubDate>
      <author>dbirmingham</author>
      <guid>http://www.enzeecommunity.com/blogs/grill/2010/06/28/enzee-universe-aftermath</guid>
      <dc:date>2010-06-28T18:18:05Z</dc:date>
      <clearspace:dateToText>2 months, 6 days ago</clearspace:dateToText>
      <wfw:comment>http://www.enzeecommunity.com/blogs/grill/comment/enzee-universe-aftermath</wfw:comment>
      <wfw:commentRss>http://www.enzeecommunity.com/blogs/grill/feeds/comments?blogPost=1153</wfw:commentRss>
    </item>
    <item>
      <title>Rewiring our thinking processes - Traditional RDBMS to Netezza</title>
      <link>http://www.enzeecommunity.com/blogs/grill/2010/03/30/rewiring-our-thinking-processes---traditional-rdbms-to-netezza</link>
      <description>&lt;!-- [DocumentBodyStart:b2bacf86-9c6d-40ce-92ce-1404862ed17c] --&gt;&lt;div class='jive-rendered-content'&gt;&lt;p&gt;One of the questions oft-asked in best-practices sessions and in general consulting: How do we get a "newbie" on-boarded quickly? Some concern usually arises when the new Enzee approaches the Netezza machine with the same thinking processes as with a traditional RDBMS. While there are "gross" similarities, it is the &lt;em&gt;&lt;strong&gt;differences&lt;/strong&gt;&lt;/em&gt; we want to leverage, and these are not either/or questions. There is a better way to implement things in Netezza, and a better way in the traditional RDBMS. Mixing the two is not optimum and can be detrimental.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;The primary discussion fulcrum is simple: One is a transactional database and one is not. Moving away from "transactional thinking" is the key. How to accomplish this?&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;One of the best ways is to discuss and actually demonstrate the primary differences between bulk and transactional processing. As this is largely the crux of misunderstanding, or even the necessary "paradigm shift" our newbie needs to embrace, a significant hurdle it seems, is the newbie's belief that the core engine functionality of their favorite RDBMS is somehow being indicted or set aside as useless. After all, the transactional RDBMS is just that - &lt;em&gt;&lt;strong&gt;transactional&lt;/strong&gt;&lt;/em&gt; - and this is what we want the newbie to move away from. What? All that hard-won and industry-hardened capability - and we're just setting it aside? Really?&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;In a word - Yes.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;It's not that the transactional capabilities are useless. They simply aren't use&lt;em&gt;&lt;strong&gt;ful&lt;/strong&gt;&lt;/em&gt; in a data warehouse. More importantly, they &lt;em&gt;&lt;strong&gt;don't even exist&lt;/strong&gt;&lt;/em&gt; in a Netezza machine. So attempting to shoe-horn transactional thinking into this machine is a huge disconnect - no differently than using a lawnmower as a hedge-trimmer. Netezza is purpose-built. Transactions are missing by design.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Now at least one person is bristling because they know, administratively, that transactional support is handy for logging, managing metadata, troubleshooting hooks and other administrative support. I don't disagree with that, but it's not the activity of bulk data processing. It is far easier to set up a smaller database machine alongside the Netezza machine to perform these administrative transactional tasks. Each machine then has an objective role and purpose, and off we go.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;What are some of the demonstrable ways that we can introduce the new Enzee to this issue, in a manner that really drives the point home? Well, I can't seem to count how many times I've had (sometimes rather contentious) discussions with "outsiders" (or perhaps "purists" ) on the subject of transactional exception handling. Inserting a record into a transactional database, with its glorious constraints turned on, will guarantee that it will pushback on us with an exception. Said exception requiring the dutiful compliance of an exception &lt;em&gt;handler&lt;/em&gt;. You know the drill.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;But in data warehousing, such transactional exceptions are &lt;em&gt;&lt;strong&gt;in the way&lt;/strong&gt;&lt;/em&gt; of our bulk load. We don't want the database to examine each and every record as it arrives, potentially formulating an exception (and its attendant overhead) for each record, or passing each one through after its constraint-based integrity check. We just finished taking all that data through a detailed sieve of business rules in the ETL layer, didn't we? The database needn't trouble itself, &lt;em&gt;just load the data, thank you.&lt;/em&gt;&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Now at least one more "outsider" is bristling. How dare you say that we should set aside the constraint-based exception handling? What possible justification could there be for such a gross trampling of RDBMS functionality? Explain yourself!&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;In a word - &lt;em&gt;performance&lt;/em&gt;.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Storytime: Just after 9/11, the airports over-compensated with all kinds of rigorous shakedown protocols. Travelers had to show a boarding pass and ID at a checkpoint, then keep them handy for just after the checkpoint. And then also for presentation at the gate prior to boarding, along with random bag searching. If you were the first one to board, or made eye-contact with the bag-search team, it was guaranteed that you would be taken aside and your luggage &lt;em&gt;rummaged&lt;/em&gt;. A friend of mine told me that the rummagers liked to carry on a conversation to make you feel more comfortable about their pulling your private things out into the open air for all to see. One of them held up a nose hair trimmer to one of his cohorts and said &lt;em&gt;What the heck is this?&lt;/em&gt; Makes one wonder what other kinds of personal appliances we could "salt" the bag with just to embarrass the daylights out of them, hmmm?&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;My friend told me that he was pulled aside a lot, and started experimenting with "stated professions" that the rummager would not care to talk about. At one point he blurted "I'm a professional bodyguard" to which the rummager alerted like a trained narcotics dog and said "So you would know how to use weapons?" to which my friend simply said "Or &lt;em&gt;not&lt;/em&gt;."  This of course made the rummager gulp and go quiet, but it still wasn't good enough. My friend didn't want them to talk &lt;em&gt;at all,&lt;/em&gt; so they wouldn't waste any time in their rummaging and just get-it-over-with. So at one point he said "I own a funeral home." Which of course, stopped the chatter completely. Nobody really knows how to continue a casual conversation about such a subject.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;The point being, he'd already had his bags electronically scanned at the checkpoint. Do we really need to check it again? And unlike a constraint-based exception handler, the rummager had the option of only picking out random hapless travelers. The exception handler rummages the bags of &lt;em&gt;&lt;strong&gt;every&lt;/strong&gt;&lt;/em&gt; traveler in the line. We can see how utterly inefficient this is. Nowadays, they screen the bags and then don't even check ID again at the gate. Except for random gates on occasion because nefarious people sometimes swap tickets when they get behind the checkpoint. In any case, if we've already exhaustively checked the bags to get the traveler where he is, more checking is a waste of everyone's time. Just like the exception handler. If we just ran the entire set of data through rignorous validation rules, we have no need whatsoever of the transactional exception handling in the database. It will waste processing time.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;And wasting time, we don't have the luxury to do.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;The transactionally-constrained bulk-load of data will be, on average, five to ten times slower in operation than its non-constrained equivalent. If our objective is to achieve a fast load - and trust me - it really is - we don't want constraints turned on. We're talking about loading millions if not billions of records. Even in an RDBMS, we cannot afford to convert what could be a thirty-minute operation into a two-plus-hour operation. The window of time simply does not exist. In some locations, if this kind of window &lt;em&gt;&lt;strong&gt;ever&lt;/strong&gt;&lt;/em&gt; existed, it is rapidly vanishing as their businesses go-global and need to process data as-the-world-turns.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;On the flip side, think about the main reason for a transactional exception - it is to keep a transactional &lt;em&gt;&lt;strong&gt;application&lt;/strong&gt;&lt;/em&gt; honest. If the data does not comply, the user fixes and re-submits. It's interactive, and it deals with a single entity at a time, not millions of entities at a time.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;The "outsider" will now brace on this assertion as well, because they think that having thousands of users interacting with the system constitutes this many-entities-at-a-time, but it simply doesn't. And here's why: RDBMS systems are meant to assimilate data in small chunks with high frequency. They are not designed to deal with large chunks at low frequency (e.g. a batch load once a night). They will &lt;em&gt;&lt;strong&gt;accommodate&lt;/strong&gt;&lt;/em&gt; such activities, but not do them &lt;em&gt;&lt;strong&gt;well&lt;/strong&gt;&lt;/em&gt;. In this case, "well" means loading a million rows a second. The RDBMS cannot approach this.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;And this is the reasoning behind the Rule #10, which is - when loading bulk data, never involve the database in row-level activities. This means, without exception, turn off the exception handling. Because the database will just protract the duration of the flow, checking each and every record and slowing down all of them as a whole, in order to find the few exceptions. It is the equivalent of &lt;em&gt;&lt;strong&gt;making the entire flow suffer for the sake of a few records&lt;/strong&gt;&lt;/em&gt;. this is a bad tradeoff. And once again - didn't we just validate and scrub all these exceptions from the flow, in the ETL/data processing environment? Why are we asking the database to validate them again?&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;And the worst part, is that the potential exceptions are all &lt;em&gt;&lt;strong&gt;anticipated&lt;/strong&gt;&lt;/em&gt; and &lt;em&gt;&lt;strong&gt;known&lt;/strong&gt;&lt;/em&gt;. What does this mean? As a back-end programmer buildng the data flow, we have direct and objective access to each and every failure point that will stop the data from loading. Why would we delegate this to the database, since it is so inefficient in performing it? Note - not so lacking in functionality, because the RDBMS has lots of functionality to perform it. It is simply too &lt;em&gt;&lt;strong&gt;inefficient&lt;/strong&gt;&lt;/em&gt; with bulk loading to be a viable resource.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;So what are the anticipated exceptions? Let's go for popularity:&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;ol&gt;&lt;li&gt;Bad or null data&lt;/li&gt;&lt;li&gt;Unique key violations&lt;/li&gt;&lt;li&gt;Primary/foreign key violations&lt;/li&gt;&lt;/ol&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;In fact, the above constitute the primary reasons for the load to fail. So lets walk through the basic process we would need to follow if we delegate this to the RDBMS database.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;In transactional mode, the RDBMS data load will kick out an exception for each of these it finds. Even if it completes with no error, someone in the room will say -&lt;em&gt;&lt;strong&gt;It took too long&lt;/strong&gt;&lt;/em&gt;. Even if it didn't find any exceptions.&lt;em&gt;&lt;strong&gt;Fix it&lt;/strong&gt;&lt;/em&gt;. &lt;em&gt;&lt;strong&gt;Make it faster!&lt;/strong&gt;&lt;/em&gt; The hard-core transactional &lt;em&gt;engenue&lt;/em&gt; will attempt to optimize it without turning off exceptions, and find that it cannot be done. If the load of one record requires 1 second, it will take 1 million seconds to load 1 million records.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;We just don't have 1 million seconds.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;(Incidentally, in Netezza, the load of 1 record requires 1 second. The load of 1 million records requires 1 second. Use your second wisely!)&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;So our new Enzee will grouse a bit and then look around at the data warehousing sites for answers, and all of them will say, turn the RDBMS exceptions off, load the data, and then turn them back on. The newbie will object - &lt;em&gt;&lt;strong&gt;but wait - when I try to turn them back on, the database yelps and says there are constraint violations. I will have to back the records out and try again&lt;/strong&gt;&lt;/em&gt;. Oh yes, we now have a mess on our hands. In the time it took to load the RDBMS data - say thirty minutes - we have now accumulated errors that might take hours to back out, fix and then retry. And we'll have to do it while the batch-window clock is ticking, not in a pre-process where we had more breathing room. We don't have this kind of time window.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;em&gt;We will never have this kind of time window.&lt;/em&gt;&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;So the next fallback is to fix the exceptions in the data processing realm (ETL tool), prior to loading the data. But isn't this what the data processing realm is for? Really? This means we do all null checking and constraint checking prior to loading. How? We download the primary and foreign keys into the local data processing environment and perform a localized join-filter to remove the exceptions. This is a data warehousing 101 best practice.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;The newbie will now brace on the idea of downloading all the key values. &lt;em&gt;All of them? That could take, well, it could take a long time!&lt;/em&gt;  It will take mere minutes to pull down all the key values. And those mere minutes are nothing compared to the duration of the recovery mess we will endure if we don't take this step.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Pay a little time now, or a lot of time later. Use your time wisely.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;So here is the tradeoff (again in traditional RDBMS space)&lt;/p&gt;&lt;ol&gt;&lt;li&gt;Turn off constraints, load the data, and then deal with the mess after the fact. Plan to spend hours backing out the mess and then running the load from scratch.&lt;/li&gt;&lt;li&gt;Download keys, join/filter the key exceptions, turn off constraints, load the data with the expectation that no mess will arise. (because it won't)&lt;/li&gt;&lt;/ol&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;In short, downloading the keys for constraint checking is a necessary evil. Our only "next best" fallback is to load the data into a pre-target staging table and do the gross comparison there. Then we copy the good records into the target table. But wait - now we've incurred the penalty of the load &lt;em&gt;&lt;strong&gt;twice&lt;/strong&gt;&lt;/em&gt; (&lt;strong&gt;one&lt;/strong&gt; for the ETL to staging and &lt;strong&gt;one&lt;/strong&gt; for staging to target). Isn't it cheaper to pull down the keys once than it is to load all the data twice? Not to mention the fact that the average RDBMS engine does not efficiently copy tables either. So even if we decide to go with loading a staging table, the copy of the staging-to-target will take longer than we are willing to wait.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Think about this: When the data exception arises in (1) above, where will we &lt;em&gt;&lt;strong&gt;fix the problem?&lt;/strong&gt;&lt;/em&gt; In the database, or in the data processing realm? The database can only report the issue, not fix it. If we must fix it in the data processing zone anyhow, why woudn't we fix it &lt;em&gt;&lt;strong&gt;proactively&lt;/strong&gt;&lt;/em&gt; rather than &lt;em&gt;&lt;strong&gt;reactively&lt;/strong&gt;&lt;/em&gt;?&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;So this approach means something even more valuable - if we find the exceptions in the data processing realm prior to loading, we will have found them &lt;em&gt;&lt;strong&gt;proactively and administratively&lt;/strong&gt;&lt;/em&gt;, not&lt;br/&gt;&lt;em&gt;&lt;strong&gt;reactively and operationally&lt;/strong&gt;&lt;/em&gt;.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;This makes a huge difference in the reconciliation of data exceptions when we're dealing with millione or billions of entities.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;And yet another issue our newbie is pleasantly unaware of - data processing on this scale has to be beholden to the constraints of the lights-out operation, administration, and logistical capabilities of the physical plant around the machine. If the operators have to get involved in the data recovery, with data processing on this scale, it needs to be for incidental reasons, not mass data recovery.&lt;br/&gt;In essence, delegating this activity to the RDBMS, is setting up our operators to fail. We will find them entirely intolerant of this approach. &lt;em&gt;&lt;strong&gt;Fix it&lt;/strong&gt;&lt;/em&gt;, they will say. If our answer to them is - &lt;em&gt;&lt;strong&gt;hey, me architect, you operator, so gird up thy loins and get thee to work&lt;/strong&gt;&lt;/em&gt; - we have punted (and dangerously so) something we should take complete responsibility for. Because make no mistake, we will be held completely acccountable for it as well. They &lt;em&gt;&lt;strong&gt;will&lt;/strong&gt;&lt;/em&gt; call us in the middle of the night. They will only help us incidentally. &lt;em&gt;It's your mess, you clean it up!&lt;/em&gt;&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;The primary issue here is that the traditional RDBMS load has to be not only load ready, but &lt;em&gt;&lt;strong&gt;consumption-ready&lt;/strong&gt;&lt;/em&gt;. When we load the data, we have to be completely and thoroughly finished with all data processing before it hits the target table. From there, the user should be able to consume it right away. Load-ready and consumption-ready is the name of the game, and it's accomplished for the RDBMS in the ETL environment, because it cannot be efficiently accomplished inside the RDBMS. The RDBMS is simply too slow and inefficient for any form of bulk operation. And again I say, if the only place to actually fix the data is the data processing realm, it only makes sense to do it proactively, not reactively.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Now let's flip over to the Netezza side of things.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;In the Netezza machine, we can stage the data "dirty" if we want to, and we often do. The data can essentially be copied as-is from its external dirty location directly into the machine with an &lt;em&gt;&lt;strong&gt;nzload&lt;/strong&gt;&lt;/em&gt; to a staging database. From there, we have it in massively parallel form and can use a series of CTAS operations (ELT-style) to cleanse and shape the data. Once we're ready, when can then do a massively parallel join from the incoming table to the target, validating primary and foreign key values &lt;em&gt;&lt;strong&gt;in bulk&lt;/strong&gt;&lt;/em&gt;. Then we just copy the good data and we're done. When using Netezza, it is always faster to let the machine do the data cleanup and integration in a massively parallel, set-based operation (even a series of them) than it is to pull the data out, process it in an ETL tool, and put it back. ETL tools, on average, cannot compete with the massively parallel power of Netezza's engine.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Let's look at what we accomplished with little effort: (1) We cleansed the data of dirt. (2) In a single, massively parallel join we validated unique constraints.(3) In a single, massively parallel join we validated foreign keys (one join per key). The total time to accomplish the second two tasks is fractional, often a matter of minutes even on billion-row tables and billion-row loads. The time for the first task is shrunken too, since we can apply our row-level data scrubbing rules in-bulk with sweeping operations rather than row-level operations.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Case Study Short: Working with a SQL-server based model, the client was loading 15 million records into the database with the bulk loader and the largest machines available. Total time to load - over 2 hours. Tried it again on an Oracle platform, with a top-line 16-core machine with plenty of high-end disk space. This operation took 30 minutes. This was attempted on a Netezza platform, same data, same volume, and it took 15 seconds. There is a contrast, but not a comparison. Nothing adequately compares to a 15-second data load.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;The important takeaway is this: If I can load the data in 15 seconds, I have a luxury of time to perform internal ELT, data scrubbing and integration, key checking and the like in a matter of minutes, still ensconcing the data into the final target table before the other two databases even get started. More importantly, I did it without standing up a formal external ETL tool. All of it happened "under the air" of the Netezza machine.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Now an interesting exercise for the new Enzee would be to actually walk through the processes noted above. In a problem-solving series of exercises, they should get some data that has embedded constraint violations, then attempt to load the data to an RDBMS with transactional constraints turned on, then turn on the creative juices to see how it can be done more efficiently. I would not suggest loading millions of rows to an RDBMS for this exercise, since they are so inefficient at this. Try it with a smaller row-count and then extrapolate the necessary time-to-load. What they will discover is that they will find themselves slowly backing out their precious transactional exception handling to fix the problem another way. The faster they get, the more the the chosen path will start looking very lean on RDBMS capabilities.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;The final form of their solution, they will find, is supported de-facto and in massively parallel inside the Netezza appliance at no additional charge.In the end, they will see why Enzees have run, not walked to a Netezza platform for just this kind of capability. We know they have made the transition when we can hear them having a conversation with another newbie about transactional versus bulk processing, and they are coaching the newbie away from the transactional model.  Ahh, a beautiful thing, indeed.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;This is why Netezza is in no way, no how a transactional machine, and why it doesn't enforce primary and foreign key constraints. These can be installed as metadata, but the expectation is that they will be used by an external, intelligent operation that will leverage them for administrative key validation - &lt;em&gt;&lt;strong&gt;in bulk&lt;/strong&gt;&lt;/em&gt;. After all, I can read the key metadata from the Netezza catalog, formulate a series of validation operations that will work for any table, any key, any time. Install it as a stored proc and invoke it when necessary. This allows me to set up the load operations and prepare the final copy to the target (which is often the accumulation of dozens of operations to integrate the data into a common pre-target table). Then validate the data just before it is finally copied to the target. This keeps me from having to do it a record-at-a-time, or to have an exception processor accidentally execute the operation before I am completely finished formulating the data for the load.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Row-level exception handling is a beautiful thing - &lt;em&gt;transactionally&lt;/em&gt;. If the domain where the exception must be fixed is already the data processing zone, we need to proactively embrace this responsibility and just do it. In the end, row-level exception handling has to be completely removed from our thinking processes. We need to invoke sweeping operations that capture the exceptions in-bulk, not a row-at-a-time. Fix and integrate them in bulk, not a row-at-a-time. Bulk is the name of the game, and always has been.&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:b2bacf86-9c6d-40ce-92ce-1404862ed17c] --&gt;</description>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">practice</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">handling</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">paradigm</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">etl</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">transactional</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">handle</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">oracle</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">enzee</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">performance</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">newbie</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">handler</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">transaction</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">staging</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">exception</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">thinking</category>
      <pubDate>Tue, 30 Mar 2010 14:23:06 GMT</pubDate>
      <author>dbirmingham</author>
      <guid>http://www.enzeecommunity.com/blogs/grill/2010/03/30/rewiring-our-thinking-processes---traditional-rdbms-to-netezza</guid>
      <dc:date>2010-03-30T14:23:06Z</dc:date>
      <clearspace:dateToText>5 months, 1 week ago</clearspace:dateToText>
      <wfw:comment>http://www.enzeecommunity.com/blogs/grill/comment/rewiring-our-thinking-processes---traditional-rdbms-to-netezza</wfw:comment>
      <wfw:commentRss>http://www.enzeecommunity.com/blogs/grill/feeds/comments?blogPost=1130</wfw:commentRss>
    </item>
    <item>
      <title>Decompression... the World Tour Closes, Alas</title>
      <link>http://www.enzeecommunity.com/blogs/grill/2009/09/30/decompression-the-world-tour-closes-alas</link>
      <description>&lt;!-- [DocumentBodyStart:52b2783f-19a9-4970-9c62-598af2121528] --&gt;&lt;div class='jive-rendered-content'&gt;&lt;p&gt;A number of months ago I wrote about how the World Tour Awaits, and all the buzz in the air about the new TwinFin. I was honored to moderate the best practices forums in North America and London, and many thanks to the rather effervescent participation by the panelists. Kudos goes out to David from Brightlight, David from Edge Associates, and Jeff from Quantisense, each of whom have those over-the-top kind of personalities that turn the session into an "experience" more than just a discussion.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;But all in all, the sessions flew like lightning. If any of you have additional questions or insights, may I invite you to post them here on the Netezza community. The discussion never ends, you know.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;It is interesting to note that many of the questions coming from Enzees in every venue, struck a common chord and followed a common thread. In that Enzees are unique and have a rarefied problem and solution domain. And are able to approach it with the confidence of Spartacus in the arena, or Jackie Chan on the streets of New York. Comments often began with, "I have a table with &amp;lt;seventy, eighty, ninety, your number here&amp;gt; billion records and I want to..."  I mean, seriously, those on the outside lookin' in will also look askance at such an opening statement, and marvel at the ensuing, rather casual discussion about it. Nothing is casual about these data sizes, on the outside world.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;It goes like this: Bring it on, baby. Because the question of whether it &lt;em&gt;can&lt;/em&gt; be done is behind me, now I just want to know how to do it &lt;em&gt;well&lt;/em&gt;. The audacity!&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Kudos also for the Enzee crowd members who injected their insights and wisdom into the discussion, freely sharing their technical and political battleground knowledge for the betterment of all. This was not the same as "iron sharpening iron", because at this scale of data processing, iron crumbles. No, this was a lot like titanium sharpening titanium, and was exciting to participate in, to say the least.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Many thanks also to Netezza for inviting me to the tour. It was a whirlwind to be sure, but well worth the ride. Tim, Olga, Courtney and Karina made it easy for me (actually all of us) to participate. Thanks to all for your hard work and a World Tour Well Done!&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:52b2783f-19a9-4970-9c62-598af2121528] --&gt;</description>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">world</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">best</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">enzee</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">universe</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">tour</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">practice</category>
      <pubDate>Wed, 30 Sep 2009 19:11:06 GMT</pubDate>
      <author>dbirmingham</author>
      <guid>http://www.enzeecommunity.com/blogs/grill/2009/09/30/decompression-the-world-tour-closes-alas</guid>
      <dc:date>2009-09-30T19:11:06Z</dc:date>
      <clearspace:dateToText>11 months, 1 week ago</clearspace:dateToText>
      <wfw:comment>http://www.enzeecommunity.com/blogs/grill/comment/decompression-the-world-tour-closes-alas</wfw:comment>
      <wfw:commentRss>http://www.enzeecommunity.com/blogs/grill/feeds/comments?blogPost=1111</wfw:commentRss>
    </item>
    <item>
      <title>Summer 'tis upon us - The World Tour Awaits</title>
      <link>http://www.enzeecommunity.com/blogs/grill/2009/06/22/summer-tis-upon-us---the-world-tour-awaits</link>
      <description>&lt;!-- [DocumentBodyStart:85d13af6-1e96-4017-89a2-235448f3ac7e] --&gt;&lt;div class='jive-rendered-content'&gt;&lt;p&gt;What's heating up about as fast as Summer here in Texas, is the excitement over the upcoming EnZee World Tour.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;I am especially excited this year because I've been tapped to host/emcee the Best Practices sessions in each of the cities, which means that I'll get a front-row seat to hear how the masters of the technology ply their trade and make the Netezza machine sing.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;After all my fellow Enzees - &lt;strong&gt;&lt;em&gt;you&lt;/em&gt;&lt;/strong&gt; are the ones gathered 'round the grill and the ones who make-it-happen. Others of us are often in awe of the rather inspired means and outcomes you so deftly deploy with the technology, and integrate it to the technologies around you.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Of all the questions I hear at a customer site on the basic workin's of the machine, there's nothing like sharing war stories with people who pull all those things together and instantiate an operational environment. Especially when you do it by utterly eclipsing the performance of Netezza's displaced predecessor. And here's where we really want to hear the down-low on how things used-to-be versus how-things are.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;In many cases, I hear that you had an easy time of bringing in the box and making it go. But making the technology go wasn't nearly as difficult as bringing-in-the-box - especially if you have to wheel it past the sneering eyes of doubters or political players who want to see it fail, or at least  - see it be not-so-widly successful as the current expectations might dictate.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;But Netezza really does meet those lofty expectations, doesn't it? And one of the stories we all love to hear is that type of victory - the dark horse so to speak - championing the cause amidst the pressure of anything-but-technology. The odd thing about new, better technologies is that they are so &lt;em&gt;&lt;strong&gt;much&lt;/strong&gt;&lt;/em&gt; better than old technologies that the older technolog&lt;em&gt;&lt;strong&gt;ists&lt;/strong&gt;&lt;/em&gt; cannot believe their own ears. Orders-of-magnitude more power you say? Tish tosh, you must be mad.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;So when we get into best practice sessions, we speak of things like scanning a terabyte, or 2 or 10, and complain that our query can't seem to cross the X-number-of-seconds boundary. &lt;em&gt;&lt;strong&gt;Seconds&lt;/strong&gt;&lt;/em&gt;, mind you. And people hear this and wonder what the complaint really is - after all we can't be working with real data because terabyte-sized table queries always take hours to run, or hadn't you heard this?&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;I recall sitting in on a session with a bunch of people who honestly had money-to-burn. One of them complained that they could not get up to New York often enough, and every time they went their favorite restaurant/play/whatever seemed to be oversold. One of them complained about a broken drawer in his private jet, while another complained about the drafty interior of &lt;em&gt;&lt;strong&gt;one&lt;/strong&gt;&lt;/em&gt; of his summer homes. Still other said that they had spent 150k on custom teak wood in their 140-foot sailboat, and had it all ripped out and replaced because it "didn't look right". Ahh, money to burn. People with a completely different list of priorities than the average Joe like me.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;I say this for contrast, because the things we speak of as Enzees, with the power available at our fingertips in the machine, is utterly foreign to people who have never experienced the power themselves. And it's interesting in best-practice space when we talk about squeezing 9-hour processes into 9 minutes, and then hear our business counterparts wonder if we could squeeze out just a few more. A best-practice balancing act is getting to the solution without over-engineering, and some of you consider this an art form.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;So Enzees, Artists and those who would kick-the-tires, gather round the grille and let's fire up those steaks, veggies and what-have-you - then the only thing hotter than Summer will be the ideas coming off the cooker -&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:85d13af6-1e96-4017-89a2-235448f3ac7e] --&gt;</description>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">world</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">enzee</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">practice</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">tour</category>
      <category domain="http://www.enzeecommunity.com/blogs/grill/tags">best</category>
      <pubDate>Mon, 22 Jun 2009 18:02:28 GMT</pubDate>
      <author>dbirmingham</author>
      <guid>http://www.enzeecommunity.com/blogs/grill/2009/06/22/summer-tis-upon-us---the-world-tour-awaits</guid>
      <dc:date>2009-06-22T18:02:28Z</dc:date>
      <clearspace:dateToText>1 year, 2 months ago</clearspace:dateToText>
      <wfw:comment>http://www.enzeecommunity.com/blogs/grill/comment/summer-tis-upon-us---the-world-tour-awaits</wfw:comment>
      <wfw:commentRss>http://www.enzeecommunity.com/blogs/grill/feeds/comments?blogPost=1090</wfw:commentRss>
    </item>
  </channel>
</rss>

