[ www.netezza.com ]

Gather 'round the Grill

9 Posts tagged with the performance tag
0

In the past several projects, the issue of using views has consistently arisen, as to when to use, not to use and what to expect. Views are one of those mainstay workhorses that we love to hate and sometimes hate to love, but used correctly, can save a world of hurt and lost development time.

 

So we would ask the question - why use a view at all? Isn't the table definition good enough? And what of a synonym? Isn't this just as good?

 

Well, synonyms are handy for configuration management and invaluable for testing. For BI, however, they don't pass-thru the metadata of their underpinning relation's metadata for consumption by the BI tool, so this can be problematic. We also cannot refresh a synonym easily. It has to be dropped and then created in two operations, where view gives us the concurrency-protected operation of "create or replace view" and is muey bien. More on synonyms in another blog entry.

 

Views can easily reach across databases, giving us the ability to stand-up a consumption-point that contains one-part tables and one-part views without having to push data around (very handy for say, a reference database that we want as an on-demand resource of fresh information). I'm a big fan of setting up consumption-point databases so that a user comes to a pre-designated place, not the master repository, to fulfill their information needs. This decouples the user from the master repository and gives us enormous freedom in the ongoing enhancement of their user experience.Views are the vehicle towards this goal.

 

Views also let us do on-demand case/when conversions and typecasting that can be completely encapsulated from the consuming process.


And of course a really cool part about Netezza views - is that we can include as many columns as we want in its "select" clause, the view will not fetch them all, only the one's mentioned in the select that consumes the view - this is a win-win because otherwise it would fetch all of the columns and then drop the majority on the floor to deliver a few.

 

Views have the lightweight nature of a single SQL statement that can be easily installed, where a stored proc often contains multiple SQL statements. Both of these mechanisms serve to hide the logic from the BI tool. But think about this - would we use the stored proc as part of another join? Or would we expect to just select from the stored proc and consume an answer? The more complex the operation, the more we need to just select-and-consume, and take the burden off the BI tool to know more than it has to.

 

A pernicious part of integrating BI tools is just that - expecting that it will know all it needs to know to interact with the Netezza MPP. This is - as you may painfully discover - a false expectation. Case in point, we might have a very creative intersection table between two large fact tables, and we can formulate a query that will browse the information-we-want in mere seconds. Then we plug in our BI tool and ask it to manufacture a query to do the same thing, but it struggles. Now we have to make a call - do we deploy the BI tool in the hopes that later releases will resolve this, or do we install a view or stored procedure that adapt the BI tool to our data model, and then wait for the BI tool to get better in a later release? You see, we can always toss the adaptation when our BI tool gets better. But we cannot allow our user-experience to languish on the same terms. More on this in another essay.


So before I jump into a lot of other things we like about views, I'll address some of the above in their more malignant form.

 

I'll loosely divide views into two buckets - simple and complex. The simple view consumes a single table and may have columnar transforms on it. A complex view, simply put, has more than one table in the join logic.

 

A simple view cannot be easily misused, but a complex view can be misused so easily it will make the head spin on your best troubleshooter. For example, I cannot count the times I've seen a case where a master query joining on a view, which in turn joined on a view, which in turn joined on a view etc. How deep can you go? This is not the issue at all. The issue is in treating the view as though it is a reusable, inheritable object rather than a standalone select-and-consume capability. So where do we draw the line?

 

Transactional thinking - that is - the notion that we can install nested (inherited) views because they handle transaction-at-a-time anyhow and any given instance of them will have a negligible performance problem - is completely washed away when dealing with multi-billion-row scales on a Netezza platform. It's not a transactional platform, so each view potentially initiates a full table scan. Multiply these nested upon nested views and we have nested tables scans - sometimes several separate scans on the same table. Which is more efficient, to look at a multi-billion row table one, or multiple times?

 

One customer had a query that started running very slow one day. We went through a process of discovery to find out what had changed. Seems that a new version of an existing view had
been installed, and the bad query was consuming this view deep under the covers. The bad query and view  were both accessing one of the largest tables in the database, the bad query was now scanning the big table twice, taking a double-hit on the master query itself. Even worse, the changed view did not leverage the big table's zone maps or its distribution key. So a change in one place dramatically affected unchanged functionality of a master query.

 

Because we are embracing economies of extraordinary scale, dynamic objects have a propensity to lose performance integrity over time. What worked yesterday may not work today, so we have to tune it. Netezza is so efficient that this tuning necessity may not arise for years after the implementation. (In one case, four years afterward). By that time, the knowledge of the system's dependencies are not fresh on everyone's mind, so it is easy to make a spot-fix on the view and deploy it. In so doing, we may create a cascading effect for all the other places that consume the view and do so with the expectation of original behavior. In short, the latent nested view architecture is a minefield. We should not implement it because it creates trouble from day one, even though nobody has stepped on mine just yet.

 

At one customer site we had to sift through six levels of view logic to find the performance problem. The customer wanted to know what they should do to fix the problem, but "the problem" was in the overall inplementation and the nested views, not the one bad view, or for that matter, the recent performance symptoms of a minefield implementation.

 

Views can behave as traditional objects if they are single-table views or they leverage additional tables that are small and inconsequential to performance. Don't ever include a big-fat table in a view as part of a performance boosting strategy unless you can designate that the view is in fact a standalone entry point and not something that can arbitrarily participate in the JOIN clause of another master query. Why is this? Invariably we will forget the complexity of the view and then attempt to join it in another operation. For a BI tool, this could be highly problematic as well, because a view that was once simple could spontaneously go complex, and if it affects performance, we'll be pulling our hair out to find the problem through what reduces to a scavenger hunt, or worse, a submarine hunt.

 

Many BI tools simply choke on automatically forming a "complex" Netezza query because there is an implicit assumption of indexes via primary keys, and if these don't exist, the BI tool does the best it can, which in many cases is the least-common-denominator of a query structure. This this doesn't play well on the SPUs for large-scale queries. I cannot count how many times I've seen a convoluted query that we just de-engineered and simplified, and ran an order-of-magnitude faster than the one conjured by the BI tool, yet nothing the tool folks could do seemed to make the BI tool form it the same way. To the rescue: a view that did the right thing - and that was that.

 

What's that? Putting together a view diminishes the flexibilty of the query? Only marginally, and since we're dealing with billions of rows, we don't have much runway for "ultimate" flexibility anyhow. The larger the datasets, the more we need to make sure the queries are as efficiently formed as possible. And since this means as simply formed as possible, we're not talking about BI Tool query engineering, but query de-engineering.


To avoid pain and injury, don't treat all views the same. If we have a complex view, we should tune and designate it as standalone. No matter how much we like its results, it is better not to just arbitrarily include it in another join. the primary reason being - most views are not set up to regard a distribution. So when we include it with our other join, the resolution of distribution might take the form of of least-performing, lower-denominator. We don't want that.

 

One alternative, oddly, is to CTAS - execute such a view in context and insert its data into a temporary table, then use the temporary table in the master join. This affords us the option to (a) leverage the view's normally small output (b) preserve the distribution or (c) align distribution to the next operation (d) simplify the implementation. Of course, your BI tool may not support this, or may support it in an inefficient fashion. Most of the major BI tools will accommodate advanced scenarios, so get your product support rep on the wire and have a heart-to-heart.

 

Yet another alternative is to use the view like an in-line view, except in the where-clause in correlated sub-query. This can often take the form of a where-not-exists clause or the like and can also be very efficient.

 

Another alternative is to break apart the view's logic and assimilate it into the larger view so that all logic is preserved. But you'll be maintaining that logic in two places, right? Not necessarily. We have a lot of view DDL executables that do not directly spawn from a modeling tool. Several of those being in BASH script, which provides for parameterization of logic. If we put the logic into a parameter, then produce the views by including the parameterized logic, we will maintain the core logic in one place (script) but actually deploy two views that leverage it. This is essentially what happens under the covers with many object-oriented environments anyhow. Multiple objects will consume another class and deploy an instance that includes that class, so this approach embraces that inheritance pattern. Not in the dynamic run-time of the view, but in the view's initial DLL-level deployment.

 

MYLOGIC=$( cat <<!
a.limit1 between 50 and 60 and
a.limit2 between 1000 and 50000 and
a.tran_amt < 10000 and
b.employee_id <> 9999
!
)

 

view1="create view view1 as select col1, col2, col3 from mytable a where $MYLOGIC ;"


view2="create view view2 as select col1, col2, col4 from mytable a join yourtable b on a.id = b.id where a.col1 = b.col1 and $MYLOGIC ;"

 

If our modeling tool supports this capability as part of its functionality, and we should leverage it before simply bolting a view into a join. If our modeling tool does not support it, this scripted DDL scenario is easy enough to formulate and leverage without a lot of overhead. The objective: two views that both behave as optimized joins, rather than one view that behaves as a join-with-a-view.

 

Either way, there is a theme here, that simply including the complex view as part of another join's  logic - as though it was a table - is risky and can, even at the outset, offer up such bad performance as to be a non-starter. So a plain-vanilla practice should be to make the complex view behave in a standalone query-and-consume fashion by default. Make no assumptions that it is okay to arbitrarily include it in a larger query's join clause.

 

The further downside is that a de-facto join-with-a-view can work really well at the outset, but the scale of the data can catch up to the even the most robust of implementations, and wiring up the complex view dependencies creates a problem that will not scale, but will only become obvious over time (a minefield)

 

One group invoked a standard for view naming conventions. The simple views would have no prefix at all, so they would look like tables to the casual user. Fair game and all that. The complex views were labeled as v_<viewname> as a cue to a user or report builder: don't use it in the join of a larger query. You'd think that if there was an implicit rule to avoid using anything "v_" prefix that people would play nicely. But not so, since your reporting users may have come from a RDBMS background where it's perfectly okay to mix views into the master query. Awareness of the standard is one thing, but actually embracing it is another. We cannot protect our systems from people who either don't know the rules, don't understand them, or cannot map their experiences from an RDBMS to an MPP.

 

So a suggestion here would be to name the view in a manner that is a departure from common view nomenclature. Calling it an sp_(NAME) might draw the ire of your admins who want stored procs named for what they are, and not obfuscate their names. But if our views are not really common views, and have caveats on their usage, we need a safer naming convention, one that aligns with the goal we are trying to achieve - that of adapting the BI tool to the MPP. One group used a naming convention of "bi_", while another used "rpt_", and still another used the common acronym for their given BI tool. The point is to adopt a convention that is somewhat unconventional, so that those with conventional thinking are able to transform their thinking without finding themselves in a minefield.

 

Nothing is worse than overlooking a minefield - it's a scary view - a view to a kill.

0 Comments Permalink
0

In an ongoing campaign to take his environment to the next level, John (name changed to protect the innocent) started holding orientation sessions for his DBAs, programmers, architects and other technical resources as a means to get-the-word-out or at least get some education into his people - so that they would be more self-contained in the field, as it were, building out their environments.

 

While they weren't using Netezza (yet), they were still using set-based operations in a high-powered ETL environment, so the basic principles were the same. That is, perform operations on whole groups of records at a time, not several operations on one record at a time.

 

Well, as it turns out the primary mental resistance had little to do with understanding the technology, but more in wanting to be "the guy" to break the mold. What mold is that? The mold that makes things feel so "mainframey", with flat files and these "archaic" approaches to data processing that technology "left behind" so many moons ago. Why are we going back to the old days? they lament. Surely someone, somewhere missed the memo that we just don't use flat files! C'mon people! Get with the program!

 

And the more John used his Jedi mind powers to change their minds, the more they dug into this whole notion that they were being dragged back to the dark ages of programming. It might seem a bit dramatic, but many of us have actually seen the techies roll their eyes at the mere mention of flat files, as though we have made a statement more expected from a shaman medicine man or medieval witch. What's next? Put leeches on the machine to cure its bugs? C'mon people! Get with the program!

 

Some fifteen years ago I worked with a group that had some very complex integration issues between their custom application and the Oracle Financials application. They wanted to transport invoicing and other billing issues from their system into OF and have a transparent interaction. OF wasn't as mature back then, so this approach had some challenges, and since the primary interface between the two applications was Oracle's Pro-C, the technical team naturally chose C-language (not C++) to make the interfaces work. A debate ensued on this project as to whether we should "code everything in Pro-C" or use C language at all. The confusion? Pro-C isn't a control language. It's a database I/O specification for C-language. We could always interact with the database through Pro-C, but even the simplest decision-tree operation would still require a formal procedural language, and Pro-C didn't qualify. The Pro-C proponents felt like they had lost the debate, but something more was lost - in all this the technical team was required to use Pro-C for all interactions with the database, including the batch-uploads of the invoices and other instruments. The time for this operation was so egregiously slow that we made a mid-stream decision to deal with the problem more transactionally. That is, mini-batch operations more often than once an evening (some recognize this as "continuous" operation).

 

This only forestalled the inevitable. The volume of data quickly grew so large that the back-end was running continuously and unable to keep up with the front-end. Even worse, the OF functionality was not being used in real-time, even though we'd set it up to behave that way. It all came crashing down when we had to install the system where the volume would crush it. The answer? Flat files.

 

Now you can imagine the outcry from the Pro-C people. Not only had we pushed back on them to use C-language for control, but we were now moving even further away from Pro-C into - gulp - flat files. The gauntlet was thrown down that the entire back-end architecture was flawed and needed complete rework. By this time I had moved on to warmer climes and could watch this battle from a distance, but one of the new engineers called me up one evening to get the down-low on the principals in the environment. He claimed that the whole place was crazy and this myopic fixation on Pro-C would be their undoing. Why? Pro-C is a transactional protocol, not a batch protocol. It invokes the database at the API level rather than performing the leaner bulk-insert operations. Over a year later, the problem remained unsolved so they abandoned the OF interface altogether and started using a third-party provider for their financial reporting.

 

The irony - the third-party provider required them to ship over their transactions nightly. In flat files! (the horror!)

 

Something I continuously but gently point out, is that in the data warehousing realm the flat-file is a mainstay. It's not going away and should not have to. The denial that the flat file is a permanent player ----- is the path to mayhem.

 

Not too very long ago, I had the opportunity to assist an online brokerage in how they assimilated transactions from their various member firms. The member firms could ship their data via a web service, over the internet, manually enter it on the brokerage web site (ideal for small shops) or ship it via flat files. Enormous resources had been leveraged to program, maintain and enhance the automated interactions for all these pathways - except for flat files. They had been treated like a necessary evil. Something to be tolerated until they could be replaced with one of the "more mainstream" methods. Even today, they have not moved away from flat files. And they represent over 30 percent of the brokerage's total transactional volume. Not something to be relegated or trivialized.

 

It all came to a head one day when a clerk called up a leader at one of the upstream member firms claiming that they had not transmitted their transaction file on time. (Note - in brokerage terms this leads to an SEC action, so it's serious business). This claim quickly escalated to the member firm's top echelon and within the hour the CEO of the firm was on a conference call with the CEO of the brokerage firm, evidence-in-hand that they had in fact transmitted the file and would not stand for this. The CEO of the brokerage took a deep dive into the responsibility chain only to discover that the member firm really had transmitted things via flat file, and the brokerage was so lacking in attention to this medium that they didn't even have auditing capabilities or anything to tell them definitively whether the file had arrived or not. The file had arrived but somewhere in the night, it's loading process had aborted for reasons other than data. The poor clerk could only review a morning report, with no visibility to the actual problem. This state of affairs was intolerable.

 

The brokerage CEO mandated audit-style processing and flat-file receipt for everything, not because it's a good idea, but it's the law (SEC-wise), after all. And the outcry from this could not have been more vehement. The techies were being told in no uncertain terms - formalize and institutionalize flat file handling. This was taken no differently than telling the network group that they would have to support every prior version of Windows, including 3.0, just-in-case. Woe is us.

 

But the techies missed the point, in that the flat file is a modern-day marvel in its resilience and capability. As many other types of storage mechanisms have come and gone, the flat file continues, impervious to the changes in technology all around. Flat files underpin every major database storage mechanism, are ultimately the storage form for more recent formats like XML and its derivatives, are scalable on any platform, and have generally stood the test of time. Flat files are like a gallon of gasoline. They are always predictable, reliable, never break, easily scale and can be used practically anywhere - and usually are.

 

So why the consistent resistance to flat files? It's because they don't seem exotic or challenging. After all, it's a flat file. A caveman could do it. Somehow, suggesting a flat file solution makes a person feel like they are not a contributor. Yeah, anybody can whip up a flat file - where's the technical prowess in that? In my humble opinion, the diminishing prowess in effectively using and embracing flat files - especially where they are supposed to be used - is rapidly becoming a lost art form and part of a lost world. Just as we lost the architectures and methods of the wonders of the ancient world, the knowledge of effective flat-file usage is invaluable in enterprise computing. Formalizing and institutionalizing it has extraordinary value, especially when it comes to tracking critical enterprise assets.

 

But in the Netezza space, where do they play?

 

How about data intake? In the Netezza platform we can load data at extraordinary speeds. About the only technology that can possibly feed Netezza at its maximum rate of intake, is the raw physics of a file system. Considering that should we need to extract something from a database, it will ultimately go through the database engine's software layer down into its bowels to arrive at - flat files - and then pull the data, rise back into the engine's CPUs and be delivered, largely via software processes, to the extraction point of the information. Reading from this extraction point will always be slower than reading from a flat file. This is why products like WisdomForce can extract Oracle data so much faster than an interface-level extract. Their Fastreader goes for the data on the file-system level, and has embraced the audacious notion that performance is found close-to-the-physics. Where have we heard that before?

 

If we were to perform a simple test - let's say we pull data from SQLPLUS into a pipe (to eliminate the write-drag of the file system) and then perform a simultaneous nzload from that same pipe, the flow will move only as fast as the extraction. In one particular case, 3 million records (even with a parallel extract) took over fifteen minutes to pull from the database. Netezza's nzload waited patiently, perhaps scraping its virtual nails on its internal chalkboards waiting for the maddeningly slow load to finally finish. In the second version of the test, we extracted the data to a flat file and once completed, performed the nzload. The extraction still took fifteen minutes. The nzload took a few seconds.

 

But think about this - in the first test, the fifteen minutes that the Netezza machine was tied up with an nzload, it could have been doing other things. After all, there are a finite number of load threads we can invoke for work, and this long, slow stream tied up one of them for much longer than it should have. Pushing to a flat file and then performing an nzload, the load thread is only occupied for a short window and is then free for more work. This is a more efficient use of Netezza's interface. Of course, if the box has nothing else whatsoever to do in this fifteen minutes, then go for it. A while back, we installed and burned in a sqlplus-to-pipe-to-netezza intake framework that worked just fine, and its window of operation was during a quiet time. On the flip side, other interfaces had detailed data feeds, most of them integrated to arrive at the same time, and they were all being pushed as flat files. Netezza simply inhaled them - in seconds - and moved on.

 

In yet another venue, the upstream systems were pulling and pushing small data files from their various internet-based sources. All of the data files carried a small part of the same information stream, so we could essentially "cat" these files together into one. The problem was that they were all being written to a common file server, and then other processes kicked off to load these snippets of data individually. Since each one was a tiny file, this created an enormous burden on Netezza. Every load has a finite cost. (Recall, we can load 1 record in 1 second, or 1 million records in 1 second - either way it costs us 1 second). So by formalizing a "collector" mechanism for these files, we could effectively have one nzload load as many of them as were available on the file system as one large stream of data. In this regard, we could "cat" hundreds of files into a pipe and nzload from the pipe. This is a good use of "cat", since it is writing to memory and not back out to the file system, essentially reading and stuffing data into a pipe for our consumption. This alone stabilized the intake protocol and - where the existing implementation had saturated the Netezza machine's interface, the collector freed it up and allowed them to take on even more capacity without additional overhead. Think about the mechanics for a moment. If we have 1500 files, collectively containing 150 million records, we can choose to load 1 file at a time, requiring 1500 seconds (25 minutes), or we can cat the files into a single nzload for a load that took about two minutes. If the issue is load-time and performance, then we need to formalize and harness the way we intend to load all those flat files. Make a design decision, embrace it and institutionalize it for maximum firepower.

 

The Netezza Underground offers up a number of rules (the first ten of which are early in the book) that are specific and non-optional for systems of scale. One of the rules is to use the most scalable mechanisms and assets available, and embrace them as a regular part of everyday data warehousing life. But mention "flat file" to someone steeped in transactional or visual technologies flowing from Redmond or Silicon Valley, and they first roll their eyes. When they see we are serious, they pushback as though fighting-on-principle. When the verdict is in, flat files are here to stay, and they see they cannot win, they update their resume and leave. Their reasoning: they don't want to go back to the "dark ages". You think I'm kidding.

 

We again come full circle to the whole idea that bulk processing is not transactional processing. And with Netezza, the bulk processing is on a sometimes mind-numbing scale. Things that we used to do as neat-clean operations in transactional space, suddenly have no viability whatsoever with data sizes of this magnitude. Which is, of course, the primary reason that people will reduce to flat file operation when the performance starts to lag. Flat files are scalable. Software, not so much. Transactional, for bulk, never scales. Stop now.

0 Comments Permalink
0

One of the questions oft-asked in best-practices sessions and in general consulting: How do we get a "newbie" on-boarded quickly? Some concern usually arises when the new Enzee approaches the Netezza machine with the same thinking processes as with a traditional RDBMS. While there are "gross" similarities, it is the differences we want to leverage, and these are not either/or questions. There is a better way to implement things in Netezza, and a better way in the traditional RDBMS. Mixing the two is not optimum and can be detrimental.

 

The primary discussion fulcrum is simple: One is a transactional database and one is not. Moving away from "transactional thinking" is the key. How to accomplish this?

 

One of the best ways is to discuss and actually demonstrate the primary differences between bulk and transactional processing. As this is largely the crux of misunderstanding, or even the necessary "paradigm shift" our newbie needs to embrace, a significant hurdle it seems, is the newbie's belief that the core engine functionality of their favorite RDBMS is somehow being indicted or set aside as useless. After all, the transactional RDBMS is just that - transactional - and this is what we want the newbie to move away from. What? All that hard-won and industry-hardened capability - and we're just setting it aside? Really?

 

In a word - Yes.

 

It's not that the transactional capabilities are useless. They simply aren't useful in a data warehouse. More importantly, they don't even exist in a Netezza machine. So attempting to shoe-horn transactional thinking into this machine is a huge disconnect - no differently than using a lawnmower as a hedge-trimmer. Netezza is purpose-built. Transactions are missing by design.

 

Now at least one person is bristling because they know, administratively, that transactional support is handy for logging, managing metadata, troubleshooting hooks and other administrative support. I don't disagree with that, but it's not the activity of bulk data processing. It is far easier to set up a smaller database machine alongside the Netezza machine to perform these administrative transactional tasks. Each machine then has an objective role and purpose, and off we go.

 

What are some of the demonstrable ways that we can introduce the new Enzee to this issue, in a manner that really drives the point home? Well, I can't seem to count how many times I've had (sometimes rather contentious) discussions with "outsiders" (or perhaps "purists" ) on the subject of transactional exception handling. Inserting a record into a transactional database, with its glorious constraints turned on, will guarantee that it will pushback on us with an exception. Said exception requiring the dutiful compliance of an exception handler. You know the drill.

 

But in data warehousing, such transactional exceptions are in the way of our bulk load. We don't want the database to examine each and every record as it arrives, potentially formulating an exception (and its attendant overhead) for each record, or passing each one through after its constraint-based integrity check. We just finished taking all that data through a detailed sieve of business rules in the ETL layer, didn't we? The database needn't trouble itself, just load the data, thank you.

 

Now at least one more "outsider" is bristling. How dare you say that we should set aside the constraint-based exception handling? What possible justification could there be for such a gross trampling of RDBMS functionality? Explain yourself!

 

In a word - performance.

 

Storytime: Just after 9/11, the airports over-compensated with all kinds of rigorous shakedown protocols. Travelers had to show a boarding pass and ID at a checkpoint, then keep them handy for just after the checkpoint. And then also for presentation at the gate prior to boarding, along with random bag searching. If you were the first one to board, or made eye-contact with the bag-search team, it was guaranteed that you would be taken aside and your luggage rummaged. A friend of mine told me that the rummagers liked to carry on a conversation to make you feel more comfortable about their pulling your private things out into the open air for all to see. One of them held up a nose hair trimmer to one of his cohorts and said What the heck is this? Makes one wonder what other kinds of personal appliances we could "salt" the bag with just to embarrass the daylights out of them, hmmm?

 

My friend told me that he was pulled aside a lot, and started experimenting with "stated professions" that the rummager would not care to talk about. At one point he blurted "I'm a professional bodyguard" to which the rummager alerted like a trained narcotics dog and said "So you would know how to use weapons?" to which my friend simply said "Or not."  This of course made the rummager gulp and go quiet, but it still wasn't good enough. My friend didn't want them to talk at all, so they wouldn't waste any time in their rummaging and just get-it-over-with. So at one point he said "I own a funeral home." Which of course, stopped the chatter completely. Nobody really knows how to continue a casual conversation about such a subject.

 

The point being, he'd already had his bags electronically scanned at the checkpoint. Do we really need to check it again? And unlike a constraint-based exception handler, the rummager had the option of only picking out random hapless travelers. The exception handler rummages the bags of every traveler in the line. We can see how utterly inefficient this is. Nowadays, they screen the bags and then don't even check ID again at the gate. Except for random gates on occasion because nefarious people sometimes swap tickets when they get behind the checkpoint. In any case, if we've already exhaustively checked the bags to get the traveler where he is, more checking is a waste of everyone's time. Just like the exception handler. If we just ran the entire set of data through rignorous validation rules, we have no need whatsoever of the transactional exception handling in the database. It will waste processing time.

 

And wasting time, we don't have the luxury to do.

 

The transactionally-constrained bulk-load of data will be, on average, five to ten times slower in operation than its non-constrained equivalent. If our objective is to achieve a fast load - and trust me - it really is - we don't want constraints turned on. We're talking about loading millions if not billions of records. Even in an RDBMS, we cannot afford to convert what could be a thirty-minute operation into a two-plus-hour operation. The window of time simply does not exist. In some locations, if this kind of window ever existed, it is rapidly vanishing as their businesses go-global and need to process data as-the-world-turns.

 

On the flip side, think about the main reason for a transactional exception - it is to keep a transactional application honest. If the data does not comply, the user fixes and re-submits. It's interactive, and it deals with a single entity at a time, not millions of entities at a time.

 

The "outsider" will now brace on this assertion as well, because they think that having thousands of users interacting with the system constitutes this many-entities-at-a-time, but it simply doesn't. And here's why: RDBMS systems are meant to assimilate data in small chunks with high frequency. They are not designed to deal with large chunks at low frequency (e.g. a batch load once a night). They will accommodate such activities, but not do them well. In this case, "well" means loading a million rows a second. The RDBMS cannot approach this.

 

And this is the reasoning behind the Rule #10, which is - when loading bulk data, never involve the database in row-level activities. This means, without exception, turn off the exception handling. Because the database will just protract the duration of the flow, checking each and every record and slowing down all of them as a whole, in order to find the few exceptions. It is the equivalent of making the entire flow suffer for the sake of a few records. this is a bad tradeoff. And once again - didn't we just validate and scrub all these exceptions from the flow, in the ETL/data processing environment? Why are we asking the database to validate them again?

 

And the worst part, is that the potential exceptions are all anticipated and known. What does this mean? As a back-end programmer buildng the data flow, we have direct and objective access to each and every failure point that will stop the data from loading. Why would we delegate this to the database, since it is so inefficient in performing it? Note - not so lacking in functionality, because the RDBMS has lots of functionality to perform it. It is simply too inefficient with bulk loading to be a viable resource.

 

So what are the anticipated exceptions? Let's go for popularity:

 

  1. Bad or null data
  2. Unique key violations
  3. Primary/foreign key violations

 

 

In fact, the above constitute the primary reasons for the load to fail. So lets walk through the basic process we would need to follow if we delegate this to the RDBMS database.

 

In transactional mode, the RDBMS data load will kick out an exception for each of these it finds. Even if it completes with no error, someone in the room will say -It took too long. Even if it didn't find any exceptions.Fix it. Make it faster! The hard-core transactional engenue will attempt to optimize it without turning off exceptions, and find that it cannot be done. If the load of one record requires 1 second, it will take 1 million seconds to load 1 million records.

 

We just don't have 1 million seconds.

 

(Incidentally, in Netezza, the load of 1 record requires 1 second. The load of 1 million records requires 1 second. Use your second wisely!)

 

So our new Enzee will grouse a bit and then look around at the data warehousing sites for answers, and all of them will say, turn the RDBMS exceptions off, load the data, and then turn them back on. The newbie will object - but wait - when I try to turn them back on, the database yelps and says there are constraint violations. I will have to back the records out and try again. Oh yes, we now have a mess on our hands. In the time it took to load the RDBMS data - say thirty minutes - we have now accumulated errors that might take hours to back out, fix and then retry. And we'll have to do it while the batch-window clock is ticking, not in a pre-process where we had more breathing room. We don't have this kind of time window.

 

We will never have this kind of time window.

 

So the next fallback is to fix the exceptions in the data processing realm (ETL tool), prior to loading the data. But isn't this what the data processing realm is for? Really? This means we do all null checking and constraint checking prior to loading. How? We download the primary and foreign keys into the local data processing environment and perform a localized join-filter to remove the exceptions. This is a data warehousing 101 best practice.

 

The newbie will now brace on the idea of downloading all the key values. All of them? That could take, well, it could take a long time!  It will take mere minutes to pull down all the key values. And those mere minutes are nothing compared to the duration of the recovery mess we will endure if we don't take this step.

 

Pay a little time now, or a lot of time later. Use your time wisely.

 

So here is the tradeoff (again in traditional RDBMS space)

  1. Turn off constraints, load the data, and then deal with the mess after the fact. Plan to spend hours backing out the mess and then running the load from scratch.
  2. Download keys, join/filter the key exceptions, turn off constraints, load the data with the expectation that no mess will arise. (because it won't)

 

In short, downloading the keys for constraint checking is a necessary evil. Our only "next best" fallback is to load the data into a pre-target staging table and do the gross comparison there. Then we copy the good records into the target table. But wait - now we've incurred the penalty of the load twice (one for the ETL to staging and one for staging to target). Isn't it cheaper to pull down the keys once than it is to load all the data twice? Not to mention the fact that the average RDBMS engine does not efficiently copy tables either. So even if we decide to go with loading a staging table, the copy of the staging-to-target will take longer than we are willing to wait.

 

Think about this: When the data exception arises in (1) above, where will we fix the problem? In the database, or in the data processing realm? The database can only report the issue, not fix it. If we must fix it in the data processing zone anyhow, why woudn't we fix it proactively rather than reactively?

 

So this approach means something even more valuable - if we find the exceptions in the data processing realm prior to loading, we will have found them proactively and administratively, not
reactively and operationally.

 

This makes a huge difference in the reconciliation of data exceptions when we're dealing with millione or billions of entities.

 

And yet another issue our newbie is pleasantly unaware of - data processing on this scale has to be beholden to the constraints of the lights-out operation, administration, and logistical capabilities of the physical plant around the machine. If the operators have to get involved in the data recovery, with data processing on this scale, it needs to be for incidental reasons, not mass data recovery.
In essence, delegating this activity to the RDBMS, is setting up our operators to fail. We will find them entirely intolerant of this approach. Fix it, they will say. If our answer to them is - hey, me architect, you operator, so gird up thy loins and get thee to work - we have punted (and dangerously so) something we should take complete responsibility for. Because make no mistake, we will be held completely acccountable for it as well. They will call us in the middle of the night. They will only help us incidentally. It's your mess, you clean it up!

 

The primary issue here is that the traditional RDBMS load has to be not only load ready, but consumption-ready. When we load the data, we have to be completely and thoroughly finished with all data processing before it hits the target table. From there, the user should be able to consume it right away. Load-ready and consumption-ready is the name of the game, and it's accomplished for the RDBMS in the ETL environment, because it cannot be efficiently accomplished inside the RDBMS. The RDBMS is simply too slow and inefficient for any form of bulk operation. And again I say, if the only place to actually fix the data is the data processing realm, it only makes sense to do it proactively, not reactively.

 

Now let's flip over to the Netezza side of things.

 

In the Netezza machine, we can stage the data "dirty" if we want to, and we often do. The data can essentially be copied as-is from its external dirty location directly into the machine with an nzload to a staging database. From there, we have it in massively parallel form and can use a series of CTAS operations (ELT-style) to cleanse and shape the data. Once we're ready, when can then do a massively parallel join from the incoming table to the target, validating primary and foreign key values in bulk. Then we just copy the good data and we're done. When using Netezza, it is always faster to let the machine do the data cleanup and integration in a massively parallel, set-based operation (even a series of them) than it is to pull the data out, process it in an ETL tool, and put it back. ETL tools, on average, cannot compete with the massively parallel power of Netezza's engine.

 

Let's look at what we accomplished with little effort: (1) We cleansed the data of dirt. (2) In a single, massively parallel join we validated unique constraints.(3) In a single, massively parallel join we validated foreign keys (one join per key). The total time to accomplish the second two tasks is fractional, often a matter of minutes even on billion-row tables and billion-row loads. The time for the first task is shrunken too, since we can apply our row-level data scrubbing rules in-bulk with sweeping operations rather than row-level operations.

 

Case Study Short: Working with a SQL-server based model, the client was loading 15 million records into the database with the bulk loader and the largest machines available. Total time to load - over 2 hours. Tried it again on an Oracle platform, with a top-line 16-core machine with plenty of high-end disk space. This operation took 30 minutes. This was attempted on a Netezza platform, same data, same volume, and it took 15 seconds. There is a contrast, but not a comparison. Nothing adequately compares to a 15-second data load.

 

The important takeaway is this: If I can load the data in 15 seconds, I have a luxury of time to perform internal ELT, data scrubbing and integration, key checking and the like in a matter of minutes, still ensconcing the data into the final target table before the other two databases even get started. More importantly, I did it without standing up a formal external ETL tool. All of it happened "under the air" of the Netezza machine.

 

Now an interesting exercise for the new Enzee would be to actually walk through the processes noted above. In a problem-solving series of exercises, they should get some data that has embedded constraint violations, then attempt to load the data to an RDBMS with transactional constraints turned on, then turn on the creative juices to see how it can be done more efficiently. I would not suggest loading millions of rows to an RDBMS for this exercise, since they are so inefficient at this. Try it with a smaller row-count and then extrapolate the necessary time-to-load. What they will discover is that they will find themselves slowly backing out their precious transactional exception handling to fix the problem another way. The faster they get, the more the the chosen path will start looking very lean on RDBMS capabilities.

 

The final form of their solution, they will find, is supported de-facto and in massively parallel inside the Netezza appliance at no additional charge.In the end, they will see why Enzees have run, not walked to a Netezza platform for just this kind of capability. We know they have made the transition when we can hear them having a conversation with another newbie about transactional versus bulk processing, and they are coaching the newbie away from the transactional model.  Ahh, a beautiful thing, indeed.

 

This is why Netezza is in no way, no how a transactional machine, and why it doesn't enforce primary and foreign key constraints. These can be installed as metadata, but the expectation is that they will be used by an external, intelligent operation that will leverage them for administrative key validation - in bulk. After all, I can read the key metadata from the Netezza catalog, formulate a series of validation operations that will work for any table, any key, any time. Install it as a stored proc and invoke it when necessary. This allows me to set up the load operations and prepare the final copy to the target (which is often the accumulation of dozens of operations to integrate the data into a common pre-target table). Then validate the data just before it is finally copied to the target. This keeps me from having to do it a record-at-a-time, or to have an exception processor accidentally execute the operation before I am completely finished formulating the data for the load.

 

Row-level exception handling is a beautiful thing - transactionally. If the domain where the exception must be fixed is already the data processing zone, we need to proactively embrace this responsibility and just do it. In the end, row-level exception handling has to be completely removed from our thinking processes. We need to invoke sweeping operations that capture the exceptions in-bulk, not a row-at-a-time. Fix and integrate them in bulk, not a row-at-a-time. Bulk is the name of the game, and always has been.

0 Comments Permalink
0

Many years ago someone impressed upon me the need for simplicity in matters of scale. The problem of course, is that simplicity is impossible without power. This is why we see secondhand RDBMS environments proliferate their complexity into a functionally catatonic state, ultimately calling for its wholesale replacement. And upon the call, others swoop in to save the day, saying that they can "replicate your functionality" in another stronger environment that is geared for high complexity, without even once looking askance at the complexity's necessity. By that I mean, the complexity arrived as a function of an underpowered environment, with sweat-labor from engineers working diligently to prop up a fading machine. It means - all that addtional power-propping, is artificial and we need to regard it as a necessary evil. If we'd had the power way back when, the complexity never would have arisen in the first place. So the complexity is a symptom, not an attribute.

 

If we have the power, it gives us the capability to implement functionally sophisticated solutions with ease of maintenance and operation. Functional sophistication is key, because we don't want to dumb-down the functionality just because we don't have enough hardware. Yet the people responsibile for buying us more hardware look at us warily, wondering "You know, I signed off on the last hardware purchase thinking it would be my last hardware purchase. Yet now we are already out of gas and it seems like we didn't get the return-on-investment for the last purchase."  Ahh, but wait, says the engineer, the systems are voracious and so are the users. Adding more hardware is the only way to stay ahead of them. To which the purchaser objects "How do I know that you are making the most efficient use of the hardware you already have? Can you tell me that you haven't tried to optimize the environment?"  To which the engineer skulks away, formulates a plan and spends the next six months wrapping the solution in an engineered cocoon of complexity that offers marginal boost, but a boost nevertheless. The purchaser feels vindicated. "I'll stand my ground next time, and they'll have to go through another optimization! Aha! The key is to get these deadbeat engineers to do their job, and engineer!"

 

And so at the end of this bitter and tumultuous cycle, we have an over-engineered, underpowered machine that nobody is happy with except for the purchaser. who held out until the very end. Often the purchaser is overridden by a super-purchaser, like the CTO or CEO, who finally releases the funds, offers the directives, and the purchaser dutifully though reluctantly complies, certain in his heart that the engineers have at least one more optimization cycle left in them. If you've never been the recipient or the participant in such a repeatable and pervasive cycle, what a blessing to be in the food service or housekeeping industries in these perilous times!

 

In one case, I told a senior leader point-blank that the reason his secondhand RDBMS system was running out of gas, was the proliferation of cursor-based stored procedures draining the lifeblood from the box. He was stunned at this assertion, because he'd been assured by his implementers that this was the right thing to do. The implementers in the room objected by asking if I was "actually suggesting" that they pull the data into a third-party tool, process it, and put it back. Such phrases as "are you telling me", "seriously"  and "everybody knows that" were the commom prefixes of all their objections. Oookay-fine. This doesn't detract one iota from the simple fact that the secondhand RDBMS doesn't process bulk data well, and it doesn't really support third-party environments that require it to (you know, bulk loading and extract). The whole bulk-loading and extract domain is something that secondhand RDBMS systems provide their own utility for, and dogpile one caveat after another against its regular use for operational, lights-out data processing. This is because, as Rule #10 tells us, secondhand RDBMS systems are lousy at row-by-row processing. This has always been true.

 

And think from the perspective of a newbie. Imagine being introduced now to a Netezza machine where it has been specifically purpose-built to support all the things that the secondhand RDBMS's treat as - well - secondhand? Our newbies come to their cubicle with their hard-won tales of derring-do and the many dragons they have captured and thrown down. They drag their canvas bags filled with dragon-fighting gear, empty it on the floor and we marvel at all the tools we once used to be effective. Ahh, the aroma of the dragon's blood wafts from the antiquated instruments of war, reminding us of days gone by, and all that bygone stuff, too.  It's hard to immediately schedule their pickup for the Warehousing Museum on the 7th floor, where all of our stuff is on display and gathering dust. Instead we scratch the Museum's number on the back of a business card and hand it to the warrior for later. Still, all of those weapons and their attendant experience are very valuable, but not in the way one might think.

 

What's a seasoned warrior to do when the machine can devour a data warehouse dragon like a tree shredder, digest it, repackage it and dungeon-ize it for all eternity, and do it as an appliance? Almost like pressing the button on a toaster, but not quite. Our hero rolls up his sleeves and starts slinging row-by-row, cursor-based stuff so prevalent in secondhand implementations. Upon first execution, it runs worse than a dog. It runs like a wet dog. And doesn't smell any better either. The hero scratches his head and tries again. Everything the hero knows about data processing just went "counter-intuitive". The hero mutters "I don't get it". Ahh, but the hero will get it, because the hero is emotionally engaged.

 

This emotional engagement is something that we, as seasoned Netezza folk, should leverage to help our newbies make the necessary shift in their approach. They really need to take ownership of the knowledge, but sometimes (I've been told) it feels a lot like pushing a string (for the seasoned people) and a lot like playing some kind of strange board-game for the newbie. To avoid mental pain and injury, and heavy-lifting on the part of the seasoned people (and the attendant frustration from helping a newbie blossom) the seasoned pro only need to remember the power of the emotional engagement. The newbie will want to conquer the machine. Harness it for the good of all mankind. They will brandish their blades with the familiar shhhhiiiiiinnnngg as it leaves the sheath. But there's only one hero in the room. The big black box. The warrior's emotions are drawing him/her inexorably closer to the final conclusion - the machine works for us. Really works for us. Unlike the secondhand RDBMS machines that once enslaved us. Now we are the master. The new hero does our bidding, and does it well.

 

So our warrior, after many days of working with the machine, finds himself no longer using his old gear. No longer using his old ways. In fact, now stumbling over the gear and getting his feet tangled in the pull-strings on the canvas bag. One day he will pull out the number we gave him for the Data Warehouse Museum on the 7th floor and call for a pickup. On that day, take the warrior to lunch. The transition is complete.

 

What has the warrior embraced - that complexity is no longer the key to winning. We don't have to "do it the old way" and we don't have to figure out a way to shoehorn our former implementations into the new domain. Those implementations were necessary evils, using technology that wasn't geared to support them, rigged for an outcome on an underpowered and overwhelmed environment. When moving to the new environment (whether it's a newbie learning the ropes or a large-scale migration of a secondhand RDBMS into the big black box) we don't take the prior implementation with us. We don't take the prior table structures, cursor-based processes, or anything else about the original implementations. If artificial complexity means that even part of the implementation is suspect, then all of the implementation is suspect. We have a new way of doing things, a new machine to do them on, and the outcome will be so much better if we now do things the way the machine best performs, rather than doing things the way some other secondhand machine performs poorly.

 

And this is the difference between complexity and simplicity. In an underpowered secondhand RDBMS, the complexity props up the weakness of the technology and is a symptom of weakness in every way imaginable. Simplicity on the other hand, is the mark of strength of the technology and its ease of implementation and maintenance.

 

For a (human) hero, complexity is the reason for existence and simplicity is for the simple-minded peasant. After all, the peasants do the (back labor) work of the field and save the thought labor for the feudal lord and his lackeys. But one can see the parallel immediately - the secondhand technology is actually a feudal lord, its high-priced product engineers are its lackeys, and we, my friend are its indentured servants. Quite the converse for Netezza folk - we are the feudal lord, requiring no lackeys, and the machine is our indentured servant. The machine works for us.

 

I cannot count how many times I've been approached (even in the hallway of a Netezza conference!) of people asking me about re-engineering their existing systems for Netezza. No, my friend, we don't re-engineer, we de-engineer. There's a huge difference.

 

What does simplification buy us? We can now move away from the raw complexity required to prop up a lack of performance, and move toward the sophistication required of a competitive solution. The sophistication is key, because anyone can build a simple system for simple business reasons. But when the complexity of the environment hinders us from the next level of functional sophistication, the complexity has now enslaved our business model, and effectively the business itself. Simplified implementations are stable and adaptable, so can scale to breathtaking heights, giving us the  necessary edge for competitive, sophisticated offerings that aren't even possible in the secondhand technologies.

0 Comments Permalink
0

Manhattan Skylines

Posted by David Birmingham Mar 4, 2010

Marcus Gray watched in consternation as the viral program cranked up. He knew that in moments the band of hackers would once again take over the Manhattan power grid. For now, they were doing it as a prank. But he also realized it could be a test run for something even bigger. Like a grid-by-grid shutdown of the entire system, opening the door for untold mayhem on the darkened streets.

 

Moments later, messages from the hacker gang started appearing on all their terminals. Taunting barbs letting everyone know that they were in complete control and nobody could stop them. Gray shook his head and closed his eyes, hoping that this would pass quickly. Losing power even in one part of the grid could spell pandemonium and place lives and fortunes at risk. The weight on his shoulders was crushing.

 

"I think I can help," said a voice from behind. Lane McBride from the Federal Counter-Terrorism Unit based in Manhattan, leaned over to regard Gray's terminal.

 

Gray turned to the voice, recognizing it with hope in his eyes, and said, "They're at it again."

 

"I saw the precursors," McBride noted, "That they were entering the system."

 

"Yeah, but it doesn't matter if we can't find exactly where they are," Gray sighed, shaking his head, "They're in a hundred different buildings, including the Empire State. You guys have agents standing by at all of them, but they have to search the buildings floor-by-floor to find them. The problem is, we have to shut down communications for the building so that they can't warn each other. So even if we could catch a few, do you have any idea how long a floor-to-floor search takes in the Empire State? We can't keep that building offline from communication for that long."

 

"Not to worry," McBride grinned, "I have an algorithm that will directly pinpoint their floors. All we have to do is send our officers up to the floor, and I bet we can round them up in minutes."

 

"Wow," Gray whistled, "I'd like to see that."

 

McBride whipped out a flash stick, plugged it in and let the program do its work. Within seconds, it had pinpointed each hacker, the building their signal was coming from and the floor of the building. "Here we go."

 

"I like it," Gray grinned.

 

McBride touched several buttons on his phone and dispatched the information, and monitored as each of the officers acknowledged the information and the plan. "We'll know soon enough."

 

Gray noted, "The problem has always been that they could hear us coming and could shift floors anytime they wanted."

 

"Not this time," McBride smirked, "At least, not if we do it right."

 

The first officer to report back was from the Empire State. Two of the hackers had been stationed there on separate floors. Both were now in custody and unable to warn their cohorts in the other buildings. Gray listened in awe as one by one, the officers reported in, having captured their respective quarries with minimal effort.

 

"That was brilliant," Gray stared at the screen as the weight seemed to lift from his shoulders, "How did you come up with the algorithm?"

 

"Simple process of elimination. I just looked at the problem from a very-large-scale search. The most important information is where the perps aren't - not where they are. The algorithm zones in on the candidate floor by understanding which floors are not candidates. Process of elimination leads the way. So we can search the Empire State and Chrysler buildings just as quickly as a single-story, capture the floor number and we're done."


---------------------

Some of you already see the parallels. It's how a zone map works. But how does it apply?

 

When we take a look at the Record Distribution option in the Netezza Admin GUI, we're often happy with a "ragged edge" for all the SPUs. And a "flat top" is the ticket. But what about the case of a "Manhattan Skyline", where we have high peaks and low valleys? This is higher than normal skew (something we're supposed to avoid, right?) People see those and shun them. However, these are often the natural result of an intermediate table produced by an ELT operation, and often a result of multi-pass queries in a BI tool. These usually leverage the mainstay workhorse CTAS (Create-Table-As-Select), so in many cases, people are tempted to turn on "random" for all CTAS operations. Or just maybe - one of our regular static supporting tables is deliberately distributed as a Manhattan Skyline just because we want to regularly perform co-located joins with it on larger master table using the same distribution key.

 

In any case, a primary reason we would get this kind of Manhattan Skyline distribution is if we are trying to preserve an existing distribution in order to perform a follow-on operation with tables on the same distribution. Whew! And why would we allow this to continue? Isn't a random distribution better than a Manhattan Skyline? Our problem remains: if the table has such a Manhattan Skyline distribution, we have higher than normal skew. Any full-scan on the table will cause the query to perform as slow as the "tallest bar"  (the SPU with too much of the table's data). As the table grows in size, the problem worsens. It is not a scalable distribution in its latent form, so don't embrace one without a plan.

 

Well, random distribution has a risk too, especially at the BI level, of negatively affecting concurrency performance. Even if our individual queries are not hindered by the data-broadcast incurred by the random distribution, they could just be a one-hit-wonder, because running many of these operations side-by-side can sometimes saturate the inter-SPU fabric, affecting concurrency. If we can keep the processing on the SPUs, we can avoid this problem entirely. So the issue is one of user scalability, something that all of us care about and that the other vendors (sometimes) turn a blind eye to. Netezza has it covered, and as usual, it's so simple a cave man could do it (now I'll get mail!)

 

So now we have two options, neither of which seem good - (a) keep the Manhattan Skyline distribution or (b) use a random one. Let me say that random is not always bad, but it poses a potential danger for concurrency. Likewise the Manhattan Skyline can often be a latent result of an intermediate CTAS so is unavoidable anyhow. And why would we want to preserve an existing distribution on a CTAS? The answer - because it will be a co-located write (blazingly fast). But wait! Don't we get a co-located write by default?

 

Maybe.

 

I have noted in prior posts how the default distribution for a CTAS might not be what we want or expected, so here's a quick recap:

 

(a) For simple single-table CTAS, it will preserve the source distribution key - (co-located write)

(b) For simple multi-table-join CTAS, it will leverage the first column result in the "select" clause (maybe a co-located write)

(c) For CTAS using summaries/group functions in the select, it will leverage the columns in the "group-by" clause (rarely a co-located write)

 

If any of the above are not the original distribution of the source(s), we could inadvertently sacrfice our co-located write. But we can preserve it if we specifically use "distribute on" with the CTAS execution. With co-located writes, this means the data never leaves the SPUs. If we distribute the CTAS on anything else, the data must leave its current SPU and find its way to another one. This initiates a data broadcast (and can negatively affect concurrency). Preserving the distribution, we get the benefit of a co-located write (avoiding broadcast to make the table) and set up the next operation for a co-located read (also avoid the broadcast to leverage the table). Short answer: preserving the distribution preserves concurrency performance. Now the SPUs are working for us at physics-speed.

 

Rather than just live with the latent effects, lets embrace and harness them for the good of all mankind. Well - er -  at least for our user base.

 

What we really want is threefold -

 

(1) preserve the distribution with a co-located write (preserve concurrency, potential Manhattan Skyline as latent artifact)
(2) leverage the result with a co-located read (preserve concurrency, potential penalty from Manhattan Skyline)
(3) mitigate the Manhattan Skyline with a zone map (ahh, best of all worlds)

 

So to get the first two, we can simply preserve the distribution with a "distribute on (key)" clause and make sure the distribution key is part of the "where/join" operations.. This is the simple part.

 

To get the third, we need to either (a) sort the data as it is created, or (b) make a materialized view after-the-fact to get the zone map effect for selected columns. The first one (sorting) is often easier than it sounds, and with strongly filtered intermediate tables is also very scalable. The second one (materialized view) has some caveats but is very fast to create. What does the zone map actually do? It effectively stripes each SPUs portion of the table so that only the section in the zone is actually addressed. Like McBride's algorithm, it's as though the rest of the data isn't even there, because the zone map has guided the optimizer to completely ignore it. So whether the SPU's data has a tall bar or a short bar, the performance is the same. We need all three of the above and the zone map mitigates the potential problem of unexpectedly high skew from an intermediate distribution - or an outlier table that we need to distribute on a common key. Even if (1) and (2) above give us a good distribution today, it could always "go Manhattan" in the future.

 

Another obvious question is "If this is an intermediate result, why bother? Just filter out the stuff I don't want and then there's no issue, right?" Well, technically yes, for a single operation, but I know of at least a dozen cases where the intermediate table is used for a lot of downstream activity, not just a one-off throwaway. So our stewardship rule is: make the data better. For the next downstream process or the ultimate data consumer, the data should get better every time we touch it.

 

Rather than rewrite or re-design a carefully tested and detailed process, adding a simple "order by" or MV is easy and preserves the existing logic, and data model, with little impact and high return. This is especially true of a static supporting table, because we can install what we need on the table's creation. The consuming processes all benefit from it with no more than regular query execution (materialized views are transparent).

 

In the end, we can still leverage the plain-vanilla parts of the Netezza performance model (zone maps, co-location) without having to over-engineer the data using indexes, intersection tables or summaries. This preserves something more  - the ongoing resilience and adaptability of the model itself.

 

Recap:

 

  • Apply the "distribute on" clause of the CTAS to avoid the latent effect of default distribution.
  • Preserve co-location for reads and writes in intermediate tables.
  • If a potential Manhattan Skyline distribution is the CTAS result, rather than go random, sort the CTAS result by a selected column or use a materialized view.
  • As always, apply strong filters to the CTAS creation so that it's not simply copying one table's contents to another (carve the data out).
  • Experiment for the best fit, but remember that Netezza is an appliance.
  • We don't need to engineer the queries, only apply simple performance model alignments in the data itself, to leverage the machine's physics
0 Comments Permalink
0

Famous words, or some such like, uttered by Orson Welles as he launched into a scary parody of alien terror on national radio. Really scary for some. And proferred on Halloween night in 1938, so dare I say, 'tis the season (almost).

 

Ahh, not to fear, this purports to be a painless foray. But I do have a story to tell.

 

Several projects ago (I always start this way, so you won't think I'm talking about you!) - I worked with some really sharp data engineers on boiling out a solution for retail operational reporting. The data arrived every five minutes or more, or less, and sometimes in parallel loads, with 24x7 regularity. More and more Netezza implementations are going this way, and you too, should look into processing data at the speed of thought. In any case, the reporting users wanted to plumb the depths of this data store, to the tune of eighty billion records and growing. (Okay, small I know (for some of you) but humor me).

 

Well and good, except rather late in the game, the reporting users spontaneously expressed a desire to review the detail through metadata-based "lens", that is, set up some drilling levels and other metadata-based entry points, such that the entire operational model would be seen through this reporting "lens" and it would provide all the context for the consumers.

 

Now, such a model as described, would require such enormous power from a standard SMP/RDBMS-styled system, that we might well cause structural damage on the raised floor for sheer physical weight of said system. That is, if we really expected a report to return within a day or two of the request. Ahem! as I facetiously clear my literary throat.

 

But the worst-case for any given query for the above was around 8 minutes, and over 99 percent of the thousands of queries submitted, returned in less than 30 seconds. Oh, yeah, it was smokin' hot. In most queries using zone maps and the like, we saw returns in mere multiple seconds. Pshaw! Says the tick-tock-man, chocolate and vanilla, don't waste my time.

 

However (and there's always a catch) many of the larger reports were actually conglomerations of these smaller queries, and their aggregate time would occasionally exceed ten minutes or more. And even though this was a far cry from the "days away" we would expect from an SMP/RDBMS system, it was still 'too slow' for the users. Now, this is true adrenalin-junkie stuff, sort of like the old Far-Side cartoon of a young man standing with a fork in front of a waffle iron, captioned "Wendell Zurkowitz, slave to the waffle light". I recall how one man noted that many years ago we would wait hour(s) for a traditional oven to finish cooking, and now get impatient when the microwave instructions are greater than five minutes.

 

Perspective.

 

And rather than punt to the users and say, "Hey guys, this is just unrealistic" and degenerate into "expectation management" - the challenge was to actually achieve faster turnaround times on the reports. And here, I'm talking about getting these ten-minute reports into the 30-second zone. Would we have to embrace some extreme engineering for this feat? Methinks not - but the form of the process to get there was quite instructive.

 

Now recall I noted that the above model had operational tables, which were to be the detailed source, and a retail reporting hierarchy that was largely metadata-based. This reporting hierarchy had some significant size as well, perhaps a fourth the size of the eighty-billion-record fact table it had to link into. Yet both of these were on separate distribution keys. Queryng one meant broadcasting another.

 

And now, for broadcasting.

 

Whenever two tables are distributed on different keys, a join between them cannot be initially co-located. To support the co-location, Netezza will broadcast the salient information from one table's context to the other. This means the physical data has to move from its home SPU, out onto the inter-SPU network fabric, and find its way to the target SPU where it will be further examined. Broadcasting for small tables is inconsequential and barely a blink on the radar. For larger tables it can have strange effects. For example, we saw one query return consistently in ten seconds. Yet when running side-by-side with itself (multiple users) it could take several times longer.


The reason is that both queries were competing for bandwidth on the inter-SPU fabric, among other things. The simplest solution, of course, is to get our metadata table distributed on the same key as the operational tables. The problem was simply in the complexity of this metadata table and how it mapped to the core information. "Blowing it out" into a materialized form of information would require significant planning and design, because a misstep could easily make the reports turn out wrong, and this was unthinkable. In all this, the maintainability had to be considered, because if our initial complexity is too high, the maintainability is in jeopardy - by design.

 

Of course, we would spend most of our time in testing this scenario. Coding and implementation in most BI shops is a nit compared to the testing we have to execute to validate the outcome. Netezza is no different, except we can close the testing loop sooner if we have more power. And of course, for something of this magnitude, to test the change from minutes to seconds, we would need a powerful machine to measure the difference. Whenever we ran the new solution on a smaller machine, the difference couldn't even be measured. No, the power of the machine makes the testable difference visible and measurable.

 

As I noted, the form of this exercise was the most instructive part. Rather than form a means to align these two tables for co-located joins, the first effort was in attempting to tune the queries. You know, "query engineering", which is the mainstay of performance engineering on an SMP/RDBMS platform, and old habits are hard to break. The data engineers were somehow in denial that they would receive extraordinary power from configuring the data. Rather they trusted their instincts and chose to attack the queries.

 

Now, in any platform, regardless of shape, size or vendor, power is always and forever the domain of hardware. Software cannot manufacture more CPUs or network speed. If the physical plant is not ready, the software can only use what it has at its disposal. The software itself is largely a cost center, because it can only drain the machine's energy through inefficiency. In an SMP/RDBMS machine, the only option we have is to engineer the queries, because the physical plant is configured to be general purpose.

 

In a purpose-built machine, however, the query is simply a controlling mechanism to Netezza's resources. The host will chop it apart into snippets and dispatch these to the component that they will serve. Extreme query engineering on the other hand, assumes that jockeying around with the query can actually affect our fate. (contrast; a poorly written query is different from directly engineering a well-written query). And besides, do we really want to spend our time carefully engineering the query to the point of functional brittleness? In an SMP/RDBMS machine we will see queries that extend for tens of pages in a very daunting complexity. Maintaining these is a full-time job for our consultants. They swarm on the machine, and carefully tune their handiwork to avoid breakage.

 

Yet, we purchased a Netezza machine to get away from this complexity. To reduce, clarify and simplify our administration and consumption of the data. So as I watched these engineers bat themselves against the problem, no differently than a fly batting against a window, I watched them pull out their hair in generous tufts when little they did offered the significant gains they expected. This outcome was entirely counter-intuitive to their training. They were acccustomed to using and tuning software to make things work faster.


Sweeping the hair from the floor one evening, I mentioned (for the x-teenth time) that the broadcast effect was killing them. Once our engineers grasped the broadcasting problem, I thought we would make headway, but things actually got worse. They started trying try to control the broadcast as the root cause rather than the symptom. In one test, I saw one of the largest tables leap into a broadcast and we just killed the query outright (it would probably still be running, even today). The engineers lamented: How do we make sure the larger table doesn't broadcast? How do we control the broadcasting to our benefit? Answers exist to all of these, but it's like talking to a drug addict, one who is addicted to the drug of SMP/RDBMS and claims he can 'quit anytime'.

 

And then the truth came out, "David, if we can make this 10100 machine process data like a 10400 machine, we'll look like heroes!" To which I ask "How?" to which the response is: "We can save them all that money they would have spent on the hardware..." Well, not really. You've just chosen something else to spend the money on, namely performance engineering, the cost of time-to-market, the cost of a marginal implementation and the cost of human labor (the most expensive asset you have, by the way). But since the only way to get a 10100 to perform like a 10400 is to actually be a 10400, well, you see the futility. 432 SPUs versus 108 SPUs? And they really, truly thought they could - I mean - seriously. Let's keep in mind that the opposite is true. If we can't make the 10100 process data like a 10400, perhaps our approach is flawed? Heroes or goats. Take your pick. In my estimation, there's only one hero in the room. The big black box.

 

So the broadcast is the symptom, not the root cause. How about, we quit broadcasting, cold turkey? Take the data model through a detox program and the engineers through a series of deprogramming seminars to - well - it's not that bad. Typically the average engineer only has to see it operate in an adverse manner to become a believer. But a believer they must be, or they will not take action to correct the problem, correctly.

 

So one of them finally decided to produce a map table, one that would map the metadata into the operational tables such that all core joins would become co-located, with a common distribution. And lo, the first test of this blew their minds. Even the complex reports were now coming back in single-digit times, and the reports that had been running ten minutes or longer were now under a minute, even with multiple users. In fact, they saw the performance and scalability practically handed to them - simply because they configured the data correctly. It had little to do with query engineering.

 

Now one may ask the obvious question, and please do so now: Why don't you just build out some user-facing tables and forget leveraging the operational tables? After all, we don't build our non-Netezza reporting systems on top of operational data, do we? We build-out dimensional models and other handy structures to postively affect the user experience and simplify the flow (and the maintenance). This functional decoupling is a mainstay of reporting environments. (Okay, the next entry will focus on this). But in this case, suffice to say that the owner of the machine had placed down a hard-mandate on disk utilization. At no time could we foray into replicated detail, or even summary of detail without a plan to access the operational detail on a drill-down and the like. Interestingly, the required reporting tables would have only cost mere fractions of the cost (on disk) of the time/labor and effort put into making the operational tables viable. This is why it deserves its own treatment in a separate rant - er - essay. Stay tuned, and don't touch that radio dial.


Back to the drama - A telltale symptom that we're doing something wrong, is when we start down the engineering path. It's an appliance. We don't engineer toasters, blenders or laundry machines. But the difference here seems to be subtle. It's not. In this case, the culprit was the broadcast, something to be eliminated rather than managed. And no amount of creative query hoop-jumping would overcome this. Get the joins onto the SPUs. It seems obvious to those who have been around the machine for bit. But for those who have not, the learning curve is upon them. Be patient with them for as long as it takes to get it right. Once we have a believer, we'll never have the conversation again. As long as we stay in a theoretical zone, however expect them to stay in the spin cycle. This is like many things scientific. Seeing is believing.

 

Whenever I (and others like me) observe a ritual of performance engineering, each participant holding out the hope that "just one thing" will offer stratospheric boost so they can all wipe their foreheads and go home - this is the surest sign of one of two things: Either the data is poorly configured and is causing the queries to be ineffcient, or the data is properly configured and the machine does not have enough physics to achieve the goal. If the focus is on query engineering, they are wasting time. If the focus is on data engineering, at some point it will reach a "diminishing return". Either the machine has the power or it doesn't. Time to switch to Netezza, or if using Netezza, time to add some physics (a frame or two) to make it happen.


Moral of the story: Performance is found in the physics, not the carefully engineered queries. If we find ourselves "engineering" our queries for performance reasons - we should take a step back, take a deep breath - click our heels together and say softly: "There's no power like SPU power. There's no power like SPU power." Repeat as necessary.

 

And pay no attention to the man behind the curtain. I'll bet he and Orson Welles never even met.

0 Comments Permalink
0


As the sunrise peeked over the horizon, it cast long shadows over the four cars awaiting the break of dawn. Stretching before them, the expanse of the salt flat beckoned, nay taunted them, to accelerate across its ancient surface. Not caring for the winner or loser, it merely provided a level playing field for them to test their wares and technology. But yawned at the futility of the race itself. The salt flat had always been, and always would be. Come one, come all, it invited daily, almost mockingly.

 

The leader for team-Exa sat in his racer's driver seat, eyes closed. When he felt the warmth of the morning touch his face, he raised an eyelid to examine the time. Now thirty minutes from flag-down, the sun would still be at his back when he won the race. And he would win the race.

 

The lead for team-Terra pushed back into her driver's chair to stretch her legs as her eyes fluttered open. She glanced toward her left to the Exa racer, gleaming in the morning sun, and then to her right at the NZ racer, its plain black lines and nondescript exterior, she knew, hid the power under its frame, and was nothing to be trifled with.

 

The fourth car on the end, entered in the eleventh hour was a plain vanilla Volkswagen Beetle with a rocket engine attached to its backside. No frills, no nonsense and nothing hidden. Five men from Redmond had delivered it last evening. They hadn't even had time to take a test run on the flat.

 

Minutes later all four drivers and their lackeys met in front of the four cars, partly to wish each other luck and partly to offer last minute trash-talk. Dominic Toretto, the driver of the NZ machine, ran his hands over his bald scalp and rubbed it vigorously, as if massaging the sleep from his head, then yawned and said, "Okay gentlemen. We're fifteen minutes from flag-down. Anyone want to back out? I swear we won't hold it against you."

 

"Dude," laughed Excel, the driver for the Redmond machine, "In your dreams. I have investors watching."

 

"As do I," smiled Tara, the only female driver, and would command the blue-streamlined Terra racer, named for its ability to master the earth and its elements. "We're all in this for keeps." She batted her eyes and tilted her head flirtatiously, "You want to see under my hood?"

 

"Out here in the open?" Toretto laughed, drawing chuckles from the others, "Sure, let's see what you have."

 

She ignored the innuendo and pointed her keytag toward the Terra racer and pressed a button, causing both side doors to slide away and the hood to pop open. Toretto strolled over to examine the engine. He'd seen these before.

 

"Lot of power under that hood," he quipped.

 

"Yeah," she said, expecting a bit more enthusiasm for her machine. She wouldn't find it among any of these drivers, though. They lived and breathed adrenalin, and knew as much about her machine as she did. And weren't in denial about its weaknesses, either.

 

"Looks plain," said Jeff, driver for the Exa-car, "And as you can see, not enough control."

 

"So let's look at yours," Toretto said, a twinkle in his eye.

 

As they sauntered to the next car, Jeff's lackey whispered in Toretto's ear, "We've radar-mapped the entire flat between here and the finish line. Every bump is programmed into the machine. You think that's a competitive advantage?" He slapped Toretto on the back and laughed loudly.

 

"Bumps don't matter," Toretto muttered, with the strength and experience of someone who would know.

 

Jeff spun to face him, "What was that?" he laughed, "Bumps don't matter. Did you hear that?" he looked around him to the others, with his lackey already laughing, "He says bumps don't matter." He crossed his arms, "Would it matter to you if I said that ignoring bumps at these speeds is like a death wish?"

 

"No."

 

"No, what? No it won't matter what I say, or bumps still don't matter?"

 

"Either way," Toretto said with a wry grin, "Bumps don't matter."

 

Jeff threw up his hands in frustration as Toretto poked his head into the Exa-racer's driver side window. Jeff asked, "What do you think, huh?"

 

Toretto examined the interior, laid out like a Boeing 757 cockpit. Three LCD screens loaded with controls and meters, flashing lights all around the dashboard and dozens of knobs and gears. "Got a lot of moving parts," Toretto sighed, "Think you'll need all that?"

 

"No more, no less," Jeff said, "Our investors are very demanding. All the tires and wheels are measured for pressure and impact, the dual-redundant monitors compensate for any detected differences, and the pre-mapped radar anticipates every bump and turn."

 

"It's a salt flat," Toretto grinned, patting him on the side of his shoulder, "There are no turns. And bumps don't matter."

 

Jeff nearly bit his tongue, but instead smiled and shook his head while Toretto continued his examination.

 

"Looks to me like," Toretto finally said, "You decked out the car just for this ride."

 

"Yeah. So?"

 

"Well, it might work for a salt flat under controlled conditions, but it's not streetworthy."

 

"We're not testing on a street," Jeff fired back, "All that matters is who makes it to the other side."

 

"Really?" Toretto raised an eyebrow, "You think people will be knocking on your door to buy a few of these to come out here to run on salt flats?" He laughed, "Your investors will expect to see the performance you show here," he pointed toward the West, "Out there. Or they can't make any money. Optimizing your car, just for this test, doesn't mean anything."

 

"We'll see," Jeff snapped.

 

"I'd like an assessment of my car, if you don't mind," said Less, the driver for the Redmond car.

 

Toretto simply said, "Not much different from the Exa. Except you don't make any bones about the fact that you've strapped a jet engine to an underpowered car. You think those wheels and frame can handle the stress of the race? We'll see how you do on the flats. That's all I can say."

 

"Gentlemen," intoned a voice all around them, coming from well-placed speakers, "We're five minutes from flags-down so anything you need for warm-up, do it now."

 

Jeff punched a button on his keytag to remotely initiate his computers into a final pre-race system check. Toretto slowly strolled back to his car, opened the door and flopped into the driver's seat. His lackey Mark, younger than he but the sharpest of his crew, brushed back a long black lock of hair and positioned it over his ear, then silently joined Toretto in the passenger seat. After Toretto punched several buttons to initiate the engine, Mark  could no longer hold it in.

 

"Don't you think we're about to get smoked here?" Mark said, glancing to the Exa car, "I mean, radar mapping, all those controls and - I mean - "

 

"I know what you mean," Toretto said casually, engaging the first gear, "Just trust the machine."

 

"I know what your philosophy is," Mark sighed, shaking his head, "Put it all under the hood, make it self contained, but what if you need to get creative in the middle of the race?"

 

"Would one of our customers have the option to get creative?" Toretto asked, allowing the car to roll ahead to the starting line. "Do we let them add stuff to the machine? Do we require them to know a lot about what's under the hood?"

 

"No, but -"

 

"But what?"

 

"I don't know what! It just seems like they have more, you know, more -"

 

"More what?"

 

"I don't know what! It just seems like more."

 

"More to break. More to maintain and watch - when the real mission is to go fast on the flats. And everywhere else."

 

"You think we'll win?"

 

"Trust the machine."

 

Presently a racing judge appeared with a flag in each hand, and took his place between the two middle cars. Watching the clock count down, he raised the flags high, then started counting down loudly.

 

"Hold on to your chair," Toretto mumbled, "It's a little rough out of the gate."

 

"I'm ready," Mark said, holding tightly to the chair, pushing against the floorboard to press his back into the chair's leather. He'd made the mistake of eating a meal just prior to the first test runs the week before, and had spent an hour cleaning his half-digested meal from the dashboard and interior windshield. This time, he'd fasted for twenty four hours. Nothing remained in his stomach, he was sure of it.

 

Over in the Exa-racer, Jeff had strapped himself into his seat, and his onboard systems had just finished its run-through only seconds before the flags would fall. The carefully tuned machine would master the flats today. The machine, and his name, would soon be synonymous with extreme speed and power. He would win this race. He was sure of it.

 

Each driver sat in breathless anticipation as the judge counted down to zero, and watched almost in slow motion as the flags went down. But that's when anything "slow motion" utterly ended. Each of the machines engaged their own forms of acceleration. The Redmond machine driver simply turned a valve and flooded the rocket engine with fuel. It's ignition was like an explosion of TNT and it blasted from the line like, well, like a rocket.

 

"They're getting ahead of us," Mark complained as the NZ car's acceleration pulled him deeper into the leather.

 

"It's just a side effect of packaging," Toretto said, his pulse rate not having changed one beat faster, "Just be patient."

 

Without warning, the Redmond machine sputtered and fishtailed its wheels as they passed it, Mark spun his head as the Redmond machine flew past them and they left it in a wall of salty dust. He then looked back at the Exa racer, and to Jeff's eyes riveted forward, set like flint againt the Western sky.

 

"How did you -" Mark began.

 

"Know it would run out of power?" Toretto lifted one side of his mouth, "Get real."

 

"We're still ahead of the others," Mark noted pensively, glancing around toward Tara, who seemed oblivious to everything around her.

 

"It will stay that way," Toretto said simply.

 

"So that's it," said Mark, "We stay in these race positions until the end?"

 

"No, they will think the race is over soon, and make their move."

 

Suddenly Tara's car started gaining ground, like something pushing it from behind. Mark saw her pulling up behind them fast, and faster still, "She's coming. She's coming really fast."

 

"Naah, she's just changed her fuel mix. Thinks going from 55/50 to 25/50 will actually matter."

 

Mark spun toward the Exa racer, now closing the distance, "He's coming too, Are we slowing down, or are they -"

 

"Making their move," Toretto said quietly.

 

"Aren't you going to do something? They're gaining!"

 

"Let them burn out," Toretto chided as the two competitor machines passed them and gained their respective leads, "And besides, the race is won in the architecture, not the gadgets."

 

"What difference does it make if we're behind?"

 

Toretto watched as the odometer slowly ticked over, And over again. "We're almost there, are you strapped in?"

 

"Yes, I'm strapped in, but almost where? Where is there?"

 

"There," Toretto pointed to a tinted stain in the salt flat, and watched the odometer tick over to the prescribed reading. "Here we go. Hold on."

 

"What are you doing?"

 

Toretto ignored him and pressed a switch on the dashboard. They could hear a whining mechanical noise coming from the rear as two gleaming foils slowly rose from the tail of their accelerating vehicle.

 

"What are those?"

 

"What did the Exa driver say?" Toretto reminded, "That at these speeds, bumps count. Actually, at these speeds,what counts is stabilization."

 

"How will those make us more stable? It looks like they're slowing us down!"

 

"Brace yourself," Toretto said, and punched the second button. "Accelerators engaged."

 

In that instant, the air inside the car seemed to grow thin, and the air around them seemed to radically change, buffeting the racer with increasing intensity. Then Mark felt it, a pulling, g-force of acceleration as it pressed him deep into the leather of his chair, and caused the blood to run from his face and into the back of his head. With a whoosh-whoosh, they passed the other two cars as though they were standing still.

 

Jeff watched helplessly as the NZ racer flew past them. Upon glancing down and across the controls, all of their gauges were standing at the max, pinned almost into the red line. Even if he could make it go faster, they would incur irreversible structural stress, and possibly crack apart on the flats, spinning into a million pieces. Jeff furiously spun dials and adjusted controls, attempting to squeeze just a bit more power from the machine. If he couldn't come in first, second place would have to do. Jeff now cursed his own racer as it entered the NZ racer's dust trail. His investors would be livid.


Tara furiously slammed her palm into the steering wheel, repeatedly cursing as the NZ car disappeared into the distance. Switching her fuel mixture from 55/50 to 25/50 had made her car lighter and more agile, but had not offered the additional speed. At least, not that kind of speed.


Then something rushed toward both their cars as the NZ racer crossed the sound barrier, a shockwave ripped up the surface of the salt flat and met them head-on. The Terra car was more stable, so the wave simply bounced its wheels. The Exa car was not so lucky. When the shockwave hit, the passengers heard the sonic boom before they felt it lift the racer's front end and flip it backwards, spinning it in a barrel-roll as it tried to find its footing again. Its back wheels landed first, then the front, causing the back wheels to lift off again, then the front, rocking violently back and forth like this at least five times before the right front tire blew out, sending the vehicle into a wild spin.

 

Jeff could hear and feel the car's structure releasing and popping from the stress. At this speed and rate of rotation, the Exa-racer's uncontrolled spin would rapidly develop enough centrifugal force to turn human brains to scrambled eggs. Jeff felt the red-out coming as an automatic release triggered and both their ejection seats activated, separately catapulting them hundreds of feet into the air. Their parachutes deployed when they reached apex, and Jeff witnessed his car disintegrate on the salt flat.

 

Jeff lifted his gaze into the West, watching the NZ car disappear like a speck in the wake of its own shockwave, churning up the ground behind it. It would likely reach the finish line before his parachute even touched him to the ground.

 

Toretto casually glanced to his rear-view mirror, watchind the salt flat behind him, practically corrugating the ground in his wake. "Hmmm," he finally said, "Maybe bumps do count. Just not for us. And I don't mind giving them a bumpy ride." He settled into his seat, "No sir." And with that, fully understood the frustrated rage building in the minds of his competitors, and soon their investors.

 

And more fully understanding the difference between being fast, and being furious.

0 Comments Permalink
0

Rick Deckard wiped the sweat from his brow as he holstered his high-powered weapon. Lifting the communicator from his belt, he muttered several codes and closed the transceiver.

 

"Skin jobs," he said to himself, surveying the replicant sprawled on the floor, and amazed at the technology's ability to mimic the most complex entities on earth. He softly kicked the replicant's front panel, observing large hole his weapon had created in the technology's logo. The half-remaining "T" and the "ata" telling him he'd scored big. Another wannabee down for the count.

 

His communicator buzzed for attention. He lifted it, beeped-in and said "Deckard" like he really didn't want to be bothered, but knew such sentiments were useless. Apparently more replicants were on the prowl, having stolen their way into enterprises with myopic POCs, NDAs and a variety of other three-letter-acronyms. He so longed to go Solo.

 

"We've spotted another one," said the dispatcher on the other end, "People are dying."

 

"Dying?" Deckard raised an eyebrow. "That's new."

 

"Dying to get their jobs back after a misfired deployment with a replicant," said the dispatcher, "Get with the program Deckard. You were called from retirement, but you can't be this rusty. Not with this much at stake."

 

"You wanna come out here and be my backup?" Deckard shot back, irritated, "It's easy to criticize from behind a desk."

 

"Keep on talkin'," laughed the dispatcher, "But the day's slippin' by - and so will your replicant if you don't get on the stick."

 

"Yeah, yeah, whatever," Deckard beeped out, sighed and replaced the communicator. The steam rising from the replicant's body reminded him of why his work was important. Stolen money. Stolen dreams.

 

Less than fifteen minutes later, Deckard found himself crouching behind a stack of crates, one eye on the replicant and one eye on his pistol as he wrested it from its holster. Time was, he could draw, shoot and replace it before a replicant could take one mechanical breath. Now, countless CPU clocks dishonored his rustiness, and he needed a new weapon if he ever intended to win.

 

Too late he realized that he'd spent too much time fiddling with the pistol, and upon looking up, found the replicant nowhere in sight. In that moment, he felt the replicant's mechanical breath on the back of his neck, and he whirled to confront it.

 

"Deckard!" shouted the replicant as he delivered a hard backfist, reeling Deckard over the crates to fall hard on the other side. "You should never have returned! You know I can't be beaten in toe-to-toe comparison!" He then split the crates apart and tossed them to each side.

 

Deckard had already reached for his pistol, but it had been just loose enough to fall from the holster when the replicant had ambushed him. Glancing around feverishly, the fear rose in his throat as the replicant took one step forward, grabbed him by the shirt and shook him once. He pulled his fist back and Deckard could hear it hitch, meaning that some special spring had latched in preparation for release, and if the replicant's fist now threw a punch, the impact would take his head clean off his shoulders.

 

"Sleep tight," said the replicant wickedly.

 

But the punch never came. Instead the replicant's eyes widened, his breath shortened and his strength seemed to instantly leave his body. He dropped Deckard like a sack of potatoes, and Deckard wasted no time in scrambling clear. The replicant fell to his knees with a bone-crunching impact, his eyes vacuous, and fell forward with a whump.

 

Deckard glanced around for his weapon, only to be met face to face with another, much younger Blade Runner, holding a smoking weapon, clearly more advanced than his own.

 

"I'm TwinFin," said the Blade Runner meekly, pointing to the twitching mass that was the replicant  "I see you've just run across a more advanced model than you're accustomed to."

 

"Stronger than before," Deckard rasped, wiping the sweat from his face with both hands, "It's been awhile."

 

"Yes," he said, "This one's name is A-Data. He is the most advanced of his kind. A front-loader and high-volume storage capability. Also fast response. Almost as fast as yours, even with age."

 

"Thanks," Deckard responded flatly, unamused, "A-Data, eh?" he smirked, tapping the replicant's leg with his foot, "Well, now he's just an ex A-Data."

 

"True," smiled TwinFin, "But you'll need more power if you want to stay ahead of them," he held out his weapon, a POC-killer if ever Deckard had seen one. On the weapon's barrel, in old-Gothic script, he read the weapon's name "The Closer."

 

"Nice," Deckard quipped.

 

TwinFin suddenly produced an auto-ject unit with the "enzee" logo emblazoned on it, snatched Deckard's hand, and before Deckard could object, injected the enzee accelerant into Deckard's bloodstream.

 

"What the?" Deckard now snatched his hand back, but suddenly felt the chemical's surge of power, "What's in that stuff?"

 

"Secret sauce," TwinFin smiled, "You'll be five-X or more faster response than they are. Your next replicant will go down for the count before the count even begins."

 

"Tight."

 

"You have no idea," he smiled, "And by the way, I'll be right behind you."

 

"I hear some of them are looking for their makers," Deckard posited.

 

"Wouldn't you?" TwinFin said, "I'd sure wonder why I was made that way. Changed from one purpose to another in the middle of my cycle."

 

"I wonder if anyone has noticed, that the replicants are always trying to be like us?"

 

"It's because we're the only standard they know, by which they are measured."


"I also wonder," mused Deckard, "If these replicants dream of electric customers."

0 Comments Permalink
0

"Blade?" Hannibal King touched the sleeping warrior gently on the shoulder, "Wake up, dude."

 

Blade raised one eyebrow, then slowly opened his left eye. Unafraid of the day or night, the warrior moved his hand ever so slightly to verify the presence of his sword. King could see the taughtness of Blade's shoulder sinews as he slowly shifted his weight on the pallet.

 

"This has better be good," Blade rasped, "I was in the middle of a dream. Kickin' bloodsucker tail," he wiped his hand over his face as though it would wipe away the sleep from his eyes, or the fatigue in his body, but it did neither.

 

"We have some news," King said with a low voice, "The upgrades have arrived."

 

Blade's other eye slowly opened, "Oh?"

 

"Yeah," King laughed, "You're gonna like it."

 

"I'll be there in five," Blade said, half of him wanting to roll over and sleep, and half of him curious about the upgrades. Blade always had a half-and-half approach to life. The bloodsuckers hated him for it.

 

A number of minutes later, the warrior strolled slowly into the main atrium of his personal lair, only to find it strewn with boxes, styrofoam and bubble wrap, "What's all this mess?" he rasped.

 

King appeared from behind one of the largest boxes, a vertical package over eight feet tall, holding a swatch of bubble wrap, "Don't you just love this stuff?" he quipped, violently popping several dozen bubbles with vigorous manipulation.

 

"Stop that!" Blade commanded, ever-despising King's cheeky nature, "Tell me what all this is."

 

"All this," King pointed to a far wall where the apparatus had been installed, "is just for you. At your service."

 

"Blade servers, eh?" Blade took two short steps toward the machines, "What does it do?"

 

"Only slices, dices and makes Julie-Anne cry!" King cackled.

 

Blade was not amused.

 

"Okay, seriously," King began, "Recall some of our - er clients - had some run-ins with the bloodsuckers? Their problems were really that they were working with too little information. Or that it was inaccurate, or not arriving in time. The BI bloodsuckers swoop in to save the day."

 

"I hate bloodsuckers," Blade seethed.

 

"Oookay, so they fell prey to the wiles of the bloodsuckers, promising a better mousetrap and all that."

 

"They always promise."

 

"Moving right along, they promise but don't deliver. Here's where we come in, and help them get on the right track."

 

"How do these machines do that?"

 

"The Blade servers include a special sauce - "

 

"Special sauce. Is it red?"

 

"Uhh, no. But it's all painted in your favorite color. The better part is that you can use this machinery during the day to find opportunities, and still let it work at night, you know, when you're - uh - out."

 

"Hunting bloodsuckers."

 

"Uhh, yeah, so let's focus here. The new server has a special acclerator that basically lights up the night."

 

"Is it ultra-violet light?"

 

"No, but it's ultra-clear light. The kind of light we need to shine on business priorities, SLAs and how to leverage the machine at the enterprise level. You know, best practices."

 

"I don't need any practice. When the sun goes down - "

 

"Okay, look," King interrupted, "The accelerator sits on the blade and does all the analytic streaming work. The server then allows for cache RAM to sit between the disk drives and the processor, so we can keep stuff in memory longer."

 

"I have a long memory for bloodsuckers."

 

"And some clients," King rolled his eyes, "May need long memory for lookup tables, oft-used dimensions and the like."

 

"Are you starting all that other-dimension talk again? I thought I'd made a deal with Stan that we would never introduce - "

 

"No, not alternative dimensions in spacetime," King smirked, "But multidimensional analysis."

 

"I don't follow."

 

"Data analysis."

 

"To what purpose? What are we looking for?"

 

King thought about the question for a moment, realizing that the answer could capture Blade's attention or lose him forever. He finally said "Bloodsuckers."

 

Blade's eyes flashed, "If this will help us find the bloodsuckers, why do we only have one? Why not more?"

 

"Now, now, we should start small and grow tall - "

 

"Platitudes," Blade huffed, "Time is short. Will it find the bloodsuckers or not?"

 

King knew that when he said bloodsuckers, he'd meant the broken processes and data that drain the lifeblood from a company, "Yes, it can help us find them."

 

"Good," Blade finally said, slowly strolling toward the machines. He stared at them for a long moment and finally said. "You work for me, now."

 

"Uhh, Blade," King said, "They can't hear you, they're machines."

 

Blade didn't say anything.

 

"Oh, and I have this," King produced a small metal plate and held it out to Blade.

 

The warrior turned and stared at the object, curious as to its nature. "And this?"

 

"Is a Final Interrogation Node," King said, "For use when you are about to dispatch a bloodsucker."

 

"How does it work?"

 

"You wrap the wrist-strap here," he applied the strap to his own wrist, holding the plate in his hand, then flicked his wrist. The plate flew to nearest stone column, remaining connected to King's wrist with a tether made of high-tensile filament. The plate sank into the stone with a dull rrrriiiiinggg. . King then flicked his wrist again and the plate dismounted, the tension in the tether returning it immediately to his open palm.

 

"That was fun, but what does it do, really?"

 

"When you're done asking questions that anyone can get answers for, the FIN takes it to the next level. And if you have one in each hand - "

 

"Twin Fins, very funny."

 

"You'll still get the answers you're looking for."

 

"I'll always get the answer I want eventually."

 

"Uhh, well, isn't that what the bloodsuckers say? Anyone can give the right answer slow. But these," he held up the FINs," Get the right answers faster than anything."

 

"Even faster than me?"

 

"Faster than Blade alone," King smiled, "Yep, even faster than a blade and all its servers. You still need the FIN's and special sauce. Bloodsuckers don't have those."

 

"Competitive advantage," Blade said in a low whisper, "I like it."

0 Comments Permalink