Skip navigation

Gather 'round the Grill

2 Posts tagged with the scalable tag
0
When introducing the Netezza platform to a new environment, or even trying to leverage existing technologies to support it, very often the infrastructure admins will have a lot of questions, especially concerning backups and disaster recovery. Not the least of which are "how much" "how often" and such like. More often than not, every one our responses will be met with a common pattern, a sentence starting with the same two words:
What the....?
Case in point, when we had a casual conversation with some overseers of backup technology as a precursor to "the big meeting" we almost - quite accidentally - shut down the conversation entirely. Just the mention of "billions" of rows, or speaking of the database in "Carl Sagan" scaled terms, caused them to want to scramble for budget and market surveys of technologies that were more scalable than their paltry nightly tape-backup routines. In this particular conversation, we were talking about nightly backups that were larger than the monthly backups of all of their other systems combined. Clearly we were about to pop the seams of their systems and they wanted a little runway to head off the problem in the right way.
But what is the "right way" to perform a backup where one of the tables is over 20 terabytes in size, the entire database is over 40 terabytes in size and the backup systems require two or even three weeks to extract and store this information for just one backup cycle?  Quipped one admin "It takes so long to back up the system that we need to start the cycle again before the first cycle is even partially done." and another "Forget the backup. What about the restore? Will it take us three weeks to restore the data too? This seems unreasonable."
Yes it does seem unreasonable precisely because it is - quite unreasonable. As many of you may have already discovered, the Netezza platform is a change-agent, and will either transform or crush the environment where it is installed, so voracious are its processing needs and so mighty is its power to store mind-numbing quantities of information.
The aforementioned admins simply plugged their backup system into the Netezza machine, closed their eyes and flipped the switch, then helplessly watched the malaise unfold. It doesn't have to be so painful. These are very-large-scale systems that we are attempting to interface with smaller-scale systems. We might think that the backup system is the largest scaled system in our entire enclosure, but put a Netezza machine next to it and watch it scream like a little girl.
So here's the deal: No environment of this size shoiuld be handled in a manner that is logistically unworkable for the infrastructure hosting it. We can say all day that these lower-scaled technologies should work better or that Netezza should pony-up some stuff to bridge the difference, but we all know that it's not that easy. Netezza has simplified a lot of things, but simplification of things outside the Netezza machine - aren't we asking a bit much of one vendor?
To avoid pain and injury, think about the things that we need to accomplish that are daunting us, and solve the problem. The problem is not in the technology but in the largesse of the information. We would have the same problem on our home computers if we had a terabyte of data to backup onto a common 50-gig tape drive. We would need twenty tapes to store the data. The backup/restore technology works perfectly fine and reasonably well for a variety of large-scale purposes. We simply need to be creative about adapting it to the Netezza machine. Don't plug it in and hope for the best. Don't do monolithic backups. The data did not arrive in the machine in a monolithic manner so why are we trying to preserve it that way? Leave large-scale storage and retrieval to the Netezza machine and don't crush the supporting technologies with a mission they were never designed for.
Several equally viable schools of thought are in play here. What we are looking for is the most reliable one. Which one will instill the highest confidence with least complexity? The more complex a backup/restore solution becomes, the less operational confidence we have in it. If it cannot backup and restore in a reasonable time frame, we exist in a rather anxious frontier, wondering when the time will come that the restoration may be required and we put our faith in the notion that it either won't, or when it does all of the other collateral operational ssues will eclipse the importance of the restoration. In other words. future circumstance will get us off the hook. There is a better way, like a deterministic and testable means to truly backup and restore the system with high reliability and confidence.
On deck is the simplest form of the solution - another Netezza machine. Many of you already have a Disaster-Recovery machine in play. Trust me when I tell you that this should be fleshed out as a fully functional capability (discussed next) and then the need for a commodity backup/restore technology evaporates. Using another Netezza machine, especially when leveraging the Netezza-compressed information form, allows us to replicate terabytes of information in a matter of minutes. I don't have to point out that none of our secondary technologies can compete with this.
A second strategy requires a bit more thought, but it actually does leverage our current backup/restore technology in a manner that won't choke it. It won't change the fact that the restoration, while reliable, may be slow simply because moving many terabytes in and out of one of these secondary environments is inherently slow already.
A third strategy is a hybrid of the second, in that enormous SAN/NAS resources are deployed as the active storage mechanism for the data that is not on (or no longer on) the Netezza machine.  This can be a very expensive proposition on its own. We know of sites that keep the data on SAN in load-ready form, and then load data on-demand to the Netezza machine, query the just-loaded data, return the result to the user and then drop the table. You may not have on-demand needs of this scale, but this shows that Netezza is ready to scale into it.
A fourth strategy is a hybrid of the first, that is we still use a Netezza machine to back up our other Netezza machine, but we use the more cost-effective Netezza High-Capacity server, which is less expensive than the common TwinFin (fewer CPUS, more disk drives) but otherwise behaves in every way identically to its more high-powered brethren. And honestly if we were to put apples-to-apples in a comparison between the cost of a big SAN plant to store these archives versus the High Capacity Server, the server wins hands-down. It's cheaper, simpler to operate, doesn't require any special adaptation and we can replicate data in terms guided by catalog metadata rather than adapting one technology to another.
So let's take these from the least viable to the most viable and compare them in context and contrast, and let the computer chips fall where they may.
Commodity backup/restore technology
If we want to leverage this, we need to understand that it cannot be used to perform monolothic operations. These are unmanageable for a lot of reasons:
  • The time it takes to perform a backup and restore is many times longer than a common operational cycle - that is the data is already on-the-move and being processed over the many days required to backup the data. By the time we are done, it does not represent a true backup
  • If the backup fails, we've lost all that time and have to restart from scratch

  • The time it takes to restore the data is too long to be viable. We cannot wait for days or weeks to get the data back under control and also lose days or weeks of processing time
  • The prospect of data loss, over many days if not weeks, is a real threat
  • The backup/restore technology is oblivious to the data content, but the content itself provides the necessary boundaries for proper backup operation
  • The vast majority of the content is static already, meaning that the vast majority of the backup is redundant and has no value in secondary form
To mitigate the above, we need to adapt the large-scale database to the backup technology by decoupling and downsizing the operation into manageable chunks. This is a direct application of the themes surrounding protracted data movement in any environment. The larger the data set, the more the need for checkpointed operations so that the overall event is an aggregation of smaller, manageable events. If any single event fails, it is independently restartable without having to start over. Case in point, if I have 100 steps to complete a task and they are all dependent upon one another, and the series should die on step 71, I still have 29 steps remaining that may have completed without incident, but I cannot run them without first completing step 71. This is what a monolithic backup buys us - an all-or-nothing dependency that is not manageable and I would argue is entirely artificial.
To continue this analogy, lets say that any one of these 100 steps only takes one minute. In the above, I am still 30 minutes away from completion. I arrive at 6am to find that step 71 died, and now I have to restart from step 1 and it will cost me another 100 minutes. Even if I could restart at step 71, I am still 30 minutes away from completion. 
Contrast the above to a checkpointed, independent model. If we have 100 independent steps and the step 71 should die, the remaining 29 steps will still continue. We arrive at 6am to find that only one of the 100 steps died and we are only 1 minute away from full completion. The difference in the two models is very dramatic:
Monolithic means we are operationally reactive when failure occurs. The clock is ticking and we have to get things back on track and keep moving. Checkpointed means we are administratively responsive when failure occurs. We don't have to scramble to keep things going. In fact, in the above example, if step 71 should fail and the operator is notified, doesn't the operator have at least half an hour to initiate and close step 71 independently of the remaining 29 steps? Operators need breathing room, not an anxious existence.
Monolithic methods are supported de-facto by the backup/restore technology. If we want to perform a checkpointed operation, we have to adapt the backup/restore process to the physical or logical content of the information. We don't want to directly mate the backup technology itself, so we need to adapt it.
Logical Content Boundaries
This means we have to define logical content boundaries in the data. What's that? You don't have any logical content boundaries, not even for operational purposes? Well, per my constant harping on enhancing our solution data structures with operational content, such as job/audit ID and other quantities, perhaps we need to take a step back and underscore the value of these things because they exist for a variety of reasons. One of them is now upon us - the operatonal support of content-bounded backups. It is required for scalability and adaptability and is not particularly hard to apply or maintain.
A more important aspect of content boundary is the ability to identify old versus new data. If the data is carved out in manageable chunks, some will always be least-recent and some more-recent. Invariably the least-recent chunks will be identical in content no matter how many times we extract them for backup. This means we can extract/backup these only once and then focus our attention on the most active data. In a monolithic model, there is no distinction between least or most active, least or most recent. In large-scale databases, the least-recent data is the majority, so the monolithic backup is painfully redundant when it need not be.  
Do we absolutely need content-bounded backups for all of our tables to function correctly? Of course not. But by applying this as a universal theme it allows us to treat all tables as part of a backup framework where all of them behave the same way. So part of this is in the capture of the data but a larger part is the operational consistency of the solution.
Many reference tables such as lookups will never grow larger and we know this. In fact, their data may remain static for many years. For the ones that are tied to the application and grow or change every day, these we will call solution tables. They are typically fed by an upstream source and are modified on a regular basis. Any of these tables can grow out of control. The reference data then represents a very low operational risk. Why then would we not simply fold the reference data into the larger body and treat all the tables the same? There is no operational penalty for it, but enormous benefit from being able to treat all tables the same way inside a common solution.
At this point, the backup/restore framework will address all of the tables the same way, but now we have the ability to leverage rules and conditions within the framework so that special handling is available if necessary. This is a common theme in large-scale processing: Handle everything as though it will grow, but accommodate exceptions with configuration rules. I'll forego this aspect for now and let' take a look at what we need in basic terms:
  • An intermedaite landing zone: Some off-machine storage location that can hold the data while in transit - e.g a NAS or SAN volume. This will not be the same size as the database but some fraction of it. It is a workspace for intermediate files, not permanant storage
  • Content boundaries: A means to define and manage content bthese are tags or quantities in the data itself. They need to be consistent for all tables so that the backups and restore operate the same way. Of course, if we don't have these, we need to apply them.
  • A restoration database: a separate database that will be used for the restoration process, because we won't be restoring the data directly into the table from whence it came. Why not? There is an increasing likelihood that the data structures have changed since the last backup. If the structure has changed, we cannot reload the data into it. We need a way to shepherd the data back into the machine.
  • Processes: that capture / backup / restore the data using content boundaries and common utilities. These are typically control scripts that are easily configured, deployed and maintained
  • Archival Database (next): Physical content replicas of the masters that hold all of the original information in a form useful for backup and restore. Whether they keep the data active/online is immaterial. Their role is to interface with the backup/restore mechanisms so that any data once made offline can easily be made online through this database.
Archival Databases
Setup formal archival databases. Whether these reside on the same server, or on a DR / High Capacity server (below) is immaterial. The point is that the data in the master tables will be actively rolled into these tables, which will form the backbone for all backup and restoration operations. We therefore replicate the masters (with streaming replication) into the archival stores and then at some interval perform the backup of the archives, not the master.
 
image
Foreshorten the Master Tables
Now that we have a means to define content boundaries - you did apply those boundaries right? We can now look at the database holistically for optimizations based on active data.
At one site we have a table with ninety billion records in three different fact tables spanning over ten years of information, However, the end-users and principals claimed (and we verified) that the most active data on any given day is the most recent six months. Anything prior to that, they would query perhaps once or twice a year for investigative purposes, but had not tied any reports to it.
image 
So now we have an opportunity to get agreement from the users to shorten the master tables. This is especially necessary for those fact tables. In the end, these fact tables (above) were shortened to less than four billion rows each and these are kept trimmed on a regular basis. The original long-term data is held in another archival database that is the foundation for the backups and restores.
 
Protocol
  • On a per-table basis, use the content boundaries to manufacture/compile a set of checkpointed instructions for each table to support extraction and restoration

  • Execute the extractions to create flat files

  • Each extraction will pull data to a Netezza-compressed flat file and also supply another file with the metadata instructions necessary to restore the file's contents. On a per-file basis, these instructions will include:
    • Original table definition, used to construct a template of the table in the restoration database

    • File load instructions used to get the content back into the machine

    • Transformation command, used to move data from the restoration database into its original home

  • The files will be extracted (using content boundaries) onto the storage landing zone
  • The smaller extractions can be performed in parallel, are independent and form their own checkpointed process
  • The backup techology will begin its backup process once the second (metadata companion) file arrives for the set. The metadata companion file behaves as a trigger.
  • Once the backup for the file-set is complete, the backup/restore deletes the file and its companion
  • This decouples the backup/restore processing from the database (completely) so that it can focus solely on file-based backup and restore
  • For restoration, both the content file(s) and companion metadata file(s) will be retrieved and placed on the storage landing zone
  • The metadata will be used to reconstruct an empty intermediate version of the original table in the Restoration Database
  • Load the table using nzload
  • Then the metadata will be used (even a preformulated SQL statement) to copy the data into the original target
  • Throw away the intermediate load table, delete the data file and its companion metadata file
  • The restoration process can be run as many ways-parallel as supported by nzload
  • The restoration process can also be surgical, with the more agile ability to restore data in smaller segments
image 
The above protocol, while likely easy to pull off an administer, still has a number of moving parts that the Netezza based-equivalent (later) will not have.

SAN-based hybrid

The only difference between the above protocol and one using a SAN-based storage mechanism, is the absence of a formal backup/restore technology. Rather the SAN is the long-term storage location and we perform the incremental extractions onto it. Rather than delete the files, we keep them.
This has significant implications for the cost of the SAN. After all, if we intend to interface this to the Netezza machine, we would not want common NAS storage because it is too slow and the vendors actively disclaim their technology from being viable for data warehousing. The primary reason is that the network and CPUs are set up for load-balancing, like a transactional database, but not bulk onload/offload of the data.
Not only will we need enough SAN to backup the environment, but to also carry fathers and grandfathers if need be (this is a policy decision). With checkpointed extracts, the father/grandfather issue is largely moot. This is because once a checkpointed extract of older data is pulled and stored, it won't be changing and capturing another one just like it has no value. 
Netezza-based Hybrid
In this approach we leverage another Netezza machine like a DR server, as our backup, archival and restore foundation. It can easily hold the information quantity. The difference here is in price, of course, since a fully-functional TwinFin is more expensive than most common SAN installations. However, the High Capacity Server (below) mitigates this pricing problem while delivering a consistent data experience.
One primary benefit of performing backups into the DR server is that it can automatically serve the role of a hot-swap server in case of failure in the primary server.
For this scenario to work however, we would want streaming replication between the active databases and the DR server so that the data is being reconciled while being processed. This allows us to have a fully functional hot-swap if the primary crashes, and we can continue uninterrupted while the primary is serviced. Word to the wise on this kind of scenario however, bringing back the primary means that it is out of sync, since the secondary took over for a span of time. So we would need to be able to reverse the streaming replication to make it whole.
Scenarios like this often embrace the practice of operationally swapping the two machines on defined boundaries, like once a month or once a quarter, where they actually switch roles each time. This allows the operations staff to gain confidence in the two machines as redundant to each other in every way. I have seen cases like this where the primary machine went down, the secondary machine kicked in seamlessly and all was well. I have also seen cases where the principals kept the DR server up to date but when it came time to operationally switch, some important piece (usually in the infrastructure between the devices) was missing causing the failover itself to fail. It is best to have a plan in place, but it's better to have tested the plan and that it actually works. 
Protocol 
  • On a per-table basis, use the content boundaries to manufacture/compile a set of checkpointed instructions for each table to support extraction and restoration

  • Using a variation of the nzmigrate utility, perform the table-level extractions

  • Extract data in Netezza-compressed form to a flat file

  • Load the flat file into a restoration database in the second machine

  • Perform the transform-execution to copy data from the restoration database into the target table

  • The extractions, restoratons and transformations can all leverage simple scripts and catalog metadata, not become dependent on deeply hand-crafted code

  • Intermediate (SAN) landing zone requires less space because the data is being transferred in Netezza-compressed format and it is cleaned-up on-the-fly

  • Transfer is quicker because Netezza-compressed data is written to the data-slices directly instead of coming through the host.

A word on Netezza-compressed transfer. I wrote about this in Netezza Transformation but it is important to highlight here. We performed an experiment moving half a terabyte scattered across a hundred or more tables. This data was moved from its original home to a database in another machine. The first method used simple SQL-extract into an nzLoad component. This process took over an hour. The second method used transient external tables with compression, coupled with an nzload in compressed mode. The entire transfer took less than six minutes. This was because the compressed form of the data was already 14x compressed.
In other experiment using over 20x compression for the data, we were able to transfer ten terabytes in less than an hour. This kind of data transfer speaks well for the streaming replication necessary for DR server operation (above) but underscores the fact that even when transferring between Netezza machines, it's as though we haven't left the machine at all.
Netezza-based High-Capacity Server
This option is simply a form of the Netezza-based hybrid (above) but on a dedicated server designed to support backup and recovery.
The better part about this server is that is has more disk drives and fewer CPUs, making it far more cost effective for storage than common SAN devices. Couple this with the minimal overhead required for transferring data between machines, and the ability to surgically control the content with the content-boundaries and catalog metadata, and we get the best of all worlds with this device.
Not only that, but it is also scalable to support storage of all other Netezza devices in the shop as well as any non-Netezza device where we simply want to capture structured information for archival purposes. The High Capacity server is queryable also, meaning that even the ad-hoc folks will find some value in keeping the data online and available.
Lastly, in Spring 2010, as part of the safe-harbor presentation one of the principals at IBM Netezza announced plans for a replication server. I can only imagine that this device will deliver us from any additional hiccups associated with streaming replication that we might now be doing in script or other utility control language.
At Brightlight, our data integration framework (nzDIF) has the nz_migrate techniques built directly into the flow substrate of the processing controller, as well as the enforcement and maintenance of the aforementioned content boundaries. We are actively acquiring and applying best, most scalable and simplified approaches as a solution framework firmly lashed to a purpose-built machine. I am a big proponent of encouraging Enzees to take on these things themselves, or at least let us coach you on how to make it happen. The solutions are simple because the Netezza platform itself is simplified in its operation. Stand on the shoulders of genius - the air is good up here.
0

Host: Ladies and Gentlemen, I'm your host here with Hairball Plotter-
Hairy: Call me Hairy
Host: (Pause) Oookay, I'm here with Hairy Plotter and he's going to tell us about some of his - er -

Hairy: Innovations

Host: Innovations? I thought they were just the opposite -

Hairy: Well, eccentric innovations anyhow. You know, the kind of stuff that gets the job done, but with a lot
more moving parts!

Host: Please explain

Hairy: Well, you see, it's not incumbent upon me as a consultant to work myself out of a job. I need a way
to stay employed and ensconced, and complexity is just the ticket

Host: How so?

Hairy: The more complex the environment, the better job security I have

Host: We have a caller from Ontario on the line, a question for Mr. Plotter?

Caller: Yes, I'm an IT manager and I've found that when people try to embed themselves as you describe, they create risk
for me, and I don't like that.

Hairy: Thanks for that input. I'll file that.

Caller: Wait a second, are you blowing me off?

Hairy: What difference does it make? If you like it or you don't, you're still stuck with me.

Caller: Unless I find something better.

Hairy: I'll just milk your budget so there's nothing to spend on something better. Works for me.

Caller: But that doesn't work for me.

Hairy: Noted.

Caller: But -
(click)

Host: We have another caller from a financial services firm in New York. Caller you're on the air.

Caller: Thanks, I agree with Hairy on this. Contractors are punted around and treated like fodder. We need a way to
keep our jobs and pay our bills. Working ourselves out of a job doesn't fit that mold. We need more ways to
make work for ourselves, even if it's artificial.

Host: Artificial?

Hairy: Well said!

Host: Wait a second, artificial?

Hairy: Well, of course its artificial. All of us are smart enough to do it better, faster, smarter. Heck, I could deploy a
high-reuse/low maintenance implementation that basically lets you run things lights-out.

Host: Good for you!

Hairy: Oh no, so not good for me. Once it's deployed I get a final paycheck and have to ride into the sunset. Seems romantic
but the sunset doesn't pay the bills!

Host: So you find a way to make yourself useful.

Hairy: No, I find a way to make myself necessary. Usefulness is for suckers.

Host: Now wait a second -

Hairy: Now listen. Everyone does it. At one of my favorite clients, all of the contractors are on a perpetual time-and-materials contract. When we
had some folks show up on a fixed-price gig, they practically pulled their eyeballs out when they couldn't get any urgency out of anyone.

Host: Why not?

Hairy: Because all those time-and-materials folks had a vested interest in protracting the work until the next day or the next week. Practically

every time we said we needed something right away because our clock was ticking, they would offer it up for delivery "next Friday" or some such.

But you know, If they ever looked ike they were wrapping up, it could spell curtains for their cushy little tushy.

Host: (laughs) I see what you mean.

Hairy: It's just how the game is played.

Host: So IT staffing is just a game?

Hairy: Musical chairs. In so many ways.

Host: So what are some of the ways that you - well - set this up?

Hairy: Invest in a lot of wool.

Host: Sorry?

Hairy: We'll be pulling it over the manager's eyes.

Host: Oh, I see, that's kind of funny.

Hairy: Thought you would like that. But seriously, we take the simplest, most direct way to implement something, so that we
know exactly what it would look like, then do the opposite.

Host: Seriously?

Hairy: Well of course. If we do the simplest approach, there's no room for a hero to step in an save the day. Lots of little
virtual terrorists running around in a complex system. Think about how the 9/11 terrorists were able to evade
electronic surveillance - they stayed off the grid and used cash, and were able to fool the other systems into thinking
they were out of the country.

Host: So you use the same technique?

Hairy: Well, the same forms of deception of course. If we built over-complex systems accidentally people would say that we're
clever, but if we do it deliberately, we're clearly diabolical. I would never deliberately paint myself to look bad, so we have to be a little

deceptive, right?

Host: Hmm, so this is a little disturbing. I want to know - oh we have a caller from Orlando.

Caller: Hey man, I want to come work with you.

Hairy: Shoot me a resume.

Host: Stay on the line and we'll take your contact information. There's another caller from Dallas Texas. Dallas, you're on the air.

Caller: Hey, this whole deliberate-snowstorm thing is really a different way to look at things.

Hairy: Snowstorm, I like that. When one of my first mentors realized this technique, he said in a gruff voice - "You're a blizzard, Hairy."
At least I'm not slithering - around you know like those snakepits where the idea-mongers hang out. Ideas are good for a spell, then they wear off.
Almost like-   Like snake-charming, or snake-whispering.

Host: But I mean, doing it deliberately seems, well -

Hairy: I know what you're about to say. But think about it. Whether we do it deliberately or accidentally, the outcome is
the same. They are interested in the system, but we're interested in self-preservation. If I have to choose, I say, don't
do self-preservation by accident. Do it deliberately. This means set yourself up to be necessary.
Host: Explain necessary?

Hairy: Of course. The more complex the system the more they need you. If it's simple, what do they need you for?

Host: But if they figure out what you're doing, you're cooked, right?

Hairy: But if I work myself out of a job, the outcome is the same. No paycheck.

Host: So you're saying that most data warehousing is just accidental brilliance?

Hairy: Acccidental or deliberate, the outcome is the same.

Host: Oh, please, don't go fatalist on me.

Hairy: Come on, man. Most folks can't pull this off whether accidental or deliberate. The fact is, we still stand up a functional,
operating environment. Consumers are being fed, users are getting what they want. Forget the fact that the
environment is stood up on pallettes and serviced by swarming people on rollerskates.

Host: I- suppose.

Hairy: Again, if someone stood up the working environment and had all this stuff in it, you'd call it clever. Only when
you realize that the complexity is artificial would you take exception to it. If you never realize this, I will always
have a paycheck and you will always have an operational system. You'll never fire me, because I'm your most visible hero.

Host: Ever heard of Munchhausen-by-proxy syndrome?

Hairy: Not familiar with it, no.

Host: Where a care-giver deliberately harms their child in order to be seen as the savior that delivers them from harm.

Hairy: I see a pattern, sure.

Host: You said it yourself, diabolical.

Hairy: Only if they realize it. Perception is the key.

Host: But you're admitting to it here. On the air.

Hairy: Nobody will remember it. For one good reason: It's so outrageous that it simply cannot be true.

Host: Unbelievable.

Hairy: ExACTLY!

Host: But what about scalability? When volumes increase, invariably the complex systems are crushed.

Hairy: Oh please, only about five percent of all implementations have that issue, so if I stay away from those, I'm gold.

Host: But what if one of your implementations grows into this? Seems like you'd have some explaining to do.

Hairy: What are the odds? I can easily place myself in the lower-scale zone and make a good living at it.

Host: But you would agree that the larger the scale, the more the need for simplicity?

Hairy: I don't do scale, so why would I care? I blow smoke into a manager's face, nose or other orifices and take a paycheck. What
does scale mean to me?

Host: You have a lot of disparaging things to say about these decision-makers. Aren't they your customers?

Hairy: Look, you have people like me, who work the magic, and people who sit in the office, smug in their confidence that
they know exactly what to do and how to do it. My objective is to make sure they stay on the outside looking in,
snuggled next to their branded-coffee cup and blissfully unaware of my agenda. We call them the Smuggles.

Host: Smuggles?

Hairy: Yeah, Smug people who snuggle with coffee. Their users are the Smuggees. Smug people who know data but don't know beans
about how to make it operational. Always offering opinions. Who cares what they think?

Host: Clearly not you.

Hairy: Well, I care what they think to an extent. As long as their reports are running to spec, it doesn't matter how we pulled
it off, only that they get the data they want.

Host: The "how" and "what" question. I've heard that before.

Hairy: And what I've heard before, is the endless droning of users dictating to us how we should deploy the systems.

Host: And what do you do with that?

Hairy: If their suggestion will make the environment more complex and difficult to manage, we're all over it. If it makes the
environment easier to operate, we push back. We call it the "you asked for it" policy. It's a theme, you see.

Host: Yes, I can see that.

Hairy: O I love 'em. Nothing is better than deliberately choosing a platform that can't cut the mustard. I mean, they're not really
hard to pick, you know. Imagine getting almost to the end of the project and hitting that hard
wall - flying right into it like a blind witch on a runaway broom. It's hysterical to watch all these
folks running around like headless chickens. You can't pay for that kind of entertainment. But of course,
they do pay for it, and handsomely.

Host: Until they realize that you're in the center of it?

Hairy: Eye of the storm you mean, where the seas are calm. I never lose my cool, so they always think things
are under control until someone pulls the single thread that unravels it all. I have plausible deniability. Keeps
me working and the paychecks coming. Sweet.

Host: Don't you think this is a bit - you know - underhanded?

Hairy: The mass layoffs of the turn of the century were underhanded. They created a 1099-Culture that basically means all of us are
mercenaries. Soldiers of fortune. We go to the highest bidder no matter what. There's no conscience in that existence, especially when we could
be working for one company today and their competitor tomorrow. Those companies treat the 1099's like batteries - plug 'em in, burn 'em out and toss 'em.

Host: Seems a bit cynical.

Hairy: One of the better parts about this kind of consulting is that I can propose the solution without
actually producing anything. Then I can flit from flower to flower, pollinating these ideas. They pay
me for the ideas, not the actual work, you see.

Host: So you propose, but you don't actually execute?

Hairy: Execution is somone else's problem. Why should I stick around to see if the proposals actually work? If
they don't, there's another feather in my hairball cap, or rather another notch in my mayhem gun. But
if the solution works, good grief, all that work for nothing? May as well stay home and play video
games.

Host: What would you like to leave our audience with?

Hairy: Oh, I suppose, don't worry, be happy and all that. I have a new book coming out called Managing Expectations.

Host: So tell us about that.

Hairy: It's all about how to give the user a false sense of security while we do whatever we want. I'm all about what's expedient
for me, but most of the time when the client sees what I'm doing, they love it because it makes things expedient for them as well. Soon
everyone is following the beat of the same drummer.

Host: Which would be?

Hairy: Get it done no matter what the cost. That way, I can charge whatever I want. Cost is no limit.

Host: I see. I think.

Hairy: When you think about it, most IT folk want to do things the expedient way anyhow. Over-thinking the problem seems so stodgy to them.
In fact, most of the time when we're going through the analysis phase, I can see it written in their expression and practically popping from
their eyeballs.

Host: What's that?

Hairy: The desire to start coding! Heck, practically every conversation is punctuated with how they intend to do it - long before the analysis
is complete. That's because IT folk just don't have the patience for analysis. They want to get coding. I just set the expectation that we'll
code first and analyze later.

Host: So when does the analysis come into play?

Hairy: What analysis? If we start coding the analysis never happens. There's never enough time. After all, the herald cry of
expedience is - "If you don't have time to do it right, when will you have time to do it over?" What do you get when you start coding
without any analysis? Hairball city. That's my stomping ground.

Host: Thank you for your time Mr. Plotter. This has been very - uh - enlightening.

Hairy: My pleasure

(pause)

Host: That was Hairy Plotter with his half-baked stints. Be with us next time when Jason Statham joins us for
"Data Transporter 2", where he kicks a bunch of back-ends with ETL tools. Until then, happy computing and keep that data flowing!

--

--

--