[ www.netezza.com ]

Thinking Inside the Box

6 Posts tagged with the monash tag
1

News broke on Tuesday that EMC plans to acquire Greenplum to focus on data warehousing and analytics on “big data”. The idea is that by doing so, EMC is officially throwing its hat into the competitive ring for the ‘Data Warehouse Appliance’ (DWA) market – something of a defensive mechanism now that virtually all of the major data warehouse vendors are now selling their own versions of a DWA – and consequently greatly reducing sales pull-through of EMC storage for data warehouse deployments.

Some referred to the merger as “
a good fit for a storage vendor with appliance-y ideas” and others hailed it as follows, “the market has shifted as of late moving toward integrated appliances and this move gives EMC a very important arrow in its quiver” and labeled Greenplum as a purveyor of “very high performance database systems”.

One can also reasonably assume that this acquisition not only is intended to shore up a product offering weakness, but that it is also destined for affiliation with EMC’s other major initiative announced earlier this year – the
Acadia Virtual Computing Environment (VCE) Joint Venture with Cisco Systems and headed up by Michael Capellas. The Acadia JV includes EMC’s storage and its VMWare virtualization software as well as Cisco Systems’ compute nodes and networking. VCE is built on the concept of modular building blocks, called vblocks that marry computing horsepower to storage capacity. All that’s missing from that story is a data warehouse DBMS to make it a full-on data warehouse appliance, right?

There are two big problems with these assumptions…


Performance: For all the discussion about “scale” and  “big data” in the EMC announcement, there is no mention of how either party can address the real issues that mainstream enterprises face every single day with their data warehouse systems – how to get maximum performance out of a complex, highly concurrent operational environment where hundreds if not thousands of users are banging away on the system, night and day.

  • The fact is that the actual Greenplum target market has clearly NOT been one that focused on high-performance analytics over the past several years. Instead, the few wins publicly announced by the company have been for very high capacity, limited compute platforms – applications more commonly referred to as “queryable archive”.
  • Curt Monash today again mentioned Greenplum’s lack of support for the “high-concurrency” requirements of a mainstream data warehouse.
  • This looks much more like adding a very basic set of storage-centric data warehousing capabilities in a move to find a broader channel for EMC’s traditional storage products rather than any strategic move into the world of high performance data analytics. Further to this point, neither company has done much of anything to address a very strong trend in the mainstream data warehouse market – the marriage of advanced, predictive analytics into the busy data warehouse systems.
  • David Vellante confirmed that to be successful the EMC/Greenplum marriage will need to yield, “optimized sytems[sic]; smokin’ fast performance; reference architectures; scale;” and “federation capabilities; not just big honking systems.” We couldn’t agree more but one can’t help but notice that neither Greenplum nor EMC have brought any of those characteristics to market for data warehousing to date.


Appliances: Since the acquisition is fairly transparent in its defense against moves by the likes of Oracle, Teradata and IBM (as well as Netezza seven years ago) to the appliance model, it’s hard to see how either EMC or Greenplum are effectively equipped now to do battle against those established players.

  • EMC have never really “sold” data warehousing to anyone previously and Greenplum have nearly prided themselves in going after “Greenfield” high capacity applications rather than head-to-head competition vs. established players. And one need look no further than the limited market penetration of H-P’s NeoView to understand that it takes more than simply deep pockets to succeed in the data warehousing market.
  • Greenplum is not a purveyor of “integrated appliances” and at best, they can hope to infuse in EMC the ability to make their joint product offering a little more of an “appliance-y idea” (hat tip to Dr. Monash for coining the term) to the market. Instead, Greenplum have fashioned themselves over the past several years as a software only solution.
  • Assume that the Acadia VCE and “vblock” application is a big piece of this strategy. Neither Cisco nor EMC would claim that their servers, networking or storage arrays offer the lowest price-per-bit or price-per-performance alternative in the market. So one needs to think about what that means in terms of the price-performance competitiveness of this new “appliance-y” joint product.


In short, Greenplum joins the pantheon of “interesting” acquisitions for EMC as it will certainly stir some news cycles and drive some analysts and bloggers to create “fresh, new” content; but it’s not really something that I think will register on the Richter scale of customer market share.

1 Comments Permalink
3

Netezza Migrator.jpg

It may have been the result of a misunderstanding or a comment heard out of context. But whatever the background for the commentary, let me simply state that Netezza is completely committed to the success of the Netezza Migrator and all the other Netezza products and functionality launched at Enzee Universe 2010 this past week. Migrator eliminates a potential barrier to TwinFin™ adoption (i.e., migration costs) and logically should lead to easier acceptance and broader system sales for Netezza. Furthermore, our partnership with EnterpriseDB at both the corporate and technical levels has been and remains extremely solid and strong.

As I stated in the
announcement of the product, “The Netezza Migrator product allows organizations to make data warehouse migration decisions independent of proprietary software lock-in. Organizations using data integration and BI applications with embedded Oracle-proprietary database constructs, interfaces and utilities can now more easily manage their migration from Oracle to a TwinFin appliance. The Netezza Migrator will allow our customers to achieve the performance, scale and cost advantages of their TwinFin systems while maintaining their prior investment in proprietary software.” The Netezza Migrator is specifically designed to reduce the time, complexity and costs required of our customers to move their IT applications to the Netezza TwinFin platform.

With Migrator, Netezza’s customers will be able to extract themselves from the dreaded “Oracle lock-in” of functions and procedures written using Oracle-proprietary techniques and they can decide which of their applications to migrate directly to Netezza and just when, at their own pace. Its capabilities go well beyond the extremely limited capabilities provided by Oracle’s own ‘Database Gateway for ODBC’. Migrator provides an Oracle compatible wrapper around Netezza that is optimized in ways that Oracle could never hope, nor deign, to provide with its "Heterogeneous Services" functionality: including support for Netezza syntax pushdown, high speed API, and Netezza user defined functions.

Migrator makes it even easier for Netezza’s customers to move all of their data warehouse from Oracle to Netezza. In short, this is something we feel is extremely valuable for Netezza and particularly “liberating” for our customers.

3 Comments Permalink
0

A loyal customer alerted us toan Oracle blog by Jean-Pierre Dijcks earlier today that showed the Oracle FUD machine is fully revved-up and ready to go. I'd like to offer a rebuttal, however in the interest of not intruding on Jean-Pierre's entry with an overly-long comment, I've just put a short response on his blog post with a pointer to this one.


Misconceptions and Misunderstandings, or Errors and Plain-old FUD?

I’m writing to correct *just a few* of the misconceptions about what is really important in high-performance, scalable data warehouse systems, errors, or just plain-old pure “competitive FUD” points from Jean-Pierre's posting earlier today. We certainly have posted some information recently about the TwinFin product and Curt Monash’s postings late Thursday provided more info. If his readers are interested in learning more, or even signing up for a “Test Drive”, they should visit www.netezza.com.

First off, I think this is a “banner day” for Netezza. We believe that TwinFin (and the other products in the new product family)
extend both our performance and price-performance advantage over our competitors. We stand by our marketing statements that we regularly demonstrate 10-100X performance advantages over our competitors, particularly competitive offerings of the major incumbent DW system vendors (“Just who are those incumbents?” Jean-Pierre's readers may ask. Well let’s just say that we see Oracle as the incumbent system and/or a challenger system in over 50% of our deal flow.).

Regarding his claims about DBM being “
faster than Netezza” (and I can only assume he meant at “real” data warehouse tasks) - we’re ready whenever Oracle feels up to actually taking one of their Database Machines onsite to a customer for a fair, open customer benchmark. So far, Oracle have been, shall we say, “a little reticent” to do on-site benchmark testing against Netezza.

Next, given the large number of incorrect points in the original posting, I think perhaps that just a few of them will be useful enough for readers to get the gist of just how far afield some of the ‘facts’ are:

  • It all comes down to data scan rates per rack”: Would that it were true that all of data warehousing boiled down to full-stream data scans (as if the entire world of analytics relied on “select count(*) from lineitem” types of queries), then we could all measure “goodness” on how many GB/sec of data could be burst-scanned in our systems. But that’s not the case. So we build Netezza’s data and analytic appliances to deliver the best possible overall performance at the best price and power requirements. As a consequence, and following from those same numbers as-posted, a single rack of TwinFin can process (not just scan) about 400 million rows of data per second. That’s process, as in: “scan, decompress, project, restrict, AND join, etc.”. Need more processing firepower? Netezza’s system performance scales linearly with the addition of more S-Blades: at the low-end, the TwinFin 3 can deliver as much as 100M rows/second of processing horsepower, while the TwinFin 120 can provide you with 4 billion rows/second.  Does a system that still relies on using SMP-based servers running “plain old” Oracle 11g RAC scale similarly for data warehousing?


  • Non-open Linux running on FPGAs”: I’m really not sure what (if anything) was meant by this, but saying that Netezza’s FPGAs “are apparently running non-open Linux” is oxymoronic on at least two different levels (FPGAs don’t typically “run” an OS and, “non-open Linux” - really?)


  • User data & compresssion”: I also enjoyed the accounting of all that “user data” available to DBM users in the Oracle table and the various comments about compression. When Netezza quotes user data capacities in our systems, the numbers reflect real raw user data space, not space that will be further reduced because of required indexes in an attempt to boost performance. Furthermore, Netezza’s compression & decompression techniques allow us to extract “pure performance” from their use. By not relying on CPU cycles to decompress the data before we can process it any further, the FPGA engines decompress the data, on-the-fly, as fast as it streams off the disk drives. Can Oracle make either of those claims?


  • Tolerating node failures without downtime”: In perhaps the most bald-faced inaccuracy, the Oracle blog claimed, that Netezza “continues to lack the ability to tolerate node failures without downtime”. This I can only chock up to pure competitive “FUD-ism” as our capabilities in this area have been quite strong throughout the four generations of Netezza appliances and are further strengthened in TwinFin. Netezza is a fully-redundant system with no single point of failure, even in our smallest systems. Failover in the presence of failures of the disk drives, S-Blades, internal networking or host processors (in short, everything) is automatic and done in-service, with hot-swappable replacement throughout.


  • Appliance simplicity”: One thing Jean-Pierre didn’t address that might have been humorous to see his take on is the notion of “appliance simplicity” - basically the ability to build, support and maintain large to very large-sized data warehouses, with heavy workloads, with no or minimal tuning, partitioning, indexing or other “performance duct tape” required. Routinely, this capability in the Netezza systems is what delights our customers most and we have customers managing systems with several hundreds of terabytes of user data (not indexes + data, mind you - real data) with fractions of an FTE (full-time employee) devoted to them.


I hope that clears up some of the misconceptions. If any of Jean-Pierre's readers or Oracle customers would like to see or hear more about TwinFin for themselves, we definitely would invite them to come stop by our booth (#207) at
TDWI or come to one or our regional Enzee Universe events coming to a location near you.

0 Comments Permalink
2

 


"You stay classy, San Diego." -- Ron Burgundy (Will Ferrell) in "Anchorman" (2004)Will Ferrell Anchorman.gif


This morning a few others from the Netezza Marketing and Product Management teams and I are ensconced by the Marina in sunny San Diego, CA for the TDWI World Conference and for an news announcement or two. And who better to bring us "Breaking News!" than the Number 1 newsman in all of San Diego, Ron Burgundy. [For those of you who might have been "hoping for more" from Ron in a quote about San Diego, you can check out the IMDB database for some great ones, including Ron's own historical (and hysterical) etymology for the city's name.]


BANNER_TwinFin_3.gif

 

Though it’s not exactly a state-secret at this point, today we’re launching the 4th generation of Netezza data warehouse and analytic appliances and the first of four initial product lines in it: TwinFin™.

 

TwinFin logo name.jpg

Some of the core characteristics of the TwinFin and the overall platform are:

  • Resetting Netezza’s price-performance leadership position in the market and extending Netezza’s performance lead;
  • Disrupting the competitive data warehouse market among the incumbents, just as we did with our initial systems in 2003/’04;
  • Moving to a commercially-available, blade-based server and storage platform; and
  • Opening Netezza’s aperture on the broader market with a multi-product platform design to match customers’ data warehouse and analytics needs across their enterprise


After the market disruption Netezza caused with the introduction of the NPS® in 2003 and since, we have seen the entry of dozens of new startups in our wake and virtually every major incumbent data warehouse vendor has retooled its portfolio to include a “response” to the Data Warehouse Appliance (DWA) in a suddenly reenergized market. Several of them, to their credit, have advanced their value propositions and improved their competitive position.


TwinFin Board Image.gifNow it is Netezza’s time once again. With the introduction of TwinFin and the other members of the new family of products, Netezza is once again changing the game; widening the applicability of our systems to more types of customers, applications and partners in the market.

As stated in
my response to Curt Monash, my response to Curt Monash last week, we think of this 4th generation of the Netezza appliance as using “the same architecture with a new physical implementation”. Starting with TwinFin, we moved to a commodity blade-server based system framework, but one that still uses Netezza’s “secret sauce” to deliver as much as a 5X increase in performance over the previous generation of Netezza systems, namely:

· our balanced design and streaming architecture;

· the use of Field Programmable Gate Array (FPGA) technology as a query processing “turbocharger”; and

· our advanced MPP management and optimization software.

 

And there are more innovations and performance gains on the way! TwinFin, quite simply, will serve as a platform for expanding Netezza’s performance and price-performance advantage in the industry and as the basis for advancing the state-of-the-art for in-database, analytically intensive data processing; all without sacrificing any of the appliance simplicity with which our company is synonymous.

As
a couple of us said last week, Netezza has served as “the benchmark” for high-performance DWA pricing in the industry and we are now leading “the market in pivoting to a new competitive price-performance level”. With these new systems, we have embraced a trend that has been happening around the industry – the movement of marginal cost of a bit of disk storage toward $0 – with system-sizing, pricing and even system numbering focused on the performance delivered by a given platform.

 

We think the net effect of the new, simplified pricing structure for TwinFin and the other members of the Netezza product family will create a major disruption in the market. With starting (US-based) prices that equate to under $20,000 per terabyte, TwinFin’s list price is a fraction of other competitors’ performance-system pricing (after they’re all done playing price-obfuscation games around mirror, swap and index storage).

 

TwinFin and the other new Netezza data and analytic appliance products give us the opportunity to continue to lead the market and provide our customers with the best value and performance possible for all of their data warehouse and analytic processing needs. Netezza TwinFin - because two fins are faster than one.

2 Comments Permalink
2

Change, but no Change

Posted by Phil Francisco Jul 31, 2009

Just trying to clarify. Curt Monash's informative blog on the coming Netezza system and family of products includes the following:

 

<snip>

 

Beyond the switcheroo in components, Netezza is making substantial changes to its hardware architecture. In current Netezza products, the FPGA plays the role of a disk controller on steroids — it receives data, does some SQL or other analytic operations on it, and then throws it over the wall to the CPU for the rest of the processing. The new Netezza product family, however, adds an actual disk controller. More important, it adds fast interconnects between the FPGAs, the disk controller, and RAM — specifically, as Phil Francisco put it in an email,

using multiple parallel channels of PCIe with much faster interconnection rates and lower contention between the blade server and the “DB accelerator card” with the FPGAs.

DMA (Direct Memory Access) technology also fits into the picture somehow.

 

<snip>

 

...which seems to beg further clarification.

 

While Curt suggests big changes are afoot in Netezza's “architecture” - I think a more appropriate viewpoint would be that it's “the same architecture with a new physical implementation”. That is, the concept of data streaming from disk through the system is just as important now as it ever was.

 

S-Blade Diagram.jpg

 

True, we did move the "disk controller" function to a pair of HBA (Host Bus Adapter) cards that interface with the disk enclosures using multiple, redundant SAS (Serial-Attached SCSI), and providing more than ample bandwidth to stream all the drives per rack continuously to the blades. For those who click-thru on Curt's blog, this function is embedded in the device labeled “SAS Expander Module” (one on both the blade server and the "DB accelerator") in the 3rd chart of the PDF file (and also shown above) and allows data to stream from disk through to memory and then on to the FPGA without delay.

 

SP Data Flow.jpg

 

To move data between the blade server and the DB accelerator cards, we use IBM's expansion card (formerly known as "sidecar") technology to provide multiple parallel high-speed PCIe (peripheral component interconnect express) channels delivering the data streams from the disk drives to the memory on each blade server and providing very high-speed interconnect between the FPGA devices and that same memory, using DMA (direct memory access) to effect high-speed memory access without encumbering the CPU to get at it.

 

FPGA Engines.jpg

 

With all this high-speed interconnectivity, Netezza has been able to alter the data flow so that data streams to the memory first and then to the various FAST engines (see above diagram and/or refer to Issue 16: The Latest Addition to Netezza's FAST Engines Framework) in the FPGA. Those engines act as a "turbocharger" for query processing, implementing data decompression, restricting, projecting and applying the appropriate visibility rules in a pipelined process; typically filtering out well over 95% of the data scanned. From the FPGA, the resulting reduced data set is passed on to the CPU memory for additional processing to complete the process.

 

So, the logical streaming model of data from from disk to FPGA to CPU is retained, with significantly higher throughput as a result. But there's an added benefit: the fact that the originally-scanned data can remain in memory, still in compressed & unfiltered form, to be used as a cache avoiding disk scan activity where possible and helping boost system performance even more. In short, "Change, but no Change."

 

I hope that helps - with Curt's architecture viewpoint as well as with questions about our use of PCIe interconnects to raise performance.

2 Comments Permalink
0

 

"Don't be afraid to try the greatest sport around

(catch a wave, catch a wave)
Everybody tries it once
Those who don't just have to put it down
You paddle out turn around and raise
And baby that's all there is to the coastline craze
You gotta catch a wave and you're sittin' on top of the world"
– from "Catch a Wave" by The Beach Boys (1963)

Surf's up! Summer seems to finally have arrived in the Boston area and a number of vendors in the data warehousing and analytics space are hoping to catch a wave riding on a flurry of industry announcements. A few trends continue to build in the news:

 

  1. Data sizes continue to grow alongside the pressure to increase performance & shrink data latencies;
  2. Workload complexity and user counts continue to grow;
  3. More and more, customers are seeing the value of running advanced analytical processing directly in their primary data repository (see item #1 for reasons why); and
  4. Industry prices for data warehousing and analytics have begun another shift downward.


Today I'd like to address this last point. According to more than one industry analyst, over the last several years, Netezza has served as "the benchmark" for DWA pricing in the industry. Several of our competitors have sought to match and/or undercut Netezza pricing in the market. Some of the incumbent players have tried to, with very limited success, hinge their pricing off Netezza prices, match the performance of the Netezza Performance Server® system, or inoculate their pricey "flagship" products by adding less-expensive, feature-deficient products to their portfolio. But Netezza has continued to succeed in the marketplace, becoming a profitable, publicly-traded company with nearly 300 customers and 400 employees worldwide and one that is listed among the "Leaders" in the Gartner Magic Quadrant.

 

When we disrupted the data warehousing market with our first generation product in 2003 and 2004, Netezza was one of very few startups in an otherwise moribund industry. Now, with established "street cred" and hundreds of loyal customers, we intend to once again upset our competitors and lead the market in pivoting to a new competitive price-performance level. We're about to launch the fourth generation platform of our data warehouse and analytic appliances, which will advance Netezza's performance leadership and once again establish a new price-performance benchmark.

 

Admittedly, we won't be the first vendor offering high-performance data warehouse systems to move to a lower pricing plateau. That task is usually done by early-stage start-ups looking to find a way to differentiate themselves. True to form, Dataupia probably can claim establishing a lower price point first and recently another multiyear "start-up" has also started lower. But those are offerings from very modestly-sized startups with no established market "track record". Netezza will be the first company with proven product maturity, customer base and financial viability to do so.

 

Just how and what are we doing to cause this disruption? Well, let's just say things around the "briefing table" have been quite hectic, and that I and others will have more news about that to follow shortly.

 

[As you might imagine, it's been getting more and more difficult to keep things under wraps – in recent weeks we've even had to fight people off from getting early "sneak peeks". ]

 

Until then hey, it's summertime! So here's what I'd recommend –

 

"So take a lesson from a top-notch surfer boy

(catch a wave, catch a wave)
Get yourself a big board
But don't you treat it like a toy
Just get away from the shady turf
And baby go catch some rays on the sunny surf
And when you catch a wave you'll be sittin' on top of the world


Catch a wave and you'll be sittin' on top of the world"

 

 

Twin Fin: A short board (usually 5'8" - 6'8") with a wide tail for maneuverability and a fin near each rail for stability in radical turns.

 

Purpose: A wider tail area provides more planing area and lift, which creates more speed by efficiently utilizing wave energy. Milking speed and energy from smart surf with extremely sensitive and responsive turning ability are this design's strong points

0 Comments Permalink
Bookmark and Share

Actions