[ www.netezza.com ]

Thinking Inside the Box

3 Posts tagged with the teradata tag
4

Today Netezza is launching a new eBook entitled, “Oracle Exadata and Netezza TwinFin™ Compared”. As the name implies, this eBook provides a comparison of the Netezza TwinFin data warehouse appliance and Oracle’s “appliance-like” database machine offering.ebook_tfexam_thumb.jpg

 

Certainly Netezza is not the first company to compare/contrast its flagship system with Oracle’s most recent entry. Richard Burns, a consultant over at Teradata did a laudable job exposing the technical shortcomings of the Exadata v2 machine as they pertain to data warehousing in a May 2010 whitepaper. And there have been several recent pieces written on Oracle’s apparent success although the publicly named customer-list has struck some as a bit underwhelming.

 

Netezza continues to compete (and win) against Oracle regularly in the marketplace, including in competition with the Exadata v2 product and so, we felt it was high time to put our own comparison story together with today’s eBook and with this little blog posting. Let me know what you think.

 

So where to begin? Let’s start with the fact that the Netezza TwinFin is built to excel at a specific purpose – as the best price/performance platform for Data Warehousing and Analytics in the market. Conversely, Oracle has tried to “kill two birds with one stone” in the Exadata v2 – aiming it primarily at the On-Line Transaction Processing applications space, but also making bold claims to performance as a Data Warehouse with it’s Sun-based Oracle Database Machine (DBM) and Exadata Storage Server, version 2 (Exadata).

 

So why does it matter that Oracle is aiming to do both OLTP and DW in the same system – apart, that is, from at least two decades of people trying-and-failing to do exactly that with the likes of Oracle in previous software and hardware instantiations? Let’s start with the workload requirements of the two application areas:

  • OLTP systems execute many short transactions, typically of extremely small scope (touching only a handful of records) and in extremely predictable, well-understood access and query patterns. They need to excel at handling these small transactions in very high volume, combined with equally small writes to the database in the form of updates, insertions and deletions. This limited scope, high throughput and “regularity” of the access patterns make OLTP systems great candidates for intelligent caching and (multiple) secondary data structures, such as indices to speed their processing.

 

  • Conversely, DW systems are typically asked to perform “read-heavy” queries and operations against the current and deep historical data sets. Rather than analyzing just a few records, a DW query might look at millions, even billions, of rows from a single table, combined with join logic with multiple other tables. Data warehouse systems are used by company analysts and managers to find the “needle in the haystack” in guiding enterprise decision-making in a more comprehensive and often ad-hoc manner – frequently mitigating the ability to use “tricks of the trade” such as results caching and/or indices.

 

So the two applications tend to lead to very different system/platform implications. No special “news” there – as I said earlier, people have been trying-and-failing to use a single system for both applications for years.

 

Without stealing any more of the thunder of our electronic publication today, let me just lay out what I believe are the fundamental differences between Netezza’s TwinFin and the Oracle Database Machine/Exadata as simply and plainly as I can:

 

Netezza TwinFinOracle Database Machine / Exadata v2
True MPPHybrid "SMP-plus" Approach
Data Streaming with a Hardware AssistCPU-intensive Processing for Basic DB Operations
Deep Analytics ProcessingCentral Cluster-based Approach
No-Tuning-Required SimplicityComplex Array of Knobs and Levers

 

In my view, these are "big deal" differences. They're not the result of a simple feature gap to be closed in an upcoming point-release, but rather go directly to limitations at the heart of the Oracle DBM/Exadata system architecture and/or business culture. To address them would require a major rearchitecting, or at least refactoring, of Oracle's decades-old DBMS code base. They also happen to be highly visible to customers and prospects, which makes for some interesting comparisons in head-to-head on-site Proofs of Concept (POCs).

 

1) True MPP vs. a Hybrid "SMP-plus" Approach

Netezza’s TwinFin uses a full MPP approach to data warehousing, pushing all of the processing down as close as possible to where the data is stored and maximizing the processing horsepower of MPP for scalability, throughput and performance – for even the most complex workloads. Using the MPP method of dividing the workload and attacking query problems in parallel, Netezza has been able to demonstrate market-leading data warehouse price-performance across four generations of data warehouse appliances.

 

Oracle’s DBM/Exadata takes a hybrid approach adding Exadata Storage nodes largely to handle data decompression and predicate filtering tasks, but still relying primarily on the SMP cluster of Oracle RAC to handle most of the data warehouse tasks, including complex joins. In addition the SMP cluster also must act as the central distribution point for any data that needs to be redistributed between and across Exadata nodes. To try to minimize this, Oracle and Sun’s solution was to “throw hardware at the problem” (quoting Teradata’s Mr. Burns), over-engineering interconnections, processor rates and other elements required because of all of this data movement, rather than refactoring and solving a fundamental software architecture issue.

 

The difference between the two is akin to an 8-lane continuous streaming superhighway in the TwinFin instance versus multiple freeways converging on and necking down to a two-lane country road via a “traffic roundabout”. I live in Massachusetts and can attest to the negative impact of taking multiple highways down to a single road – it happens every weekend at the gateway to and from Route 6 on Cape Cod.

 

2) Data Streaming with a Hardware Assist vs. CPU-intensive Work for Basic DB Operations

In addition to the advantages of the MPP architecture for data warehousing, the TwinFin system makes use of hardware acceleration for increased query and analytics performance. Coming in the form of the "DB Accelerator" that is part of each S-Blade in the TwinFin system architecture, providing four dual-core Field-Programmable Gate Arrays (FPGAs) on each DB Accelerator, this hardware acceleration takes care of fundamental processing steps such as decompression, predicate filtering and ACID-compliant data visibility at the full scan rate of the data from disk. The fact that this device is placed as close as it is to the disks for which it is performing its processing gives the TwinFin system much more performance leverage because data can be filtered, processed and value-added before undergoing any unnecessary CPU processing or having to be transported across an expensive network.

 

And the fact that it is a field programmable device means that Netezza can use it to introduce additional features and performance through a simple upgrade to our NPS software/firmware – as Netezza has with the introduction of two phases of hybrid column/row-level compression technology (with Release 6.0, scaling as high as 32:1 compression, depending on data patterns) first introduced in 2005, and our high-performance implementation of row-level security. Because it's performed in the FPGA in TwinFin, "Compression = Performance"; so if a customer's data is compressed by a 4:1 factor, the effective data streaming rate for processing queries is increased four-fold.

 

Conversely, the DBM/Exadata system relies entirely on CPU processing. In fact, the great majority of the functionality provided for by the Exadata nodes in the DBM/Exadata system is to replicate the functionality included in each FPGA core of the TwinFin - data decompression and predicate filtering. Because of the CPU-intensive nature of decompressing data in the DBM/Exadata system, Oracle "strongly suggests" lesser compression when data is required for high-performance data warehousing vs. "cooler" queryable archive purposes. Again, the heavy-lifting for query processing and analytics is left to the central SMP cluster nodes rather than parallel Exadata nodes, forcing Oracle to "throw hardware at the problem".

 

3) Deep Analytics Processing vs. Central Cluster Analytics

Netezza brings analytics to where the data is stored – as close as possible to where it is stored to do the processing – not just to decompress it and do predicate filtering, but to complete as much of the complex analytics as is possible, in parallel. That’s as true of the “traditional” OLAP analytics of SQL-based data warehousing as it is of the advanced and predictive analytics enabled by the new capabilities of i-Class in the “Second Wave of TwinFin”.

 

With i-Class, Netezza introduces a comprehensive, scalable and high-performance approach to advanced analytics for both our customers and partners, spanning Linear Algebra/Matrix manipulation, and engines for R and Hadoop along with several programming languages including C, C++, Java, Python and even Fortran. The i-Class functionality also offers plug-ins and packages for the Eclipse IDE and R GUI, and pre-built, analytic functions engineered to deliver performance at scale spanning data preparation, mining, predictive analytics and spatial functions together with access to analytics functions from the GNU Scientific Library and R CRAN repository. Extended by the i-Class embedded analytics capabilities, TwinFin allows our partners and customers to push-down applications, functions and algorithms going well beyond standard set-based SQL, at scale with high performance, freeing them of the latency and sampling requirements demanded by off-board processing platforms for advanced analytics.

 

The Oracle DBM/Exadata performs the majority of the OLAP analytics in the central cluster (RAC) nodes, after traversing the "traffic roundabout". And apart from basic scoring functionality, virtually ALL of the advanced analytics are performed in the cluster nodes as well. Placing the predominance of processing in the central SMP cluster means that both the functionality and scale of the analytics are limited by the capacity and performance that the SMP cluster can provide - typically limited to the elements included in Oracle's own "Data Mining" package.

 

The DBM/Exadata’s requirement for shipping the data from the storage arrays to the central cluster for analytics is akin to backhauling full massive truckloads of materials from a mining site to pick out the gold at a central headquarters rather than sifting out the most important nuggets in parallel and sending only those valuable elements back in the case if TwinFin.

 

4) No-Tuning-Required Simplicity vs. a Complex Array of Knobs and Levers

For a long time, the simplicity of the Netezza data warehouse appliance has shone through most strongly in the extremely limited tuning requirements it imposes on administrators of the system, particularly as compared to Oracle-based systems. Simplifying the system management is core to Netezza’s “appliantization” of the data warehouse and analytics platform. Rather than managing a “coordinated collection” of technology assets, the system and database administrators of TwinFin interact with a single appliance and use the redundant Linux-based SMP host nodes as the interaction point for all activities. Everything from database configuration, data distribution, data mirroring, monitoring, software upgrade and day-to-day management are simplified (in the words of one TwinFin customer, “It’s Netezza-easy – it just works.”).

 

No indexing is necessary (or even supported) in TwinFin to achieve high performance. Just about the only requisite “tuning” of the system is the definition of the distribution key for spreading data across all the S-Blades – typically the primary keys of the tables. Even in the internal management structure of TwinFin, our system management has been configured to get the maximum performance from the commodity subsystems (blades, chassis, disk arrays and network) by connecting them in novel ways and then managing them at a system level, rather than at the subsystem or rack-level.

 

While it is true that Oracle has simplified some of the tuning knobs and levers in the DBM/Exadata, prospective customers should ask them if they really have moved into the domain of requiring only a small handful of tuning knobs & settings; or whether they still require, or more colloquially, “strongly suggest” the use of dozens or even hundreds of settings (depending upon the number of objects being maintained and optimized). How many dozens of IP addresses are needed to configure and manage the DBM/Exadata (TwinFin requires only two)? Oracle even have a special service to help DBM/Exadata customers migrate and tune their systems and databases for performance and some of their leading Performance Architects even talk about the requirement of using functions like the Oracle SQL Tuning Advisor as an inevitable fait accompli.

 

By Oracle’s own admission, the time-savings that customers can expect to achieve in managing and tuning the DBM/Exadata system in Oracle 11g r2 is only 26% less than in Oracle 11g. Contrast that with installation after installation of Netezza appliances where 100s of terabytes of data under management in a data warehouse(s) are being maintained by two or even less then one FTE, rather than a team of Oracle specialists. It all depends on one’s perspective and philosophy in building a real appliance for the data warehouse market. Where others may see the need to tune, partition, index and sub-index data sets for performance purposes as an inevitability, Netezza sees that same need as reason to enhance TwinFin’s capabilities in order to obviate it.

 

All of this really adds up quickly to a significant price-performance advantage for customers of TwinFin – and with our limited tuning and simplified operations, also translates into much more rapid time-to-value for Netezza’s customers, too. So that’s it – four simple fundamental differences that really set the TwinFin appliance apart from the DBM/Exadata. Agree? Disagree? Let me know what you’re thinking. And now, go over and have a look at today’s eBook release for the rest of the story.

4 Comments Permalink
1

News broke on Tuesday that EMC plans to acquire Greenplum to focus on data warehousing and analytics on “big data”. The idea is that by doing so, EMC is officially throwing its hat into the competitive ring for the ‘Data Warehouse Appliance’ (DWA) market – something of a defensive mechanism now that virtually all of the major data warehouse vendors are now selling their own versions of a DWA – and consequently greatly reducing sales pull-through of EMC storage for data warehouse deployments.

Some referred to the merger as “
a good fit for a storage vendor with appliance-y ideas” and others hailed it as follows, “the market has shifted as of late moving toward integrated appliances and this move gives EMC a very important arrow in its quiver” and labeled Greenplum as a purveyor of “very high performance database systems”.

One can also reasonably assume that this acquisition not only is intended to shore up a product offering weakness, but that it is also destined for affiliation with EMC’s other major initiative announced earlier this year – the
Acadia Virtual Computing Environment (VCE) Joint Venture with Cisco Systems and headed up by Michael Capellas. The Acadia JV includes EMC’s storage and its VMWare virtualization software as well as Cisco Systems’ compute nodes and networking. VCE is built on the concept of modular building blocks, called vblocks that marry computing horsepower to storage capacity. All that’s missing from that story is a data warehouse DBMS to make it a full-on data warehouse appliance, right?

There are two big problems with these assumptions…


Performance: For all the discussion about “scale” and  “big data” in the EMC announcement, there is no mention of how either party can address the real issues that mainstream enterprises face every single day with their data warehouse systems – how to get maximum performance out of a complex, highly concurrent operational environment where hundreds if not thousands of users are banging away on the system, night and day.

  • The fact is that the actual Greenplum target market has clearly NOT been one that focused on high-performance analytics over the past several years. Instead, the few wins publicly announced by the company have been for very high capacity, limited compute platforms – applications more commonly referred to as “queryable archive”.
  • Curt Monash today again mentioned Greenplum’s lack of support for the “high-concurrency” requirements of a mainstream data warehouse.
  • This looks much more like adding a very basic set of storage-centric data warehousing capabilities in a move to find a broader channel for EMC’s traditional storage products rather than any strategic move into the world of high performance data analytics. Further to this point, neither company has done much of anything to address a very strong trend in the mainstream data warehouse market – the marriage of advanced, predictive analytics into the busy data warehouse systems.
  • David Vellante confirmed that to be successful the EMC/Greenplum marriage will need to yield, “optimized sytems[sic]; smokin’ fast performance; reference architectures; scale;” and “federation capabilities; not just big honking systems.” We couldn’t agree more but one can’t help but notice that neither Greenplum nor EMC have brought any of those characteristics to market for data warehousing to date.


Appliances: Since the acquisition is fairly transparent in its defense against moves by the likes of Oracle, Teradata and IBM (as well as Netezza seven years ago) to the appliance model, it’s hard to see how either EMC or Greenplum are effectively equipped now to do battle against those established players.

  • EMC have never really “sold” data warehousing to anyone previously and Greenplum have nearly prided themselves in going after “Greenfield” high capacity applications rather than head-to-head competition vs. established players. And one need look no further than the limited market penetration of H-P’s NeoView to understand that it takes more than simply deep pockets to succeed in the data warehousing market.
  • Greenplum is not a purveyor of “integrated appliances” and at best, they can hope to infuse in EMC the ability to make their joint product offering a little more of an “appliance-y idea” (hat tip to Dr. Monash for coining the term) to the market. Instead, Greenplum have fashioned themselves over the past several years as a software only solution.
  • Assume that the Acadia VCE and “vblock” application is a big piece of this strategy. Neither Cisco nor EMC would claim that their servers, networking or storage arrays offer the lowest price-per-bit or price-per-performance alternative in the market. So one needs to think about what that means in terms of the price-performance competitiveness of this new “appliance-y” joint product.


In short, Greenplum joins the pantheon of “interesting” acquisitions for EMC as it will certainly stir some news cycles and drive some analysts and bloggers to create “fresh, new” content; but it’s not really something that I think will register on the Richter scale of customer market share.

1 Comments Permalink
2

 


"You stay classy, San Diego." -- Ron Burgundy (Will Ferrell) in "Anchorman" (2004)Will Ferrell Anchorman.gif


This morning a few others from the Netezza Marketing and Product Management teams and I are ensconced by the Marina in sunny San Diego, CA for the TDWI World Conference and for an news announcement or two. And who better to bring us "Breaking News!" than the Number 1 newsman in all of San Diego, Ron Burgundy. [For those of you who might have been "hoping for more" from Ron in a quote about San Diego, you can check out the IMDB database for some great ones, including Ron's own historical (and hysterical) etymology for the city's name.]


BANNER_TwinFin_3.gif

 

Though it’s not exactly a state-secret at this point, today we’re launching the 4th generation of Netezza data warehouse and analytic appliances and the first of four initial product lines in it: TwinFin™.

 

TwinFin logo name.jpg

Some of the core characteristics of the TwinFin and the overall platform are:

  • Resetting Netezza’s price-performance leadership position in the market and extending Netezza’s performance lead;
  • Disrupting the competitive data warehouse market among the incumbents, just as we did with our initial systems in 2003/’04;
  • Moving to a commercially-available, blade-based server and storage platform; and
  • Opening Netezza’s aperture on the broader market with a multi-product platform design to match customers’ data warehouse and analytics needs across their enterprise


After the market disruption Netezza caused with the introduction of the NPS® in 2003 and since, we have seen the entry of dozens of new startups in our wake and virtually every major incumbent data warehouse vendor has retooled its portfolio to include a “response” to the Data Warehouse Appliance (DWA) in a suddenly reenergized market. Several of them, to their credit, have advanced their value propositions and improved their competitive position.


TwinFin Board Image.gifNow it is Netezza’s time once again. With the introduction of TwinFin and the other members of the new family of products, Netezza is once again changing the game; widening the applicability of our systems to more types of customers, applications and partners in the market.

As stated in
my response to Curt Monash, my response to Curt Monash last week, we think of this 4th generation of the Netezza appliance as using “the same architecture with a new physical implementation”. Starting with TwinFin, we moved to a commodity blade-server based system framework, but one that still uses Netezza’s “secret sauce” to deliver as much as a 5X increase in performance over the previous generation of Netezza systems, namely:

· our balanced design and streaming architecture;

· the use of Field Programmable Gate Array (FPGA) technology as a query processing “turbocharger”; and

· our advanced MPP management and optimization software.

 

And there are more innovations and performance gains on the way! TwinFin, quite simply, will serve as a platform for expanding Netezza’s performance and price-performance advantage in the industry and as the basis for advancing the state-of-the-art for in-database, analytically intensive data processing; all without sacrificing any of the appliance simplicity with which our company is synonymous.

As
a couple of us said last week, Netezza has served as “the benchmark” for high-performance DWA pricing in the industry and we are now leading “the market in pivoting to a new competitive price-performance level”. With these new systems, we have embraced a trend that has been happening around the industry – the movement of marginal cost of a bit of disk storage toward $0 – with system-sizing, pricing and even system numbering focused on the performance delivered by a given platform.

 

We think the net effect of the new, simplified pricing structure for TwinFin and the other members of the Netezza product family will create a major disruption in the market. With starting (US-based) prices that equate to under $20,000 per terabyte, TwinFin’s list price is a fraction of other competitors’ performance-system pricing (after they’re all done playing price-obfuscation games around mirror, swap and index storage).

 

TwinFin and the other new Netezza data and analytic appliance products give us the opportunity to continue to lead the market and provide our customers with the best value and performance possible for all of their data warehouse and analytic processing needs. Netezza TwinFin - because two fins are faster than one.

2 Comments Permalink
Bookmark and Share

Actions