[ www.netezza.com ]

Thinking Inside the Box

3 Posts tagged with the dbms2 tag
1

News broke on Tuesday that EMC plans to acquire Greenplum to focus on data warehousing and analytics on “big data”. The idea is that by doing so, EMC is officially throwing its hat into the competitive ring for the ‘Data Warehouse Appliance’ (DWA) market – something of a defensive mechanism now that virtually all of the major data warehouse vendors are now selling their own versions of a DWA – and consequently greatly reducing sales pull-through of EMC storage for data warehouse deployments.

Some referred to the merger as “
a good fit for a storage vendor with appliance-y ideas” and others hailed it as follows, “the market has shifted as of late moving toward integrated appliances and this move gives EMC a very important arrow in its quiver” and labeled Greenplum as a purveyor of “very high performance database systems”.

One can also reasonably assume that this acquisition not only is intended to shore up a product offering weakness, but that it is also destined for affiliation with EMC’s other major initiative announced earlier this year – the
Acadia Virtual Computing Environment (VCE) Joint Venture with Cisco Systems and headed up by Michael Capellas. The Acadia JV includes EMC’s storage and its VMWare virtualization software as well as Cisco Systems’ compute nodes and networking. VCE is built on the concept of modular building blocks, called vblocks that marry computing horsepower to storage capacity. All that’s missing from that story is a data warehouse DBMS to make it a full-on data warehouse appliance, right?

There are two big problems with these assumptions…


Performance: For all the discussion about “scale” and  “big data” in the EMC announcement, there is no mention of how either party can address the real issues that mainstream enterprises face every single day with their data warehouse systems – how to get maximum performance out of a complex, highly concurrent operational environment where hundreds if not thousands of users are banging away on the system, night and day.

  • The fact is that the actual Greenplum target market has clearly NOT been one that focused on high-performance analytics over the past several years. Instead, the few wins publicly announced by the company have been for very high capacity, limited compute platforms – applications more commonly referred to as “queryable archive”.
  • Curt Monash today again mentioned Greenplum’s lack of support for the “high-concurrency” requirements of a mainstream data warehouse.
  • This looks much more like adding a very basic set of storage-centric data warehousing capabilities in a move to find a broader channel for EMC’s traditional storage products rather than any strategic move into the world of high performance data analytics. Further to this point, neither company has done much of anything to address a very strong trend in the mainstream data warehouse market – the marriage of advanced, predictive analytics into the busy data warehouse systems.
  • David Vellante confirmed that to be successful the EMC/Greenplum marriage will need to yield, “optimized sytems[sic]; smokin’ fast performance; reference architectures; scale;” and “federation capabilities; not just big honking systems.” We couldn’t agree more but one can’t help but notice that neither Greenplum nor EMC have brought any of those characteristics to market for data warehousing to date.


Appliances: Since the acquisition is fairly transparent in its defense against moves by the likes of Oracle, Teradata and IBM (as well as Netezza seven years ago) to the appliance model, it’s hard to see how either EMC or Greenplum are effectively equipped now to do battle against those established players.

  • EMC have never really “sold” data warehousing to anyone previously and Greenplum have nearly prided themselves in going after “Greenfield” high capacity applications rather than head-to-head competition vs. established players. And one need look no further than the limited market penetration of H-P’s NeoView to understand that it takes more than simply deep pockets to succeed in the data warehousing market.
  • Greenplum is not a purveyor of “integrated appliances” and at best, they can hope to infuse in EMC the ability to make their joint product offering a little more of an “appliance-y idea” (hat tip to Dr. Monash for coining the term) to the market. Instead, Greenplum have fashioned themselves over the past several years as a software only solution.
  • Assume that the Acadia VCE and “vblock” application is a big piece of this strategy. Neither Cisco nor EMC would claim that their servers, networking or storage arrays offer the lowest price-per-bit or price-per-performance alternative in the market. So one needs to think about what that means in terms of the price-performance competitiveness of this new “appliance-y” joint product.


In short, Greenplum joins the pantheon of “interesting” acquisitions for EMC as it will certainly stir some news cycles and drive some analysts and bloggers to create “fresh, new” content; but it’s not really something that I think will register on the Richter scale of customer market share.

1 Comments Permalink
3

Netezza Migrator.jpg

It may have been the result of a misunderstanding or a comment heard out of context. But whatever the background for the commentary, let me simply state that Netezza is completely committed to the success of the Netezza Migrator and all the other Netezza products and functionality launched at Enzee Universe 2010 this past week. Migrator eliminates a potential barrier to TwinFin™ adoption (i.e., migration costs) and logically should lead to easier acceptance and broader system sales for Netezza. Furthermore, our partnership with EnterpriseDB at both the corporate and technical levels has been and remains extremely solid and strong.

As I stated in the
announcement of the product, “The Netezza Migrator product allows organizations to make data warehouse migration decisions independent of proprietary software lock-in. Organizations using data integration and BI applications with embedded Oracle-proprietary database constructs, interfaces and utilities can now more easily manage their migration from Oracle to a TwinFin appliance. The Netezza Migrator will allow our customers to achieve the performance, scale and cost advantages of their TwinFin systems while maintaining their prior investment in proprietary software.” The Netezza Migrator is specifically designed to reduce the time, complexity and costs required of our customers to move their IT applications to the Netezza TwinFin platform.

With Migrator, Netezza’s customers will be able to extract themselves from the dreaded “Oracle lock-in” of functions and procedures written using Oracle-proprietary techniques and they can decide which of their applications to migrate directly to Netezza and just when, at their own pace. Its capabilities go well beyond the extremely limited capabilities provided by Oracle’s own ‘Database Gateway for ODBC’. Migrator provides an Oracle compatible wrapper around Netezza that is optimized in ways that Oracle could never hope, nor deign, to provide with its "Heterogeneous Services" functionality: including support for Netezza syntax pushdown, high speed API, and Netezza user defined functions.

Migrator makes it even easier for Netezza’s customers to move all of their data warehouse from Oracle to Netezza. In short, this is something we feel is extremely valuable for Netezza and particularly “liberating” for our customers.

3 Comments Permalink
0

A loyal customer alerted us toan Oracle blog by Jean-Pierre Dijcks earlier today that showed the Oracle FUD machine is fully revved-up and ready to go. I'd like to offer a rebuttal, however in the interest of not intruding on Jean-Pierre's entry with an overly-long comment, I've just put a short response on his blog post with a pointer to this one.


Misconceptions and Misunderstandings, or Errors and Plain-old FUD?

I’m writing to correct *just a few* of the misconceptions about what is really important in high-performance, scalable data warehouse systems, errors, or just plain-old pure “competitive FUD” points from Jean-Pierre's posting earlier today. We certainly have posted some information recently about the TwinFin product and Curt Monash’s postings late Thursday provided more info. If his readers are interested in learning more, or even signing up for a “Test Drive”, they should visit www.netezza.com.

First off, I think this is a “banner day” for Netezza. We believe that TwinFin (and the other products in the new product family)
extend both our performance and price-performance advantage over our competitors. We stand by our marketing statements that we regularly demonstrate 10-100X performance advantages over our competitors, particularly competitive offerings of the major incumbent DW system vendors (“Just who are those incumbents?” Jean-Pierre's readers may ask. Well let’s just say that we see Oracle as the incumbent system and/or a challenger system in over 50% of our deal flow.).

Regarding his claims about DBM being “
faster than Netezza” (and I can only assume he meant at “real” data warehouse tasks) - we’re ready whenever Oracle feels up to actually taking one of their Database Machines onsite to a customer for a fair, open customer benchmark. So far, Oracle have been, shall we say, “a little reticent” to do on-site benchmark testing against Netezza.

Next, given the large number of incorrect points in the original posting, I think perhaps that just a few of them will be useful enough for readers to get the gist of just how far afield some of the ‘facts’ are:

  • It all comes down to data scan rates per rack”: Would that it were true that all of data warehousing boiled down to full-stream data scans (as if the entire world of analytics relied on “select count(*) from lineitem” types of queries), then we could all measure “goodness” on how many GB/sec of data could be burst-scanned in our systems. But that’s not the case. So we build Netezza’s data and analytic appliances to deliver the best possible overall performance at the best price and power requirements. As a consequence, and following from those same numbers as-posted, a single rack of TwinFin can process (not just scan) about 400 million rows of data per second. That’s process, as in: “scan, decompress, project, restrict, AND join, etc.”. Need more processing firepower? Netezza’s system performance scales linearly with the addition of more S-Blades: at the low-end, the TwinFin 3 can deliver as much as 100M rows/second of processing horsepower, while the TwinFin 120 can provide you with 4 billion rows/second.  Does a system that still relies on using SMP-based servers running “plain old” Oracle 11g RAC scale similarly for data warehousing?


  • Non-open Linux running on FPGAs”: I’m really not sure what (if anything) was meant by this, but saying that Netezza’s FPGAs “are apparently running non-open Linux” is oxymoronic on at least two different levels (FPGAs don’t typically “run” an OS and, “non-open Linux” - really?)


  • User data & compresssion”: I also enjoyed the accounting of all that “user data” available to DBM users in the Oracle table and the various comments about compression. When Netezza quotes user data capacities in our systems, the numbers reflect real raw user data space, not space that will be further reduced because of required indexes in an attempt to boost performance. Furthermore, Netezza’s compression & decompression techniques allow us to extract “pure performance” from their use. By not relying on CPU cycles to decompress the data before we can process it any further, the FPGA engines decompress the data, on-the-fly, as fast as it streams off the disk drives. Can Oracle make either of those claims?


  • Tolerating node failures without downtime”: In perhaps the most bald-faced inaccuracy, the Oracle blog claimed, that Netezza “continues to lack the ability to tolerate node failures without downtime”. This I can only chock up to pure competitive “FUD-ism” as our capabilities in this area have been quite strong throughout the four generations of Netezza appliances and are further strengthened in TwinFin. Netezza is a fully-redundant system with no single point of failure, even in our smallest systems. Failover in the presence of failures of the disk drives, S-Blades, internal networking or host processors (in short, everything) is automatic and done in-service, with hot-swappable replacement throughout.


  • Appliance simplicity”: One thing Jean-Pierre didn’t address that might have been humorous to see his take on is the notion of “appliance simplicity” - basically the ability to build, support and maintain large to very large-sized data warehouses, with heavy workloads, with no or minimal tuning, partitioning, indexing or other “performance duct tape” required. Routinely, this capability in the Netezza systems is what delights our customers most and we have customers managing systems with several hundreds of terabytes of user data (not indexes + data, mind you - real data) with fractions of an FTE (full-time employee) devoted to them.


I hope that clears up some of the misconceptions. If any of Jean-Pierre's readers or Oracle customers would like to see or hear more about TwinFin for themselves, we definitely would invite them to come stop by our booth (#207) at
TDWI or come to one or our regional Enzee Universe events coming to a location near you.

0 Comments Permalink
Bookmark and Share

Actions