[ www.netezza.com ]

Thinking Inside the Box

9 Posts tagged with the netezza tag
0

      

In a recent blog, Greg Rahn of Oracle responded to Phil’s “Oracle Exadata and Netezza TwinFin Compared” eBook; before commenting on an Oracle engineer’s views, I’ll restate the eBook’s larger themes.

 

Exadata connects Oracle’s RAC database, its architecture designed for online transaction processing (OLTP), via a fast network to a massively parallel processing storage tier. As an OLTP database paired with a specialized storage subsystem, tuning Exadata to function as a data warehouse is complicated and demands skilled, highly trained, experienced technical staff. Mitigating the shortcoming of an OLTP database pressed into service as an analytic database with expensive network and storage makes Exadata costly: to acquire; to design, tune and maintain as an optimally-configured data warehouse; to run in the data center.

 

Netezza TwinFin, designed as an analytic database, brings the power of massively parallel processing to manage and exploit data at terabyte-to-petabyte scale. TwinFin is an appliance–easy to install, easy to operate and easy to manage. TwinFin offers value: fast performance for advanced analytics at an affordable price.

 

Now I’ll discuss the detail of Greg’s blog and respond from a Netezza perspective.

 

Claim: Exadata Smart Scan does not work with index-organized tables or clustered tables.

 

Greg responds that “IOTs and clustered tables are both structures optimized for fast primary key access, like the type of access in OLTP workloads, not data warehousing” and suggests our intent was to mislead by quoting from an old Oracle datasheet. It wasn’t. Oracle 11g Release 2 documentation reads “Index-organized tables are suitable for modeling application-specific index structures. For example, content-based information retrieval applications containing text, image and audio data require inverted indexes that can be effectively modeled using index-organized tables.” Elsewhere the documentation states “Index-organized tables are useful when related pieces of data must be stored together or data must be physical stored in a specific order. This type of table is often used for information retrieval, spatial and OLAP applications.” In the eBook Phil discusses first and second generation data warehouses; many of the applications described by Oracle as candidates for IOTs are typical of those our customers run on TwinFin – these are second generation data warehouse applications. Greg believes Exadata smart scan not working with index-organized tables has zero impact on Exadata customers. Is it reasonable to conclude that Exadata is not being used for second generation data warehousing?

 

Claim: Exadata Smart Scan does not work with the TIMESTAMP datatype.

 

Since we published the first edition of the eBook Christian Antognini, the original source of this information, goes to the heart of the matter in his blog: “The essential thing to understand is that this limitation is due to bug 9682721. The fix is expected to be part of 11.2.0.2. According to my test cases (that Greg Rahn was so kind to execute against an early release of 11.2.0.2), offloading works correctly for all datetime functions but for the following three predicates.

 

  • months_between(d,sysdate) = 0
  • months_between(d,current_date) = 0
  • months_between(d,to_date(‘01-01-2010’,’DD-MM-YYYY’)) = 0”


Note that the MONTHS_BETWEEN function can basically be offloaded. The problem in these cases is that the offloading does not work when, for example, SYSDATE is used as a parameter.

While happy to let this one pass, I have a question. Do organizations accrue value or cost from a technology requiring its administrators understand all combinations of functions, their predicates and their parameters before they are capable of designing queries to be processed in parallel?

 

Claim: When transactions (insert, update, delete) are operating against the data warehouse concurrent with query activity, smart scans are disabled. Dirty buffers turn off smart scan.

 

In my opening comments I compared TwinFin’s simplicity to the complexity of Exadata. All queries submitted to TwinFin are processed in its massively parallel grid; no tuning, no special database design. This is appliance simplicity. In Exadata whether a query benefits from smart scans (massively parallel processing) can depend on the state of the data being read. Exadata requires developers to understand at great depth the physical path a query takes to access data. This is complexity.

 

While Greg concedes Exadata’s MPP processing is disabled for those blocks containing an active transaction he is confident that “Not having Smart Scan for small number of blocks will have a negligible impact on performance”. My experience with Netezza’s customers and their applications prompts me to take a more circumspect view. I’ll explain why in the next section.

 

Claim: Using [a shared-disk] architecture for a data warehouse platform raises concern that contention for the shared resource imposes limits on the amount of data the database can process and the number of queries it can run concurrently.

 

Greg argues contention for shared disk is not a problem for Exadata and cites Daniel Abadi’s blog in his defense. Let’s take a look at what Daniel says on this subject “If you are going to make an argument that shared-disk causes scalability problems, you have to make the argument that contention for the one shared resource in a shared-disk system is high enough to cause a performance bottleneck in the system - namely, you have to argue that the network connection between the servers and the shared-disk is a bottleneck.” This is the argument Phil makes in our eBook. Consider a query analyzing correlations between equity trades in a sector of a stock market. The algorithm calculates Spearman’s rank correlation coefficient (Spearman’s rho), measuring statistical dependence between two variables by assessing how well the relationship between them can be described. This analysis creates valuable insight in to whether specific equities influence behavior of other equities in the same market sector within a window of one to ten minutes.

 

The customer loads a massive volume of trading data into TwinFin and constantly trickle feeds data from live markets into the warehouse. The query is run and re-run constantly to assess behavior of different equities in dynamic markets. Each time TwinFin completes a Cartesian join between all the equities in the sector while at the same time calculating a Volume-Weighted Average Price and a Return From Previous Close value for the equity under investigation. The results pass to Spearman’s rank correlation coefficient function to calculate the Population Covariance and the standard deviation of every equity combination for the time period. Netezza executes every step of the query in parallel utilizing all TwinFin’s hardware and software resources. Netezza’s intelligent storage selects only the rows needed for that market sector and projecting only the columns needed for assessment. The join result is directly streamed to the code implementing the statistical analysis which TwinFin downloads to every processor in its MPP grid, running the complex calculations in parallel. Results from each node in the MPP grid are returned via the network to the host for final assembly and rendering back to the requesting application. TwinFin completes the analysis in a few minutes, and then runs it again, and again for as long as the market is open.

 

After several hours Oracle 10G was still attempting to complete its first round of analysis. What difference will a new version of the Oracle database paired with an MPP storage system and a fast network make? Exadata’s MPP storage grid is unable to process Cartesian joins, the first step of in this analytic process, meaning it brings no performance gain but must put all records on the network and send them across to Oracle RAC. Even if it we able to process the join Exadata cannot push down user defined functions, used to implement the calculations, to MPP - in Oracle functions always execute on the RAC servers. In processing the algorithms Oracle must create and manage temporary data sets and write these out of memory for storage. Exadata’s flash cache may play some role here, but the size of the data sets and the complexity of the algorithms will force database processes to write to disk. This flow from Oracle RAC is back across a network still clogged with coming from the MPP storage tier data, queued and unprocessed waiting for attention from a fully-consumed Oracle RAC. I contend that Exadata’s network connection between the servers and the shared-disk is a bottleneck. Not Exadata’s only bottleneck. TwinFin demonstrates how a true MPP architecture excels in calculating Spearman’s rank correlation coefficient - a real workload on a real dataset. Oracle’s OLTP database, simply not designed to process large-scale analytics, is overwhelmed. Exadata suffers contention on its network and in its database system’s shared disk architecture.

 

Back to the previous point about Exadata’s MPP processing being disabled for blocks containing an active transaction – the customer is constantly loading new market data and analyzing it in comparison with a massive volume of historic data. While entirely appropriate for transaction processing, Exadata’s architecture of disabling an entire block from parallel processing when a single record in the block is being updated can only hinder and never help in the data warehouse. The very point of a data warehouse is that all data should be available to the business as quickly as extract-transform-load processing allows. By pressing an OLTP database in to service as an analytical database Oracle unnecessarily burdens customers with creating database designs to work around this complexity and, developing a thorough understanding of how each query accesses the data model. While not having Smart Scan for small number of blocks may or may not impact performance, as an unnecessary complexity demanding the attention of database specialists, it costs customers real money.

 

Claim: Analytical queries, such as “find all shopping baskets sold last month in Washington State, Oregon and California containing product X with product Y and with a total value more than $35” must retrieve much larger data sets, all of which must be moved from storage to database.

 

Greg shows some nice SQL to demonstrate how Exadata processes the beer and pizza query. Give the business an answer and they always come back with a new question: “Greg, what was the total value of Brand #42 beer’ sold in each basket?” Greg can now update his SQL with the clause:

 

sum(case when p.product_description in ('Brand #42 beer') then td.sales_dollar_amt else 0 end) sum_productX,

 

and re-run the query. Business users love IT when we give them a fast performing system but are less forgiving when a query, that yesterday ran blazingly fast, today slows to a snail’s pace. Exadata cannot push down the newly introduced sum for parallel processing by its storage nodes as the join must be processed first, and the storage nodes cannot process joins. Any function or calculation that uses columns from two or more tables must be evaluated on the RAC database servers. The query performance is going to degrade significantly sending the database expert back to the Oracle documentation in an attempt to find a new way to resolve the amended query so it completes at a time acceptable to the business.

 

Claim: To evenly distribute data across Exadata’s grid of storage servers requires administrators trained and experienced in designing, managing and maintaining complex partitions, files, tablespaces, indices, tables and block/extent sizes.

 

While conceding Oracle Automatic Storage Management automates the task of striping partitions across all available disks, the ASM administration team must still create partitions, configure and manage disk groups for shared storage across instances, choose and implement either 2-way mirroring or 3-way mirroring, and configure Allocation Unit sizes. Additionally, Exadata configuration requires administrators create and manage tablespaces, index spaces, temp spaces, logs and extents.

 

In conclusion, Netezza entered the data warehouse market convinced the products offered by the dominant vendors, in particular Oracle, were ill-suited to meet the challengers of Big Data and of such complexity to make them exorbitantly expensive to acquire and use. Exadata only increases the complexity and expense of an Oracle warehouse. Greg draws his readers’ attention to the excellent blog at http://dbmsmusings.blogspot.com/ where Daniel Abadi muses “Both Oracle and Teradata are too expensive for large parts of the analytical database market.

 

Greg’s blog reveals one path available to organizations wishing to generate greater value from their data. CIOs willing to build, train, and permanently assign a team of technical experts to choosing just the right combination from a myriad of settings, can be continuously employed coercing a database designed for OLTP to function as a data warehouse. I’ll close this blog with a manager’s perspective, from someone who focuses an organization’s limited resources on its highest priorities. Peter Drucker, who introduced us to the concept of the knowledge worker, gave us a pragmatic measure to evaluate our own and our team members’ activity - am I merely efficient (doing things right) or truly effective (doing the right thing)? All the workarounds and clever tuning demanded by Exadata simply don’t exist in TwinFin, Netezza has proven them unnecessary.

0 Comments Permalink
0

Netezza Director of Product Marketing Razi Raziuddin is blogging today.


     

I’ve been at The 2010 TDWI World Conference in San Diego this week, where the theme is "agile BI that delivers data (I would use the term ‘insights’) at the speed of thought.” Timing is everything when it comes to making decisions – and influencing other to make decisions we’d like to see.

 

We’ve all experienced Red Car Syndrome at some point or another. You test drive a red car. You like it. Suddenly, you start noticing red cars everywhere – not because the number of red cars has increased, but because the experience of driving a red car is now personalized. Online advertisers use Red Car Syndrome to connect consumers with the products they genuinely want, as I was reminded first-hand recently. While searching for kitchen fixtures online, I noticed that many of the ads featured a pair of pricey fixtures that initially caught our eye, but that we had rejected as exceeding our budget. But the ads seemed to know our tastes better than we did, and ultimately we succumbed and made the purchase.

 

Red-Car-psd38311 6.jpg

 

The experience brought home the power of right-time analytics. Speed is critical in making analytics actionable and delivering real value to the business. The trifecta of huge data volumes, complex analytics and query performance is an increasingly common thread in the BI and data warehousing world. It is true not just for online marketers, but cuts across industry lines. Whether it is an insurance provider trying to prevent fraud, a telco determining the cheapest and best path to route a call or a government agency unearthing criminal activity, time to insight from big data makes the difference in every case.

 

Doug Henschen recently wrote a good article on this topic for InformationWeek in which he calls out success in the Big Data era as the ability to get faster insights from huge data sets. The article highlights Catalina Marketing’s  petascale data warehouse environment and the fast insights they derive from a huge database of 195 million consumers.

 

Although not every enterprise has a data warehouse environment quite that large, the need to perform complex analytics and derive insight in the shortest time possible is common in every environment, big or small. While scalable MPP architectures address the big data problem quite well, the big math problem associated with complex and advanced analytics is what many customers still wrestle with. There’s general agreement that in-database processing, especially in scalable MPP systems, is the right solution to the big math problem. Doug’s article again highlights Catalina’s use of in-database analytics to radically streamline their analytic modeling environment and gain efficiencies of 10X as a result.

 

However, not every data warehouse platform is geared up for the challenges of performing in-database analytics at scale. The first and obvious challenge is the additional processing overhead required to run advanced analytic algorithms alongside the traditional data warehouse workload. You need a system architecture that is not overwhelmed by the data volumes typical of data warehouses in the Big Data era. Then there is the question of what analytics you want to perform. The majority of commonly available analytic libraries are written for in-memory processing in SMP systems and need to be parallelized in order to take advantage of MPP architectures. The analytic system should not only offer parallelized versions of the analytics you desire, but also provide primitives to easily parallelize advanced analytic algorithms while hiding the complexity of parallel programming from developers.

 

Finally, the dearth of universally accepted standards in the advanced analytics world poses yet another challenge. A typical analytic environment may consist of a mish-mash of commercially available tools such as SAS and SPSS, open source ones such as R and Hadoop (which are gaining popularity), and tons of application code written in various languages such as Java and Python. The underlying system must offer tremendous flexibility in integrating with a wide array of analytic tools and support for a variety of frameworks and languages.

 

In subsequent posts, I’ll talk about Netezza’s advanced analytic capabilities to enable big math on big data. In the meantime, as you plan your analytic infrastructures for the Big Data era, tell us what challenges you are coming up against.

0 Comments Permalink
0

Two things before I begin:

  • I’ll begin this posting with a call for inputs. Below I will list a few of the most common Hadoop/Netezza co-existence deployment patterns we have seen to date. But I would like to hear from others. As you see the continuing deployment of Hadoop in the enterprise and as the Second Wave of TwinFin™ comes on with the advanced analytics capabilities of i-Class, how do you see the evolving deployment patterns happening in your environment?

  • A special hat-tip to Krishnan Parasuraman, Netezza’s Chief Architect for our Digital Media group, for his excellent help in aiding and abetting this post! I have used his guidance gratefully and (with his permission) stolen freely from some of his inputs.

 

You may have noticed a partnership announcement made by Cloudera and Netezza late last week. Together with Cloudera, Netezza will open up data movement and transformation between Cloudera’s Distribution for Hadoop and the Netezza family of appliances applications and data flows for integration of the two systems. We expect that our partnership with Cloudera, together with the Hadoop support in Netezza’s i-Class™ set of advanced analytics capabilities that are included as part of the upcoming release 6.0 software release, will lead to some very innovative and expansive applications for our customers and for both companies.

 

Even today, Netezza customers are doing some very interesting things with deployment of Hadoop and our TwinFin data warehouse appliance. Far from being the “Hadoop v. SQL” battle that some people might like to make the current market out to be, we have instead noticed a growing number of “co-existence” deployment strategies and design patterns already at work with our customers – particularly among customers in the “Digital Media” vertical market.

 

These types of strategies can play to the strengths of both technologies and roughly break down into two categories: 1) the use of a Hadoop Cluster for data ingestion, which I’ll write about in further detail today; and 2) using a Hadoop Cluster for long-term data retention, or as a “queryable archive,” for which I’ll go into further detail in a post later this week.

 

Using a Hadoop Cluster for Raw Data Ingestion

The use of a Hadoop Cluster as the engine for data ingestion is the most common “co-existence” pattern we see in our customers’ mutual deployments of Hadoop and Netezza. The deployment pattern typically arises when the customer has hit specific performance and processing throughput scalability limitations with their existing Data Integration or ETL implementation.

 

Raw weblog data is the primary data source for most Digital Media analytics and reporting requirements. Weblogs are data rich (e.g., page views, impressions, click-throughs and demographics collected from applications servers). They are typically semi-structured and collected and stored in flat files.

 

There are some critical facts about weblogs that present real performance challenges in processing them:

  • sheer volume: millions of rows of weblog data collected throughout the day and loaded daily into the data warehouse;
  • complex query processing: parsing and decoding encoded character strings requires text processing, pattern matching, tokenizing type capabilities within the ETL process
  • non-conformed dimensions: collecting page views or impression data defined and represented differently by various systems makes fitting them into conformed dimensions is another very common data ingestion & processing challenge.

 

There are two common variants of this pattern – dealing with semi-structured (e.g., weblogs) and unstructured (e.g., text) data and often customers will have versions of both variants in operation simultaneously.

 

Hadoop-NZ 2.png

Semi-structured data ingest via Hadoop

 

Semi-structured data is parsed (and possibly aggregated as well) in the Hadoop Cluster and then loaded into a TwinFin where the performance and workload scaling of the appliance is important for deeper analysis, higher throughput and faster reporting.

 

 

Hadoop-NZ 1.jpg

Unstructured data ingest via Hadoop

 

Unstructured data in this pattern is contextualized (classified, mined, keyworded and indexed) in Hadoop and then moved into a Netezza TwinFin appliance for the low-latency, high-performance analytics used to drive business decisions.

 

 

A Hadoop Cluster provides a scalable ingestion mechanism that is well suited for addressing the challenges described above. The Cluster can be incrementally scaled to handle ingesting the massive volumes of weblog data and it can support text processing and complex data processing through programming languages such as Java or Python. [Note that with the coming i-Class set of analytics functionality, the programmability and some of the complex data processing may also be possible on the TwinFin, depending on a customer’s applications needs or preference.]

 

Following the data ingest steps, processed weblog information is brought into TwinFin as atomic event information or as summarized tables, depending on the size of the appliance and analytic maturity & scale of the organization where it is deployed. A typical deployment might look like the following diagram:

Hadoop-NZ Arch 1.jpg

 

 

An alternate, far less common, deployment design of the above co-existence pattern is used by some of our customers. That is the use of an external elastic MapReduce cloud (such as the Amazon Cloud) for the data ingestion purposes.

 

In cases where the customer may have its application servers in the Amazon’s EC2 cluster, they may also choose to use Amazon’s S3 web services for retaining weblog data. In that case, Amazon would provide the elastic MapReduce infrastructure for the data ingest process into the TwinFin appliance. This alternative deployment scenario would look something like the following:

Hadoop-NZ Arch 2.jpg

 

 

The bottom line is that the different strengths of TwinFin and Hadoop lend themselves to complementary deployments – and some of our customers have already discovered innovative ways to leverage them together to maximize the value of both their investments.

 

In my next post, I’ll discuss the second pattern we’re noticing: one in which Netezza customers are using the Hadoop Cluster for long-term data retention.

0 Comments Permalink
0
"I cannot imagine life without Netezza." from a tweet by "noogle" (Twitter, 11 May 2010)

 

800px-Shibuya_night.jpg

[Photo credit: 2006 photo of “the scramble” intersection in the Shibuya district of Tokyo, courtesy of "Bantosh" and Wikipedia]

 

Late in May, members of my team and I were in Tokyo's ultra-bustling Shibuya district for a few days on our "worldwide whirlwind training tour" with the global field sales teams regarding the details of the TwinFin i-Class product offering. The late-night scene, hairstyles and outfits there border on the outrageously-hip. There are high-def billboards, electronic gadgets, and of course the bright lights of retailers, bars, clubs and restaurants all through Shibuya. Advances in high technology are virtually 2nd nature to the people there. So with that as the backdrop, imagine the surprise of hearing over a beer or two (see earlier reference to bars & clubs in Shibuya) that a customer "could not live without Netezza".

 

We're proud of our highly referenceable customer base at Netezza and our "easy to do business with" relationships with our customers and partners. In my six years with the company I've met a pretty fair number of really enthusiastic customers including people who held "welcome" parties for their Netezza systems, use "Netezza" as a verb ("Did you Netezza that data?") or even an adverb ("It's Netezza easy."). But I can't recall any customer who said that they, "could not imagine living without Netezza".

 

Simple self-promotion is not the real point of this post though. What is is the thread of a dilemma that noogle presents us with in his 40+ word tweet. It's something that business managers and analysts face on a daily basis: what is more important –

  • being able look for strategic and/or tactical competitive nuggets by performing SQL OLAP analytics on their full, atomic-level dataset; or
  • looking for that guidance by using advanced analytical toolsets on subsets or aggregations of their data that are extracted from the data warehouse?

 

Here's the whole tweet by noogle in it's original form:

デー タが莫大になると分析が不可能になる。少ないデータを複雑なアルゴリズムで分析するよりも、莫大なデータを単純なアルゴリズムで分析する方が有益。統計学 とは逆。アホかという量のデータ分析の手助けするのがNetezza。もうあたしはNetezzaの無い世界では生きていけない。

 

And here's a translation of it into English [parenthetical comments and emphasis are mine]:

When data is huge, complex analytics are impossible. It’s far more beneficial analyzing massive data with simple [SQL] logic, rather than analyzing small data with complicated analysis. This is opposite of statistics [based on sampling techniques]. Analyzing data which is “crazy massive” is Netezza. I cannot imagine life without Netezza.

 

It turns out that noogle is a long-time user of advanced analytics and predictive techniques. He knows their value, but his tweet exposes of weakness of today's typical analytical environment. By not being able to perform advanced analysis inside the database, most of that work (if performed at all) is done in external servers based on data sets that are extracted (filtered, sampled and/or aggregated) from the data warehouse.

 

That adds latency to do the extraction and limits the "currency" of the data. Depending on whom you ask, it also limits the accuracy of the results. For instance, looking at aggregations or samples may give you a sense of the "big picture" but not necessarily uncover the needle in the haystack (e.g., fraud detection) or the impact of a long tail that can be exploited in a particular business.

 

So noogle's choice is to use the analytic horsepower of TwinFin over the sampling techniques. But if one is limited to the set-based logic of SQL, perhaps aided by user-defined functions, you are again limited in the predictive visibility that those tools can provide. Faced with the dilemma this customer chose being able to analyze all the data over statistically sampling and performing advanced analytics on the sample set. Having an answer to that dilemma is precisely what has driven the advent of the i-Class functionality for TwinFin.

 

We're excited about TwinFin i-Class, but I'm interested in what others may have to say about this. Does your company employ advanced analytical techniques and how have you reconciled the "sampling" versus "full data set" questions in your business? And what are the prospects and pitfalls of doing "crazy complicated analytics on crazy massive data" all in one simple, high-performance data warehouse appliance from your perspective?

0 Comments Permalink
0

“The best vision is insight.” -- Malcolm Forbes (1919-1990), publisher of Forbes magazine, New Jersey state senator and adventure hobbyist.

A couple of big announcements from our friends at SAS today. For the industry at large, SAS’
commitment to in-database analytic processing is a confirmation of trends that we have been discussing for over two years: more and more, the “data warehouse” is becoming the hub of all analytics processing for the enterprise. While that announcement covers multiple database vendors, today’s other announcement from Cary, NC on the availability of the “SAS Scoring Accelerator for Netezza” means that we and SAS are immediately putting this recommitted strategy into action.

Of primary importance to Netezza’s customers is the fact that with SAS’ intensification of In-Database functionality, SAS and Netezza will continue working together to deliver ever more advanced analytic capabilities inside the Netezza appliance. And the first step on that path is an excellent one: the availability SAS Scoring Accelerator for Netezza means that Netezza’s customers are able to execute SAS scoring models directly within the Netezza appliance and in-line with other SQL query processing on their data. The SAS Scoring Accelerator for Netezza will be Generally Available in early 2010, and Netezza and SAS are already working with a small number of early adopter customers such as Catalina Marketing, as they begin to benefit from this powerful functionality.

These scoring models are used in virtually every vertical market in which Netezza sells our products for fraud detection, credit and risk analysis and market segmentation. By embedding them in the Netezza appliance, customers will get the same 10-100X market-leading performance on scoring their data as they do on query processing. By running in-database customers can score
all their data and not be reliant on only using samples or aggregates for expediency. And the in-database scoring also means that the inherent delays, or latency, in getting at the data to score it has been eliminated. The best way to deal with the large amounts of data being loaded in today’s data warehouse systems is not move it unless necessary, so Netezza’s AMPP architecture and method of moving the data processing as close as possible to where data is stored delivers huge performance gains for in-database analytics.

 

n-sight atomic small 2.png

n-sight logo small.pngThe on-going partnering work with SAS, and specifically the Scoring Accelerator, are part of the conversation with customers, partners and the market in general that Netezza began back with our Enzee Universe world tour in September regarding our vision for the industry and for Netezza. It’s known as “Netezza Insight” and CEO Jim Baum used his keynote addresses in seven cities around the world to begin the dialogue of taking Netezza and the concept of data warehousing “deeper”, “higher”, “wider” in a “unified” enterprise-wide platform approach together with other partners in the community. In smaller settings with customers, partners and analysts since then, we’ve continued that dialogue since the Enzee Universe and generated real excitement as they come to understand the full breadth of what Netezza is enabling in the market.


In coming days, we’ll be writing more about Netezza Insight and how it is manifest in product platforms, features and applications. But for today, let’s just say that SAS and Netezza customers are already able to do more, faster, with our combined products than ever before and that this is just a step toward even more powerful capabilities.

As Rick (Humphrey Bogart) said in the closing scenes of
Casablanca, “I think this is the beginning of a beautiful friendship.”

 

 

 

[UPDATE: Rather than just reading what I have to say, you can watch SAS Executive Vice President and CTO Keith Collins describe his take on the value of in-database processing and the Scoring Accelerator for Netezza in the following video from the Enzee Universe 2009 show in Boston.]

 



0 Comments Permalink
2

 


"You stay classy, San Diego." -- Ron Burgundy (Will Ferrell) in "Anchorman" (2004)Will Ferrell Anchorman.gif


This morning a few others from the Netezza Marketing and Product Management teams and I are ensconced by the Marina in sunny San Diego, CA for the TDWI World Conference and for an news announcement or two. And who better to bring us "Breaking News!" than the Number 1 newsman in all of San Diego, Ron Burgundy. [For those of you who might have been "hoping for more" from Ron in a quote about San Diego, you can check out the IMDB database for some great ones, including Ron's own historical (and hysterical) etymology for the city's name.]


BANNER_TwinFin_3.gif

 

Though it’s not exactly a state-secret at this point, today we’re launching the 4th generation of Netezza data warehouse and analytic appliances and the first of four initial product lines in it: TwinFin™.

 

TwinFin logo name.jpg

Some of the core characteristics of the TwinFin and the overall platform are:

  • Resetting Netezza’s price-performance leadership position in the market and extending Netezza’s performance lead;
  • Disrupting the competitive data warehouse market among the incumbents, just as we did with our initial systems in 2003/’04;
  • Moving to a commercially-available, blade-based server and storage platform; and
  • Opening Netezza’s aperture on the broader market with a multi-product platform design to match customers’ data warehouse and analytics needs across their enterprise


After the market disruption Netezza caused with the introduction of the NPS® in 2003 and since, we have seen the entry of dozens of new startups in our wake and virtually every major incumbent data warehouse vendor has retooled its portfolio to include a “response” to the Data Warehouse Appliance (DWA) in a suddenly reenergized market. Several of them, to their credit, have advanced their value propositions and improved their competitive position.


TwinFin Board Image.gifNow it is Netezza’s time once again. With the introduction of TwinFin and the other members of the new family of products, Netezza is once again changing the game; widening the applicability of our systems to more types of customers, applications and partners in the market.

As stated in
my response to Curt Monash, my response to Curt Monash last week, we think of this 4th generation of the Netezza appliance as using “the same architecture with a new physical implementation”. Starting with TwinFin, we moved to a commodity blade-server based system framework, but one that still uses Netezza’s “secret sauce” to deliver as much as a 5X increase in performance over the previous generation of Netezza systems, namely:

· our balanced design and streaming architecture;

· the use of Field Programmable Gate Array (FPGA) technology as a query processing “turbocharger”; and

· our advanced MPP management and optimization software.

 

And there are more innovations and performance gains on the way! TwinFin, quite simply, will serve as a platform for expanding Netezza’s performance and price-performance advantage in the industry and as the basis for advancing the state-of-the-art for in-database, analytically intensive data processing; all without sacrificing any of the appliance simplicity with which our company is synonymous.

As
a couple of us said last week, Netezza has served as “the benchmark” for high-performance DWA pricing in the industry and we are now leading “the market in pivoting to a new competitive price-performance level”. With these new systems, we have embraced a trend that has been happening around the industry – the movement of marginal cost of a bit of disk storage toward $0 – with system-sizing, pricing and even system numbering focused on the performance delivered by a given platform.

 

We think the net effect of the new, simplified pricing structure for TwinFin and the other members of the Netezza product family will create a major disruption in the market. With starting (US-based) prices that equate to under $20,000 per terabyte, TwinFin’s list price is a fraction of other competitors’ performance-system pricing (after they’re all done playing price-obfuscation games around mirror, swap and index storage).

 

TwinFin and the other new Netezza data and analytic appliance products give us the opportunity to continue to lead the market and provide our customers with the best value and performance possible for all of their data warehouse and analytic processing needs. Netezza TwinFin - because two fins are faster than one.

2 Comments Permalink
0

 

"Don't be afraid to try the greatest sport around

(catch a wave, catch a wave)
Everybody tries it once
Those who don't just have to put it down
You paddle out turn around and raise
And baby that's all there is to the coastline craze
You gotta catch a wave and you're sittin' on top of the world"
– from "Catch a Wave" by The Beach Boys (1963)

Surf's up! Summer seems to finally have arrived in the Boston area and a number of vendors in the data warehousing and analytics space are hoping to catch a wave riding on a flurry of industry announcements. A few trends continue to build in the news:

 

  1. Data sizes continue to grow alongside the pressure to increase performance & shrink data latencies;
  2. Workload complexity and user counts continue to grow;
  3. More and more, customers are seeing the value of running advanced analytical processing directly in their primary data repository (see item #1 for reasons why); and
  4. Industry prices for data warehousing and analytics have begun another shift downward.


Today I'd like to address this last point. According to more than one industry analyst, over the last several years, Netezza has served as "the benchmark" for DWA pricing in the industry. Several of our competitors have sought to match and/or undercut Netezza pricing in the market. Some of the incumbent players have tried to, with very limited success, hinge their pricing off Netezza prices, match the performance of the Netezza Performance Server® system, or inoculate their pricey "flagship" products by adding less-expensive, feature-deficient products to their portfolio. But Netezza has continued to succeed in the marketplace, becoming a profitable, publicly-traded company with nearly 300 customers and 400 employees worldwide and one that is listed among the "Leaders" in the Gartner Magic Quadrant.

 

When we disrupted the data warehousing market with our first generation product in 2003 and 2004, Netezza was one of very few startups in an otherwise moribund industry. Now, with established "street cred" and hundreds of loyal customers, we intend to once again upset our competitors and lead the market in pivoting to a new competitive price-performance level. We're about to launch the fourth generation platform of our data warehouse and analytic appliances, which will advance Netezza's performance leadership and once again establish a new price-performance benchmark.

 

Admittedly, we won't be the first vendor offering high-performance data warehouse systems to move to a lower pricing plateau. That task is usually done by early-stage start-ups looking to find a way to differentiate themselves. True to form, Dataupia probably can claim establishing a lower price point first and recently another multiyear "start-up" has also started lower. But those are offerings from very modestly-sized startups with no established market "track record". Netezza will be the first company with proven product maturity, customer base and financial viability to do so.

 

Just how and what are we doing to cause this disruption? Well, let's just say things around the "briefing table" have been quite hectic, and that I and others will have more news about that to follow shortly.

 

[As you might imagine, it's been getting more and more difficult to keep things under wraps – in recent weeks we've even had to fight people off from getting early "sneak peeks". ]

 

Until then hey, it's summertime! So here's what I'd recommend –

 

"So take a lesson from a top-notch surfer boy

(catch a wave, catch a wave)
Get yourself a big board
But don't you treat it like a toy
Just get away from the shady turf
And baby go catch some rays on the sunny surf
And when you catch a wave you'll be sittin' on top of the world


Catch a wave and you'll be sittin' on top of the world"

 

 

Twin Fin: A short board (usually 5'8" - 6'8") with a wide tail for maneuverability and a fin near each rail for stability in radical turns.

 

Purpose: A wider tail area provides more planing area and lift, which creates more speed by efficiently utilizing wave energy. Milking speed and energy from smart surf with extremely sensitive and responsive turning ability are this design's strong points

0 Comments Permalink
2

 

We had quite a surprise the other day when it came to our attention that Netezza and the NPS data warehouse appliance are now the subjects of a new book: Netezza Underground: The unauthorized tales of derring-do and adventures in resilient data warehousing solutions, by David Birmingham (ISBN: 1-4392-0743-7 and now available in paperback version for $31.54 at Amazon.com).

 

 

This is not the first instance of the NPS system being the subject of a book sold by Amazon (e.g., SAS/ACCESS(R) 9.1.3 Supplement for Netezza), but this particular publication certainly brought feelings of both fun and reaching into the mainstream with it, starting right from it's very clever cover art (above) to David's clever turns of phrase and real-life examples.

 

 

As the title suggests, it was not written or coordinated with any Netezza authorization. So of course we bought a copy and read/skimmed through it as quickly as we could. I will say this, David's self-publication skills are great - he keeps what could easily have been a boring, heavy technical tome both engaging and fun to read while still imparting lots of great information about the NPS system, its performance and its ease of operation. And the book's publication is incredibly current - with references to Netezza Developer Network and "BI Appliance" announcements made only as recently as the Enzee Universe user conference in September.

 

 

While I certainly could quibble with a point made here or there about the system, in general I thought it was an excellent book and even put up the following recommendation for it on the Amazon site:

 

I commend David Birmingham on a book that is at once as lightly entertaining and interesting to read as it is chock full of details about just the kind of performance and operational simplicity that is possible with the Netezza Performance Server (NPS) system. Straightaway from the opening pages, Birmingham's effusive, engaging style and excitement about Netezza's system is apparent, "It inhales, crunches and publishes Libraries-of-Congress-at-a-time - and fast."

He also captures the essence of the NPS appliance in an ultra-succinct two-sentence paragraph explaining just why his "Administration Stuff" chapter is so short, "It's an appliance. Put it in the corner and let it work." I couldn't have said it better myself!

This book is comprehensive and current - even reflecting some of the more recent announcements from Netezza regarding OnStream programmability, the Netezza Developer Network and analytic appliances.

As the guy who is responsible for projecting the Netezza products and our technology direction forward, I want to recommend David Birmingham's book to current and prospective customers and partners alike, or as David himself says on the book's Dedication page, "to Enzees everywhere".

--Phil Francisco, VP Product Management & Marketing, Netezza Corporation

So "to Enzees everywhere", have a read of David's book and welcome to the "Netezza Underground".

2 Comments Permalink
0

It was an odd email exchange. Only 30-minutes earlier, at approximately 3:04pm US-PDT, Oracle CEO Larry Ellison, head of one of the most powerful database technology companies on Earth, had publicly launched Oracle's entrée into the Data Warehouse Appliance marketplace: "the HP Oracle Database Machine and the Oracle Exadata Data Storage Server" - while simultaneously "sporting a curiously Romanesque hair style".

 

 

Larry Ellison & Julius Caesar - separated at birth? (Wikipedia: Julius Caesar)

 

 

Perhaps we should have been cowered by such a goliathan announcement? Perhaps we should have quivered? Well that's when the email showed up. You see, Netezza had a booth (or "stand" - as I'm writing this from London tonight) in the exposition area of Oracle's big OpenWorld show in San Francisco. Within minutes of Larry's presentation, in which Netezza figured prominently albeit with substantially erroneous information across Mr. Ellison's charts, the Netezza stand was completely deluged with people saying things like, "I had never talked to your company about data warehousing before, but if Larry is going to spend 10 minutes talking about you, I need to know more." And the Netezza product brochures starting flowing - not in a trickle like a leaky pipe, but like water through a burst dam.

 

 

Larry hadn't just brought up Netezza but had spent some "quality time" extolling the strengths of the Netezza architecture - moving query processing horsepower as close as possible to the storage elements of the system, and his commentary had marked Netezza as the leader in the Data Warehouse Appliance (DWA) approach. Within the hour, our team's supply had run out. Undeterred by the lack of the product brochures - the team had moved on to distributing our glossy fold out "BI Emergency Survival Guide".

 

 

But what this anecdote from the floor of a 50,000-person trade show really meant was that a sea-change had happened in the industry. No less than Larry Ellison had put his imprimatur on the DWA industry segment and in so-doing had also summarily marked Netezza as the industry's leading vendor in the segment.

 

 

Since then, phones have rung off the hook and email exchanges have approached the immediacy of Instant Messaging, with in-bound requests for more information about the Netezza Performance Server®. Whatever doubt that existed in the market that DWAs were a force in the marketplace was eradicated yesterday... at approximately 3:04 pm US-PDT.

 

 

"Please send more product brochures," indeed! Thanks for all the sales leads, Larry! We'll get around to correcting all your misconceptions about our product shortly.

 

 

0 Comments Permalink
Bookmark and Share

Actions