[ www.netezza.com ]
0

Netezza Director of Product Marketing Razi Raziuddin is blogging today.


     

I’ve been at The 2010 TDWI World Conference in San Diego this week, where the theme is "agile BI that delivers data (I would use the term ‘insights’) at the speed of thought.” Timing is everything when it comes to making decisions – and influencing other to make decisions we’d like to see.

 

We’ve all experienced Red Car Syndrome at some point or another. You test drive a red car. You like it. Suddenly, you start noticing red cars everywhere – not because the number of red cars has increased, but because the experience of driving a red car is now personalized. Online advertisers use Red Car Syndrome to connect consumers with the products they genuinely want, as I was reminded first-hand recently. While searching for kitchen fixtures online, I noticed that many of the ads featured a pair of pricey fixtures that initially caught our eye, but that we had rejected as exceeding our budget. But the ads seemed to know our tastes better than we did, and ultimately we succumbed and made the purchase.

 

Red-Car-psd38311 6.jpg

 

The experience brought home the power of right-time analytics. Speed is critical in making analytics actionable and delivering real value to the business. The trifecta of huge data volumes, complex analytics and query performance is an increasingly common thread in the BI and data warehousing world. It is true not just for online marketers, but cuts across industry lines. Whether it is an insurance provider trying to prevent fraud, a telco determining the cheapest and best path to route a call or a government agency unearthing criminal activity, time to insight from big data makes the difference in every case.

 

Doug Henschen recently wrote a good article on this topic for InformationWeek in which he calls out success in the Big Data era as the ability to get faster insights from huge data sets. The article highlights Catalina Marketing’s  petascale data warehouse environment and the fast insights they derive from a huge database of 195 million consumers.

 

Although not every enterprise has a data warehouse environment quite that large, the need to perform complex analytics and derive insight in the shortest time possible is common in every environment, big or small. While scalable MPP architectures address the big data problem quite well, the big math problem associated with complex and advanced analytics is what many customers still wrestle with. There’s general agreement that in-database processing, especially in scalable MPP systems, is the right solution to the big math problem. Doug’s article again highlights Catalina’s use of in-database analytics to radically streamline their analytic modeling environment and gain efficiencies of 10X as a result.

 

However, not every data warehouse platform is geared up for the challenges of performing in-database analytics at scale. The first and obvious challenge is the additional processing overhead required to run advanced analytic algorithms alongside the traditional data warehouse workload. You need a system architecture that is not overwhelmed by the data volumes typical of data warehouses in the Big Data era. Then there is the question of what analytics you want to perform. The majority of commonly available analytic libraries are written for in-memory processing in SMP systems and need to be parallelized in order to take advantage of MPP architectures. The analytic system should not only offer parallelized versions of the analytics you desire, but also provide primitives to easily parallelize advanced analytic algorithms while hiding the complexity of parallel programming from developers.

 

Finally, the dearth of universally accepted standards in the advanced analytics world poses yet another challenge. A typical analytic environment may consist of a mish-mash of commercially available tools such as SAS and SPSS, open source ones such as R and Hadoop (which are gaining popularity), and tons of application code written in various languages such as Java and Python. The underlying system must offer tremendous flexibility in integrating with a wide array of analytic tools and support for a variety of frameworks and languages.

 

In subsequent posts, I’ll talk about Netezza’s advanced analytic capabilities to enable big math on big data. In the meantime, as you plan your analytic infrastructures for the Big Data era, tell us what challenges you are coming up against.

0 Comments Permalink
0

In theory there is no difference between theory and practice. In practice there is.
-- Jan van de Snepscheut, 1953-1994, computer scientist and educator, California Institute of Technology

So, yesterday I wrote about "the Netezza's" transformation into a platform for deep analytics. Now I know a platform is only as good as the applications available on it, which brings me to our announcement this morning.

 

Last September, we got together with a handful of visionary partners and customers and created the Netezza Developer Network (NDN) with the goal of developing truly innovative analytic applications. We announced the first wave of these offerings today, with 5 NDN partners delivering game-changing applications built using Netezza's OnStream analytics. Let me highlight a couple of them here.

 

 

  • Systech Solutions' profitability analysis application for retail and CPG companies provides cost and revenue analysis at the detailed SKU and customer level. It gives business users the ability to build and run profitability models using a GUI, instead of relying on IT to do it for them. This is pretty unique, because traditionally something like this would take huge amounts of time - measured in many months - not to mention the resources required. Their app cuts this down by orders of magnitude! So you not only get very fine-grained profitability analysis, but it's available very, very quickly. That makes all the difference between gut-feel decisions and data-based ones about which products and customer to keep, which prices to re-negotiate and how to truly impact the bottom line.

 

  • Imagine if telco service providers could analyze each and every one of the many millions of call detail records they collect and store, before making very important decisions - the kinds that can dramatically alter their earnings statements. That's what RateIntegration's app offers - a tool for business users that allows them to model the impact of competitors' pricing and regulatory changes to figure out the most optimal rate plans. Business analysts can also directly implement custom scoring algorithms for customer segmentation and profiling using their flexible rules engine.

 

Apart from these, we have Multi-Threaded Inc's fuzzy name and text matching app for critical anti-terrorism, money laundering and digital forensics operations; HCL Technology's implementation of Monte Carlo simulations for pricing derivatives; and Edge Associate's library of SQL functions to speed up migrations to "the Netezza". Make sure you check out the brand spanking new applications webpage to get more details about each of these members and their applications, and don't forget to stop by their booths at the User Conference.

 

 

"As long as one does not have to wait minutes to hours between computational gestures, something amazing happens; one gets problem solving at the speed of human insight"
-- Data-Centric Computing with the Netezza Architecture, Sandia National Laboratories

While the new applications developed by NDN members are unique and serve very different markets - retail, telecommunications, financial services and government - they have remarkable similarities in the value they offer to customers. The applications power complex analytics orders of magnitude faster than economically feasible before, allowing users to perform "what-if" analyses to more accurately predict future outcomes. These analyses can be performed on large volumes of detailed data, providing unique business insights that would otherwise be lost in sampled and summarized data. The deployment and management of the overall solution is greatly simplified, freeing up business users to focus on results rather than worrying about tuning and maintaining the system.

 

What's really neat about Netezza's open platform approach is the ideas and innovation it is generating and the differentiated applications it is helping launch. Now that's what platform innovation is all about, isn't it? At Netezza, it's about bringing the power of analytics to the mainstream.

0 Comments Permalink
0

"The milk of disruptive innovation doesn't flow from cash-cows "
-- David Isenberg, Blogger, Musings About Loci of Intelligence and Stupidity

Dare I say ... "orders of magnitude performance" for data warehouse applications is old news as far as Netezza customers are concerned! It became fairly obvious to me at the Netezza European User Conference, held a few months ago. In presentation after presentation, customers talked about the performance and simplicity benefits they got from "the Netezza" - how the proof-of-concept (against their favorite legacy data warehouse vendor) seemed unbelievable at first, but certainly proved true in production; the fact that they did indeed get orders of magnitude better performance; and how all this changed the way they did business. Brian Ganly of The Carphone Warehouse used this chart to highlight Netezza performance during his talk about the "Netezza Experience." I think it captures the sentiment really well ...

 

 

 

It's not that data warehouse performance is not important any more, or that somehow the 100X performance that Netezza delivers is "enough". In fact, what the Netezza customers were alluding to, in a customer's own words, is: "Netezza does what it says on the tin!" We talk about blisteringly fast performance without requiring tuning and aggregations at half the cost of other systems, and we deliver. Once customers see for themselves what "the Netezza" can do for their data warehouse, they get intrigued about the possibility of what else it could do for their business. And that quickly leads them to look beyond raw performance for data warehouses and apply "the Netezza" to new and interesting big-data analytic problems.

 

 

"... the best products become platforms at some point."
-- Bob Warfield, author of the SmoothSpan Blog

As the data warehouse market continues to evolve, more and more companies are looking to use information as a competitive lever across their organizations. The most successful will be those that make use of information to exploit arbitrage windows in the marketplace and predict future outcomes more accurately. These companies will differentiate themselves by making high performance analytics pervasive, providing employees, partners and vendors access to the kinds of analytics that are only available to a select few in the enterprise today.

 

What's needed to deliver on the promise of advanced analytics is a platform that can overcome the challenges of doing deep analytics on large data volumes - performance, complexity and cost. Let's look at how advanced analytics are done on traditional systems. In most cases, these poor data warehouses are so overtaxed that adding any more processing is a certain way to bring them to their knees. And so the usual approach is to extract huge data sets onto an outsized SMP server or compute grid, perform the analytic computation on it and load result sets back to the data warehouse for querying. You can clearly see the problems with this approach. It's expensive, especially when you're talking about a large SMP or grid; it's complex since you have more systems to maintain; but most importantly you get poor performance even if you spend tons of time and money on the infrastructure. The data movement back and forth introduces the same latency and performance bottlenecks that still plague traditional data warehouse architectures.

 

What we've done with "the Netezza" is created just such a platform that overcomes these complex analytics challenges. The idea is quite simple actually. Algorithms for analytic tasks such as scoring, text and spatial processing, image and video analysis and financial simulations can be run directly on the intelligent nodes inside the Netezza. So these algorithms can act on the data where it resides, rather than sending it off-board for processing. You not only get the benefit of fully parallelized execution across hundreds of processors resulting in orders of magnitude better performance for analytics, but also the simplicity and economy of an appliance. Plus the Netezza is able to handle all this extra processing because of the spare processing capacity built into each of its intelligent nodes. Let me refer you to Phil Francisco's blog for a blow-by-blow version of how "[OnStream analytics|p-1032]" works in practice.

 

This is all great so far - I mean any platform that provides these kinds of advantages has to be quite extraordinary! But the true value of a platform is determined by the applications that run on it and how innovative and differentiated they are. That's where there is a lot of interest and excitement in the enzee community. More on that very soon ...

0 Comments Permalink
Bookmark and Share

Actions