Skip navigation
41510 Views 22 Replies Latest reply: Apr 5, 2010 9:27 PM by gb2312 RSS 1 2 Previous Next
artmt New Enzee 9 posts since
Mar 2, 2010
Currently Being Moderated

Mar 19, 2010 2:54 AM

Looking for advise on finding a Netezza job

I am a senior level Oracle professional. I am now considering investing in Netezza training and certification and wondering whether this would be a good investment.
I have a strong background as Oracle developer and data architect. I have done DBA work as well but that is not my main expertise. Several years ago I worked on a project where I got some exposure to Netezza and was very impressed with the platform.
In today's job market I had to drop my rate quite a bit, and it is still not easy to find good Oracle projects. But just because I briefly mention Netezza on my resume I get a lot of calls from recruiters who are desperately searching for Netezza professionals.
This is why I am considering taking Netezza classes and getting certified.
How do you see the prospect of finding a senior-level Netezza-focused position for someone with strong Oracle data warehousing background, Netezza training and certification but little hands-on experience?

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    1. Mar 19, 2010 3:52 AM (in response to artmt)
    Re: Looking for advise on finding a Netezza job

    Practically everybody starts with Netezza from the same experience base that you have. They get a project or get exposure, are impressed with the technology and want to do more with it. Do not be daunted. Learn the technology and jump in with both feet first.

  • Superuser Rookie 90 posts since
    Sep 19, 2008
    Currently Being Moderated
    2. Mar 19, 2010 5:48 AM (in response to artmt)
    Re: Looking for advise on finding a Netezza job

    there are many positions.. i would say many of the ppl who are experienced in netezza over here at my place started the sameway( from oracle prof background, along with me )..

    try these --- http://netezzaforum.com/netezza-jobs-f11/

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    4. Mar 19, 2010 9:21 PM (in response to artmt)
    Re: Looking for advise on finding a Netezza job

    Define "senior level position". The Netezza platform is a data warehouse appliance. As such, those with a significant background in data warehousing are more successful with it, esp as a consultant. Keep in mind that the average Netezza database sports tables in the many billions of records, most of them in hundreds of billions of records. If you are not accustomed to this scale, perhaps you are not at the senior level, but there's no time like the present to start learning it.

     

    Short answer, when we hire senior level people we look at their background in data warehousing. If they don't understand data warehousing, they (typically) won't understand the Netezza platform.

     

    Case in point - Netezza has no indexes. It is not a transactional platform. Solving problem with it requires set-based thinking, not transactional, cursor-based thinking. One of the reasons Oracle and RDBMS folks have a hard time transitioning to Netezza is that the platform encourages processing of data inside the database with typical insert-select statements that would be insane on an Oracle platform. A typical Oracle equivalent would be to pull the data into an ETL environment, perform set-based processing on it, and put it back. In Netezza space, this would be like pulling the data out of a 200-processor machine, processing it on a 16-processor machine and then putting it back into the 200-processor machine. The power is in the 200-processor machine, not the ETL environment.

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    7. Mar 20, 2010 1:49 PM (in response to artmt)
    Re: Looking for advise on finding a Netezza job

    That would be incorrect. Any large-scale operation on the data would take place inside the machine. Validation, exception handling etc. Whether you do it all in SQL or in a combination of using an ETL tool and SQL

     

    For example, one group uses DataStage to manage the SQL statements it sends to the machine. It then handles exception and control logic in DataStage but manipulates all data inside the machine - the data itself never leaves the box. This tends to relegate a high-powered ETL environment to the role of "firing over SQL statements" but so be it  - that's where the power is

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    8. Mar 20, 2010 1:53 PM (in response to artmt)
    Re: Looking for advise on finding a Netezza job

    Netezza support is the best in the biz. However, keep in mind that several roles exist in a data warehouse project, and in a Netezza environment, the DBA is a part-time job, if that. The machine is so self-contained that having a formal DBA can actually slow down progress. After all, the create-table-ddl in Oracle, after all of its objects and support are instantiated, can take thousands of statements, where the equivalent in Netezza is tens of statements. Seriously, DDL in Netezza to create a table is basically naming the table, the distribution key and the columns. There's no tablespace, indexing or other management.

     

    So it gives the applications folks a lot of freedom to create and destroy intermediate workspace tables on-the-fly, effectively performing data processing inside the machine with "ELT", emphasis on the "T" being inside the machine rather than ETL where the T is outside the machine.

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    10. Mar 21, 2010 8:26 AM (in response to artmt)
    Re: Looking for advise on finding a Netezza job

    The same way you would solve it in "standard" data warehousing. You would not involve the Oracle database in this problem, because this would be a violation of rule #10 - never use a RDBMS for bulk processing. So if the RDBMS is not supposed to be involved (e.g unique key checking at the row level) how would we solve the problem otherwise? Solving this problem in-bulk requires a set-based bulk operation to make it happen. Just so there's no confusion, Netezza does not enforce unqueness in tables even if a PK constraint is applied. This is in keeping with data warehousing best practices. Can you imagine why Netezza would never want to invoke this kind of functioinality, or tempt a developer into using it?

     

    See the book "Netezza Underground" available from Amazon.com for more insights.

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    12. Mar 21, 2010 9:59 PM (in response to artmt)
    Re: Looking for advise on finding a Netezza job

    Netezza can handle data in any state of cleanliness or disarray. I've assembled data into Netezza using raw feeds into staging tables that were taken through ELT steps toward an awating target. Data arrived with extraordinary dirt and data risk and arrived on the target in pristine, consumption-ready form.

     

    In a standard RDBMS data warehouse, "load ready" means that the data must arrive on the database's front door "consumption ready", not just load ready. It has to be exceptionally clean when it is loaded to the RDBMS because we cannot afford to involve the RDBMS in row-by-row processing (Rule #10)

     

    Clumsy is a word I would reserve for constraint usage. Mainly because when exceptions occur, the constraints cannot fix the problem, they can only make it worse.

     

    Think about the operations in place here. If the constraints are on during the load, the RDBMS is involved in row-by-row key checking, a violation of Rule #10. So we would always turn them off. People who don't do this have performance issues.

     

    If an exception happens during the load when the constraints are on, the entire load process is then halted to handle the exception. This is a very bad state of affairs when loading billlions of rows.

     

    If we load bad data into the table with constraints turned off, then attempt to turn them on, the database will complain and we then have a messy rollback issue. Can the constraint fix this problem? No, it can only report the problem, Where is the problem actually fixed? In the process creating the data, of course. The only determininstic and resilient means to guarantee that the data is ready to load into the target table, is to make it load-ready by checking the data, the keys and anything else that would cause it to fail.

     

    This is where the term "load ready" is often confused. "Load ready" is more than just ready because the data is scrubbed. It is ready becauseall anticipated errors have been removed.

     

    Unique and pk/fk constraints are easily anticipated.

     

    In a standard RDBMS data warehouse, this would be accomplished by:

     

    1. Download the primary / foreign key values (from the target table) into the ETL environment
    2. Scrub the data for exceptions in bulk with a massive key-join operation prior to a bulk load.
    3. Once exception are pulled out, the constraints on the database are turned off to execute a bulk load.
    4. Once the load completes, the constraints are restored.

     

    No anticipated errors arise from the process, and all exceptions are captured in step 2 - proactively, rather than in step 4, reactively

     

    If we don't take the prior steps, we will have to endure the following clumsy operation:

     

    1. Scrub the data for all exceptions "except" for the constraints
    2. Turn off key constraints on the target table, because leaving them on is too costly in performance (Rule #10)
    3. Load the data
    4. Attempt to turn the constraints back on
    5. If no exceptions, whew!
    6. If exceptions, we have to back out everything we just loaded, or at least remove the erroneous records (from a table that has corrupted data), and try again
    7. If the data loaded duplicates, however, how can we unload the duplicate without unloading the original good record?
    8. In the end, the data processing environment, not the database, is still charged with making the data correct and load-ready.
    9. But look how much time we lost in the recovery cycle!

     

     

    Once again, this is a standard data warehouse 101 approach in any data warehouse. Constraints are always turned off during a load because they radically hinder performance. When we're talking about billions of records, we cannot afford the performance hit, especially for something that we have to proactively fix anyhow.

     

    the simplest way to achieve this in Netezza, is with a massively parallel table-to-table join. All exceptions simply fall out, and are processed in bulk scale. This is the only high-performance means to capture the exceptions. It is also easy enough (in both script and ETL/ELT) to formulate reusable resources to accomplish these goals rather automatically without having to code them separately for each table.

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    14. Mar 23, 2010 9:36 AM (in response to artmt)
    Re: Looking for advise on finding a Netezza job

    We then need to define "scale". If by scale we mean that we intend to require the user to always use the primary key index to get the on-demand performance, then constraining the user this way can yield upward scalability. If we allow the user to query on any arbitrary column, then the primary key does not scale in any sense of the word. A table-scan on billions of rows in a traditional RDBMS will take longer than the user is willing to wait.Certainly longer than the environment is willing to support. Each key constraint adds largesse to the total data storage requirement. Regenerating keys, and key maintenance, then becomes a regular part of data administration.

     

    On the other hand, sometime back I had a table with over 200 columns, sporting 14 terabytes and 40 billion rows of information. A "big dumb" scan took no longer than 7 minutes. Applying some simple Netezza optimizations brought any arbitrary query to under 30 seconds, most under 10 seconds. This kind of scale is impossible in Oracle, in every sense of the word.

     

    And this with no primary keys, indexes or other common props required in a traditional RDBMS. This means that all columns are "fair game" all the time. Netezza even allows the natural processes of data warehousing to provide inherent boosts with zone maps. By following simple rules, performance arrives for free. This is how an appliance should work.

     

    Perhaps this is why, in 2007, Business Objects Strategic group declared the traditional RDBMS, (with Oracle at the top of the list) to be a "secondhand technology" for data warehousing, now eclipsed by parallel appliance technology, and the impetus for Oracle to create their own appliance offering. The traditional RDBMS simply requires too much structural engineering (star schemas, index manipulation, etc) to be agile enough for a highly demanding data warehouse environment. It is this high-engineering of the database structures that eventually paints the RDBMS into a functional box.

     

    In addition, to get performance from the traditional RDBMS, query engineering is the key to performance gains. In Netezza, the query is largely along for the ride. We could write a really bad query, but once a query is "good" it's not likely to get any better without changing the data configuration itself. So data engineering is a more important key to performance success. If the data is layed out correctly, the environment will scale without query engineering. In fact,de-engineering the queries often yields better performance than trying to carefully craft a query. This effectively simplifies the BI/presentation environment and puts the onus on the data engineers for performance yield, a much better state of affairs than punting it to the reporting and query engineers.

1 2 Previous Next

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points