Skip navigation
17182 Views 4 Replies Latest reply: Nov 2, 2009 1:01 PM by Chris Rodgers RSS
tom_k New Enzee 8 posts since
Aug 7, 2009
Currently Being Moderated

Nov 2, 2009 11:33 AM

int4 overflow error

What is the best way to dissect this error?  I know that I am getting this due to bad data, but Im trying to figure out the best solution to find the culprits when dealing with 70 billion rows.

 

Im trying to load click stream data and turning a data type of DATE to an integer. Sounds simple enough and most loads are working except a few.

 

here is what I am doing

 

cast(to_char(day,'YYYYMMDD') as int) as date_id

 

Im joining on an INTEGER id as well as the day which is a DATE.

 

Im inserting into a column DATE_ID which is INTEGER

 

 

insert into fct_click

(date_id)

select cast(to_char(day,'YYYYMMDD') as int) as date_id

from stage_click a

inner join stage_click_processed b on a.id = b.id

and a.day = b.day;

 

id = integer

day = date

 

date_id = integer

 

Any ideas?

 

Thanks in advance!

 

tom_k

  • Chris Rodgers Active Enzee 172 posts since
    Sep 19, 2006
    Currently Being Moderated
    1. Nov 2, 2009 12:10 PM (in response to tom_k)
    Re: int4 overflow error

    I would try to run a few parts separately, seeing if they fail, or looking for min/max values.  The overflow sounds like it would be happening on a conversion, aggregation, or insert process.

     

    For example, run just this part:

    select cast(to_char(day,'YYYYMMDD') as int) as date_id

    from stage_click

     

    and see if that is what is giving you issues.  If it does not fail, get the min/max values from date_id that is generated and see if they are beyond the definition of an int.

     

    Additionally, double check the definition of your target table compared to your source data.  Is ID really an int? or a bigint in the source data?

  • David Birmingham Active Enzee 426 posts since
    Sep 24, 2007
    Currently Being Moderated
    3. Nov 2, 2009 1:13 PM (in response to tom_k)
    Re: int4 overflow error

    We had a case of multi-billion-row sifting for bogus dates in files - namelly that they had illegal/garbage characters, some of which did not display

     

    a quick way to scan your data for bogus values is to load the data into a varchar, and then follow up with the function as so:

     

    when (translate(trim(column_name), '-:/0123456789','')='') then date(colum_name) else null end column_name

     

    the above checks for valid chars in our date-type values, so just give it a character list that makes send for your data type - if all the characters reviewed end up forming an empty string, then the entire value "passes" and it's a keeper - otherwise do something else with it (as in the above, we made it null) but I am not a big fan of nulls in the final target - our downstream processing scrubbed nulls before the repository received them

     

    In addition, when converting from the varchar, capture any values with a string length of > 8 - this will help you drive out the culprits and discover any common patterns in the bad data.

  • Chris Rodgers Active Enzee 172 posts since
    Sep 19, 2006
    Currently Being Moderated
    4. Nov 2, 2009 1:01 PM (in response to tom_k)
    Re: int4 overflow error

    If you can find a day that will not load, whittle it down to a record that will not load.  Create the reproducable case where you have 1 row in each of the tables being joined that will cause the failure.  At that point, the error may be come much clearer (take a look at the plan after it runs to see which snippet is failing).  If not, then post the ddl and row values here and maybe we can spot it.  Otherwise, open a ticket with Netezza and they can torubleshoot it for you - especially if you have a repro set up with a couple of table and small set of data.

     

    That is how I approach most sql bugs - whittle it down to the specific data that does not work...

More Like This

  • Retrieving data ...

Bookmarked By (0)