Jun 30, 2010 11:30 AM
Could not serialize transaction error
-
Like (0)
HI all,
We have a procedure that inserts into a table named QA_SUB_PROCESS_SOURCE_TOTAL, after which it computes the count on the number of rows inserted in the current run by using a process id. This count is used for further processing. It is to be noted that the PROCESS ID is unique for every procedure call.
I get this error " could not serialize transaction" when two or more calls are made to this procedure in parallel.
(i.e) when two instances of the same proc is run simultaneously. However when i delay the execution on second procedure by a few seconds i don t get any error.
The structure of the procedure is somewhat like the following one:
CREATE OR REPLACE PROC...
..
BEGIN
INSERT INTO QA_SUB_PROCESS_SOURCE_TOTAL..
SELECT COUNT(1) INTO v_COUNT FROM QA_SUB_PROCESS_SOURCE_TOTAL -- error encountered at this point
WHERE process_id = v_proc_id;
...
END_PROC;
i feel there is some conflict between the INSERT and the SELECT operation on the QA_SUB_PROCESS_SOURCE_TOTAL.
It would be great if any one could let me know the cause of this problem and the solution to overcome it.
This is typical of a common design / implementation flaw - that is treatng Netezza like a transactional database. Netezza is not a transactional database in any sense of the word or concept. It will not behave with the same concurrency as say Oracle, and single-row inserts are extremely inefficient and are to be avoided completely in bulk data processing.
Are you certain that no updates or deletes are happening on this table? This type of error typically happens when more than one thread is addressing a table in an inconsistent manner (one doing an insert while another is doing an update etc) Check out the concurrency rules for a transactional session - if you are doing anything to the tables (e.g truncate or select on the inserted data) inside the stored proc, it will throw an error.
As a practice, whenever we perform an insert, we should not go back to count the records in the target table. If an error occurs, something else is wrong. But if an error does not occur, we have no reason to believe that the records didn't insert. It is therefore better to derive our count, if that's what we're really after, from the incoming data rather than the target table. Think about what this actually means. It's the equivalent of throwing a truckload of fish into a lake, then going back into the lake to count the fish. In all of data warehouseing, it is better to count the fish while they are still in the truck.
If we are performing multiple mini-inserts, this is also problematic. Netezza is not efficient with single row or small-count inserts, so it is better to bundle up smaller counts into a larger critcal mass. If we are doing single-row inserts for job control tables and the like, it's still just as inefficient, but it's usually not as frequent if we are collecting smaller data sets into larger ones.
tricks and tips for how to navigate these issues (as a practice) are found in Netezza Underground. (Amazon.com)
You will have to COMMIT the transaction prior to running the SELECT statement to avoid the 'could not serialize' error. Unfortunatly you cannot COMMIT inside of a stored procedure, so you would have to commit by calling the stored procedure using an external language such as a shell script or java, etc. The current releases of Netezza support up to 31 concurrent INSERT processes, but the INSERT process is not allowed to do anything else other than INSERT/COMMIT. If you try to perform any SELECT statements in the same transaction where you have INSERTed into a table, you will run into 'could not serialize' errors if any other process is trying to INSERT/UPDATE/DELETE data in the same tables you are selecting from.
It is perfectly fine for other processes to SELECT from the same tables even if multiple INSERT processes are happening at the same time, but it is not possible for a transaction which executes an INSERT to also execute a SELECT statement (one process can perform insert/select, but not more than one).
So to summarize, you can have up to 31 concurrent inserts running without problems. Netezza only supports 1 concurrent update or delete statement against a table and also only supports 1 concurrent transaction which performs an INSERT/SELECT against the same table.
Thanks for the response David, The approach you state makes a lot of sense.
I would try to redesign the procedure. But considering the time constraints, i would like to know if there is any other alternative.
I would like to mention here that the table involved here is typically a LOG table.
There are NO DELETES/UPDATES happening on it.
Thanks in advance,
Sathish
@Shawn,
Thanks for the detailed explanation. It helped me in understanding the exact cause of the problem.
Hi all,
I was able to arrive at a tactical fix to deal wit this error, without much change to the code.
The procedure causing error consisted of a INSERT and SELECT operation in a single stored procedure. refer below:
CREATE OR REPLACE PROC...
..
BEGIN
INSERT INTO QA_SUB_PROCESS_SOURCE_TOTAL..
SELECT COUNT(1) INTO v_COUNT FROM QA_SUB_PROCESS_SOURCE_TOTAL -- error encountered at this point
WHERE process_id = v_proc_id;
...
END_PROC;
I have now broken the procedure into TWO different procedures.
The first one would handle only the INSERTS and then second procedure performing just the SELECT operation would be invoked.
procedure 1:
BEGIN
<< INSERT>>
END;
procedure 2 :
BEGIN
<<SELECT>>
END;
Hope this helps a few.

