Best practice to corrupt data

gojko's picture
articles: 

A common practice for handling errors in PL/SQL procedures is to catch all errors in the top-most database layer and convert them into error codes and human readable messages for client applications. This technique is a relict from the past and, in fact, a very bad practice from today's perspective, since it can lead to data corruption.

Great expectations

Procedures build by this rule are easy to spot. They typically have one or two additional OUT parameters on the end – error code and message. Here is a typical example:


build_references (
start_ref IN reference_tbl.idref%TYPE,
start_date IN DATE := SYSDATE,
error_code OUT INTEGER,
error_msg OUT VARCHAR2);

This type of error handling really was a best practice several years ago, and is even published in Steven Feuersteins book “PL/SQL Best Practices” (EXC-03: Catch all exceptions and convert to meaningful return codes before returning to non-PL/SQL host programs). O'Reilly published that book in 2001, at the time when popular client platforms did not know what do to with database exceptions or how to handle them, so they were given an error code and message through output parameters. Feuerstein mentions three client platforms as typical examples: Powerbuilder, Visual basic and Java (a bit strange, since Java has proper exception handling) . The situation changed significantly since then, but the practice lives on, and is still followed with a religious zeal in some companies.

Other than dealing with platforms that do not convert Oracle exceptions into host exceptions, I've come across several explanations for this “best practice”:

  • dealing with all possible errors in the database, and not having to bother with errors in web/client application code
  • giving clients a single, clear and human readable message instead of a strange stack trace

No real benefits

The original premise, client platforms which don't know what to do with database exceptions, no longer holds. Most client platforms now support exception handling, and even Visual basic will transform exceptions from ODBC drivers into it's embedded Err object. Database exceptions will just be wrapped into a SQLException (Java), DbException (.Net) or whichever class is appropriate. But, on the end, you will get a proper host exception, with the database stack trace included. In fact, Java will make you handle a checked SQLException, whether you like it or not.

The other two explanations, as far as I am concerned, never were valid. First of all, you will never be able to take care of all the errors in the database. Never. Some edge cases will still give you problems - network glitch will cause web to disconnect from the database, connection pool will decide to recycle and Oracle will not be able to start a dedicated listener process, or the database will run out of storage. These problems are not frequent, and represent a very irregular situation - an Exception. You might explain to yourself that all the problems were handled in the database, but your client could not care less about that. His customer placed a bet on a winning horse, but there is no record of the bet in the database.

The “strange stack trace” concept was never clear to me. I like to have as much troubleshooting information as I can. Most login operations will fail because the password was incorrect, but if someone added a trigger to the session table that cannot execute because it's mutating, having an 'Incorrect password' as the only information in application logs does not help me very much. I think that the third explanation is just a sign of laziness – front end programmers not wanting to get the message out of the exception. On the end, exceptions in most platforms carry both the error code and string message – instead of dumping the stack trace, front end can show only the message (and log the trace in any case).

So there are no real benefits of the approach. However, the story does not end here – serious problems can be caused by the assumed “best practice”.

Feuerstein suggests maintaining an additional error handling layer as a wrapper around original procedures/functions, and then using the underlying layer in the database, and the wrapper when connecting from outside. But, this is rarely the case. Most of such procedures I've seen are used both from the database and from the outside world. So, this approach effectively cancels all benefits of exceptions in the database. Your application logic becomes mixed with error handling code, and you have to check for errors after all operations. Error codes are easy to ignore which is often the case in code written under tight deadlines. This can lead to errors that are extremely hard to troubleshoot. All of this also applies to client code, where error codes will have to be checked after every operation. In fact, most of the properly written clients would check for the error code and throw an exception in case of error. So additional code is being written on the client side to throw the exception, but the original cause of problems was lost along with the database stack trace.

If web developers are promised that no errors will get thrown from the database, they might just choose to ignore any unexpected problems, and blame you for everything.

Now you see it, now you don't

If cluttered code and passing the buck are not enough for you – here is a common scenario with a really big, real and serious problem that will make you think twice before writing those two additional output parameters again:

Imagine a batch update method that changed half the records and then ran into a problem (for example, money is taken from account but one of requested books is no longer available). The method could not handle the problem at that level, and threw an exception. Outer Pl/Sql procedure or block handles the exception by setting the error code parameter and returning. The web displays the message and asks the user kindly to choose another book and start again. A relatively common scenario – I'm sure that you can find a similar example in your applications. What you don't expect is that the account changes would get committed along the way – which is exactly what will happen if the client was in autocommit mode.

Most modern database connectivity drivers support “autocommit” mode, executing each call as an implicit transaction which is committed or rolled back depending on whether the call completed successfully or raised an exception. In the account example, the call completed successfully - there was no exception and the driver could not care less about your custom error message. The problem becomes even more interesting because autocommit mode is default for .Net and Java clients. So, in order for your code to be used properly, clients must explicitly turn off autocommit mode, begin and complete transactions.

To those of you who are now thinking "I don't care about the web code" or "it's not so hard to write a commit", you are not doing your job if it's harder to use your code properly than to misuse it. And yes, it's not hard to call the commit method, but wrapping exceptions into error messages has no real benefit, and requires additional code both in the database (to catch the exception) and in web (to throw it again, commit or rollback).

A workaround might be to commit or rollback directly in stored procedures, but then you are implying that they will never be used in a wider scope, which is, in my experience, a sure a sure ticket to disaster. Anyway, you are writing and maintaining more code than you should.

The way of the Samurai

The solution is quite easy - follow the way Samurai Principle - "Return victorious, or not at all". All popular client platforms now support exceptions, and will wrap the underlying database exception properly, so it's actually less code for you, less code for your clients, and less chance to get an urgent support call at 3 AM on a Saturday.

To conclude, wrapping exceptions into error messages:

  • clutters code both on client and in the database
  • requires more effort to write and maintain additional code
  • cancels all benefits of exceptions in database
  • makes it very hard to troubleshoot strange edge cases
  • makes it easy to ignore errors, causing problems that are hard to diagnose and troubleshoot
  • may lead to data corruption

and should not be considered a best practice by any standard. Use exceptions for error handling - that's what they are intended for. You will write less code and the system will be much easier to maintain.

Gojko Adzic
http://www.gojko.com

Comments

Kevin Meade's picture

I always hated this practice. Will be interesting to see someone with a counterpoint (hey, maybe Steve will stop by).

there are three things I religiously avoid when building Oracle systems:

1) use of ROWID for anything
2) use of COMMIT in stored code
3) use of any kind of error masking

I have lost count of how many times these three things have hosed systems I have worked on, or how much crap code was wrapped around having to deal with these three practices. With every passing year, I feel stronger about why I avoid these like the plague.

I think understanding this has to do with how well a developer or dba can get beyond a single piece of code so that they can appreciate the bigger picture of data systems and their life cycles. Today it is all about choice; not doing stuff that limits your options later. It is a sad commentary that most of the developers in our business never make the leap beyond a basic programming mindset.

As a last comment let me suggest, that error trapping code is like all other code, less is more, the best exception handler is the one you never wrote.

Thanks, Kevin

gojko's picture

Hi - completely agree about transactional control in stored code. A friend of mine always talks about the previous company where he worked - writing a commit or rollback in stored code without obtaining an approval from senior architects was defined as a gross misconduct in official company policy, and could get people fired.

What bad experience did you have with rowid? I have not run into problems with that yet (though I don't see it used much).

Gojko Adzic
http://www.gojko.com

Interesting article. I work with a lot of code that is never called by anything outside the database - for example, internal data archiving sp's that are kicked off by Oracle Scheduler, never interact with anything outside the database, and log their exceptions to a table. In this case, the calling program is always a PL/SQL stored procedure or an anonymous PL/SQL block, and it's still good practice to have any functions that it calls return error codes and/or human readable messages.

Kevin Meade's picture

Clearly there are supporters of the practice and those who are not. I for one have been burned by it too many times when working in an environment of sloppy programmers. I find it highly unintuitive and counter to the efficient and clean Exception Handling concept exposed by PL/SQL. Indeed, I have been practicing Oracle for 25 years now, and pl/sql since before it was called pl/sql (one failed early attempt by Oracle was SPUFI (Sql Procedural User Friendly Interface)). In all my days I have never found the need nor value of this practice. I find it is implemented by programmers who are simply copying old practices from legacy systems because they don't know any better (or dont' want to know any better).

gojko: ROWIDs change. Yes they do. Skipping over the history of ROWID over the years, try deleting a row and then reinserting it. Try exporting your database and importing it to make a copy. ... ROWID is a system value, not a data value. Using system values as data will eventually fail. There is little that can be done with ROWID that cannot be done some other way. Why add such an abusable point of failure when it can be avoided. It still see people building systems that read ROWID and save it to a table so they can "get the rows the fastest way possible later". After several recoveries and migrations, they want to look up a row and can't find it, then wonder why, eventually realizing the associations they created are gone forever (uhoh!).

Then there is the whole discussion on what does "select * from emp" really mean? Is emp a table, an external table, a view, an object view, a complex view, a materialized view, a synonym (even a packaged procedure if you are dealing with Oracle Forms). And most importantly, why should we care what kind of object emp is? "select * from emp" should just return some emp rows. OK, so what will "select rowid from emp" return? If you think this will bring back a rowid, you are right, as long the underlying object can resolve back to a table from which to get a ROWID. Otherwise, it breaks... Why can it break? What did the programmer do wrong? Answer: in using ROWID, the programmer tightly coupled code to the type of object being queried. Thus when the type of object being queried by the code changes, the code breaks.

So why would the type or rowsource that a rowsource is, change? I think I'll write an article about this, thanks, Kevin.

gojko's picture

Quote:
ROWIDs change. Yes they do. Skipping over the history of ROWID over the years,

Kevin - it looks like I was lucky never to run into a table with stored refererence ROWIDs so far. The only use of ROWID I've seen (and applied) is for processing a large number of records in a loop - I typically fetch ROWID from the cursor and then update rows by ROWID, in order not to use the index twice. Would you count this also as bad practice (does it fall into "anything" from "use of ROWID for anything")?

Gojko Adzic
http://www.gojko.com

Kevin Meade's picture

Gojko: you are clearly an advanced developer. This use of ROWID is an advanced practice, and has long been favored by many as a means of providing their applicationS with a slight performance boost. Considered safe by many, is does in fact have a serious flaw. To understand why you have to go beyond a programmers mind set and appreciate the inter-relations of data management and application development, and how the two can help each other or hurt each other. This could be a long discussion but I shall try to be succinct.

A good data model translates into application code that is easier to develop, test, deploy, and extend. A bad data model does the opposite. What do we learn from this: we learn that certain data practices can make a developer’s job easier (usually much easier), or harder (always much harder).

But the reverse is also true: that certain programming practices can place restrictions on what a data management team can do with a database. Use of ROWID is a restricting practice. Why? Because it presumes to know the type of "resolved" object being referenced. Here is an example:

select * from emp;

What is EMP? Is it a table?

If you do "select rowid rwid,emp.* from emp;" then you'd better hope it is a table or at least something that oracle can resolve to an underlying table from which to get a rowid. Otherwise your code is hosed. In turn, in order not to hose your code, I as a data manager (DBA, Architect, or whatever), I can't exchange this emp table for some alternative rowsource that does something cool because then the table would be gone and that would cause your code to fail because your code expects the underlying object to be resolvable to a table.

Oh man! That stinks!. How am I as a data guy supposed to do magic in the database if I have no control over the rowsources presented to the outside world? Consider this following sequence of SQL:

create table emp (
empno number not null
,empname varchar2(10) not null
)
/

alter table emp
add primary key (empno)
add unique (empname)
/

insert into emp values (1,'Gojko')
/

commit
/

SQL> select * from emp
2 /

EMPNO EMPNAME
---------- ----------
1 Gojko

Nothing special here, just an emp table containing the name of a new friend I met recently online in a discussion forum on OraFaq. But now check this out.

create table emp_lt
as
select
emp.*
,cast(null as varchar2(1)) delete_flag
from emp
where rownum = 0
/

alter table emp_lt
add primary key (empno)
/

create index emp_lt_xk1 on emp_lt(empname)
/

desc emp_lt

/*
SQL> desc emp_lt
Name Null? Type
----------------------------------------- -------- --------------
EMPNO NOT NULL NUMBER
EMPNAME NOT NULL VARCHAR2(10)
DELETE_FLAG VARCHAR2(1)
*/

rename emp to emp_table
/

create or replace
view emp
as
select *
from emp_table
where not exists
(
select null
from emp_lt
where emp_table.empno = emp_lt.empno
)
union all
select empno,empname
from emp_lt
where delete_flag is null
/

create or replace
trigger ioiup_emp
instead of insert or update or delete on emp
for each row
begin
if inserting then
update emp_lt set
empname = :new.empname
,delete_flag = null
where empno = :new.empno
and delete_flag = 'Y';
if sql%rowcount = 0 then
insert into emp_lt
values
(
:new.empno
,:new.empname
,null
);
end if;
elsif updating then
update emp_lt set
empname = :new.empname
where empno = :old.empno
and delete_flag is null;
if sql%rowcount = 0 then
insert into emp_lt
values
(
:new.empno
,:new.empname
,null
);
end if;
else
update emp_lt set
delete_flag = 'Y'
where empno = :old.empno;
if sql%rowcount = 0 then
insert into emp_lt
values
(
:old.empno
,:old.empname
,'Y'
);
end if;
end if;
end;
/
show errors

Hmm... What is all this. This is known as a HYPOTHETICAL VIEW layer. This view uses an instead of trigger to redirect changes on a table to a "differential" table instead. This has many uses, indeed, it is so powerful, Oracle created an entire product around it called Oracle Workspace Manager (check it out). OWM is done using hypothetical views (done Oracle's way of course).

With hypothetical views, I as a data base guy can offer support for advance things like: LONG TRANSACTIONS, unlimited and independent but synchronized testing logons, and user controlled on demand database restore to prior states. If I wanted to make the implementation more sophisticated (as in OWM), then I can even support point in time queries and primitive transaction event microscopes. And the list goes on...

But best of all, I can do all this for any application system without asking you, the application developer, to change a single line of your code, because the database does it all behind the scenes in a transparent manner. The trick is that to your applications, "select * from emp" is still "select * from emp"... right? Here, this proves it...

SQL>
SQL> select * from emp;

EMPNO EMPNAME
---------- ----------
1 Gojko

SQL> select * from emp_table;

EMPNO EMPNAME
---------- ----------
1 Gojko

SQL> select * from emp_lt;

no rows selected

SQL>
SQL> insert into emp values (2,'Kevin')
2 /

1 row created.

SQL>
SQL> select * from emp;

EMPNO EMPNAME
---------- ----------
1 Gojko
2 Kevin

SQL> select * from emp_table;

EMPNO EMPNAME
---------- ----------
1 Gojko

SQL> select * from emp_lt;

EMPNO EMPNAME D
---------- ---------- -
2 Kevin

SQL>
SQL> update emp set
2 empname = 'The G.'
3 where empname = 'Gojko'
4 /

1 row updated.

SQL>
SQL> select * from emp;

EMPNO EMPNAME
---------- ----------
2 Kevin
1 The G.

SQL> select * from emp_table;

EMPNO EMPNAME
---------- ----------
1 Gojko

SQL> select * from emp_lt;

EMPNO EMPNAME D
---------- ---------- -
2 Kevin
1 The G.

SQL>
SQL> delete from emp where empname = 'The G.'
2 /

1 row deleted.

SQL>
SQL> select * from emp;

EMPNO EMPNAME
---------- ----------
2 Kevin

SQL> select * from emp_table;

EMPNO EMPNAME
---------- ----------
1 Gojko

SQL> select * from emp_lt;

EMPNO EMPNAME D
---------- ---------- -
2 Kevin
1 The G. Y

SQL>
SQL> rollback
2 /

Rollback complete.

SQL>
SQL> select * from emp;

EMPNO EMPNAME
---------- ----------
1 Gojko

SQL> select * from emp_table;

EMPNO EMPNAME
---------- ----------
1 Gojko

SQL> select * from emp_lt;

no rows selected

SQL>

This little sequence of changes shows how the view accepted your app's changes, and rerouted them to the differential table yet still presented you with a correct version of the EMP rowsource after each change. Transaction Atomicity is maintained as witnessed by the rollback. The Original data was never touched but all your changes looked like they were made. If you are shaking you head thinking why would anyone do this, then read up on Oracle Workspace Manager, and things like LONG TRANSACTIONS. Trust me, there is a world of value in this and its free to the world outside the database, no program changes required. All I had to do was run a script to generate some code and then apply it, (an hour worth of work maybe).

Just imagine how easy it is to support cycling test environments during an application system upgrade, where you have to "reset" a test database back to a known state for each cycle (aka. every time a bug is fixed and the test suite must run again). Instead of going to some support area, asking for a database restore and then waiting days for them to put it on their schedule; your users simply press a button, and all data is deleted from the differential tables thus "resetting" your database back to its starting state. At least that is what their application code thinks.

The Oracle database is just so cool!

SQL>
SQL> select rowid from emp;
select rowid from emp
*
ERROR at line 1:
ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY,
etc.

SQL>

Hi Kevin,

Refer your post dated 2006-10-04 20:23 in response to Gojko's post.

"Just imagine how easy it is to support cycling test environments during an application system upgrade, where you have to "reset" a test database back to a known state for each cycle (aka. every time a bug is fixed and the test suite must run again). Instead of going to some support area, asking for a database restore and then waiting days for them to put it on their schedule; your users simply press a button, and all data is deleted from the differential tables thus "resetting" your database back to its starting state. At least that is what their application code thinks."

The above situation is a very familiar situation I face in daily work. We have recently got into Oracle as a base for some of our clients and this is one of the issues that I have been pondering on. The current practice, in most places I have seen so far seems to be a restore of a DB when all that is needed is to bring the database to a known state for each cycle. Despite talking to a few Oracle DBAs, I am none the wiser. Reading the post response above, I have a glimmer of hope that maybe, just maybe there are people who know how to do this without having to do a DB Restore.

Can you elaborate on that? if you think this should be a separate discussion, I would be glad to start a separate one. Awaiting your valuable views...

Cheers
Dev

Kevin Meade's picture

There are many restrictions to this product. You will have to review your database to see how much work will be involved to get it going for you. If you have questions, just email me (can we do that frm this forum?).

Kevin Meade

gojko's picture

Kevin - first of all, thanks for taking the time to write such a long followup. I see your point of view now, and completely agree that using rowid is a dirty hack, and as any other dirty hack, should be used cautiously and avoided in everyday work. I've used that technique in the past many times, mostly for ad-hoc batch updates directly on tables, and have never really thought about it from the perspective of higher level abstractions. Thanks again!

Gojko Adzic
http://www.gojko.com

Kevin Meade's picture

Another IT professional makes the leap to a higher data plain.

But lest we get a nose bleed, we should also say, that every tool has its place, even ROWID I suppose. If by ad-hoc batch update you mean some kind of data oneshot, written at the time of need, to be used only once, then it would seem a dirty hack such as ROWID might be quite useful, especially if all the "hackers" doing oneshots for you are familiar with the practice. There would be no risk of your code failing later because you aren't planning on using it later. You would instantly know if ROWID was workable.

Under such circumstances I would not suggest changing the way you work (I am reasonable). Indeed, I am glad your response, in agreement, still recognizes that ROWID has a place.

Thanks buddy, hope to see you again here. And thanks again for being brave enough to takle the concepts of your original post. A useful article you created. Please write something else soon.