Sunday, May 2, 2010

Recoverable vs. Unrecoverable System Faults

My overall problem is:

I invoke an external service and receive a fault.
How do I distinguish between a System Fault and a Business Fault?
If another process is updating a DB table, locking the data so I go in timeout.... is this a Business Fault or a System Fault?

In case of any fault:
- Should I retry after some time, with the same parameters?
- Should I retry after some time, tweaking some parameters?
- Should I simply give up?


Another example:
BEA-380002  means "unable to connect".... it the series of endpoints configured for the Business Service has been exhausted, and the number of retries configured is finished, then I should simply give up.

Anyway one should go over this http://download.oracle.com/docs/cd/E11036_01/alsb30/messages/alsb/kernel/l10n/TransportKernel.html list of transport errors and tell me if we can retry them or not.


OSB gives you a very limited set of policies on a Business Service: 
retry count
retry interval
retry application errors (i.e. soap faults, this is a new feature in 3.0).


Reading Oracle SOA Suite Developer's Guide (Error Handling chapter) I learn that:
Oracle BPM has a Fault Management Framework,
where you can define FaultPolicy elements in a XML file,
with Fault Condition and Actions. The Fault Condition classifies a Fault Type, and the Action defines how to handle this specific Fault Type.

The Condition is based on FaultName (as in the WSDL "fault" clause) and FaultCode (as from the $fault/faultcode element). It can also be expressed with a test clause where you extract part of the fault message using XPath expression and check its content, as in:
<test>$fault.payload/tns:fault/tns:code="380002"</test>

The Action can be: retry, humanIntervention, rethrow, abort, replayScope, or javaAction.
Each Action type iscomplemented by a specific parameter set (see XSD after).

The whole enchilada is explained here: 

You provide a fault-policies.xml file and a fault-bindings.xml file. In the fault-bindings you define which policies apply to each process.

Some more concepts are expressed here http://download.oracle.com/docs/cd/E12839_01/integration.1111/e10224/med_faulthandling.htm,  this link contains also a very precious XSD of  policies and bindings.


A quick link to the book




No comments: