Wednesday, September 14, 2011

XA resources and Transaction Timeout

Interesting case:

We had a couple of JMS messages stuck in a queue, and CPU 100%.

JMS message state: receive transaction

JMSXDeliveryCount 2060

The transactions were of very long duration, above 100 seconds, and they would end with an application rollback (not a Timeout Rollback):


####<Sep 13, 2011 12:00:12 AM BST> <Error> <EJB> <myserver.unix.acme.com> <pluto_AdminServer> <[ACTIVE] ExecuteThread: '11' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1315868412458> <BEA-010026> <Exception occurred during commit of transaction Xid=BEA1-556D788B8665743465EB(33150705),Status=Rolled back. [Reason=oracle.jdbc.xa.OracleXAException],numRepliesOwedMe=0,numRepliesOwedOthers=0,seconds since begin=124,seconds left=60,XAServerResourceInfo[WLStore_pluto01_Domain__WLS_pluto_AdminServer]=(ServerResourceInfo[WLStore_pluto01_Domain__WLS_pluto_AdminServer]=(state=rolledback,assigned=pluto_AdminServer),xar=WLStore_pluto01_Domain__WLS_pluto_AdminServer28341129,re-Registered = false),XAServerResourceInfo[rmsscottish]=(ServerResourceInfo[rmsscottish]=(state=rolledback,assigned=pluto_AdminServer),xar=rmsscottish,re-Registered = false),SCInfo[pluto01_Domain+pluto_AdminServer]=(state=rolledback),local properties=({weblogic.jdbc.jta.rmsscottish=[ No XAConnection is attached to this TxInfo ]}),OwnerTransactionManager=ServerTM[ServerCoordinatorDescriptor=(CoordinatorURL=pluto_AdminServer+10.3.75.186:7001+pluto01_Domain+t3+, XAResources={WLStore_pluto01_Domain__WLS_pluto_AdminServer, WLStore_pluto01_Domain_FileStore, WLStore_pluto01_Domain_WseeFileStore, rmsscottish},NonXAResources={})],CoordinatorURL=pluto_AdminServer+10.3.75.186:7001+pluto01_Domain+t3+): weblogic.transaction.RollbackException: Could not prepare resource 'rmsscottish
at weblogic.transaction.internal.TransactionImpl.throwRollbackException(TransactionImpl.java:1818)
at weblogic.transaction.internal.ServerTransactionImpl.internalCommit(ServerTransactionImpl.java:336)
at weblogic.transaction.internal.ServerTransactionImpl.commit(ServerTransactionImpl.java:230)
at weblogic.ejb.container.internal.MDListener.execute(MDListener.java:500)
at weblogic.ejb.container.internal.MDListener.transactionalOnMessage(MDListener.java:371)
at weblogic.ejb.container.internal.MDListener.onMessage(MDListener.java:327)
at weblogic.jms.client.JMSSession.onMessage(JMSSession.java:4547)
at weblogic.jms.client.JMSSession.execute(JMSSession.java:4233)
at weblogic.jms.client.JMSSession.executeMessage(JMSSession.java:3709)
at weblogic.jms.client.JMSSession.access$000(JMSSession.java:114)
at weblogic.jms.client.JMSSession$UseForRunnable.run(JMSSession.java:5058)
at weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:516)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
Caused by: oracle.jdbc.xa.OracleXAException
at oracle.jdbc.xa.client.OracleXAResource.prepare(OracleXAResource.java:679)
at weblogic.jdbc.wrapper.VendorXAResource.prepare(VendorXAResource.java:70)
at weblogic.jdbc.jta.DataSource.prepare(DataSource.java:885)
at weblogic.transaction.internal.XAServerResourceInfo.prepare(XAServerResourceInfo.java:1277)
at weblogic.transaction.internal.XAServerResourceInfo.prepare(XAServerResourceInfo.java:500)
at weblogic.transaction.internal.ServerSCInfo$1.run(ServerSCInfo.java:361)
... 3 more
.>


The transaction monitoring page would say:


This page shows information about current transactions coordinated by the server or in which server resources participate.

Transaction statistics(Filtered - More Columns Exist)

Transaction ID Status Seconds Active XA Resources Servers

BEA1-004042C8BFB5743465EB Active 178 {myds=ended, WLStore_pluto01_Domain__WLS_pluto_AdminServer=suspended} {pluto01_Domain+pluto_AdminServer=active}

BEA1-004442C8BFB5743465EB Active 80 {myds=ended, WLStore_pluto01_Domain__WLS_pluto_AdminServer=suspended} {pluto01_Domain+pluto_AdminServer=active}



It turns out that the Proxy Service consuming the messages has "Force XA transaction" set to true, and myds was a Data Source with a JDBC driver XA compliant, hence it was enrolled in the transaction even if the DB was used only for a lookup.

I am persuaded that the DB was throwing an exception because it had already internally timed out the transaction (XA timeout is 60 seconds by default).

Replacing the JDBC driver with a non-XA avoided enrolling the DB in the XA transaction, hence no more rollbacks.

Incidentally when we first restarted the server, the transaction recovery service recovered the 2 previous transactions, and they got stuck because now the DS was not XA compliant any more. Restarting again fixed the problem, because all new transactions would not enroll the DB any more.

Interestingly, the JTA timeout at domain level is set to 30 seconds, but this timeout doesn't refer to the whole duration of the 2 Phase Commit transaction, it refers only to the "prepare" phase. So the Transactions can well last 100 seconds without JTA complaining about it. What you should look at is the "abandon timeout period" which is by default enormously high.

No comments: