Thursday, May 16, 2013

ServerFailureTriggerMBean, OverloadProtectionMBean, numberOfStuckThreads

excellent discussion on server health and stuck threads from user csoto : https://forums.oracle.com/forums/thread.jspa?messageID=10074875


Here is the Javadoc of ServerFailureTriggerMBean .

If stuck threads are more than 25 (configurable value), WebLogic goes in failed state.

Make sure "Auto Kill If Failed attribute" is false in "Health Monitoring".

http://weblogicserveradministration.blogspot.com/2010/10/server-overload-protection-actions.html

See also http://www.javamonamour.org/2012/08/weblogic-finding-health-state-through.html

The WLST to achieve this is:

cd('/Servers/osbdev1ms1/OverloadProtection/osbdev1ms1')
cmo.setFailureAction('force-shutdown')

cd('/Servers/osbdev1ms1/OverloadProtection/osbdev1ms1/ServerFailureTrigger/osbdev1ms1')
cmo.setMaxStuckThreadTime(600)
cmo.setStuckThreadCount(25)



In fact, it seems to me better to act at cluster level, I don't really want to act on the Admin server:

edit()
startEdit()
cd('/')


cd('/Clusters/' + clustername + '/OverloadProtection/' + clustername)
cmo.setFailureAction('force-shutdown')

cd('/Clusters/' + clustername + '/OverloadProtection/' + clustername + '/ServerFailureTrigger/' + clustername)
cmo.setMaxStuckThreadTime(600)
cmo.setStuckThreadCount(25)


cd('/')
serverList=cmo.getServers()
for server in serverList:
    name=server.getName()
    cd('/Servers/' + name)
    cmo.setAutoKillIfFailed(true)
    cmo.setAutoRestart(true)

save()
activate()





This worked like a charm: when there are more than 25 stuck threads, the server state goes to FAILED and it's immediately restarted.

PS somehow the JSP doesn't seem to be able to generate threads which are marked as STUCK. I had to write a OSB HTTP proxy invoking a Thread.sleep(1000000) in a Java callout, to be able to see the threads as STUCK and the server restarted. Weird.

No comments: