Java mon amour: October 2013

Thursday, October 31, 2013

WLST: expand the PYTHONPATH

Official doc here: http://docs.python.org/2/tutorial/modules.html#the-module-search-path
I run this test:


/opt/oracle/fmw11_1_1_5/osb/common/bin/wlst.sh

wls:/offline> print sys.path


['/opt/oracle/fmw11_1_1_5/wlserver_10.3/server/lib/weblogic.jar/Lib'
 '__classpath__'
 '/opt/oracle/fmw11_1_1_5/wlserver_10.3/server/lib/weblogic.jar'
 '/opt/oracle/fmw11_1_1_5/wlserver_10.3/common/wlst/modules/jython-modules.jar/Lib'
 '/opt/oracle/fmw11_1_1_5/wlserver_10.3/common/wlst'
 '/opt/oracle/fmw11_1_1_5/wlserver_10.3/common/wlst/lib'
 '/opt/oracle/fmw11_1_1_5/wlserver_10.3/common/wlst/modules'
 '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst'
 '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/lib'
 '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/modules'
 '/opt/oracle/fmw11_1_1_5/oracle_common/common/script_handlers'
 '/opt/oracle/fmw11_1_1_5/osb/common/wlst'
 '/opt/oracle/fmw11_1_1_5/osb/common/wlst/lib'
 '/opt/oracle/fmw11_1_1_5/osb/common/wlst/modules'
 '.']

not all those folder/files actually exist by default:

/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/modules , /opt/oracle/fmw11_1_1_5/osb/common/wlst/modules

don't exist...
but
/opt/oracle/fmw11_1_1_5/wlserver_10.3/common/wlst/modules
exists, and that's where I would put my own modules.
However, you can RUNTIME add modules. I put a module cmdb.py in /opt/oracle/usr. It contains a function "getwhoami":

sys.path.append('/opt/oracle/usr')
wls:/offline> from cmdb import getwhoami
wls:/offline> print getwhoami()
soa

yahoo email forwarding not working

Recently yahoo apparently had big trouble with a new release, of which I have been a victim.

No more email forwarded to my gmail account for some 15 days, leaving me really puzzled "hey, why nobody is answering to my emails?"

Click on the cogwheel on the top right, choose "settings", where it says "Yahoo account " click on the Edit button on the right, and check that the "Forward" radio is selected, with the proper email.

In the same Account page, make sure that Yahoo didn't replace the main Accounts email with something weird (in my case it replaced it with a very old hotmail account)...

I have read somewhere that in Yahoo there is a culture of "don't deploy anything new to PROD for fear of breaking it", with only few mega-releases once in a while. I don't know if this is true, but by the look of it it seems reasonable.. these MAJOR botch-ups are typical of companies that deliver too many changes at a time and don't have a culture of incremental small frequent changes.

Tuesday, October 29, 2013

OSB thread management across LOCAL Proxy Service invocation

I grew in the faith that a Publish (one way) operation would be carried in a separate thread form the calling thread, even much so when you use QoS = Best Effort and no transactions.
Here http://docs.oracle.com/cd/E23943_01/admin.1111/e15867/modelingmessageflow.htm#OSBAG181 at chapter 37.12.1.3 it says:

The Oracle Service Bus threading model works as follows:

    The request and response flows in a proxy service execute in different threads.

    Service callouts are always blocking. An HTTP route or publish action is non-blocking 
    (for request/response or one-way invocation), if the value of the qualityOfService 
    element is best-effort.

    JMS route actions or publish actions are always non-blocking, but the response 
    is lost if the server restarts after the request is sent because Oracle Service Bus 
    has no persistent message processing state.

I tried from a HTTP ServiceA to do:

a service callout
a publish
a route

to a LOCAL ServiceB, and with a Java callout I was getting the thread name in ServiceA and ServiceB. This either with Transaction Required enabled or not (on both services), and with Qos = Exactly Once or Best Effort.
In ALL cases, much to my surprise, the same thread was executing both ServiceA and ServiceB.

Deleting a million directories in Linux

This morning a component went ballistic and created more than a million folders under /path/to/myfolder, until the file system was completely full.
this command was showing 100% of Inodes used:
watch df -i /opt/oracle/domains/osbpr1do/shared/apps/fileadapter/controlDir/fileftp/controlFiles
You could delete folder/files in several way, most of them VERY inefficient:

#this takes forever and deletes also the parent folder
rm -rf /path/to/myfolder

#this funnily fails with "file not found", most likely because there are 
#special characters like a equal sign = in the folder name
find /path/to/myfolder -type d -mtime +5 -exec ls -ltr {} \;

#of course I have tried also with quotes, with same result
find /path/to/myfolder -type d -mtime +5 -exec ls -ltr "{}" \;

#I have tried deleting only subsets
rm -rf /opt/oracle/domains/osbpr1do/shared/apps/fileadapter/controlDir/fileftp/controlFiles/a*
#but the problem is that the shell expands the a*, and this can resolve to too many argumnent
#so the command fails


#this is very simple but deletes also 
find /path/to/myfolder/ -type d -delete

then I read this article and I tested this:

rsync -av --delete --remove-source-files /tmp/empty/ /path/to/myfolder/

where /tmp/empty/ is an empty folder, and it works like magic. Reading around it says that this is due to the fact that rsync has a LIFO way of reading file system info, rather than FIFO.

Monday, October 28, 2013

maven-release-plugin quirks... maven hell as usual


27-Oct-2013 10:51:49       [exec] [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-release-plugin:2.0:prepare (default-cli) 
on project OSBJavaCustomXPaths: Cannot prepare the release because you have local modifications :
27-Oct-2013 10:51:49       [exec] [ERROR] [build-number.txt:unknown]
27-Oct-2013 10:51:49       [exec] [ERROR] -> [Help 1]
27-Oct-2013 10:51:49       [exec] [ERROR]
27-Oct-2013 10:51:49       [exec] [ERROR] To see the full stack trace of the errors, 
re-run Maven with the -e switch.
27-Oct-2013 10:51:49       [exec] [ERROR] Re-run Maven using the -X switch to 
enable full debug logging.
27-Oct-2013 10:51:49       [exec] [ERROR]
27-Oct-2013 10:51:49       [exec] [ERROR] For more information about the errors and possible solutions,
 please read the following articles:
27-Oct-2013 10:51:49       [exec] [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
27-Oct-2013 10:51:49  
27-Oct-2013 10:51:49  BUILD FAILED

look no further: just add this SVN properties to your project root (trunk) folder:

svn:ignore

target
build-number.txt

The local modifications thing is just BS, ignore it.

Quick personal wiki with ZIM

the better alternative to text files in Notepad++:
http://www.glump.net/software/zim-windows
you can save the wiki files in a local Dropbox folder, so you can automatically recover them from any other location.

Sunday, October 27, 2013

WebLogic JMS connection factory: Maximum Messages per Session caveat

In OSB, a JMSProxy Service will generate 16 consumers in a queue. This should enable a tremendous message processing parallelism.
Not necessarily.
The way JMS messages are dispatched to the consumers is regulated by the Connection Factory property "Maximum Messages per Session", which defaults to 10 (see "client" tab).

The console help says:

The maximum number of messages that can exist for an asynchronous session 
and that have not yet been passed to the message listener. 
When the Synchronous Prefetch Mode is enabled, 
this value also affects synchronous sessions with a message consumer 
that will prefetch messages in one server access.

(the whole story is told here)

So in our case a single MDB prefetched 10 messages, 1 went to current mode and the others to pending, and the single thread associated to that consumer went stuck on a resource, and the other 9 jms messages stood pending forever.

This is clearly a huge breach of the parallel processing paradigm, and it impacts also SLA on a conspicuous number of messages. We shall look at the opportunity to set this parameter to 1, disabling so the batching delivery mode for JMS messages.

I am not 100% sure that THIS parameter is actually guilty, but I don't see any other explanation.

See also "WebLogic Server (WLS) Support Pattern: Troubleshooting Pending JMS Messages (Doc ID 1204064.1)":

"This message buffer or pipeline can be configured using the MessagesMaximum parameter in the Connection Factory settings (http://download.oracle.com/docs/cd/E13222_01/wls/docs81/jms/implement.html#1080748). The default value of this parameter is 10 per instance of the asynchronous client. In case of a MDB, each instance is a separate client. Hence, if there are 15 instances of a given MDB, there can be 10*15=150 pending messages buffered on the MDB side in the message pipeline."

These buffered messages will not be consumed by the client if the client happens to be hanging for any reason while processing a message. Other messages in the pipeline will wait and contribute to the number of pending messages. Each such hanging thread corresponds to a hanging asynchronous client and contributes to pending messages equal to the MessagesMaximum parameter. For instance, 5 threads hanging in onMessage() method will contribute to 50 pending messages, assuming the MessagesMaximum parameter has the default value of 10.

You can also reduce the value of the MessagesMaximum property to 1 from the default value of 10. This means that there will be no messages in the pipeline. A second message will be delivered to the asynchronous consumer only after the first message processing has been completed. This can have some performance impact.

It is important to note that this MessagesMaximum property defined on the Connection Factory is different from the MessagesMaximum property for a JMS destination. The property for a JMS destination is used to configure the maximum number of messages that can be stored in the destination.

In WLST:

jmsmodule='PVModule'
jmsCF='PVCF'

cd('/JMSSystemResources/' + jmsmodule + '/JMSResource/' + jmsmodule + '/ConnectionFactories/' + jmsCF + '/ClientParams/' + jmsCF)
cmo.setMessagesMaximum(1)

once I do this, only max 16 messages (= the number of consumers) are in Pending state at any time, the other messages are current.

Saturday, October 26, 2013

OSB Blackbox, the simplest possible tool to monitor in-flight requests

A common issue we have in PROD is that we have occasionally stuck threads and we don't have a clue which payload they are processing.
In the logs, in the STUCK thread notification you get a Thread Name (Execute Thread 27...) and a long stacktrace from which you can only understand if it was processing a JMS message or a file, but not which Proxy Service was being executed.
We do log all incoming requests, but it is not immediate to associate the Stuck Thread to them.
So I have come up with a simple Custom XPath which should be invoked at the beginning of each Proxy Service - or simply in the MessageTracker service that we invoke to trace every incoming request - to store ThreadName, ProxyName (we call this "InterfaceID"), payload and some unique identifier (like PONumber, CustomerID....) that we call technicalMessageID and BusinessID.
This is the - REALLY bare bones - implementation:

This is the Blackbox, the entry point of all our calls. It holds a RequestMap containing all our in-flight requests

package com.acme.osb.logging;

import java.util.Date;

import org.apache.xmlbeans.XmlObject;

/**
 * Keeps in memory the execution status of all current requests
 * @author PIPPO
 *
 */
public class Blackbox {
 public static final RequestMap requests = new RequestMap();
 
 /**
  * Archive a request in our Map 
  * @param threadName
  * @param startTime
  * @param interfaceId
  * @param businessId
  * @param payload
  * @return
  */
 public static String trackRequest(String interfaceId, String technicalId, String businessId, XmlObject payload) {
  String threadName = getThreadName(); 
  return trackRequestWithThreadName(threadName, interfaceId, technicalId, businessId, payload);
 }

 public static String trackRequestWithThreadName(String threadName, String interfaceId, String technicalId, String businessId, XmlObject payload) {
  RequestDescriptor rd = new RequestDescriptor(threadName, new Date(), interfaceId, technicalId, businessId, payload);
  requests.put(threadName, rd);
  return threadName;
 }
 
 public static String getThreadName() {
  return Thread.currentThread().getName();
 }
 
 public static String dumpBlackbox() {
  StringBuffer result = new StringBuffer();
  result.append("size=").append(requests.size()).append("\n");
  result.append(requests.toString());
  return result.toString();
 }
 
 public static String clearBlackbox() {
  requests.clear();
  return "OK";
 }
 
}

This is the RequestDescriptor, entry in the RequestMap to hold all info about an in-flight request

package com.acme.osb.logging;

import java.util.Date;

import org.apache.xmlbeans.XmlObject;

/**
 * Hold info about all incoming requests
 * @author PIPPO
 *
 */
public class RequestDescriptor {
 String threadName;
 Date startTime;
 XmlObject payload;
 String interfaceId;
 String technicalId;
 String businessId;
 
 
 
 public RequestDescriptor(String threadName, Date startTime, String interfaceId, String technicalId, String businessId, XmlObject payload) {
  super();
  this.threadName = threadName;
  this.startTime = startTime;
  this.payload = payload;
  this.interfaceId = interfaceId;
  this.technicalId = technicalId;
  this.businessId = businessId;
 }



 public String getThreadName() {
  return threadName;
 }



 public void setThreadName(String threadName) {
  this.threadName = threadName;
 }



 public Date getStartTime() {
  return startTime;
 }



 public void setStartTime(Date startTime) {
  this.startTime = startTime;
 }



 public XmlObject getPayload() {
  return payload;
 }



 public void setPayload(XmlObject payload) {
  this.payload = payload;
 }



 public String getInterfaceId() {
  return interfaceId;
 }



 public void setInterfaceId(String interfaceId) {
  this.interfaceId = interfaceId;
 }



 public String getBusinessId() {
  return businessId;
 }



 public void setBusinessId(String businessId) {
  this.businessId = businessId;
 }



 public String getTechnicalId() {
  return technicalId;
 }



 public void setTechnicalId(String technicalId) {
  this.technicalId = technicalId;
 }



 public String toString() {
  return "threadName=" + threadName + ", startTime=" + startTime.toString() + ", interfaceId= " + interfaceId + ", technicalId=" + technicalId + ", businessId=" + businessId + ", payload=" + payload.xmlText();
 }
 
}

This is the RequestMap, a simple collection with some utility method

package com.acme.osb.logging;

import java.util.Date;
import java.util.Enumeration;
import java.util.concurrent.ConcurrentHashMap;

/**
 * A self-managing container of requests
 * @author PIPPO
 *
 */

public class RequestMap {
 public static final int MAXSIZE = 100;
 
 long numberOfRemoval = 0; 
 
 public long getNumberOfRemoval() {
  return numberOfRemoval;
 }

 public void setNumberOfRemoval(long numberOfRemoval) {
  this.numberOfRemoval = numberOfRemoval;
 }

 ConcurrentHashMap requests = new ConcurrentHashMap();

 /**
  * Add an element to the map, removing oldest element if maxsize is reached
  * @param threadName
  * @param rd
  */
 public void put(String threadName, RequestDescriptor rd) {
  requests.put(threadName, rd);
  if (requests.size() > MAXSIZE) {
   RequestDescriptor oldest = findOldestEntry();
   requests.remove(oldest.getThreadName());
   numberOfRemoval ++;
  }
 }
 
 /**
  * Return the entry with the oldest timestamp
  * @return
  */
 public RequestDescriptor findOldestEntry() {
  RequestDescriptor result = null;
  Date oldestDate = new Date();
  Enumeration en = requests.elements();
  while (en.hasMoreElements()) {
   RequestDescriptor rd = en.nextElement(); 
   if (rd.getStartTime().before(oldestDate)) {
    oldestDate = rd.getStartTime();
    result = rd;
   }
  }
  return result;
 }
 
 public int size() {
  return requests.size();
 }
 
 public String toString() {
  StringBuffer result = new StringBuffer();
  Enumeration en = requests.elements();
  while (en.hasMoreElements()) {
   RequestDescriptor rd = en.nextElement();
   result.append(rd.toString());
  }
  return result.toString();
 }

 public void clear() {
  requests.clear();
 }
 
}

In the RequestMap I had to do some alchemy to make sure we never store more than 100 concurrent requests... I am paranoid about memory leaks!
You can expose the dumpBlackbox() method with a JMX agent, or more simply with a HTTP Proxy service. In case the WebLogic AppServer was completely unresponsive, probably JMX is more robust. An approach could be also to implement a "kill -3" (SIGBREAK) handler at JVM level - to dump not only the ThreadDump but also the OSB dump.

NOTA BENE: if you have huge payloads, this could lead to a severe hit on your memory.... be careful...

Linux directory read and execute bits

file pippo.txt is in /vagrant/one/two, and it belongs to root, just like directories one and two.

[root@osb-vagrant vagrant]# ls -Rlt /vagrant
/vagrant:
total 4
drwxr-xr-x 3 root root 4096 Oct 26 08:29 one

/vagrant/one:
total 4
drwxr-x--x 2 root root 4096 Oct 26 08:29 two

/vagrant/one/two:
total 0
-rw-r--r-- 1 root root 0 Oct 26 08:29 pippo.txt

Question: will user vagrant be able to do ls /vagrant/one/two?
Answer: NO

[vagrant@osb-vagrant vagrant]$ ls /vagrant/one/two/
ls: cannot open directory /vagrant/one/two/: Permission denied

Why not? Because the "two" read bit is not set. It is set on "one" however". The "read" bit for a folder means "let me list its content". However, user vagrant can "cat /vagrant/one/two/pippo.txt", because the read bit is set on pippo.txt, and the execute bit is set on "two".
If I remove the execute bit on "two":

chmod 770 /vagrant/one/two
ls -ltr /vagrant/one/two
total 0
-rw-r--r-- 1 root root 0 Oct 26 08:29 pippo.txt

then I even lose the right to view pippo.txt content, although the file itself is readable for vagrant.

[vagrant@osb-vagrant vagrant]$ cat /vagrant/one/two/pippo.txt
cat: /vagrant/one/two/pippo.txt: Permission denied

To recap: execute bit on a folder allows me to "traverse it". read bit on a folder allows me to view its content. This is very un-intuitive and derives from an overloaded use of bits which were originally meant for files. Files and directories are totally different beasts, so they should be modeled differently.

How about deleting files? It's not enough that you have "write" access to the file: you should also have execute access to the folder.

See also these excellent tutorials http://www.hackinglinuxexposed.com/articles/20030417.html http://www.hackinglinuxexposed.com/articles/20030424.html

Thursday, October 24, 2013

SQLDeveloper 4.0 preview: unable to run it

I have installed Java 7 and added 2 extra lines to sqldeveloper.bat:

cd C:\pierre\sqldeveloper-4\sqldeveloper\sqldeveloper\bin
sqldeveloper.bat

set JAVA_HOME=C:\pierre\Java\jdk1.7.0_45

set PATH=C:\pierre\Java\jdk1.7.0_45\bin

java -Xmx640M -Xms128M -Xverify:none -Doracle.ide.util.AddinPolicyUtils.OVERRIDE_FLAG=false -Dsun.java2d.
ddoffscreen=false -Dwindows.shell.font.languages= -XX:MaxPermSize=128M -Dide.AssertTracingDisabled=true -Doracle.ide.util.AddinPolicyUtils.OVERRIDE_FLAG=true -Djava.util.logging.config.file=logging.conf -Dsqldev.debug=false -Dide.conf="./sqldeveloper.conf" -Dide.startingcwd="." -classpath ../../ide/lib/ide-boot.jar oracle.ide.boot.Launcher

ERROR: You're trying to run the product with the legacy launcher oracle.ide.boot.Launcher . Check your .conf file and be sure to include:
        AddJavaLibFile  ../../ide/lib/fcpboot.jar
        SetMainClass    oracle.ide.osgi.boot.NbLauncher

I have added the 2 extra lines

AddJavaLibFile  ../../ide/lib/fcpboot.jar
SetMainClass    oracle.ide.osgi.boot.NbLauncher

to sqldeveloper.conf, but I still get the same error.
Giving up.
So sad.

Geeks on a dying planet: the Ocean is broken

This blog is mainly about technology, but let me occasionally reblog something about an extraordinary event, unique in the history of this planet: the complete destruction of every form of life perpetrated by the industry and the military. It seems like a science fiction movie, but it's history, our history.

What can we do. Be vegetarian. Live locally. Use a bicycle. Don't consume resources unless strictly necessary. Keep heating to a minimum. Privilege "being" to "having".

Getting started with Puppet - hands on tutorial

The problem with most of books/documentation on Puppet is the disproportion between the blablabla and the examples.
If you like me learn best by examples, then this tutorial (written by a colleague David Portabella) will be very useful.
Thank you David.

Tuesday, October 22, 2013

WLST: easy script to suspend or resume consumption from JMS queues

The painful thing of the traditional WLST scripts is is having to pass the JMSModule, JMSRuntime etc.
This script simply searches all available queues anywhere on a ManagedServer until it finds a match.


def controlConsumption(queue, suspend):
    found = False
    serverRuntime()
    myjmsruntimes = ls('/JMSRuntime/', returnMap='true')
    for myjmsruntime in myjmsruntimes:
        myjmsservers = ls('/JMSRuntime/' + myjmsruntime + '/JMSServers', returnMap='true')
        for myjmsserver in myjmsservers:
          jmsserverroot = '/JMSRuntime/' + myjmsruntime + '/JMSServers/' + myjmsserver + '/'
          cd(jmsserverroot) 
          cd('Destinations')
          mydestinations = ls('c', returnMap='true')
          #print mydestinations
          for mydestination in mydestinations:
            if (mydestination.endswith('@' + queue)):
              found = True
              #print "cding into", jmsserverroot + mydestination
              cd(jmsserverroot + 'Destinations/' + mydestination)
              if (suspend): 
                  cmo.pauseConsumption()
                  print "\n##############\nconsumption paused on ", mydestination, "\n##############\n"
              else:
                  cmo.resumeConsumption()
                  print "\n##############\nconsumption resumed on ", mydestination, "\n##############\n"
    if (not found):
        print "ERROR: queue ", queue, " could not be found"

where queue is just the Queue name (not the JNDI), and suspend is True or False.

WebLogic, debugging authorization issues

In weblogic server debug flags page, enable atz, and make yure your logging level is debug. IMPORTANT: to troubleshoot console issues, you should enable the flags and logs on the ADMIN, not on the Managed server.
For each operation you do on the console, you should see an entry like this, this one is for user weblogic, which is an Administrator:
####<Oct 22, 2013 11:13:21 AM CEST> <Debug> <SecurityAtz> <hqchacme104> <osbpl1ms1> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <9455361429c2e897:-165939bd:141df6a556f:-8000-000000000000003e> <1382433201784> <BEA-000000> <XACML Authorization isAccessAllowed(): input arguments:>
####<Oct 22, 2013 11:13:21 AM CEST> <Debug> <SecurityAtz> <hqchacme104> <osbpl1ms1> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <9455361429c2e897:-165939bd:141df6a556f:-8000-000000000000003e> <1382433201784> <BEA-000000> < Subject: 4
Principal = weblogic.security.principal.WLSUserImpl("weblogic")
Principal = weblogic.security.principal.WLSGroupImpl("Administrators")
Principal = weblogic.security.principal.WLSGroupImpl("IntegrationAdministrators")
Principal = weblogic.security.principal.WLSGroupImpl("AdminChannelUsers")
>
and then it will tell you:

Roles:AdminChannelUser, Anonymous, IntegrationAdmin, Admin

then something about the resource you are trying to access:
Resource: type=<jmx>, operation=get, application=, mbeanType=weblogic.management.runtime.ServerRuntimeMBean, target=PendingRestartSystemResources
then the policy applying to that resource:
urn:bea:xacml:2.0:entitlement:resource:type@E@Fjmx@G@M@Ooperation@Eget, 1.0 evaluates to Permit
the result of checking the policy:

XACML Authorization isAccessAllowed(): returning PERMIT

and again:
com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed AccessDecision returned PERMIT
In case something goes wrong, you will get the dreaded

XACML Authorization isAccessAllowed(): returning DENY

and
urn:bea:xacml:2.0:entitlement:resource:type@E@Fjmx@G@M@Ooperation@Einvoke@M@Oapplication@E@M@OmbeanType@Eweblogic.management.mbeanservers.edit.ConfigurationManagerMBean, 1.0 evaluates to Deny
where at the beginning we have the policy name: urn:bea:xacml:2.0:entitlement:resource:type@E@Fjmx@

WebLogic: check if a Group exists

When you create users and need to assign them to Groups, chances are that you will have also to dynamically create those groups. Luckily there is a function atnt.groupExists('somegroup').

This will work only if the JMSGroup doesn't exist:

conect(...)
atnr = cmo.getSecurityConfiguration().getDefaultRealm().lookupAuthenticationProvider('DefaultAuthenticator')
atnr.createGroup('JMSGroup', 'JMSGroup')

the second time you will get a "weblogic.management.utils.AlreadyExistsException: [Security:090267]Group JMSGroup" exception. You can decide to simply catch and ignore the exception.
If you do viewMBean(atnr) you will notice that there is a host of operations available:


setGroupDescription
changeUserPassword
setUserDescription
listMemberGroups
removeMemberFromGroup
groupExists
getGroupDescription
advance
getUserDescription
haveCurrent
listGroupMembers
unSet
getSupportedUserAttributeType
getUserAttributeValue
wls_getDisplayName
userExists
close
isSet
createGroup
listGroups
resetUserPassword
createUser
removeUser
addMemberToGroup
listAllUsersInGroup
setUserAttributeValue
importData
isMember
removeGroup
listUsers
exportData
isUserAttributeNameSupported
getCurrentName

so the code becomes:

#ROLES contains a CSV list of groups for the user USERNAME 
for role in ROLES.split(','):
    if not atnr.groupExists(role):
        atnr.createGroup(role, role)
        print "WARNING: I have  created group ", role
    print "adding ", USERNAME, "to group", role
    atnr.addMemberToGroup(role, USERNAME)

Saturday, October 19, 2013

keytool: export a private key + certificate to a PKCS12 store

I have a JKS store pippov2.dev.acme.com.jks contaning some trustedCert entries (caacme, caswisssign) and a private key (pippov2.dev.acme.com).

I want to be able to store separately the private key. Keytool allows you to export only to a PKCS12-type store:

keytool -importkeystore -srckeystore pippov2.dev.acme.com.jks -destkeystore new-store.p12 -deststoretype PKCS12

Enter destination keystore password:
Re-enter new password:
Enter source keystore password:
Problem importing entry for alias caacme: java.security.KeyStoreException: TrustedCertEntry not supported.
Entry for alias caacme not imported.
Do you want to quit the import process? [no]:
Problem importing entry for alias caswisssign: java.security.KeyStoreException: TrustedCertEntry not supported.
Entry for alias caswisssign not imported.
Do you want to quit the import process? [no]:
Enter key password for <pippov2.dev.acme.com>
Entry for alias pippov2.dev.acme.com successfully imported.
Import command completed: 1 entries successfully imported, 2 entries failed or cancelled

It's somtehing which take a LOOOOONG time, so be patient.

The file new-store.p12 is generated:

keytool -keystore new-store.p12 -list -storetype PKCS12

Enter keystore password:

Keystore type: PKCS12
Keystore provider: SunJSSE

Your keystore contains 1 entry

pippov2.dev.acme.com, Oct 19, 2013, PrivateKeyEntry,
Certificate fingerprint (MD5): 46:A7:6C:E5:13:4C:2F:7B:65:10:42:B0:3B:A9:B1:23

OSB : this service is not testable since all its operations require java arguments

I have created a simple Business Service to post a JMS message to a JMS queue.
When I test it from the Test Console being logged in with a user belonging to Administrators group, all is fine.
When I try with a user belonging to Operators, IntegrationOperators, Monitors, IntegrationMonitors it gives me a disable "debug" button and a funny message "this service is not testable since all its operations require java arguments".
Adding "IntegrationDeployers" group, logging out and in again seems to solve the issue.

Friday, October 18, 2013

Enterprise Manager (EMGC): how to display more than 1 hour of Activity

I keep forgetting it, so I write it down:

You choose the menu Performance/Top Activity

then in the top right, where it says "view data", choose "historical" and play around with the mouse and shift the grey "view window"

This is the normal display (last hour):

this is the extended display

Thursday, October 17, 2013

Access not allowed for subject: principals on ResourceType: JMSDestinationRuntime Action: execute, Target: getMessages

Not anybody can view the messages in a plain vanilla JMS queue in WebLogic. By default only Administrators can.
/

cd('/SecurityConfiguration/mydomain/Realms/myrealm')
cmo.setDelegateMBeanAuthorization(true)

(this requires immediate restart>
Changes to XACMLAuthorizer to add a policy to JMSDestinationRuntimeMBean invoke operation is not recorded by the WLS console, unfortunately, so I will have to study how to script it.

see Oracle Doc "WebLogic Server: Error When Attempting to View JMS Messages in Admin Console: Access not allowed for subject (Doc ID 1327324.1)":

Please go to Security Realms ->  -> Configuration -> General.

    Please check the "Use Authorization Providers to Protect JMX Access" parameter.
    Go to "Roles and Policies" -> Realm Policies.
    In the Policy table, select "JMX Policy Editor."
    Select "Global Scope" and click Next.
    From MBean Types, select "weblogic.management.runtime."
    Select "JMSDestinationRuntimeMBean" and click next.
    In Attributes and Operations, select "Operations: Permission to Invoke."
    Click on "Create Policy" button and save.
    Click on "Add Condition" and select "User/Group" in "Predicate List." Click next.
    Type username (USER)/Group, and click Add. Click Finish.
    Reboot the server and login using the user you just created.

In fact it's better to combine these 2 rules: group = IntegrationOperators OR Administrators, so that weblogic user can still see the JMS messages.
The resulting policy added to XACMLAuthorizer.dat is:

<Policy PolicyId="urn:bea:xacml:2.0:entitlement:resource:type@E@Fjmx@G@M@Ooperation@Einvoke@M@Oapplication@E@M@OmbeanType@Eweblogic.management.runtime.JMSDestinationRuntimeMBean" RuleCombiningAlgId="urn:oasis:names:tc:xacml:1.0:rule-combining-algorithm:first-applicable"><Description>Grp(IntegrationOperators)</Description><Target><Resources><Resource><ResourceMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal"><AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">type=<jmx>, operation=invoke, application=, mbeanType=weblogic.management.runtime.JMSDestinationRuntimeMBean</AttributeValue><ResourceAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:resource:resource-ancestor-or-self" DataType="http://www.w3.org/2001/XMLSchema#string" MustBePresent="true"/></ResourceMatch></Resource></Resources></Target><Rule RuleId="primary-rule" Effect="Permit"><Condition><Apply FunctionId="urn:oasis:names:tc:xacml:1.0:function:string-is-in"><AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">IntegrationOperators</AttributeValue><SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:group" DataType="http://www.w3.org/2001/XMLSchema#string"/></Apply></Condition></Rule><Rule RuleId="deny-rule" Effect="Deny"></Rule></Policy>

An extra entry will be added to the file:

<WLSMetaData PolicyId="urn:bea:xacml:2.0:entitlement:resource:type@E@Fjmx@G@M@Ooperation@Einvoke@M@Oapplication@E@M@OmbeanType@Eweblogic.management.runtime.JMSDestinationRuntimeMBean" Status="3"><WLSPolicyInfo wlsCreatorInfo="mbean"/>

The relative WLST is:


def allowJMSAccessForGroup(domainName):
    try:
        print "applying JMS access policy for domain", domainName
        policy = '<Policy PolicyId="urn:bea:xacml:2.0:entitlement:resource:type@E@Fjmx@G@M@Ooperation@Einvoke@M@Oapplication@E@M@OmbeanType@Eweblogic.management.runtime.JMSDestinationRuntimeMBean" RuleCombiningAlgId="urn:oasis:names:tc:xacml:1.0:rule-combining-algorithm:first-applicable"><Description>Grp(IntegrationOperators)</Description><Target><Resources><Resource><ResourceMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal"><AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">type=<jmx>, operation=invoke, application=, mbeanType=weblogic.management.runtime.JMSDestinationRuntimeMBean</AttributeValue><ResourceAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:resource:resource-ancestor-or-self" DataType="http://www.w3.org/2001/XMLSchema#string" MustBePresent="true"/></ResourceMatch></Resource></Resources></Target><Rule RuleId="primary-rule" Effect="Permit"><Condition><Apply FunctionId="urn:oasis:names:tc:xacml:1.0:function:string-is-in"><AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">IntegrationOperators</AttributeValue><SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:group" DataType="http://www.w3.org/2001/XMLSchema#string"/></Apply></Condition></Rule><Rule RuleId="deny-rule" Effect="Deny"></Rule></Policy>'
        print "applying policy", policy.replace("<", "&lt;") #the second one is ampersand followed by lt;
        print "cd('/SecurityConfiguration/' + domainName + '/DefaultRealm/myrealm/Authorizers/XACMLAuthorizer')"
        cd('/SecurityConfiguration/' + domainName + '/DefaultRealm/myrealm/Authorizers/XACMLAuthorizer')
        cmo.addPolicy(policy)
        print "done applying policy"
        return True
    except Exception, inst:
        print inst
        print sys.exc_info()[0]
        dumpStack()
        sys.stderr.write("unable to apply JMS access policy for domain " + domainName)
        return False

  
serverConfig()
allowJMSAccessForGroup(domain)

(no edit() statement is necessary to apply this change)
BE VERY CAREFUL; authorization information is cached in WebLogic console, so to see the effect of this change it's saver to logout and login again.
If you get a "weblogic.management.utils.AlreadyExistsException: Policy with matching ID and version already exists in store" don't worry, it's all right. Just check that the policy is in place.
Be aware that for old versions of WebLogic there are several bugs:
Bug 8912918
Bug 9764721
Bug 11778631
so make sure you apply the patches.

OSB xmlbeans outofmemory

we had a couple of cases of OOM lately, both preceded with this stuck thread:

      org.apache.xmlbeans.impl.schema.SchemaParticleImpl.isFixed(SchemaParticleImpl.java:203)
        org.apache.xmlbeans.impl.validator.Validator.validateSimpleType(Validator.java:1205)
        org.apache.xmlbeans.impl.validator.Validator.handleText(Validator.java:829)
        org.apache.xmlbeans.impl.validator.Validator.textEvent(Validator.java:814)
        org.apache.xmlbeans.impl.validator.Validator.nextEvent(Validator.java:244)
        org.apache.xmlbeans.impl.store.Validate.emitEvent(Validate.java:168)
        org.apache.xmlbeans.impl.store.Validate.process(Validate.java:84)
        org.apache.xmlbeans.impl.store.Validate.(Validate.java:39)
        org.apache.xmlbeans.impl.store.Xobj.validate(Xobj.java:1878)
        org.apache.xmlbeans.impl.values.XmlObjectBase.validate(XmlObjectBase.java:386)
        stages.transform.runtime.ValidateRuntimeStep.validate(ValidateRuntimeStep.java:225)
        stages.transform.runtime.ValidateRuntimeStep.processMessage(ValidateRuntimeStep.java:124)
        com.bea.wli.sb.stages.StageMetadataImpl$WrapperRuntimeStep.processMessage(StageMetadataImpl.java:346)
        com.bea.wli.sb.pipeline.PipelineStage.processMessage(PipelineStage.java:84)
        com.bea.wli.sb.pipeline.PipelineContextImpl.execute(PipelineContextImpl.java:1055)
        com.bea.wli.sb.pipeline.Pipeline.processMessage(Pipeline.java:141)
        com.bea.wli.sb.pipeline.PipelineContextImpl.execute(PipelineContextImpl.java:1055)
        com.bea.wli.sb.pipeline.PipelineNode.doRequest(PipelineNode.java:55)
        com.bea.wli.sb.pipeline.Node.processMessage(Node.java:67)
        com.bea.wli.sb.pipeline.PipelineContextImpl.execute(PipelineContextImpl.java:1055)
        com.bea.wli.sb.pipeline.Router.processMessage(Router.java:214)
        com.bea.wli.sb.pipeline.MessageProcessor.processRequest(MessageProcessor.java:96)
        com.bea.wli.sb.pipeline.RouterManager$1.run(RouterManager.java:593)
        com.bea.wli.sb.pipeline.RouterManager$1.run(RouterManager.java:591)

Doing a heap dump and using Eclipse MAT I could trace reconduct them to a org.apache.xmlbeans.impl.store.Cursor class, _toPrevBookmark method.
Still investigating.
PS it seems this was due to very large payload (> 20 MB). However it's quite frustrating.

Monday, October 14, 2013

Windows: grant Everyone full control on your external hard disk

If you have Windows on your company's laptop, here is a good news: it could be worse, you could sleep under a bridge and being gnawed by rats. That's definitely worse than having to use Windows. Not MUCH worse, though.
So you have plugged in your external hard drive and copied some files. Big mistake. Windows will immediately take ownership of your device and change ownership and access rights, so that the same hard drive will not be readable from a different machine with different users. This is Windows. Remember: Eastern Europe under Nazi occupation in WWII was worse off, so don't complain.

To grant Everyone full control on all files recursively, just do this:

icacls g:\ /grant Everyone:f

where g: is your external hard drive.

Friday, October 11, 2013

How to look inside a JMS ObjectMessage

http://docs.oracle.com/javaee/6/api/javax/jms/ObjectMessage.html
In OSB, all reporting messages are instances of com.bea.wli.reporting.jmsprovider.runtime.ReportMessage, and in the JMS message they are stored as an ObjectMessage. So, you won't be able to view their content in the WebLogic console, when they end up in the jmsResources module/dist_wli.reporting.jmsprovider_error.queue_auto queue.

Solution: export the message (one by one) into a jmsmessages.xml file using the WebLogic console Export tool (show messages).
in the resulting XML, the Object is expressed in base64 encoded format (text), in a xml tag mes:Object. Copy this text and put in a messagebody.xml (this can be automated, of course).

make sure these 2 jars are in WLST classpath (you can directly edit wlst.sh file):
/opt/oracle/fmw11_1_1_5/osb/lib/modules/com.bea.alsb.reporting.api.jar
/opt/oracle/fmw11_1_1_5/osb/lib/modules/com.bea.alsb.reporting.impl.jar


from org.apache.commons.codec.binary import Base64
f = open('messagebody.xml', 'r')
bytes = Base64.decodeBase64(f.read())

from java.io import ByteArrayInputStream
from java.io import ObjectInputStream
inputStream = ByteArrayInputStream(bytes)
objectStream = ObjectInputStream(inputStream)
obj = objectStream.readObject()

print obj.class
#it will print com.bea.wli.reporting.jmsprovider.runtime.ReportMessage


print obj.getXmlPayload()
print obj.getMetadata()

Javamonamour might die soon, thanks to Google Apps Support unavailability

I put javamonamour.org in the hands of Google Apps and godaddy, they are asking me to update my Credit Card information by logging in the Google Apps console, but every time I try to do it I get an "Invalid request" message (it's not an invalid password, this has a different error message).

If you want to contact Google Support you must provide a PIN, to find the PIN you must log into your account.... catch 22. I have opened a case by email, a robot replied with some useless advice. There is NO WAY you can talk to a human within the organization.

I have read many posts of desperate people in my same conditions, and Google seems to ignore all the cases.

I will try to migrate the blog to a new provider...

UPDATE I managed to access the google apps console... the trick is that you should use the email you setup as administrator of the site... if you go back in the emails you have received, google apps has definitely sent you an email to that address 8automatically created as a valid google email address) PLUS to your main address. You can easily initiate a password reset procedure for that administrative address, they will send the password reset to your main address, and once you reset the password you can login to the Google Apps console.

Of course they don't bother to explain you all this, nor to give you a proper error message (Invalid request!).

Thursday, October 10, 2013

base64 in WLST

if you try using base64 in WLST:

import base64
encoded = base64.b64encode('data to be encoded')

you get a:

AttributeError: 'module' object has no attribute 'b64encode'

reason being that there is:
/usr/lib64/python2.4/base64.py
and
/opt/oracle/fmw11_1_1_5/oracle_common/util/jython/Lib/base64.py

b64decode does:

import binascii
binascii.a2b_base64(s)

b64encode does:

import binascii
binascii.b2a_base64(s)[:-1]

So basically:

import binascii
s='string to encode'
encoded = binascii.b2a_base64(s)[:-1]
print binascii.a2b_base64(encoded)
string to encode
print encoded
c3RyaW5nIHRvIGVuY29kZQ==

Elephants in Geneva Lake

Every year in Lausanne the Knie Circus brings a great show of Camels and Elephants.... kids are really enjoying it.

Tuesday, October 8, 2013

Poll result:Collapse

In face of the ongoing collapse of ecosystems and economies, I shall:

This was more or less expected, political and ecological awareness is very unusual in the IT community

Monday, October 7, 2013

puppet-lint: how to sort your diagnostics

I find a bit irritating that the output of puppet-lint is not sorted. I refer to fix my puppet manifests top-down rather than by class of error. This does the job:

puppet-lint /vagrant/modules/mymodule/manifests/mymanifest.pp | awk '{ print $NF , $0}' | sort -n

and the result is neatly sorted by linenumber:

23 ERROR: create_certificate_file not in autoload module layout on line 23
33 ERROR: create_jks_store not in autoload module layout on line 33
38 WARNING: indentation of => is not properly aligned on line 38
39 WARNING: indentation of => is not properly aligned on line 39
41 ERROR: trailing whitespace found on line 41
44 WARNING: indentation of => is not properly aligned on line 44
45 WARNING: indentation of => is not properly aligned on line 45
46 ERROR: trailing whitespace found on line 46
48 ERROR: trailing whitespace found on line 48
52 ERROR: jks_java_wrapper_certificate not in autoload module layout on line 52
52 WARNING: line has more than 80 characters on line 52
66 ERROR: trailing whitespace found on line 66
70 ERROR: jks_java_wrapper_key not in autoload module layout on line 70
70 WARNING: defined type not documented on line 70
70 WARNING: line has more than 80 characters on line 70
76 WARNING: indentation of => is not properly aligned on line 76
77 WARNING: indentation of => is not properly aligned on line 77
78 WARNING: indentation of => is not properly aligned on line 78
80 WARNING: indentation of => is not properly aligned on line 80
81 ERROR: two-space soft tabs not used on line 81
90 ERROR: trailing whitespace found on line 90
91 WARNING: line has more than 80 characters on line 91
94 WARNING: line has more than 80 characters on line 94
97 ERROR: trailing whitespace found on line 97
100 ERROR: trailing whitespace found on line 100
101 ERROR: trailing whitespace found on line 101
113 WARNING: indentation of => is not properly aligned on line 113
114 ERROR: trailing whitespace found on line 114
114 WARNING: indentation of => is not properly aligned on line 114

Incidentally,most trivial issues can be fixed with a "pre" version of puppet-lint, or also CTRL + SHIFT + F in Geppetto.

Sunday, October 6, 2013

Puppet grammar and parser

Fascinating and priceless post on the topic: http://www.masterzen.fr/2011/12/27/puppet-internals-the-parser/
Here is the racc (sort of yacc for Ruby) grammar and here is the parser (lexer) (in Ruby, sob)

I will add more to this post as I discover more...

Friday, October 4, 2013

Puppet "exec" always requires a path

I needed to run a simple "mv" command in Puppet, with an "exec" type (I know, exec should be used only in desperate cases):


  exec { "mv OracleDomainScript-${version}":
    command => "mv OracleDomainScript-${version}/* .",
    cwd     => "$destination_folder",
  }

and I got a weird error message:
'mv OracleDomainScript-2.5/* .' is not qualified and no path was specified. Please qualify the command or specify a path

The workaround is to specify explicitly a path:


  exec { "mv OracleDomainScript-${version}":
    command => "mv OracleDomainScript-${version}/* .",
    path => "/bin",
    cwd     => "$destination_folder",
  }

the "default" path of the current user is not used by Puppet:

echo $PATH
/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin

and the "mv" command is in /bin

I bet EVERY Puppet beginner has made this mistake.

Huge caveat working with Vagrant and Puppet

Never, ever remove the content of the /tmp folder on your VirtualBox:
I have done it today, and after getting a weird warning:


[root@osb-vagrant tmp]# rm -rf *
rm: cannot remove `vagrant-puppet/manifests': Device or resource busy
rm: cannot remove `vagrant-puppet/modules-0': Device or resource busy

I have discovered that all the puppet files on my HOST machine had been deleted... the Vagrant mapping between the Vagrantfile directory on the host, and the /tmp/vagrant-puppet folder on the guest, seems to work BOTH WAYS.
So, just leave those folders alone. I know, it's painful.

Thursday, October 3, 2013

Configuring NIC network interface on RHEL

Note: ipconfig is deprecated in favour of ip, however a lot of legacy code still uses it.
Very interesting tutorial here and here
This is a UI to edit all configuration without requiring root privileges, bur probably if you are not root you won'et even be able to see any configuration values - forget about changing them:

/usr/sbin/system-config-network-tui

This is what you get:



âââââââââââ¤ Network Configuration âââââââââââ
â                                           â
â                                           â
â Name                 eth0________________ â
â Device               eth0________________ â
â Use DHCP             [*]                  â
â Static IP            ____________________ â
â Netmask              ____________________ â
â Default gateway IP   ____________________ â
â Primary DNS Server   ____________________ â
â Secondary DNS Server ____________________ â
â                                           â
â       ââââââ            ââââââââââ        â
â       â Ok â            â Cancel â        â
â       ââââââ            ââââââââââ        â
â                                           â
â                                           â
âââââââââââââââââââââââââââââââââââââââââââââ

this is equivalent, but requires root privileges:

/usr/bin/system-config-network

They don't require a X-terminal session, they support a text-ui mode.

To manually hack the configuration:
cd /etc/sysconfig/network-scripts/
less ifcfg-eth0

DEVICE="eth0"
BOOTPROTO="dhcp"
HWADDR="08:00:27:29:06:6F"
IPV6INIT="yes"
NM_CONTROLLED="yes"
ONBOOT="yes"
TYPE="Ethernet"
UUID="ca6ef823-ed13-46ad-8a83-9d688f6cf239"

Some other "global" info is here:
vi /etc/sysconfig/network

NETWORKING=yes
HOSTNAME=osb-vagrant.acme.com

"/sbin/ip -o addr" displays all IPs
to add an extra IP, you must specify an interface eth0:N where N > 0:

/sbin/ifconfig eth0:1 10.0.2.16 netmask 255.255.255.0

(there is an implicit "up" at the end)
Entering "/sbin/ifconfig eth0 10.0.2.16 netmask 255.255.255.0" will crash the NIC, removing the current primary address to replace with the new one.

Wednesday, October 2, 2013

oracle.repackaged.ucp.jdbc.oracle.RACCallbackGuard

today we saw this in the logs:


####<Oct 2, 2013 8:47:23 AM CEST> <Warning>
 <oracle.repackaged.ucp.jdbc.oracle.RACCallbackGuard> <acme111>
 <osbpr1ms4> <Thread-100> <<anonymous>> <> 
<0000K5o7QgvE4Uk5ozS4yY1IIgWK000003> <1380696443473> <BEA-000000> 
<RAC callback: guarded method threw exception>

no idea what happened.... it could be a RAC instance dead and TX being switched to another instance... still under investigation...

Ok, what happened is that the DBA team started the second node of the RAC cluster, and the JDBCDrivers somehow got the notification, but something must have gone wrong.

Tuesday, October 1, 2013

Consensus-based server migration: caveat

I used to have a Database-based lease mechanism for Server Migration, but occasionally it was failing ("unable to contact DB", no clue why...) and the server would restart itself.
We changed to Consensus, hoping the network would be more robust. However, due to a network reconfiguration, some IPs were left undefined and the cluster broke:


<[ACTIVE] ExecuteThread: '37' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <f56960fb001bca18:-33f939f3:1413c1039a1:-8000-0000000000065ea0> <1380619475997> <BEA-000802> <ExecuteRequest failed
java.lang.AssertionError: Invalid state transition from failed to stable_leader.
java.lang.AssertionError: Invalid state transition from failed to stable_leader
        at weblogic.cluster.leasing.databaseless.ClusterState.setState(ClusterState.java:100)
        at weblogic.cluster.leasing.databaseless.ClusterState.setState(ClusterState.java:59)
        at weblogic.cluster.leasing.databaseless.ClusterFormationServiceImpl.leaderInitialization(ClusterFormationServiceImpl.java:318)
        at weblogic.cluster.leasing.databaseless.ClusterFormationServiceImpl.formClusterInternal(ClusterFormationServiceImpl.java:148)
        at weblogic.cluster.leasing.databaseless.ClusterFormationServiceImpl.timerExpired(ClusterFormationServiceImpl.java:339)
        at weblogic.timers.internal.TimerImpl.run(TimerImpl.java:273)
        at weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:528)
        at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
        at weblogic.work.ExecuteThread.run(ExecuteThread.java:178)
>

The issue is that once the network has been fixed, the cluster didn't recover and we had to restart the servers... however this could simply be because when we restart the server, the Virtual IP associated to each server is readded to the NIC (/sbin/ifconfig -addif). Instead of restarting the servers I should have tried to add the IP manually... one should really monitor continuously the availability of those IPs...