Friday, May 31, 2013

WebLogic: [Deployer:149189]An attempt was made to execute the 'start' operation on an application named bla

just shut down all your servers - including admin.

then go to DOMAIN_HOME, run "find . -name bla -exec rm -rf {} \;"

then edit config/config.xml and remove manually the app-deployment entry for bla.

This should take care of it.

Nice little JSP to generate threads on demand


<%

 final String threads = request.getParameter("threads");
 final String period = request.getParameter("period");
 try {
  for (int t = 0; t < Integer.parseInt(threads); t++ ) {
   System.out.println("starting thread " + t + " of " + threads);
   Thread thread = new Thread(new Runnable() {
       public void run() {
        try {
         System.out.println("started with period " + period);
         Thread.sleep(Integer.parseInt(period) * 1000);
     }
     catch (Throwable t) {
      t.printStackTrace();
     }
       }
   });
   
   thread.start();
  
  };
 }
 catch (Throwable t) {
  t.printStackTrace();
 }

%>



and you invoke it as

http://acme.com:10001/wlstuck/index.jsp?threads=30&period=40

and you redeploy with

import os

theapplication=str(sys.argv[1])


print "redeploying", theapplication
os.system("rm -rf /opt/oracle/domains/osbts1do/servers/osbts1as/tmp/_WL_user/" + theapplication + "/")
connect('Pierluigi', 'weblogic1', 't3://acme.com:9001')
redeploy(theapplication)
exit()



problem is that these threads will not be part of the WebLogic thread pool, and as such they will not be monitored.

A better option is a shell script:

for i in {1..30}
do
     wget http://acme.com:10001//wlstuck &
done



and the index.jst is simply:

Thread.sleep(390 * 1000);




WebLogic admin failing with "PKI-02002: Unable to open the wallet. Check password."

"oracle.security.jps.service.credstore.CredStoreException: JPS-01050: Opening of wallet based credential store failed. Reason java.io.IOException: PKI-02002: Unable to open the wallet. Check password."

find /opt/oracle -user root

this detected that the EmbeddedLDAP.tran file was owned by root - which is not ideal.

Wednesday, May 29, 2013

Servlet: "Spy" failed to preload on startup in Web application: "dms.war"

Strange case today:

Newly created domain, Admin starting with "DMS Application" in failed state, logs show:

<29.May.2013 11:48:26 AM CEST> <Error> <HTTP> <BEA-101216> <Servlet: "Spy" failed to preload on startup in Web application: "dms.war".
java.lang.IllegalArgumentException: config=oracle.dms.config.DumpConfig@41755a16 logDir=null agency=oracle.dms.reporter.SpyAgency@7f6d7bec dmsTimer=java.util.concurrent.ScheduledThreadPoolExecutor@6ce3044f
 at oracle.dms.impl.producer.Dumper.<init>(Dumper.java:100)
 at oracle.dms.app.DomainInitializer.init(DomainInitializer.java:151)
 at oracle.dms.app.BaseInitializer.getInitializer(BaseInitializer.java:360)
 at oracle.dms.app.DmsSpy.init(DmsSpy.java:210)
 at weblogic.servlet.internal.StubSecurityHelper$ServletInitAction.run(StubSecurityHelper.java:283)
 at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
 at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
 at weblogic.servlet.internal.StubSecurityHelper.createServlet(StubSecurityHelper.java:64)
 at weblogic.servlet.internal.StubLifecycleHelper.createOneInstance(StubLifecycleHelper.java:58)
 at weblogic.servlet.internal.StubLifecycleHelper.<init>(StubLifecycleHelper.java:48)
 at weblogic.servlet.internal.ServletStubImpl.prepareServlet(ServletStubImpl.java:539)
 at weblogic.servlet.internal.WebAppServletContext.preloadServlet(WebAppServletContext.java:1985)
 at weblogic.servlet.internal.WebAppServletContext.loadServletsOnStartup(WebAppServletContext.java:1959)
 at weblogic.servlet.internal.WebAppServletContext.preloadResources(WebAppServletContext.java:1878)
 at weblogic.servlet.internal.WebAppServletContext.start(WebAppServletContext.java:3153)
 at weblogic.servlet.internal.WebAppModule.startContexts(WebAppModule.java:1508)
 at weblogic.servlet.internal.WebAppModule.start(WebAppModule.java:482)
 at weblogic.application.internal.flow.ModuleStateDriver$3.next(ModuleStateDriver.java:425)
 at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:52)
 at weblogic.application.internal.flow.ModuleStateDriver.start(ModuleStateDriver.java:119)
 at weblogic.application.internal.flow.ScopedModuleDriver.start(ScopedModuleDriver.java:200)
 at weblogic.application.internal.flow.ModuleListenerInvoker.start(ModuleListenerInvoker.java:247)
 at weblogic.application.internal.flow.ModuleStateDriver$3.next(ModuleStateDriver.java:425)
 at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:52)
 at weblogic.application.internal.flow.ModuleStateDriver.start(ModuleStateDriver.java:119)
 at weblogic.application.internal.flow.StartModulesFlow.activate(StartModulesFlow.java:27)
 at weblogic.application.internal.BaseDeployment$2.next(BaseDeployment.java:636)
 at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:52)
 at weblogic.application.internal.BaseDeployment.activate(BaseDeployment.java:205)
 at weblogic.application.internal.SingleModuleDeployment.activate(SingleModuleDeployment.java:43)
 at weblogic.application.internal.DeploymentStateChecker.activate(DeploymentStateChecker.java:161)
 at weblogic.deploy.internal.targetserver.AppContainerInvoker.activate(AppContainerInvoker.java:79)
 at weblogic.deploy.internal.targetserver.BasicDeployment.activate(BasicDeployment.java:184)
 at weblogic.deploy.internal.targetserver.BasicDeployment.activateFromServerLifecycle(BasicDeployment.java:361)
 at weblogic.management.deploy.internal.DeploymentAdapter$1.doActivate(DeploymentAdapter.java:51)
 at weblogic.management.deploy.internal.DeploymentAdapter.activate(DeploymentAdapter.java:200)
 at weblogic.management.deploy.internal.AppTransition$2.transitionApp(AppTransition.java:30)
 at weblogic.management.deploy.internal.ConfiguredDeployments.transitionApps(ConfiguredDeployments.java:240)
 at weblogic.management.deploy.internal.ConfiguredDeployments.activate(ConfiguredDeployments.java:169)
 at weblogic.management.deploy.internal.ConfiguredDeployments.deploy(ConfiguredDeployments.java:123)
 at weblogic.management.deploy.internal.DeploymentServerService.resume(DeploymentServerService.java:180)
 at weblogic.management.deploy.internal.DeploymentServerService.start(DeploymentServerService.java:96)
 at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
 at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
 at weblogic.work.ExecuteThread.run(ExecuteThread.java:178)
> 



After digging into the code, we discovered that setting in the Java properties a -Doracle.server.log.dir=/path/to/a/valid/directory fixed the issue.

Still investigating what is the trouble. We checked that -Doracle.server.config.dir and -Doracle.domain.config.dir are present, /opt/oracle/fmw11_1_1_5/oracle_common/modules/oracle.dms_11.1.1/dms.war is OK, and that dms_config.xml is OK

Tuesday, May 28, 2013

Poor man's firewall test

on the Destination host:

nc -l myhost.acme.com 3872

and make sure you are actually listening:

netstat -an | grep 3872
tcp        0      0 10.33.80.121:3872           0.0.0.0:*                   LISTEN

On the Source host:

echo ciao | nc myhost.acme.com 3872

and the "ciao" should appear on Destination and the nc should exit.

If you don't have nc installed, there are alternatives to nc:

wlst or python:

import socket
HOST = 'myhost.acme.com'
PORT = 3872
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT)) 
s.send('Hello, world')
data = s.recv(1024)
s.close()
 
 
(see http://docs.python.org/release/2.5.2/lib/socket-example.html)
 

or simply  run
telnet myhost.acme.com 3872

 
To receive data, run Java or python:
 
from java.net import ServerSocket
ss = ServerSocket(3872)
ss.accept()


(see http://docs.oracle.com/javase/6/docs/api/java/net/ServerSocket.html )
 
 
The great advantage of nc is that you can bind to any IP on the source host:
 
nc -s "your_ip_here"





To check if nc could actually connect, do:
echo ciao | nc....
echo $?

1 means "unable to connect", 0 means "connected"

echo a | nc -s "10.26.20.116" -w 1 10.51.87.24 1722 ; echo $?

A script to check firewall could very well be:

#!/bin/sh
#This script is to check that a firewall rule is operational
#Author name : Pierluigi Vernetto


function checkFirewall {
 sourceIPsArray=$(echo $sourceIPs | tr "," "\n")
 destinationIPsArray=$(echo $destinationIPs | tr "," "\n")
 for sourceIP in $sourceIPsArray 
 do
        for destinationIP in $destinationIPsArray
        do
            echo a | nc -s "$sourceIP" -w 2 $destinationIP $port
            if [[ $? -eq 0 ]] 
             then echo $sourceIP $destinationIP $port success
             else echo $sourceIP $destinationIP $port failure
            fi   
        done
    done
}

sourceIPs=10.56.218.91,10.56.218.93,10.56.218.90,10.56.218.94,10.56.218.92
destinationIPs=10.56.128.10,10.56.128.8,10.56.128.9
port=1522

checkFirewall




Monday, May 27, 2013

BEA-000150, Server failed to get a connection to the database in the past 30 seconds for lease renewal. Server will shut itself down.

We are using a Database-based lease renewal for automated server migration (I personally think it's a bad idea), we occasionally (once a month) have a server restarting with

BEA-000150 Server failed to get a connection to the database in the past 30 seconds for lease renewal. Server will shut itself down.



An Oracle DOC "With Database Leasing Server Migration WebLogic Servers Part Of Cluster Restarting Intermittently With Unable Renew Lease Errors [ID 1550164.1]" suggests that we should check for time skew in the different RAC nodes, which makes perfectly sense. However this is not our case. We shall keep investigating.

DevOps: 10 deploys a day



This today sounds quite self-evident, but 4 years ago it was quite pioneering.

Sunday, May 26, 2013

Analyzing logs with logstash

http://logstash.net/


I feel that if you find yourself running shell scripts in crontab to search for events in logs, then you might take a look at some better engineered alternatives.(skip the first 5 minutes of the video)

Buddha would have coded in Java, Python, Ruby, Scala or Groovy?

From the (very nice) presentation of REXML I read:

in Java:
for (Enumeration e=parent.getChildren(); e.hasMoreElements(); ) { 
  Element child = (Element)e.nextElement(); // Do something with child 
}


in Ruby:
parent.each_child{ |child| # Do something with child }

Then he says: Can't you feel the peace and contentment in this block of code? Ruby is the language Buddha would have programmed in.
However, in groovy:
parent.each {
 // do something with "it", process(it) 
}

or
parent.each { child ->
 // do something with child
}

in Scala:
parent.foreach ( thing =>
 // do somthing with thing
)

in python:
for child in parent:
   # do stuff with child


So, all in all if I were Buddha (in common we have only a large belly) I would have coded in Python
Keep it Simple, Luke!

Saturday, May 25, 2013

Presentations on Puppet enterprise

This is how to install Puppet Enterprise:


Awesome demo on what you can do with Puppet enterprise console... like comparing resources on several nodes...



This is a basic presentation on how to start using puppet(pretty old, some commands have changed)



Awesome presentation on Puppet on EC2, with Nagios as special guest



Quick overview of how you can deploy packages and compare hosts with Live Management:





Alexander the Great and IT

Reading about the Battle of Gaugamela, they say that at the base of the tremendous military success of Alexander was that
- his troops were highly skilled veterans,
- with a very short command chain;
- troops were also highly motivated and were very devoted to their general,
- the general would camp and fight with the troops
- the troops had vested interest in the war, in terms of boot, land concessions etc

Compare to IT world, which is more and more flooded with cheap unskilled overworked underpaid troops, and with almost more officers than troops (the Persian army), and you will see why most IT projects fail. Alexander would have fared better.



(this mosaic you can admire at the Napoli archeological museum, one of the most amazing museums on the Planet)

Friday, May 24, 2013

Unable to connect to the Oracle WSM Policy Manager

on a OSB domain with OWSM installed, we had this error when editing the policies associated to a WSDL-based service:




oracle.wsm.policymanager.PolicyManagerException: WSM-02120 : Unable to connect to the Oracle WSM Policy Manager due to the following error "javax.naming.CommunicationException [Root exception is java.net.ConnectException: t3s://acme.com:8002: Destination unreachable; nested exception is:
javax.net.ssl.SSLKeyException: [Security:090477]Certificate chain received from hqchnesoa102.acme.com - 10.53.5.192 was not trusted causing SSL handshake failure.; No available router to destination]". [Possible Cause : Destination unreachable; nested exception is:
javax.net.ssl.SSLKeyException: [Security:090477]Certificate chain received from hqchnesoa102.acme.com - 10.53.5.192 was not trusted causing SSL handshake failure.; No available router to destination]



The root cause was that the Admin server was using a Demo Identity and trust store, while the Managed Servers were using a Custom one. Setting the Admin the same way as the Managed fixed the issue.





Thursday, May 23, 2013

OSB Operations/Dashboard/Server Health

In PROD we have often stuck threads on the Admin whenever someone accesses the sbconsole at "Operations/Dashboard/Server Health" tab. The function apparently scans plenty of log files:

"[ACTIVE] ExecuteThread: '16' for queue: 'weblogic.kernel.Default (self-tuning)'" RUNNABLE
          
               antlr.InputBuffer.LA(InputBuffer.java:86)
          
               antlr.CharScanner.LA(CharScanner.java:166)
          
               weblogic.diagnostics.archive.filestore.LogLexer.mLOGFIELD(LogLexer.java:141)
          
               weblogic.diagnostics.archive.filestore.LogLexer.nextToken(LogLexer.java:83)
          
               antlr.TokenBuffer.fill(TokenBuffer.java:69)
          
               antlr.TokenBuffer.LT(TokenBuffer.java:86)
          
               antlr.LLkParser.LT(LLkParser.java:56)
          
                    weblogic.diagnostics.archive.filestore.ServerLogFileParser.getNextServerLogEntry(ServerLogFileParser.java:109)
          
                    weblogic.diagnostics.archive.filestore.ServerLogRecordParser.parseRecord(ServerLogRecordParser.java:35)
          
                    weblogic.diagnostics.archive.filestore.RecordReader.getRecord(RecordReader.java:199)
          
                    weblogic.diagnostics.archive.filestore.FileRecordIterator.readRecords(FileRecordIterator.java:75)
          
                    weblogic.diagnostics.archive.filestore.FileRecordIterator.fill(FileRecordIterator.java:246)
          
               weblogic.diagnostics.archive.RecordIterator.fetchMore(RecordIterator.java:157)
          
               weblogic.diagnostics.archive.RecordIterator.hasNext(RecordIterator.java:130)
          
               weblogic.diagnostics.archive.DataArchive.countRecords(DataArchive.java:240)
          
                    weblogic.diagnostics.archive.DataArchive.getDataRecordCount(DataArchive.java:273)
          
                    weblogic.diagnostics.accessor.DataAccessRuntime.getDataRecordCount(DataAccessRuntime.java:380)
          
               sun.reflect.GeneratedMethodAccessor76689.invoke(Unknown Source)






Wednesday, May 22, 2013

WLST : *sys-package-mgr*: can't write cache file for

On a new box, when I run /opt/oracle/fmw11_1_1_5/osb/common/bin/wlst.sh I get:


Initializing WebLogic Scripting Tool (WLST) ...

*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/modules/oracle.jrf_11.1.1/jrf-wlstman.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/lib/adfscripting.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/lib/adf-share-mbeans-wlst.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/lib/mdswlst.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/resources/auditwlst.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/resources/igfwlsthelp.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/resources/jps-wlst.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/resources/jrf-wlst.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/resources/oamap_help.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/resources/oamAuthnProvider.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/resources/ossoiap_help.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/resources/ossoiap.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/resources/ovdwlsthelp.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/resources/sslconfigwlst.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/oracle_common/common/wlst/resources/wsm-wlst.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/utils/config/10.3/config-launch.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/wlserver_10.3/common/derby/lib/derbynet.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/wlserver_10.3/common/derby/lib/derbyclient.jar'
*sys-package-mgr*: can't write cache file for '/opt/oracle/fmw11_1_1_5/wlserver_10.3/common/derby/lib/derbytools.jar'
*sys-package-mgr*: can't write index file




doing ls -ltr /tmp I find out that wlstTempsoa and alsbTempJars belong to a strange user 1102


drwxr-xr-x 3    1102   506   4096 May 13 23:05 wlstTempsoa
-rw-r----- 1    1102   506      0 May 14 10:08 pki_data-1980089318.lck.tmp
-rw-r----- 1    1102   506      0 May 14 10:08 pki_data638484921.lck.tmp
-rw-r----- 1    1102   506      0 May 14 10:08 jazn-data1986377050xml.lck
drwxr----- 2    1102   506   4096 May 14 10:08 alsbTempJars




https://forums.oracle.com/forums/thread.jspa?messageID=3018833

mkdir /tmp/wlstTmp

and edit /opt/oracle/fmw11_1_1_5/osb/common/bin/wlst.sh adding

export WLST_PROPERTIES="-Dpython.cachedir=/tmp/wlstTmp"



Tuesday, May 21, 2013

Unable to obtain metrics data from the server

In a OSB cluster, the application "ALSB Cluster Singleton Marker Application" is deployed only on ms1. This marker application tells on which server the Metrics Aggregator for OSB console should run.

If you start only ms2, you will get the message "Unable to obtain metrics data from the server", because no Aggregator is running. Not a big deal. As soon as you start ms1, the metrics will be available and the warning will go away.

Revoking JMS Service Account, security caching

I have disabled a JMS Proxy, then inserted 3 JMS messages in the queue,  then removed the rights from the JMS Service Account: once reenabled, at the beginning, the MDB tries to connect and consume the individual messages, but he gets a composite error:

####<May 21, 2013 2:18:19 PM CEST> <Warning> <EJB> <hqchnesoa102> <osbdev1ms1> <[ACTIVE] ExecuteThread: '40' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <7f9b72b69446518a:565dc0ec:13ea91cf851:-8000-0000000000011d31> <1369138699444> <BEA-010061> <The Message-Driven EJB: RequestEJB-7978421337537108199--2e2610ca.13ea3d3bbf0.-7fe8 is unable to connect to the JMS destination: PV_OSB_TESTQ. The Error was:
weblogic.jms.common.JMSSecurityException: Access denied to resource: type=<jms>, application=PV_OSB_TESTModule, destinationType=queue, resource=PV_OSB_TESTQ, action=receive
Nested exception: weblogic.jms.common.JMSSecurityException: Access denied to resource: type=<jms>, application=PV_OSB_TESTModule, destinationType=queue, resource=PV_OSB_TESTQ, action=receive
Nested exception: weblogic.jms.common.JMSSecurityException: Access denied to resource: type=<jms>, application=PV_OSB_TESTModule, destinationType=queue, resource=PV_OSB_TESTQ, action=receive
Nested exception: weblogic.jms.common.JMSSecurityException: Access denied to resource: type=<jms>, application=PV_OSB_TESTModule, destinationType=queue, resource=PV_OSB_TESTQ, action=receive
Nested exception: weblogic.jms.common.JMSSecurityException: Access denied to resource: type=<jms>, application=PV_OSB_TESTModule, destinationType=queue, resource=PV_OSB_TESTQ, action=receive>

Once I restore the JMS Service Account, after 1 minute the MDB reconnects:








####<May 21, 2013 2:19:29 PM CEST> <Info> <EJB> <hqchnesoa102> <osbdev1ms1> <[ACTIVE] ExecuteThread: '38' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <7f9b72b69446518a:565dc0ec:13ea91cf851:-8000-0000000000011d56> <1369138769467> <BEA-010060> <The Message-Driven EJB: RequestEJB-7978421337537108199--2e2610ca.13ea3d3bbf0.-7fe8 has connected/reconnected to the JMS destination: PV_OSB_TESTQ.>



In general , security changes are not instantaneous, I believe they are cached somehow.




Oracle DB: Am I a SYSDBA?

given the user INSTALL, here are all the SQL you can try to check if you are a SYSDBA:

select * from session_privs;
select * from user_sys_privs;
select * from dba_role_privs where GRANTEE = 'INSTALL';
select * from dba_sys_privs where GRANTEE = 'INSTALL';

select * from dba_tab_privs where GRANTEE = 'INSTALL';

select dbms_metadata.get_granted_ddl('ROLE_GRANT', 'INSTALL')  from dual;
select dbms_metadata.get_granted_ddl('SYSTEM_GRANT', 'INSTALL') from dual;
select dbms_metadata.get_granted_ddl('OBJECT_GRANT', 'INSTALL') from dual;



In my case, the result of "select * from dba_role_privs where GRANTEE = 'INSTALL';" is:

"GRANTEE" "GRANTED_ROLE" "ADMIN_OPTION" "DEFAULT_ROLE"
"INSTALL" "CONNECT" "NO" "YES"
"INSTALL" "ROLE_DBA_ADMIN" "NO" "YES"



WliSbTransports:381543 : URI does not have host and port information

In OSB, when you enter a JMS URI without hostname and port - meaning a LOCAL URI - you get this warning:

URI: jms:///weblogic.jms.XAConnectionFactory/PV_OSB_TESTQ, does not have host and port information. 
Host and port information is configured generally as empty when a foreign JMS provider is used.



This error message is described here

http://docs.oracle.com/cd/E14571_01/apirefs.1111/e15034/JmsTransport.html

"JMS proxy service can have empty host and port information in the URI when it is supposed to use foreign JMS provider. The destination is not created if it does not exist."

In fact the syntax jms:///connectionFactoryJNDI/queueJNDI is working perfectly even for local queues. And we really hate the story that OSB creates the queues if they are missing, so once more reason to avoid using host:port in the JMS URI.



Sunday, May 19, 2013

Connecting to a secure port with WLST

9002 is the SSL port of the admin.

on WLST side:

connect(userConfigFile='/opt/oracle/domains/mydomain/serveruserconfigfile.secure',userKeyFile='/opt/oracle/domains/mydomain/serveruserkeyfile.secure',url='t3://acme.com:9002')

java.net.SocketException: Connection reset; No available router to destination

on the server side:

####<May 17, 2013 6:25:13 PM CEST> <Warning> <Security> <hqchnesoa104> <osbpl1as> <[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <9455361429c2e897:-bf0d58f:13eb3468b62:-8000-000000000000001a> <1368807913292> <BEA-090475> <Plaintext data for protocol T3 was received from peer acme.com - 10.11.5.190 instead of an SSL handshake.>

You should use t3s protocol

connect(userConfigFile='/opt/oracle/domains/mydomain/serveruserconfigfile.secure',userKeyFile='/opt/oracle/domains/mydomain/serveruserkeyfile.secure',url='t3s://acme.com:9002')

Watch this great video on using Administration Port in WebLogic:





Friday, May 17, 2013

Security:090504

If you get an error

[Security:090504]Certificate chain received from BLA failed hostname verification check. Certificate contained MUMBLE but check expected BLA

See this http://docs.oracle.com/cd/E24902_01/doc.91/e24286/trblshoot.htm
putting "Hostname Verification:" = none made the message disappear (of course this doesn't solve the underlying issue)


Thursday, May 16, 2013

ServerFailureTriggerMBean, OverloadProtectionMBean, numberOfStuckThreads

excellent discussion on server health and stuck threads from user csoto : https://forums.oracle.com/forums/thread.jspa?messageID=10074875


Here is the Javadoc of ServerFailureTriggerMBean .

If stuck threads are more than 25 (configurable value), WebLogic goes in failed state.

Make sure "Auto Kill If Failed attribute" is false in "Health Monitoring".

http://weblogicserveradministration.blogspot.com/2010/10/server-overload-protection-actions.html

See also http://www.javamonamour.org/2012/08/weblogic-finding-health-state-through.html

The WLST to achieve this is:

cd('/Servers/osbdev1ms1/OverloadProtection/osbdev1ms1')
cmo.setFailureAction('force-shutdown')

cd('/Servers/osbdev1ms1/OverloadProtection/osbdev1ms1/ServerFailureTrigger/osbdev1ms1')
cmo.setMaxStuckThreadTime(600)
cmo.setStuckThreadCount(25)



In fact, it seems to me better to act at cluster level, I don't really want to act on the Admin server:

edit()
startEdit()
cd('/')


cd('/Clusters/' + clustername + '/OverloadProtection/' + clustername)
cmo.setFailureAction('force-shutdown')

cd('/Clusters/' + clustername + '/OverloadProtection/' + clustername + '/ServerFailureTrigger/' + clustername)
cmo.setMaxStuckThreadTime(600)
cmo.setStuckThreadCount(25)


cd('/')
serverList=cmo.getServers()
for server in serverList:
    name=server.getName()
    cd('/Servers/' + name)
    cmo.setAutoKillIfFailed(true)
    cmo.setAutoRestart(true)

save()
activate()





This worked like a charm: when there are more than 25 stuck threads, the server state goes to FAILED and it's immediately restarted.

PS somehow the JSP doesn't seem to be able to generate threads which are marked as STUCK. I had to write a OSB HTTP proxy invoking a Thread.sleep(1000000) in a Java callout, to be able to see the threads as STUCK and the server restarted. Weird.

Changing administration port for the admin to SSL

This WLST changes the admin's Adminstration port to 9002

cd('/')
cmo.setExalogicOptimizationsEnabled(false)
cmo.setAdministrationPort(9002)
cmo.setClusterConstraintsEnabled(false)
cmo.setGuardianEnabled(false)
cmo.setAdministrationPortEnabled(true)

(in the domain tab of the console, it's "SSL Listen Port Enabled" = true)

The only issue it that you should first shut down all managed servers. Then apply the change. otherwise you get the error "Cannot dynamically enable adminstration port on Managed servers when they are running"

http://docs.oracle.com/cd/E28280_01/apirefs.1111/e13952/taskhelp/domainconfig/EnableTheDomainwideAdministrationPort.html

Once you enable this Administrative port, all attempts to connect to the old port will fail with "Console/Management requests or requests with specified to 'true' can only be made through an administration channel"

After that, connecting with WLST can be problematic using t3s:

on WLST side:

javax.net.ssl.SSLKeyException: [Security:090542]Certificate chain received from hqchnesoa104.acme.com - 10.11.5.190 was not trusted causing SSL handshake failure. Check the certificate chain to determine if it should be trusted or not. If it should be trusted, then update the client trusted CA configuration to trust the CA certificate that signed the peer certificate chain. If you are connecting to a WLS server that is using demo certificates (the default WLS server behavior), and you want this client to trust demo certificates, then specify -Dweblogic.security.TrustKeyStore=DemoTrust on the command line for this client.; No available router to destination]

on WebLogic Admin side:

####<May 16, 2013 6:01:07 PM CEST> <Warning> <Security> <hqchnesoa104> <osbpl1as> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <9455361429c2e897:4bba59a7:13eae0910e1:-8000-000000000000001f> <1368720067334> <BEA-090482> <BAD_CERTIFICATE alert was received from acme.com - 10.11.5.190. Check the peer to determine why it rejected the certificate chain (trusted CA configuration, hostname verification). SSL debug tracing may be required to determine the exact reason the certificate was rejected.>

connect('weblogic', 'weblogic1', 't3s://acme.com:9003')

http://weblogic-wonders.com/weblogic/2010/03/03/ssl-exceptions-in-admin-server-and-node-manager/

I finally found how to make it work:

java -Dweblogic.security.TrustKeyStore=DemoTrust -Dssl.debug=true -Dweblogic.security.SSL.ignoreHostnameVerification=true -Dweblogic.security.SSL.enforceConstraints=off weblogic.WLST

but you should rather consult Oracle Support Doc

"How to Enable the WebLogic Server Administration Port for WLST [ID 1511115.1]"
or also the WLST FAQ

With the enabled Administration port, communication with SSL requires the keystore be configured. For example:

-Dweblogic.security.SSL.ignoreHostnameVerification=true 
-Dweblogic.security.TrustKeyStore=CustomTrust 
-Dweblogic.security.CustomTrustKeyStoreFileName=C:\oracle\Middleware\924\weblogic92\server\lib\DemoTrust.jks 
-Dweblogic.security.CustomTrustKeyStorePassPhrase=DemoTrustKeyStorePassPhrase 
-Dweblogic.security.CustomTrustKeyStoreType=JKS



The above Java properties can be set in the WLST_PROPERTIES system property, since the wlst.sh command runs like this:

JVM_ARGS="-Dprod.props.file='${WL_HOME}'/.product.properties ${WLST_PROPERTIES} ${JVM_D64} ${MEM_ARGS} ${CONFIG_JVM_ARGS}"

eval '"${JAVA_HOME}/bin/java"' ${JVM_ARGS} weblogic.WLST '"$@"'



Tuesday, May 14, 2013

Sunday, May 12, 2013

WebLogic: protect and secure JMS queues

my first experiment is protecting the individual queue with a Security Policy "user= weblogic"

If I go to the monitoring tab and do "show messages"; I get this error:

Access denied to resource: type=, application=ACMEJMSModule, destinationType=queue, resource=ACMEQ, action=browse
Message icon - Error weblogic.management.ManagementException: Authorization failure.

The same happens if instead of protecting the individual queue, I protect the JMSModule.

CAVEAT: when you ADD the policy, the effect is immediate. When you REMOVE it, the restriction stays cached, and only a restart sets the resource free again.

Now you must enable a Business Service to WRITE to the JMS queue:

http://docs.oracle.com/cd/E17904_01/doc.1111/e15866/transport_level.htm#i1078093

a) create a service account, static, with the same username/password used to protect the JMS queue

b) in the Business Service producing JMS messages, assign as "JMS Service Account" the above service account

c) in the Proxy Service consuming JMS messages,

assign as "JMS Service Account" the above service account

It can't be simpler than this.

If I connect to Domain B to Domain A JMS queue (protected), I get an error:

The Message-Driven EJB: RequestEJB-4191753809964957369-ea7ff4.13e88fddc7c.-7ef2 is unable to connect to the JMS destination: jms.jndi.dq.BLA.BLAQ. The Error was: weblogic.jms.common.JMSSecurityException: Access denied to resource: type=, application=BLAJMSModule, destinationType=queue, resource=BLAQ, action=receive Nested exception: weblogic.jms.common.JMSSecurityException: Access denied to resource: type=, application=BLAJMSModule, destinationType=queue, resource=BLAQ, action=receive

after creation of service account (static, username and password), the JMS Proxy Service on B connects fine on A:

The Message-Driven EJB: RequestEJB-4191753809964957369-ea7ff4.13e88fddc7c.-7ef0 has connected/reconnected to the JMS destination: jms.jndi.dq.BLA.BLAQ

strange, because here it says:

http://docs.oracle.com/cd/E17904_01/doc.1111/e15867/service_accounts.htm

It cannot be used in outbound requests that authenticate Oracle Service Bus to a local or remote server or system resource, such as an FTP server or a JMS server.

Sunday, May 5, 2013

Soylent green, we are almost there

I have just finished watching Soylent Green, http://www.imdb.com/title/tt0070723/, it's amazing how in 1966 (when the novel was written) they had already such a clear picture of what will eventually happen / is already happening / on this planet - collapse of ecosystems and of quality of life, and total control of corporations over our life.



Stay local, eat vegetarian, drive a bicycle, preserve the planet, avoid mainstream media.



Thursday, May 2, 2013

GridLink: oracle.repackaged.ons.ReceiverThread.run and oracle.repackaged.ons.SenderThread.run

We have 7 GridLink Datasources, all configured with the same ONS parameter (same host and port).

However, we have 21 (=7*3) threads with oracle.repackaged.ons.ReceiverThread.run, and 21 with oracle.repackaged.ons.SenderThread.run.

I wonder why:

a) 3 Receiver threads and 3 Sender threads are required per each Datasource

b) why all these Datasources could simply not SHARE the same Receiver and Sender threads, since most likely they will be sharing exactly the same informations.

We also have 7 threads oracle.repackaged.ucp.jdbc.oracle.ONSRuntimeLBEventHandlerThread.run. And 21 tcp ESTABLISHED connections to ONS.

I really don't like having 42 threads dedicated only to Grdilink notifications (plus all the sockets open by them)

Wednesday, May 1, 2013

Beware of the psychopats

99% of the IT managers are reasonable people. Don't do anything extreme - like coming to office with naked feet, or vent too openly your opinions on 9/11 - and as long as you deliver your stuff they will get along with you.

But occasionally there is a black swan of some real weirdo Project Manager.

I met one of the in Germany, 11 years ago. He was from some Middle East country, just arrived from USA. Although he could speak fluent German, he hated Germany and was outspoken about it. Chronically depressed, chain smoker, he had that typical hallucinated look in the eye that makes you feel you are going to have trouble from him. My colleagues told me that they were all avoiding his company and he was always having lunch alone.

Needless to say, the project was very late and the customer very angry.



I was very happy working with my team, really excellent people, in a separate city from the weirdo, until one day he came for a meeting with the customer. He arrived several hours late, and under a driving rain, and he called me to bring him urgently a power adapter since he forgot his. I run like crazy under a driving rain to bring him the adapter.

A few day later the agency calls me asking me not to show up at office the day after, since I dared to appear in a messy attire in front of the weirdo. I tried to explain that I was messy because of the rain, but the weirdo was adamant about kicking me out

It was a huge shock for me, it took me months to recover.

I even sent an email to all my very friendly colleagues, and when the weirdo came to know, he was again very angry at me because I was still in touch with my former colleagues.

The message is: if someone looks hallucinated and miserable, be VERY careful, trouble is around the corner.

Luckily this was the only such episode in my 27 years professional career. But I occasionally tell this episode to friends and colleague.