CPU running high, AppDynmaics help

2014-11-10_AppD_CPU_notification

Still trying to get to the root of this error, but we are at least notified of its existence via AppD and are able to give it time to complete, or just kill it off.

When the issue occurs, we typically use the unix ‘top’ command to see what PID is pegging the CPU, and will stop the WebSphere node, and kill off the PID. The hope is to get AppD to help us track down the runaway Java method that is causing the CPU spike, and fix the issue instead of killing off the symptom.

waspapps02:~> top
top - 16:45:47 up 63 days, 14:26,  1 user,  load average: 3.92, 3.58, 3.52
Tasks: 176 total,   1 running, 175 sleeping,   0 stopped,   0 zombie
Cpu(s): 91.3%us,  8.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.2%hi,  0.2%si,  0.0%st
Mem:   8129720k total,  8064584k used,    65136k free,    35760k buffers
Swap:  4192956k total,  2240932k used,  1952024k free,   212712k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
**18679 wasadmin  20   0 2966m 1.4g 6516 S  159 18.2 148:32.77 java**
18211 wasadmin  20   0 1852m 1.3g 5904 S   15 16.7  49:43.31 java
18727 wasadmin  20   0 3388m 2.2g 7772 S    3 28.2  60:20.76 java
 5378 wasadmin  20   0 1877m 1.2g 7288 S    1 15.3  55:03.02 java
 8031 root      20   0  208m 3628 2252 S    1  0.0 134:01.03 aex-metricprovi
18541 wasadmin  20   0 1946m 940m 7556 S    1 11.8  27:35.41 java
 3278 wasadmin  20   0  165m  15m 2956 S    0  0.2   3:19.99 splunkd
17419 wasadmin  20   0  8772 1236  852 R    0  0.0   0:00.01 top
    1 root      20   0 10376   88   56 S    0  0.0   0:47.11 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.56 kthreadd
    3 root      RT   0     0    0    0 S    0  0.0   0:09.55 migration/0

AppDynamics License Issues

AppDynamics_license_issue

We renewed our AppDynamics license, and received the new license.lic file in an email like we always have. I FTP’d it over to our Controller, restarted it, and hit the home page, but it appeared that there was an issue:

I did the usual troubleshooting of restarting again (same result) and reverting to the original license.lic, but it failed too since the original license was out of date by a day. I opened a ticket with their support. We were using an on-premise 3.8.0.1 and the latest version is 3.8.4. which was an issue since their license structure had drastically changed in one of the recent releases, causing this issue. Instead of trying to have them generate a new license for the the older 3.8.0.1 Controller, I just upgraded to the latest Controller version: 3.8.4.

However, after the upgrade, we still had the same issue. Their support had me run the commands below, and send them the output.

/appd_home/bin/controller.sh login-db
SELECT * FROM accountG
quit
/sbin/ifconfig -a

From this output, they were able to see that the machine’s MAC address had changed from what was originally on the AppD license. I followed up with our Unix team to ask if anything had changed on the machine’s VM, but nothing as far as they could see. So given this info, I could have had the Unix team change the MAC address, or just have AppD generate a new license with the new MAC address. We elected for AppD to generate a new license with the new MAC address since this was by far the least invasive option in our environment.

Hope this helps someone else going through an AppDynamics upgrade or license renewal from an older version.

Cloned Cluster on WebSphere does not start

We had a need to create a new cluster of WebSphere 7 JVMs (Cluster_B) that are identical to an existing cluster (Cluster_A).  No problem, an easy task that I’ve done many times before. I proceeded to venture through the WAS console to create a new cluster using the existing Cluster_A_was01 member as a template. The new config was told to create new ports, I clicked through the save buttons, and gave the cluster members a few minutes to ensure they were synced up properly with the new configuration.

Everything worked as expected right up to the point that the server did not start after issuing the start command from the CLI (Command Line Interface).

websphere_01:~> /was/AppServer/profiles/AppServer/bin/startServer.sh Cluster_B
ADMU0116I: Tool information is being logged in file
/was/AppServer/profiles/AppServer/logs/Cluster_B/startServer.log
ADMU0128I: Starting tool with the AppServer profile
ADMU3100I: Reading configuration for server: Cluster_B
ADMU3200I: Server launched. Waiting for initialization status.
ADMU3011E: Server launched but failed initialization. startServer.log,
SystemOut.log(or job log in zOS) and other log files under
/was/AppServer/profiles/AppServer/logs/Cluster_B
should contain failure information.
websphere_01:~>

This is a new server, that was cloned from an existing one, so there could be a conflict of a param that I missed (ports, cookie names, etc.).  I look inside the Cluster_B log directory, and there is no SystemOut.log to be found.

websphere_01:~> cd /was/AppServer/profiles/AppServer/logs/Cluster_B
websphere_01:/was/AppServer/profiles/AppServer/logs/Cluster_B> ls -latr
total 16
-rw-r–r–  1 websphereUser websphereGroup    0 2014-02-26 15:19 native_stdout.log
-rw-r–r–  1 websphereUser websphereGroup    5 2014-02-26 15:35 Cluster_B.pid
-rw-r–r–  1 websphereUser websphereGroup 1935 2014-02-28 13:26 startServer.log
-rw-r–r–  1 websphereUser websphereGroup 2259 2014-02-28 13:26 native_stderr.log

Note that I tried to start the server, it failed, and told me to look in the SystemOut.log.  There is no SystemOut.log listed.  I’m now in uncharted waters.  I’ve never seen an instance of starting up a new JVM where no SystemOut.log or SystemErr.log is created.  Thanks for mutton WebSphere.

After verifying the ports are different from the cloned JVM from Cluster_A, kicking kittens, and other config comparisons, I thought to look at the JVM args, which would be identical to Cluster_A, since it is a clone.  I see that AppDynamics is there, and right next them are the bane of the past couple of hours: a check mark next to Debug with the port set to 7777, just like Cluster_A’s debug configuration.

To be sure that the identical debug ports are the issue (and not AppD), I first remove the AppDynamics JVM params and try again.  Failure.  Next the debug config is removed altogether, and the server boots right up.  I changed the debug port on Cluster_B to 7778, reboot, and it again starts right up.

It would have been nice for the WAS server to let me know that there was a debug port conflict, instead of me fumbling around in the dark with no idea of where to start.  It would have saved me a couple of hours, and several kicks to kittens.