Get information about a failed Platform component

If a Platform component fails, you may need to obtain a crash dump and other information to help you (or Caplin Support) diagnose the problem. Here’s some guidance on how to do this.

These instructions assume your Platform components run under the Caplin Deployment Framework.

Before you run any of the Platform components

  1. Make sure that you’ve enabled core dumps on your Linux or Windows machine:

    1. In a Linux environment, enable unlimited size core dumps: ulimit -H -c unlimited

    2. In a Windows environment, it’s a bit more complicated to enable core dumps; for how to do this, see How can I…​ Get a Windows crash dump of a failed Platform component.

  2. In a Linux environment, make sure that the GNU Debugger (GDB) is installed on your machine.

  3. If you suspect that a component is failing because of a memory leak in the Java Virtual Machine, enable Java Garbage Collection (GC) logging. To do this, edit the java.conf file in one of these directories as appropriate:

    • Liberator: <Framework-root>/global-config/overrrides/servers/Liberator/etc/

    • Transformer: <Framework-root>/global-config/overrrides/servers/Transformer/etc/

    • Integration Adapter: <Framework-root>/global_config/overrides/<adapter-blade-name>/DataSource/etc

    Add these options to the java.conf file:

    +

    jvm-options -XX:+PrintConcurrentLocks -XX:+PrintCommandLineFlags -XX:+PrintGCDetails
    -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=var
    -Xloggc:var/gc-%p.log

If a crash occurs…​

Obtain the following information about the failed component.

  1. [Required] In a Linux environment, obtain the Linux core dump file produced by the failed component.

    1. If Liberator has failed, the core dump file is normally in: <Framework-root>/servers/Liberator/core.<pid>

    2. If Transformer has failed, the core dump file is normally in: <Framework-root>/servers/Transformer/core.<pid>

    3. If an Integration Adapter has failed, the core dump file is normally in: <Framework-root>/kits/<adapter-blade-name>/latest/

      For Adapters written entirely in C, the core dump file is normally called core.<pid>

      But if the Adapter is Java-based (for example, it’s been implemented using the Caplin Integration Suite), instead of a core dump file there will be an hs error file called hs_err_pid<pid>.log (see step 6 below).

    4. If you can’t see a file called core.<pid> in the appropriate directory when you think there should be one, check whether the default name and/or location of core dump files has been changed for your Linux installation:

      Run the command:

      cat /proc/sys/kernel/core_pattern

      If the command outputs “core”, you should find a core.<pid> file in the failed component’s current directory (as shown above), otherwise the command output will show you the (changed) filename of the core dump, and/or its (changed) directory path relative to the component’s current directory.

  2. [Required] In a Windows environment, obtain the Windows core dump file produced by the failed component.

    For detailed instructions, see How can I…​ Get a Windows crash dump of a failed Platform component.

  3. [Required] In a Linux environment obtain information about the libraries used by the failed component.

    This information is essential for interpreting the contents of the core dump file.

    For detailed instructions, see How can I…​ Get information about the libraries used by a failed Platform component.

  4. [Required] Record the time of the crash

    You can get this by looking at the date and time when the core dump file was created (ls -l command)

  5. Make a note of what was happening on the system when the crash occurred.

    For example: "We were testing clustered Liberators with this configuration (supplied) when one of the Liberators failed."

  6. Obtain the hs error file.

    You get one of these files if the Java Virtual Machine (JVM) crashed along with the failed component and the exception wasn’t handled by the component. It contains a summary of the heap at the time of the crash and other information about the machine and the crash.

    If Liberator has failed, the hs error file is in: <Framework-root>/servers/Liberator/hs_error_pid<pid>.log

    If Transformer has failed, the hs error file is in: <Framework-root>/servers/Transformer/hs_error_pid<pid>.log

    If an Integration Adapter has failed, the hs error file is in: <Framework-root>/kits/<adapter-blade-name>/latest/hs_error_pid<pid>.log

  7. Obtain the Java Garbage Collection (GC) log

    (if you previously enabled GC logging).

    If Liberator has failed, the GC log file is in: <Framework-root>/servers/Liberator/var/gc.log

    If Transformer has failed, the GC log file is in: <Framework-root>/servers/Transformer/var/gc.log

    If an Integration Adapter has failed, the GC log file is in: <Framework-root>/kits/<adapter-blade-name>/latest/DataSource/var/gc.log

  8. Obtain the logs for the failed component as at the time of the crash.

    Make sure that the logs provided match the time of the crash.

    Liberator’s log files are in: <Framework-root>/servers/Liberator/var/

    Transformer’s log files are in: <Framework-root>/servers/Transformer/var/

    An Integration Adapter’s log files are in: <Framework-root>/kits/<adapter-blade-name>/latest/DataSource/var/

  9. Obtain statistics about the machine’s CPU, memory and hard-disk usage around the time of the crash.

    This helps in assessing the loading and overall behaviour of the machine when the component failed. It can also help to determine whether the component’s process was using too much CPU, and whether it was suffering from a severe memory leak.

What if the fault’s already fixed?

If the component that’s failed isn’t the latest version available from Caplin Systems, you may find it helpful to deploy the latest version and run the system again. If the new version of the component doesn’t crash, it’s likely that it contains a bug fix that cures the fault, so you should continue to use this latest version if possible.


See also: