Get information about a failed Platform component

If a Platform component fails, you may need to obtain a crash dump and other information to help you (or Caplin Support) diagnose the problem. Here’s some guidance on how to do this.

These instructions assume your Platform components run under the Caplin Deployment Framework.

Before you run any of the Platform components

Follow the steps below:

  1. Enable core dumps on your Linux or Windows machine:

    1. On Linux, run the command below to enable core dumps (unlimited size):

      ulimit -H -c unlimited
    2. In a Windows environment, follow the instructions in Get a Windows crash dump of a failed Platform component.

  2. On Linux, install the GNU Debugger (GDB).

  3. If you suspect that a component is failing because of a memory leak in the Java Virtual Machine, enable Java Garbage Collection (GC) logging for the component.

    For Liberator, Transformer, and any other C DataSource application with an embedded JVM, edit the component’s java.conf file in the global_config/overrides directory hierarchy and add the following jvm-options configuration items:

    File: java.conf
    jvm-options -Xloggc:var/gc.log
    jvm-options -XX:+PrintGCDetails
    jvm-options -XX:+PrintGCDateStamps
    jvm-options -XX:+UseGCLogFileRotation
    jvm-options -XX:NumberOfGCLogFiles=10
    jvm-options -XX:GCLogFileSize=5M
    jvm-options -XX:+PrintConcurrentLocks
    jvm-options -XX:+PrintCommandLineFlags
    jvm-options -XX:+HeapDumpOnOutOfMemoryError
    jvm-options -XX:HeapDumpPath=var

    For Java DataSource applications, the JVM options must be inserted manually into the java command that starts the application. For example, for an adapter deployed in the Deployment Framework, edit the application’s DataSource/bin/start-jar.sh script:

    File: start-jar.sh
    java \
      -Xloggc:var/gc.log \
      -XX:+PrintGCDetails \
      -XX:+PrintGCDateStamps \
      -XX:+UseGCLogFileRotation \
      -XX:NumberOfGCLogFiles=10 \
      -XX:GCLogFileSize=5M \
      -XX:+PrintConcurrentLocks \
      -XX:+PrintCommandLineFlags \
      -XX:+HeapDumpOnOutOfMemoryError \
      -XX:HeapDumpPath=var \
      (1)
    1 Continue the java command’s original arguments here

If a crash occurs…​

Obtain the following information about the failed component:

  1. [Required] In a Linux environment, obtain the Linux core dump file produced by the failed component.

    1. If Liberator has failed, the path to the core dump file is: <Framework-root>/servers/Liberator/core.<pid>

    2. If Transformer has failed, the path to the core dump file is: <Framework-root>/servers/Transformer/core.<pid>

    3. If an Integration Adapter has failed, the core dump file is normally in: <Framework-root>/kits/<adapter-blade-name>/latest/

      For C-based adapters, the core dump file is normally called core.<pid>

      For Java-based adapters, instead of a core dump file there will be an hs error file called hs_err_pid<pid>.log (see step 6 below).

    4. If you can’t see a file called core.<pid> in the appropriate directory, run the command below to check the template used by Linux to generate the filenames of core dump files:

      $ cat /proc/sys/kernel/core_pattern
      core

      For more information on the naming and location of core files, see Naming of core dump files in the core man page.

  2. [Required] In a Windows environment, obtain the Windows core dump file produced by the failed component.

    For detailed instructions, see How can I…​ Get a Windows crash dump of a failed Platform component.

  3. [Required] In a Linux environment obtain information about the libraries used by the failed component.

    This information is essential for interpreting the contents of the core dump file.

    For detailed instructions, see How can I…​ Get information about the libraries used by a failed Platform component.

  4. [Required] Record the time of the crash

    You can get this by looking at the date and time when the core dump file was created (ls -l command)

  5. Make a note of what was happening on the system when the crash occurred.

    For example: "We were testing clustered Liberators with this configuration (supplied) when one of the Liberators failed."

  6. Obtain the hs error file.

    You get one of these files if the Java Virtual Machine (JVM) crashed along with the failed component and the exception wasn’t handled by the component. It contains a summary of the heap at the time of the crash and other information about the machine and the crash.

    If Liberator has failed, the hs error file is in: <Framework-root>/servers/Liberator/hs_error_pid<pid>.log

    If Transformer has failed, the hs error file is in: <Framework-root>/servers/Transformer/hs_error_pid<pid>.log

    If an Integration Adapter has failed, the hs error file is in: <Framework-root>/kits/<adapter-blade-name>/latest/hs_error_pid<pid>.log

  7. Obtain the Java Garbage Collection (GC) log

    (if you previously enabled GC logging).

    If Liberator has failed, the GC log file is in: <Framework-root>/servers/Liberator/var/gc.log

    If Transformer has failed, the GC log file is in: <Framework-root>/servers/Transformer/var/gc.log

    If an Integration Adapter has failed, the GC log file is in: <Framework-root>/kits/<adapter-blade-name>/latest/DataSource/var/gc.log

  8. Obtain the logs for the failed component as at the time of the crash.

    Make sure that the logs provided match the time of the crash.

    Liberator’s log files are in: <Framework-root>/servers/Liberator/var/

    Transformer’s log files are in: <Framework-root>/servers/Transformer/var/

    An Integration Adapter’s log files are in: <Framework-root>/kits/<adapter-blade-name>/latest/DataSource/var/

  9. Obtain statistics about the machine’s CPU, memory and hard-disk usage around the time of the crash.

    This helps in assessing the loading and overall behaviour of the machine when the component failed. It can also help to determine whether the component’s process was using too much CPU, and whether it was suffering from a severe memory leak.

What if the fault is already fixed?

If the component that’s failed isn’t the latest version available from Caplin Systems, you may find it helpful to deploy the latest version and run the system again. If the new version of the component doesn’t crash, it’s likely that it contains a bug fix that cures the fault, so you should continue to use this latest version if possible.


See also: