Perfect Monitoring - collectd, Confluence and JMX

Posted on Fri 04 March 2016 in monitoring

In this article I'll describe why you would like to think about your current collectd configuration when it comes to Atlassian Confluence. It might also be another reason to finally take a look at collectd.

Back in 2014 I've written about Java VM Monitoring with Jolokia. It still is a good solution and I believe Jolokia will fullfil the requirements when you're stuck in an environment without collectd.

Why?

It's always a good choice to hold on for a second and ask yourself: Why? In case of Confluence or any other Java software the answer is easy: Because you've a monster under your bed and poke it with a stick. What could possibly go wrong?

When there are issues or an unplanned downtime for unknown reasons your users will call in and passive monitoring data is a huge help when investigating further.

  • Is there a networking issue that slows down Confluence?
  • Is there an issue with the mail server?
  • Is the amount of memory for the JavaVM chosen correctly?
  • Is there an unusual CPU load?

These questions can be answered within minutes. Without monitoring it takes hours or days to find the root cause for issues.

What you need

Oh, wait... That's it!

collectd comes with a Java as well as a JMX plugin.

Dependency installation and configuration

On CentOS 7 we can install the required packages from the EPEL repository:

yum install collectd collectd-generic-jmx collectd-java

Due to some dependency foo we'll additionally need to install OpenJDK and add a ld configuration to get the Java plugin running:

yum install -y java-1.8.0-openjdk-headless
echo "/usr/lib/jvm/jre/lib/amd64/server/" > /etc/ld.so.conf.d/java.conf
ldconfig

Confluence configuration

Next we can edit the Confluence configuration on the system side to [enable JMX]((https://tomcat.apache.org/tomcat-8.0-doc/monitoring.html) and add the necessary collectd-jars to the startup configuration of Tomcat.

vi /opt/atlassian/confluence/bin/setenv.sh

Within the file the follwoing part must be extended:

CATALINA_OPTS="-XX:-PrintGCDetails -XX:+PrintGCTimeStamps -XX:-PrintTenuringDistribution ${CATALINA_OPTS}"
CATALINA_OPTS="-Xloggc:$LOGBASEABS/logs/gc-`date +%F_%H-%M-%S`.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M ${CATALINA_OPTS}"
CATALINA_OPTS="-Djava.awt.headless=true ${CATALINA_OPTS}"
CATALINA_OPTS="-Datlassian.plugins.enable.wait=300 ${CATALINA_OPTS}"
CATALINA_OPTS="-Xms1024m -Xmx1024m -XX:+UseG1GC ${CATALINA_OPTS}"

We add the following line before export CATALINA_OPTS:

CATALINA_OPTS="$CATALINA_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9999 -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.class.path=/usr/share/collectd/java/collectd-api.jar:/usr/share/collectd/java/generic-jmx.jar"

This will bind the JMX Interface to port 9999 on localhost.

At this point we can restart Confluence and check if port 9999 is available:

netstat -tlpen | grep 9999
tcp6       0      0 :::9999                 :::*                    LISTEN      1101       75274719   7658/java

collectd configuration

Now the only thing that's left is the collectd configuration.

Within the /etc/collectd.conf or a separate file (if an include-folder is in use) we add the following line:

TypesDB "/usr/share/collectd/confluence.db"

The specified file will contain two custom data types that are used by collectd:

echo "jmx_memory      value:GAUGE:0:U
time_ms      value:GAUGE:0:U" > /usr/share/collectd/confluence.db

Within the collectd.conf we add the Java and JMX plugin configuration:

# Replace the double quotes from the Tomcat Connectors to store values in the correct name format

LoadPlugin "target_replace"

<Chain "PreCache">
<Rule "strip_slash_quote">
   <Target "replace">
     PluginInstance "\\\"" ""
     PluginInstance "\\\"" ""
   </Target>
   </Rule>
 </Chain>

# Load both java and JMX plugins

LoadPlugin java
LoadPlugin GenericJMX

<Plugin "java">
  # required JVM argument is the classpath
  # JVMArg "-Djava.class.path=/installpath/collectd/share/collectd/java"
  # Since version 4.8.4 (commit c983405) the API and GenericJMX plugin are
  # provided as .jar files.
  JVMARG "-Djava.class.path=/usr/share/collectd/java/collectd-api.jar:/usr/share/collectd/java/generic-jmx.jar"
  LoadPlugin "org.collectd.java.GenericJMX"

  <Plugin "GenericJMX">
    # Memory usage by memory pool
    <MBean "memory_pool">
      ObjectName "java.lang:type=MemoryPool,*"
      InstancePrefix "memory_pool-"
      InstanceFrom "name"
      <Value>
        Type "memory"
        #InstancePrefix ""
        #InstanceFrom ""
        Table true
        Attribute "Usage"
      </Value>
    </MBean>

    # Heap memory usage
    <MBean "memory_heap">
      ObjectName "java.lang:type=Memory"
      #InstanceFrom ""
      InstancePrefix "memory-heap"

      # Creates four values: committed, init, max, used
      <Value>
        Type "jmx_memory"
        #InstancePrefix ""
        #InstanceFrom ""
        Table true
        Attribute "HeapMemoryUsage"
      </Value>
    </MBean>

    # Non-heap memory usage
    <MBean "memory_non_heap">
      ObjectName "java.lang:type=Memory"
      #InstanceFrom ""
      InstancePrefix "memory-nonheap"

      # Creates four values: committed, init, max, used
      <Value>
        Type "jmx_memory"
        #InstancePrefix ""
        #InstanceFrom ""
        Table true
        Attribute "NonHeapMemoryUsage"
      </Value>
    </MBean>

    # Java classes that are currently loaded
    <MBean "classes">
      ObjectName "java.lang:type=ClassLoading"
      #InstancePrefix ""
      #InstanceFrom ""

      <Value>
        Type "gauge"
        InstancePrefix "loaded_classes"
        #InstanceFrom ""
        Table false
        Attribute "LoadedClassCount"
      </Value>
    </MBean>

    # Compilation time of the JavaVM
    <MBean "compilation">
      ObjectName "java.lang:type=Compilation"
      #InstancePrefix ""
      #InstanceFrom ""

      <Value>
        Type "total_time_in_ms"
        InstancePrefix "compilation_time"
        #InstanceFrom ""
        Table false
        Attribute "TotalCompilationTime"
      </Value>
    </MBean>

    # Details about the JavaVM Garbage Collector
    <MBean "garbage_collector">
      ObjectName "java.lang:type=GarbageCollector,*"
      InstancePrefix "gc-"
      InstanceFrom "name"

      # How often the GC ran since last collectd collection
      <Value>
        Type "invocations"
        #InstancePrefix ""
        #InstanceFrom ""
        Table false
        Attribute "CollectionCount"
      </Value>

      # How much time has been spent on GC since last collectd collection
      <Value>
        Type "total_time_in_ms"
        InstancePrefix "collection_time"
        #InstanceFrom ""
        Table false
        Attribute "CollectionTime"
      </Value>
    </MBean>

    # Information about all the enabled Tomcat HTTP and AJP connectors
    <MBean "thread_pool">
      ObjectName "Standalone:*,type=ThreadPool"
      InstancePrefix "request_processor-"
      InstanceFrom "name"

      # Currently running threads
      <Value>
        Type "threads"
        InstancePrefix "total"
        #InstanceFrom ""
        Table false
        Attribute "currentThreadCount"
      </Value>

      # Max. threads
      <Value>
        Type "threads"
        InstancePrefix "max"
        #InstanceFrom ""
        Table false
        Attribute "maxThreads"
      </Value>

      # Min. spare threads
      <Value>
        Type "threads"
        InstancePrefix "min_spare"
        #InstanceFrom ""
        Table false
        Attribute "minSpareThreads"
      </Value>

      # Current busy threads
      <Value>
        Type "threads"
        InstancePrefix "running"
        #InstanceFrom ""
        Table false
        Attribute "currentThreadsBusy"
      </Value>
    </MBean>

    # System properties related to the JavaVM
    <MBean "jvm_localhost_os">
      ObjectName "java.lang:type=OperatingSystem"

      # Open file descriptors
      <Value>
        Type "file_handles"
        InstancePrefix "os-open_fd_count"
        Table false
        Attribute "OpenFileDescriptorCount"
      </Value>

      # Max. allowed handles for user under which the JavaVM is running
      <Value>
        Type "file_handles"
        InstancePrefix "os-max_fd_count"
        Table false
        Attribute "MaxFileDescriptorCount"
      </Value>

      # Process time used by the JavaVM 
      <Value>
        Type "counter"
        InstancePrefix "os-process_cpu_time"
        Table false
        Attribute "ProcessCpuTime"
      </Value>
    </MBean>

    # Uptime of the JavaVM
    <MBean "uptime">
      ObjectName "java.lang:type=Runtime"
      InstancePrefix ""
      #InstanceFrom ""

      <Value>
        Type "uptime"
        InstancePrefix "uptime"
        #InstanceFrom ""
        Table false
        Attribute "Uptime"
      </Value>
    </MBean>

    # Confluence indexing statistics
    <MBean "confluence_index">
      ObjectName "Confluence:name=IndexingStatistics"
      InstancePrefix "confluence_index"
      #InstanceFrom ""

      # Items currently waiting for indexing
      <Value>
        Type "gauge"
        InstancePrefix "queue"
        #InstanceFrom ""
        Table false
        Attribute "getTaskQueueLength"
      </Value>

      # Time in ms of the last index run
      <Value>
        Type "time_ms"
        InstancePrefix "elapsed_time_in_ms"
        #InstanceFrom ""
        Table false
        Attribute "LastElapsedMilliseconds"
      </Value>
    </MBean>

    # Confluence mail statistics
    <MBean "confluence_mail_task_queue">
      ObjectName "Confluence:name=MailTaskQueue"
      InstancePrefix "confluence_mail_task_queue"
      #InstanceFrom ""

      # Mails currently waiting to be send 
      <Value>
        Type "email_count"
        InstancePrefix "tasks"
        #InstanceFrom ""
        Table false
        Attribute "TasksSize"
      </Value>

      # Mails currently stuck due to an error
      <Value>
        Type "email_count"
        InstancePrefix "error_queue"
        #InstanceFrom ""
        Table false
        Attribute "ErrorQueueSize"
      </Value>
    </MBean>

    # Confluence example latency as seen on the System Information page
    <MBean "confluence_database">
      ObjectName "Confluence:name=SystemInformation"
      InstancePrefix "confluence_database"
      #InstanceFrom ""

      <Value>
        Type "latency"
        InstancePrefix "example_latency"
        #InstanceFrom ""
        Table false
        Attribute "getDatabaseExampleLatency"
      </Value>
    </MBean>

    <Connection>
      Host "${node.name}"
      ServiceURL "service:jmx:rmi:///jndi/rmi://localhost:9999/jmxrmi"
      Collect "memory_pool"
      Collect "memory_heap"
      Collect "memory_non_heap"
      Collect "classes"
      Collect "compilation"
      Collect "garbage_collector"
      Collect "thread_pool"
      Collect "jvm_localhost_os"
      Collect "uptime"
      Collect "confluence_index"
      Collect "confluence_mail_task_queue"
      Collect "confluence_database"
    </Connection>
  </Plugin>
</Plugin>

To apply the configuration collectd must be restarted:

systemctl restart collectd

collectd output

Here are some sample graphs from my own Confluence instance.

collectd-confluence-jmx-graph-1 collectd-confluence-jmx-graph-2 collectd-confluence-jmx-graph-3 collectd-confluence-jmx-graph-4 collectd-confluence-jmx-graph-5 collectd-confluence-jmx-graph-6