Nugget Friday - Prevent Jakarta EE App Crashes with Payara Server's HealthCheck

Published on 01 Nov 2024

Ensuring consistent performance and early detection of potential system issues in Jakarta EE (formerly Java EE) production deployments are critical in every production deployment. The Payara HealthCheck Service is a built-in tool designed specifically to help you monitor the health of your Payara Server deployments. While often overlooked, this service is an essential asset for operations teams who seek to minimise downtime and maximise system reliability. Let's dive into what the HealthCheck Service does, how it can be configured, and why it's valuable for monitoring your app server's critical components.

The Problem: Silent Performance Degradation

Traditional monitoring often relies on external tools, introducing complexity and potential blind spots. This can make it difficult to detect performance issues until they become major outages or negatively impact user experience. In Java EE or Jakarta EE environments, there are many performance metrics that could potentially fail but are not always easy to detect early. These include high CPU usage, excessive memory consumption, and garbage collection irregularities that can lead to performance degradation. Without consistent monitoring, operations teams may face challenges in diagnosing complex failures after they occur, resulting in extended downtimes and missed SLAs.

The Solution: Payara's Built-in HealthCheck Service

The Payara HealthCheck Service provides an automated and continuous way to monitor the performance and health of your Payara Server. It works by regularly checking specific metrics and issuing notifications when predefined thresholds are exceeded. These notifications allow you to identify problems early and create a faster response workflow or root cause analysis.

Core Metrics Monitored by HealthCheck

When enabled, Payara's HealthCheck service can track the following vital statistics:

Host CPU Usage – Monitor general CPU metrics of the server host.
Host Memory Usage – Track the system’s memory consumption (Linux and BSD derivatives only).
JVM Garbage Collections – Focus on Java Garbage Collection cycles that can affect performance.
JVM Heap Usage – Keep an eye on the critical JVM heap size to avoid memory-related crashes.
CPU Usage of Individual Threads – Discover workload-heavy or "hogging" threads that could be consuming too many resources.
Stuck Threads – Detect threads that are no longer making progress, helping mitigate the risk of stalls.
MicroProfile Metrics – Integrate with pre-existing MicroProfile monitoring for seamless metric aggregation.

How It Works

Once set up, Payara’s HealthCheck Service can be configured to send status notifications when specific thresholds are crossed. These thresholds can be configured as GOOD, WARNING, or CRITICAL, allowing you to classify the severity of the issue. For example, while a WARNING-level notification might not require immediate action, a CRITICAL notification would suggest that you take swift intervention to avoid server failure.

Here’s an example of what a log entry might look like when a metric hits its threshold:

[SEVERE] [fish.payara.nucleus.healthcheck.HealthCheckService] 

[status=CRITICAL, message='Thread with id: 145-testing-thread-1 is a hogging thread for the last 59 seconds 999 milliseconds']

In this specific instance, we’re looking at a thread that’s been identified as a "hogging thread" and has exceeded the allowed operating window. This immediate feedback allows operations teams to quickly identify and address potential issues before they escalate.

Configuring HealthCheck for Your Needs

One of the strengths of Payara's HealthCheck Service is its flexibility. You can fine-tune each aspect of the service to match your application's specific requirements:

General Settings: Control the overall service behavior, including enabling/disabling, setting check intervals, and configuring historical trace storage.
Individual Checkers: Each performance metric (CPU, Memory, Garbage Collection, etc.) can be configured independently. This allows you to set different thresholds and check frequencies for each aspect of your application's health.
Notification Integration: HealthCheck events can be routed to various notifiers, including logs, JMS queues, or even custom notifiers you've implemented.

Getting Started

Enabling the HealthCheck Service is straightforward. You can use the Payara Admin Console or the command line:

asadmin set-healthcheck-configuration --enabled=true --dynamic=true   --historic-trace-enabled=true --historic-trace-store-size=20   --set-notifiers=log-notifier,jms-notifier

This command enables the service, turns on historical tracing, and sets up log and JMS notifiers.

Additionally, the HealthCheck service integrates smoothly with MicroProfile Health endpoints, enabling you to expose certain metrics for quick health checks via RESTful APIs. If you’re using monitoring tools compatible with these endpoints, you can further streamline your monitoring processes.

Advanced Features: Hogging and Stuck Threads

Two particularly powerful features of the HealthCheck Service are its ability to detect hogging and stuck threads:

Hogging Threads: Identifies threads that are consuming an excessive amount of CPU time. This is critical for detecting poorly optimized code or unexpected processing bottlenecks.
Stuck Threads: Detects threads that have been unresponsive for a specified period. This can help identify deadlocks or long-running operations that might be impacting your application's responsiveness.

Why You Should Care

Proactive Monitoring: Get notifications before small issues snowball into critical outages.
Reduced Diagnostic Time: With logs clearly showing when and why an issue occurred, your team spends less time diagnosing problems.
Integration with MicroProfile: For enterprise teams already using MicroProfile, HealthCheck integrates several metrics for holistic system overview.
Customizable Thresholds: Configure the service to align with the specific performance requirements of your deployment environments, whether in production, test, or development

Conclusions

The Payara HealthCheck Service is an indispensable tool for any operations team focused on ensuring the performance and stability of Payara server environments. From CPU metrics and memory tracking to thread management and garbage collection monitoring, it provides an extensive set of tools to identify and fix bottlenecks early.

Configuring and keeping an eye on the metrics that matter to your specific deployment can drastically reduce downtime, improve response times, and ensure your systems are operating at an optimal level of performance. So why not give it a try? Download your trial copy of Payara Enterprise, enable the Payara HealthCheck Service in your and take advantage of real-time monitoring tailored to your enterprise needs. Your systems—and team—will thank you. Happy Coding!

Try Payara Enterprise for FREE