Preview: MicroProfile Fault Tolerance in Payara Micro

Published on 01 Feb 2018

Our Payara Engineers have been working very hard on lots of new features ready for our final 5.181 release! One of the key features we intend to deliver is compatibility with MicroProfile 1.2, which will include (among other things) a Fault Tolerance API.

What is MicroProfile?

The Eclipse MicroProfile project is a collaboration between application server vendors and the community. The initial aim was to create a project that would allow developers with skill in Java EE to use that knowledge in creating modern cloud-native or microservice-based applications.

Fault Tolerance in Payara Micro (and Payara Server)

Along with the benefits of modularising and distributing a system across multiple instances, there also come new challenges. In a monolith, the whole system is either up or down. With microservices, part of the system can experience any number of problems which can propagate through the rest of the system in very unpredictable and subtle ways.

To mitigate this, there are a number of design patterns which have emerged to make microservice architectures more fault tolerant. These patterns have all been incorporated into the MicroProfile Fault Tolerance specification and include:

Circuit Breaker
Offers a way to fail fast by automatically failing execution to prevent the system overloading and thereby prevent indefinite wait or timeout from clients
Bulkhead
Isolate failures in part of the system while the rest of the system can still function.
Fallback
Provide an alternative solution for a failed execution
Retry
Define a criteria for when and how often to retry
Timeout
Define a duration for a timeout, after which execution is abandoned

Source: MicroProfile Fault Tolerance 1.0 Architecture

A Worked Example

To demonstrate some of the fault tolerance behaviour available in Payara Micro (and Payara Server), I have created a simple example which demonstrates Retry, Fallback and Timeout which can be found in the payara/payara-examples project on GitHub, in the microprofile/fault-tolerance directory.

There are 2 examples of fault tolerance. One which demonstrates the @Retry annotation to re-attempt an execution which has failed, and another which demonstrates a @Timeout with a @Fallback method to handle a slow invocation.

Build the Example

To build the example, simply run mvn clean install

Run the Example

To run the Uber JAR, run
java -jar target/fault-tolerance-1.0-SNAPSHOT-microbundle.jar
To run the WAR file on a different Payara Micro instance, run
java -jar /path/to/payara-micro.jar -deploy target/fault-tolerance-1.0-SNAPSHOT.war

Watch the output of Payara Micro to see the URLs created, and visit the endpoints to trigger an example of fault tolerance. Watch the logs to see how Payara Micro behaves.

How it works

The example is a simple JAX-RS class with two @GET methods, getEmployeeById and getAllEmployees. There is a list of 4 people which can be returned.

There are 2 methods to introduce simple problematic behaviour, one isDown() which simply returns a boolean based on Math.random() and isSlow() which will sleep for a second based on Math.random()

Retry

The @Retry annotation has been added to the getEmployeeById, where the isDown() check may cause a RuntimeException.

@GET
    @Path("{id}")
    @Retry(maxRetries = 4, retryOn = {RuntimeException.class})
    public String getEmployeeById(@PathParam("id") int id) {
        System.out.println("Called getEmployeeById a total of " + ++retryCounter + " times");
        if (id >= employees.size()) return "No such employee. Try a number lower than " + employees.size();
        if (isDown()) throw new RuntimeException();
        return employees.get(id);
    }

    private boolean isDown() {
        // approx 80% chance
        return Math.random() > 0.2;
    }

There is a retryCounter to show the total amount of times the method has been called, so we can see the retry in action. The isDown() method should cause failures around 80% of the time but, with 4 retries, this probability is reduced. If the Retry eventually gets a success, then the result will be returned and the user will not see any problem. If the Retry gets a failure every time, then it will fail as normal and the user will see an ungraceful exception.

An example of the log output for one test is below. I invoked the method 3 times, and have indicated this among the log messages:

---> invoke <---
 
Called getEmployeeById a total of 1 times
 
 ---> invoke <---
 
Called getEmployeeById a total of 2 times
[2018-01-10T09:25:19.063+0000] [] [INFO] [] [fish.payara.microprofile.faulttolerance.interceptors.RetryInterceptor] [tid: _ThreadID=20 _ThreadName=http-thread-pool::http-listener(1)] [timeMillis: 1515576319063] [levelValue: 800] Retrying as long as maxDuration isnt breached, and no more than {0} times
 
Called getEmployeeById a total of 3 times
Called getEmployeeById a total of 4 times
Called getEmployeeById a total of 5 times
 
 ---> invoke <---
 
Called getEmployeeById a total of 6 times
[2018-01-10T09:25:37.861+0000] [] [INFO] [] [fish.payara.microprofile.faulttolerance.interceptors.RetryInterceptor] [tid: _ThreadID=25 _ThreadName=http-thread-pool::http-listener(6)] [timeMillis: 1515576337861] [levelValue: 800] Retrying as long as maxDuration isnt breached, and no more than {0} times
 
Called getEmployeeById a total of 7 times
Called getEmployeeById a total of 8 times
Called getEmployeeById a total of 9 times
Called getEmployeeById a total of 10 times
[2018-01-10T09:25:38.147+0000] [] [WARNING] [] [javax.enterprise.web] [tid: _ThreadID=25 _ThreadName=http-thread-pool::http-listener(6)] [timeMillis: 1515576338147] [levelValue: 900] [[
  StandardWrapperValve[RestApplication]: Servlet.service() for servlet RestApplication threw exception
java.lang.RuntimeException
        at EmployeeResource.getEmployeeById(EmployeeResource.java:36)
        at org.jboss.weld.proxies.EmployeeResource$Proxy$_$$_WeldSubclass.getEmployeeById$$super(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.jboss.weld.interceptor.proxy.TerminalAroundInvokeInvocationContext.proceedInternal(TerminalAroundInvokeInvocationContext.java:51)

In this example, the first invocation worked. The second did not work, and the Retry began to work, as shown in the log message. The method is then tried 4 more times and, on the 3rd attempt (number 5 shown by the counter), the method returned successfully. Next, I tried the invocation again and found that the method did not succeed, even after 4 retries, so an exception was thrown.

Timeout and Fallback

Both the @Timeout and @Fallback annotations have been added to getAllEmployees(), and an extra method added called getAllEmployeesFallback() to handle invocations when the @Timeout is triggered.

private final long TIMEOUT = 500;
    private final long SLEEPTIME = 1000;

    @GET
    @Fallback(fallbackMethod = "getAllEmployeesFallback")
    @Timeout(TIMEOUT)
    public String getAllEmployees() throws InterruptedException {
        if (isSlow()) return employees.toString();
        return employees.toString();
    }

    public String getAllEmployeesFallback() {
        return "It took longer than expected to get all employees. Try again later!";
    }

    private boolean isSlow() throws InterruptedException {
        if (Math.random() > 0.4) {
            // approx 60% chance
            Thread.sleep(SLEEPTIME);
            return true;
        }
        return false;
    }

In this example, the method will sporadically sleep for longer than the configured timeout. In this case, the @Fallback will intercept the invocation and call getAllEmployeesFallback() to give the user a different message.

There are minimal configuration options for Fault Tolerance through asadmin commands. The only configurable aspect relates to the @Asynchronous annotation, which will ensure that the execution of the client request will be on a separate thread. There are two asadmin commands which allow you to get and set the ManagedExecutorService name or ManagedScheduledExecutorService name as follows:

  Usage: asadmin [asadmin-utility-options] set-fault-tolerance-configuration
    [--managedexecutorservicename <managedexecutorservicename>]
      [--managedscheduledexecutorservicename <managedscheduledexecutorservicename>]
      [--target <target(default:server-config)>]
      [-?|--help[=<help(default:false)>]]

Fault Tolerance is just one of several MicroProfile 1.2 specs that are already implemented, with more on the way.

Stay tuned for more updates on using MicroProfile in Payara Server/Micro!