Preview: MicroProfile Fault Tolerance in Payara Micro
Published on 01 Feb 2018
by Mike CroftOur Payara Engineers have been working very hard on lots of new features ready for our final 5.181 release! One of the key features we intend to deliver is compatibility with MicroProfile 1.2, which will include (among other things) a Fault Tolerance API.
What is MicroProfile?
The Eclipse MicroProfile project is a collaboration between application server vendors and the community. The initial aim was to create a project that would allow developers with skill in Java EE to use that knowledge in creating modern cloud-native or microservice-based applications.
Fault Tolerance in Payara Micro (and Payara Server)
Along with the benefits of modularising and distributing a system across multiple instances, there also come new challenges. In a monolith, the whole system is either up or down. With microservices, part of the system can experience any number of problems which can propagate through the rest of the system in very unpredictable and subtle ways.
To mitigate this, there are a number of design patterns which have emerged to make microservice architectures more fault tolerant. These patterns have all been incorporated into the MicroProfile Fault Tolerance specification and include:
- Circuit Breaker
Offers a way to fail fast by automatically failing execution to prevent the system overloading and thereby prevent indefinite wait or timeout from clients - Bulkhead
Isolate failures in part of the system while the rest of the system can still function. - Fallback
Provide an alternative solution for a failed execution - Retry
Define a criteria for when and how often to retry - Timeout
Define a duration for a timeout, after which execution is abandoned
Source: MicroProfile Fault Tolerance 1.0 Architecture
A Worked Example
To demonstrate some of the fault tolerance behaviour available in Payara Micro (and Payara Server), I have created a simple example which demonstrates Retry, Fallback and Timeout which can be found in the payara/payara-examples project on GitHub, in the microprofile/fault-tolerance directory.
There are 2 examples of fault tolerance. One which demonstrates the @Retry
annotation to re-attempt an execution which has failed, and another which demonstrates a @Timeout
with a @Fallback
method to handle a slow invocation.
Build the Example
mvn clean install
Run the Example
-
To run the Uber JAR, run
java -jar target/fault-tolerance-1.0-SNAPSHOT-microbundle.jar
-
To run the WAR file on a different Payara Micro instance, run
java -jar /path/to/payara-micro.jar -deploy target/fault-tolerance-1.0-SNAPSHOT.war
Watch the output of Payara Micro to see the URLs created, and visit the endpoints to trigger an example of fault tolerance. Watch the logs to see how Payara Micro behaves.
How it works
The example is a simple JAX-RS class with two @GET
methods, getEmployeeById
and getAllEmployees
. There is a list of 4 people which can be returned.
There are 2 methods to introduce simple problematic behaviour, one isDown()
which simply returns a boolean based on Math.random()
and isSlow()
which will sleep for a second based on Math.random()
Retry
@Retry
annotation has been added to the getEmployeeById
, where the isDown()
check may cause a RuntimeException
.@GET
@Path("{id}")
@Retry(maxRetries = 4, retryOn = {RuntimeException.class})
public String getEmployeeById(@PathParam("id") int id) {
System.out.println("Called getEmployeeById a total of " + ++retryCounter + " times");
if (id >= employees.size()) return "No such employee. Try a number lower than " + employees.size();
if (isDown()) throw new RuntimeException();
return employees.get(id);
}
private boolean isDown() {
// approx 80% chance
return Math.random() > 0.2;
}
There is a retryCounter
to show the total amount of times the method has been called, so we can see the retry in action. The isDown()
method should cause failures around 80% of the time but, with 4 retries, this probability is reduced. If the Retry eventually gets a success, then the result will be returned and the user will not see any problem. If the Retry gets a failure every time, then it will fail as normal and the user will see an ungraceful exception.
An example of the log output for one test is below. I invoked the method 3 times, and have indicated this among the log messages:
---> invoke <---
Called getEmployeeById a total of 1 times
---> invoke <---
Called getEmployeeById a total of 2 times
[2018-01-10T09:25:19.063+0000] [] [INFO] [] [fish.payara.microprofile.faulttolerance.interceptors.RetryInterceptor] [tid: _ThreadID=20 _ThreadName=http-thread-pool::http-listener(1)] [timeMillis: 1515576319063] [levelValue: 800] Retrying as long as maxDuration isnt breached, and no more than {0} times
Called getEmployeeById a total of 3 times
Called getEmployeeById a total of 4 times
Called getEmployeeById a total of 5 times
---> invoke <---
Called getEmployeeById a total of 6 times
[2018-01-10T09:25:37.861+0000] [] [INFO] [] [fish.payara.microprofile.faulttolerance.interceptors.RetryInterceptor] [tid: _ThreadID=25 _ThreadName=http-thread-pool::http-listener(6)] [timeMillis: 1515576337861] [levelValue: 800] Retrying as long as maxDuration isnt breached, and no more than {0} times
Called getEmployeeById a total of 7 times
Called getEmployeeById a total of 8 times
Called getEmployeeById a total of 9 times
Called getEmployeeById a total of 10 times
[2018-01-10T09:25:38.147+0000] [] [WARNING] [] [javax.enterprise.web] [tid: _ThreadID=25 _ThreadName=http-thread-pool::http-listener(6)] [timeMillis: 1515576338147] [levelValue: 900] [[
StandardWrapperValve[RestApplication]: Servlet.service() for servlet RestApplication threw exception
java.lang.RuntimeException
at EmployeeResource.getEmployeeById(EmployeeResource.java:36)
at org.jboss.weld.proxies.EmployeeResource$Proxy$_$$_WeldSubclass.getEmployeeById$$super(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jboss.weld.interceptor.proxy.TerminalAroundInvokeInvocationContext.proceedInternal(TerminalAroundInvokeInvocationContext.java:51)
In this example, the first invocation worked. The second did not work, and the Retry began to work, as shown in the log message. The method is then tried 4 more times and, on the 3rd attempt (number 5 shown by the counter), the method returned successfully. Next, I tried the invocation again and found that the method did not succeed, even after 4 retries, so an exception was thrown.
Timeout and Fallback
Both the @Timeout
and @Fallback
annotations have been added to getAllEmployees()
, and an extra method added called getAllEmployeesFallback()
to handle invocations when the @Timeout
is triggered.
private final long TIMEOUT = 500;
private final long SLEEPTIME = 1000;
@GET
@Fallback(fallbackMethod = "getAllEmployeesFallback")
@Timeout(TIMEOUT)
public String getAllEmployees() throws InterruptedException {
if (isSlow()) return employees.toString();
return employees.toString();
}
public String getAllEmployeesFallback() {
return "It took longer than expected to get all employees. Try again later!";
}
private boolean isSlow() throws InterruptedException {
if (Math.random() > 0.4) {
// approx 60% chance
Thread.sleep(SLEEPTIME);
return true;
}
return false;
}
In this example, the method will sporadically sleep for longer than the configured timeout. In this case, the @Fallback
will intercept the invocation and call getAllEmployeesFallback()
to give the user a different message.
There are minimal configuration options for Fault Tolerance through asadmin commands. The only configurable aspect relates to the @Asynchronous
annotation, which will ensure that the execution of the client request will be on a separate thread. There are two asadmin commands which allow you to get and set the ManagedExecutorService name or ManagedScheduledExecutorService name as follows:
Usage: asadmin [asadmin-utility-options] set-fault-tolerance-configuration
[--managedexecutorservicename <managedexecutorservicename>]
[--managedscheduledexecutorservicename <managedscheduledexecutorservicename>]
[--target <target(default:server-config)>]
[-?|--help[=<help(default:false)>]]
Fault Tolerance is just one of several MicroProfile 1.2 specs that are already implemented, with more on the way.
Stay tuned for more updates on using MicroProfile in Payara Server/Micro!
Related Posts
Nugget Friday - Building Resilient Microservices with MicroProfile Fault Tolerance
Published on 08 Nov 2024
by Luqman Saeed
0 Comments
While we all want maximum uptime for our software systems, failures and downtimes are inevitable. However, these can be minimized and quickly resolved through comprehensive, robust and well-designed fault tolerance mechanisms. This Nugget ...
The Payara Monthly Catch - October 2024
Published on 30 Oct 2024
by Chiara Civardi
0 Comments