Scaling Payara Micro Applications with Kubernetes

Originally published on 14 May 2019
Last updated on 14 May 2019

When using Docker images as the way to deploy your application, many organizations use Kubernetes to manage the containerized version of their application. This blog gives you a short overview of Kubernetes and how to run your Payara Micro application in a scaled fashion by either defining the scaling manually, or automatically by the Horizontal Pod scaler.

Understanding Kubernetes Pods and Services

A Kubernetes Pod is a group of one or more containers (such as Docker containers) with a shared storage/network, and a specification for how to run these containers. The Pod is the basic building block within Kubernetes just as a Container is the basic building block for Docker.

Although you are able to run multiple Docker containers in one pod, most of the time there will only be one. A nice thing, however, is that all containers within a Pod share the same IP address and they can address each other through `localhost`. So in that sense, the Pod resembles your physical machine. The Machine concept within Kubernetes is called the Node and most likely runs multiple Pods.

Since the WorkLoad Manager of Kubernetes is responsible for deciding where a Pod will be instantiated and thus where your application is run, you do not know on which machine and thus IP address it will be available.

As mentioned in the definition, the Pod also contains some instructions on how to run these containers, it is very well possible that after restarting the Pod, it is hosted on another Node and thus the IP address is different.

Therefore, you need the other important concept of Kubernetes, the Service. The Service is a logical set of Pods and a policy by which to access them (something like a load-balancer and/or reverse proxy that knows how to access the application). From that point of view, Kubernetes is really a system to run your services (micro-services or basically any of your applications) in a transparent way.

To the outside world, you define that a certain Service is available, and this is internally routed to your application in a certain Pod, running on a certain Node that is under the control of the Kubernetes Manager. And for the outside world, there is just one single access point, but internally this can be handled by multiple instances of your application running in different Pods.

The Readiness Probe

When is the Pod considered 'ready' so that requests made to the Kubernetes Service are routed to that Pod?

Without any specific configuration, when the Pod is active according to the Kubernetes logic, the Pod is assigned to a Node and the container(s) of the Pod is(are) running, some request can be routed to it.

Of course, all applications need some initialization before they can process a request. There is no Container, using an application written using any kind of technology, that can respond immediately after the Container is started.

A request in the time frame of the initialization, how small it may be, will result in an error because the application can not yet handle this. Therefore it is important to configure the Readiness Probe.

As the name suggests, it gives us a way to define when the Pod can be considered ready and requests can be handled. As with all the features of Kubernetes, you can define the Readiness Probe through the YAML configuration of the Deployment.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: payara-micro
  labels:
    name: payara
spec:
  template:
    metadata:
      labels:
        name: payara
    spec:
      containers:
        - name: payara-micro
          image: dynamic-cluster:demo
          imagePullPolicy: IfNotPresent
          ports:
            - name: payara-micro
              containerPort: 8080
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 3
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 2
            failureThreshold: 20

The Readiness section defines the configuration for that Readiness Probe. In the case of Payara Micro, we can use the Health endpoint.

The Health Endpoint is defined by the MicroProfile Health Specification. It allows you to define some checks that are executed, in response to a request to that /health endpoint, that determines if your application is ready and/or healthy to accept requests.

The default implementation within Payara Micro is that this endpoint doesn’t return a success response (HTTP status 200 and status UP) until the system is completely initialized and an application is deployed.

When you define some custom Health checks, see the MicroProfile Health specification on how to do this, you can influence the time when the Pod is marked as ready and takes into account the initialization your application might need.

As you can see in the example YAML configuration above, the same endpoint is also defined as liveness check. So when you have defined some important checks, you can indicate that your instance of the application is no longer considered healthy and that it will be recreated by the Kubernetes System.

Manual Scaling

You can easily scale a specific service manually within Kubernetes with the scale command. Let us have a quick look at the Kubernetes Service first.

As already mentioned, it defines how a certain Kubernetes deployment will be exposed to the outside world.

apiVersion: v1

kind: Service

metadata:

  name: payara-micro

labels:

  name: payara

spec:

  type: LoadBalancer

  #type: NodePort

  ports:

    - port: 8080

  selector:

    name: payara

Only a few things need to be defined. With the metadata section, we define the name of the service. And within the spec section, we define on which port the service will be available and which Kubernetes deployment will be selected through the selector part.

By default, only one instance is started of our application. And with the following scale command, we can double this for our example.

kubectl scale deployments/payara-micro --replicas=2

Since we have defined that Kubernetes should perform a LoadBalancing type, the second instance of our application is picked up by it automatically when the readiness probe identifies that it is safe to route some requests to this Pod and the application instance it is running.

The Loadbalancer should, of course, be correctly configured within your cloud provider (see the documentation of your Cloud environment on how to do this) but this also works locally with Docker Desktop for Mac, Docker Desktop for Windows or a Minicube installation on your development machine.

Horizontal Pod Scaling

Besides the manual scaling, we can also use a feature from the Kubernetes Platform itself which determines the optimal number of Pods for your application based on the load of the CPU for example associated with your Pods.

In case you experience a higher load, some additional Pods are created and initialized automatically.

On the other hand, during off-peak hours, the number of Pods will be scaled down.

This feature requires the installation of the Metrics Server but otherwise, it just comes down to defining what an ideal load of your applications is.

kubectl autoscale deployment payara-micro --cpu-percent=40 --min=1 --max=4

With this command, we enable the autoscaling feature for our payara-micro deployment artifact. We have specified that there can be between 1 and 4 pods and that the optimal CPU load should be 40%.

A few examples make it clear how this scaling works.

If we have only one pod at a given time and the CPU load approaches the 80% value, a second Pod will be created by The Kubernetes Manager so that on average, each of the pods has a CPU load of around 40%.

In the case the load drops afterward in our 2 pod scenario, the scaling algorithm will stop one of the pods. If the average CPU load of the two pods is 20% in our example, the load can easily be handled by 1 instance and thus a scale down happens.

This feature is highly customizable and custom metrics can be defined and used. Also, the time interval when the calculations are performed can be configured and is by default 15 seconds.

Payara Data Grid on K8s

Payara Micro has another functionality which is specifically targeted to Kubernetes.

The Data Grid can be used as a cache to keep information between the stateless calls to the endpoints of your application, bundled with Payara Micro.

You can store information in the Data Grid, which is powered by Hazelcast, using the JCache specification or by injection the HazelcastInstance class into a CDI bean. In the latter case, you have full programmatic access to the Hazelcast functionality.

Payara Micro has added some discovery mechanism so that multiple Payara Instances running on Kubernetes the Hazelcast instances join the same cluster. So when your Application is managed through Kubernetes, the Data Grid works out of the box and no configuration is required. This discovery mechanism is implemented using the Kubernetes API.

Payara Micro is Well Suited for Use with Kubernetes

With Payara Micro's zero configuration setup, the executable jar can be packaged within a Docker Image running your application through the hollow JAR principles.

Specifically for Kubernetes, it has the optimized health endpoint which is required for the correct routing of the request to the Pod. And with the specific Kubernetes discovery mechanism, the Data grid automatically finds all instances running in the Pods by interrogating the Kubernetes API and data can be shared seamlessly in your cluster.