The JBatch Jakarta EE specification describes the process of how background jobs can be executed on a Jakarta EE compatible runtime. Using the Batch Specification, the runtime can execute some jobs that don't require any user input. Most of the time they are scheduled to execute at a certain moment of the day, but they can be triggered on-demand.
The JobRepository contains the definition and execution history and the metadata of the jobs that are known by the Server Runtime. For the proper functioning of the feature, the metadata information needs to be persisted. However, the specification doesn't define where this metadata can be stored or how it can be cleaned up.
Payara Platform products can store the JobRepository on any supported database and starting with Payara Community 5.2021.1 and Payara Enterprise 5.25.0 releases, we introduce an Asadmin Tool command to clean up the old execution data.
Two types of metadata need to be kept for the JBatch system:
- Job definitions and the steps within a Job
- Information about the execution of the jobs and the steps
The Job definition itself is provided by the developer in the batch-job XML file contained within the artifact that also contains the Job and step code. A job, for example a daily job, is composed of one or more steps, such as backup some files, backup some records, aggregate information, etc ... Steps can be executed in parallel or sequentially based on how you define the flow in the XML file.
When the daily job starts on Monday morning for example, an entry is created in the Job Instance Data table. The detailed information of the execution is stored in the Job Execution Data table which contains the start and end time of the Job execution and the status of it (success or failure, for example). When some steps of the job fail, you can ask to retry the job again and the steps will be executed from the job that failed. A new entry is created for this try in the Job Execution Data table but not in the Job Instance Data table as it is still the same job that is retried.
Similar information is kept for each step of the job so that only failed steps can be retried later on. If you have many jobs that execute regularly, the information stored in the database regarding these job executions can become large. The clean up of this information is not described in the specification so it is not implemented by default.
Asadmin Tool Command
There is a new Asadmin Tool command which can be used with both Payara Server and Payara Micro to selectively clean up the data stored for each Job execution, the clean-jbatch-repository command. Our examples will cover use of the tool in Payara Server.
Regarding the example of the daily job in the above section, it is important to know if the job ran successfully or not. In case of failure, it will give you an indication of what the problem was, allows the system to restart a Job, and only retries the steps that failed.
But you do not need the confirmation that the daily job from two weeks ago was executed successfully. So after some time has passed and you are no longer interested in the information, it can be removed from the database tables.
When you have the following definition of the Job in XML
<job id="dailyJob" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
You can execute the following command
./asadmin clean-jbatch-repository --status=COMPLETED --days=7 dailyJob
This will remove all information related to our dailyJob for completed executions (so no longer a failure) that is older than 7 days.
Another more thorough command can be executed to clean up data, even more, a month after execution as the information is no longer required in most cases.
./asadmin clean-jbatch-repository --status=ALL --days=30 dailyJob
With the new Asadmin Tool command clean-jbatch-repository, you can clean up the data stored for the JBatch system. This command makes it easy to restrict the amount of stored data, especially when you have jobs executing regularly. The data is vital for the correct execution of the jobs but it's no longer required when the job is completed. This command is an addition within Payara Server as the specification doesn't mention any cleanup requirement. With the --status and --days parameters, you can define a fine-grained policy on how the clean up is performed.