Nugget Friday - Understanding Jakarta Batch Step Processing: A Developer's Guide

Published on 03 Jan 2025

One of the most common questions about Jakarta Batch specification is "How do steps actually work under the hood?" Today's nugget takes a look at batch processing in enterprise Java applications by exploring step execution and the different types of steps available in Jakarta Batch. So grab your favorite beverage and let's dig in!

The Problem

When building batch applications, you may often struggle with choosing between different step types and understanding how they behave during execution. Should you use a chunk-oriented step or a batchlet? How do you configure them properly? And what about execution monitoring? These decisions can significantly impact your batch application's performance and maintainability.

The Solution: Grokking Step Types and Configuration

The Jakarta Batch specification provides two main types of steps: chunk-oriented steps and batchlets. Each type requires proper configuration in both Java code and the Job XML. Let's break them down.

Chunk-Oriented Steps

Chunk steps are perfect for data-intensive operations that process items in a "read-process-write" pattern. Here's a complete example including both code and configuration.

First, let's define our batch artifacts:

@Named

public class CustomerReader implements ItemReader {

    @Inject 
    @BatchProperty(name = "input.file")

    private String inputFile;

    

    @Override

    public Object readItem() throws Exception {

        // Read the next customer record

        return customer;

    }

}




@Named

public class CustomerProcessor implements ItemProcessor {

    @Override

    public Object processItem(Object item) throws Exception {

        Customer customer = (Customer) item;

        // Enrich or transform customer data

        return customer;

    }

}




@Named

public class CustomerWriter implements ItemWriter {

    @Override

    public void writeItems(List<Object> items) throws Exception {

        // Write the chunk of customers to the database

    }

}

Then let's configure the step in our Job XML:

<step id="processCustomers">

    <chunk item-count="10">

        <reader ref="customerReader">

            <properties>

                <property name="input.file" value="customers.csv"/>

            </properties>

        </reader>

        <processor ref="customerProcessor"/>

        <writer ref="customerWriter"/>

    </chunk>

</step>

The key thing to understand about chunk processing is that it:

Reads items one at a time
Processes each item as it's read
Collects processed items into a "chunk"
Writes the entire chunk in a single transaction

Batchlet Steps

Batchlets are perfect for task-oriented processing that doesn't fit the item-oriented pattern. Think file transfer, system command execution, or any other "do something once" type operation.

@Named

public class DataCleanupBatchlet implements Batchlet {

    @Inject

    @BatchProperty(name = "retention.days")

    private String retentionDays;

    

    @Override

    public String process() throws Exception {

        // Perform cleanup operation

        cleanupStaleData(Integer.parseInt(retentionDays));

        return "COMPLETED";

    }

    

    @Override

    public void stop() throws Exception {

        // Handle stop request gracefully

    }

}

And its corresponding Job XML configuration:

<step id="cleanup">

    <batchlet ref="dataCleanupBatchlet">

        <properties>

            <property name="retention.days" value="30"/>

        </properties>

    </batchlet>

</step>

Step Execution Monitoring with Listeners

Listeners allow you to monitor and intercept step execution at various points. They're perfect for logging, metrics collection and cross-cutting concerns. There are several types:

StepListener: Monitors overall step lifecycle
ChunkListener: Monitors chunk processing (for chunk steps)
ItemReadListener: Monitors individual item reads
ItemProcessListener: Monitors individual item processing
ItemWriteListener: Monitors chunk writes

Here's an example of a step listener:

@Named

public class CustomerStepListener implements StepListener {

    

    @Override

    public void beforeStep() throws Exception {

        // Called once before step execution begins

        logger.info("Starting customer processing step");

    }

    

    @Override

    public void afterStep() throws Exception {

        // Called once after step execution completes

        logger.info("Completed customer processing step");

    }

}

And then we can configure it in your Job XML:

<step id="processCustomers">

    <listeners>

        <listener ref="customerStepListener"/>

    </listeners>

    <chunk item-count="10">

        <!-- chunk configuration as before -->

    </chunk>

</step>

Step Partitioning for Parallel Processing

With partitioning, a step can execute multiple instances of itself in parallel, with each instance working on a subset of the data. This is configured through a partition plan or mapper:

@Named

public class CustomerPartitionMapper implements PartitionMapper {

    

    @Override

    public PartitionPlan mapPartitions() throws Exception {

        PartitionPlan plan = new PartitionPlanImpl();

        plan.setPartitions(3); // Run on 3 threads

        

        Properties[] props = new Properties[3];

        // Set up properties for each partition

        for(int i = 0; i < 3; i++) {

            props[i] = new Properties();

            props[i].setProperty("partition.id", String.valueOf(i));

            props[i].setProperty("chunk.start", String.valueOf(i * 1000));

            props[i].setProperty("chunk.end", String.valueOf((i + 1) * 1000));

        }

        plan.setPartitionProperties(props);

        

        return plan;

    }

}

Configure partitioning in your Job XML:

<step id="processCustomers">

    <chunk item-count="10">

        <!-- chunk configuration as before -->

    </chunk>

    <partition>

        <mapper ref="customerPartitionMapper"/>

    </partition>

</step>

Why You Should Care

Understanding these step components helps you:

Make Better Design Decisions: Choose the right step type and know when to use listeners and partitioning.
Improve Performance: Use chunk steps for efficient data processing and partitioning for parallel execution.
Monitor Execution: Implement listeners to track progress and gather metrics.

Caveats

Transaction Scope: Chunk steps automatically handle transactions, but you need to manage your own transactions in batchlets, if needed.
Memory Management: With chunk steps, choose your chunk size carefully. Too large, and you risk memory issues; too small, and you lose performance.
Partition Overhead: While partitioning can improve performance, there's overhead in managing multiple threads. For small datasets, the overhead might outweigh the benefits.

Conclusions

Steps are the fundamental components of Jakarta Batch jobs. Learning about the different types of steps and their supporting components, such as listeners and partitioning, will help you to create more resilient and efficient batch applications.

Use chunk steps for data-intensive, item-oriented processing
Use batchlets for task-oriented processing
Add listeners for monitoring and cross-cutting concerns
Consider partitioning for parallel processing of large datasets

That's it for this week's nugget! Download your free copy of Payara Platform Community and start building better batch applications today. Happy Friday and happy coding!