Spring Batch provides a neat mechanism for job execution flow control. At first glance nothing really complicated – just couple configuration instructions controlling step transitions basis on execution result of previous step. Nevertheless, I often find developers confusing some basic concepts and making mistakes that may lead to serious problems, especially in job recovery.
In this post, I’d like to untangle the most common misunderstandings about Spring Batch flow control that I observed.
Before going any further, we must realise the difference between
ExitStatus. To make a long story short,
BatchStatus represents status of a step or job from a “technical” side. It’s an enumeration so it has a fixed number of possible values:
UNKNOWN. Depending on
BatchStatus, framework may or may not let you restart the failed job.
ExitStatus represents business logic outcome. It’s a class, thus, its value can be customised. It comes with a predefined number of possible values (
UNKNOWN) but it allows you to provide your custom status value depending on business logic needs.
When we define conditional flow, the
on attribute refers to the
BatchStatus is automatically determined by framework basis on
ExitStatus or determined basis on conditional flow settings. For instance, the following instruction will set
FAILED when step ends with
Order of transitions defined in XML does not matter
The following transition:
1 2 3
is exactly the same as:
1 2 3
Also, the following transition:
is the same as:
Spring Batch sorts state transition definitions by decreasing specificity of pattern counting wildcards. If wildcard counts are equal, then it falls back to lexicographic order. This way,
FAILED transitions are always checked before the
* (asterix) transition.
The good practice is to keep the most specific patterns on top, but violating this rule has no impact on processing flow.
There is no need to cover all available exit statuses
When defining step transitions, it’s not necessary to cover all available
ExitStatus values. For instance, if your step can be either
FAILED, you don’t have to define wildcard (
*) transition for it. Spring Batch reference guide says that “all step’s transitions must be defined explicitly” but it doesn’t mean you need to cover all available
ExitStatus values. Define only those values that are really possible for your step. For example, in case of previously mentioned step with two possible exit codes:
FAILED it is enough to define:
One thing important to bear in mind, is that if your step ends up with
ExitStatus for which there is no transition defined, Spring throws
JobExecutionException: Flow execution ended unexpectedly caused by
FlowExecutionException: Next state not found and automatically marks step as
FAILED. That’s why it’s a good practice to put
transition for each step that should be recoverable and which execution can result in an exception. With this statement, step fails with exception being the actual reason of its failure, instead of failing because of
FlowExecutionException: Next state not found in case of exception occurred.
Job should fail at the exact point of failure
If some undesirable behaviour is detected in one of job’s steps, for example
IOException is thrown when trying to rename some file, that particular step should be failed. The rule of thumb is to fail the step which has actually failed its execution, instead of routing flow to some common
jobHasFailedLogAndAlertSomeAdult step and failing it there. To better illustrate this situation please review examples below.
The following example presents an ANTI-PATTERN that should be avoided – such construction is invalid and prevents job from being correctly recovered:
1 2 3 4 5 6 7 8 9 10
myTasklet throws an exception, Spring Batch routes processing to
jobFailedStep that is calling some tasklet and failing job. Why this is an anti-pattern? When job recovery is triggered, it iterates through job steps and re-run the first step that has failed. In this particular case it would be
jobFailedStep instead of
myTestStep so there would be no chance for
myTasklet to try to execute again. Instead of giving
myTestStep a chance to execute once again, job is failing in
jobFailedStep straight after recovery has been run (this is the first step that has an
ExitStatus.FAILED). Every time job recovery is run on this job, the same situation happens – there is no chance to actually recover.
What should be done instead, is to fail a job at the exact point of failure. If
myTasklet throws an exception, this is the
myTestStep that should be failed:
1 2 3 4 5
This way, when job recovery starts, it picks up
myTestStep and executes tasklet once again giving it a chance to complete successfully this time.
If step failed its execution, fail job instead of ending it
When step execution failed for some reason and you want to make job eligible for recovery you need to use
<batch:fail> instead of
<batch:end>. The former sets
FAILED, thus, makes job eligible for recovery. The latter, sets
COMPLETED and job cannot be started once again. If you try to do so,
JobInstanceAlreadyCompleteException will be thrown.
For example, the following transition ends job with
ExitStatus.FAILED but it does NOT make it recoverable (its
BatchStatus is set to
1 2 3 4 5
To make job recoverable at this step, it is required to fail processing instead of ending it. The following transition sets both
1 2 3 4 5