Untangling Spring Batch Flow Control

Spring Batch provides a neat mechanism for job execution flow control. At first glance nothing really complicated – just couple configuration instructions controlling step transitions basis on execution result of previous step. Nevertheless, I often find developers confusing some basic concepts and making mistakes that may lead to serious problems, especially in job recovery.

In this post, I’d like to untangle the most common misunderstandings about Spring Batch flow control that I observed.

Before going any further, we must realise the difference between BatchStatus and ExitStatus. To make a long story short, BatchStatus represents status of a step or job from a “technical” side. It’s an enumeration so it has a fixed number of possible values: ABANDONED, COMPLETED, FAILED, STARTED, STARTING, STOPPED, STOPPING and UNKNOWN. Depending on BatchStatus, framework may or may not let you restart the failed job. ExitStatus represents business logic outcome. It’s a class, thus, its value can be customised. It comes with a predefined number of possible values (COMPLETED, EXECUTING, FAILED, NOOP, STOPPED, UNKNOWN) but it allows you to provide your custom status value depending on business logic needs.

When we define conditional flow, the on attribute refers to the ExitStatus. BatchStatus is automatically determined by framework basis on ExitStatus or determined basis on conditional flow settings. For instance, the following instruction will set BatchStatus to FAILED when step ends with ExitStatus COMPLETED_WITH_ERRORS:

1
<batch:fail on="COMPLETED_WITH_ERRORS" />

Order of transitions defined in XML does not matter

The following transition:

1
2
3
<batch:next on="COMPLETED" to="onCompletedStep" />
<batch:next on="COMPLETED_WITH_ERRORS to="onCompletedWithErrorsStep" />
<batch:fail on="FAILED" />

is exactly the same as:

1
2
3
<batch:fail on="FAILED" />
<batch:next on="COMPLETED" to="onCompletedStep" />
<batch:next on="COMPLETED_WITH_ERRORS to="onCompletedWithErrorsStep" />

Also, the following transition:

1
2
<batch:next on="COMPLETED" to="nextStep" />
<batch:fail on="*" />

is the same as:

1
2
<batch:fail on="*" />
<batch:next on="COMPLETED" to="nextStep" />

Spring Batch sorts state transition definitions by decreasing specificity of pattern counting wildcards. If wildcard counts are equal, then it falls back to lexicographic order. This way, COMPLETED or FAILED transitions are always checked before the * (asterix) transition.

The good practice is to keep the most specific patterns on top, but violating this rule has no impact on processing flow.

There is no need to cover all available exit statuses

When defining step transitions, it’s not necessary to cover all available ExitStatus values. For instance, if your step can be either COMPLETED or FAILED, you don’t have to define wildcard (*) transition for it. Spring Batch reference guide says that “all step’s transitions must be defined explicitly” but it doesn’t mean you need to cover all available ExitStatus values. Define only those values that are really possible for your step. For example, in case of previously mentioned step with two possible exit codes: COMPLETED and FAILED it is enough to define:

1
2
<batch:next on="COMPLETED" to="onCompletedStep" />
<batch:fail on="FAILED" />

One thing important to bear in mind, is that if your step ends up with ExitStatus for which there is no transition defined, Spring throws JobExecutionException: Flow execution ended unexpectedly caused by FlowExecutionException: Next state not found and automatically marks step as FAILED. That’s why it’s a good practice to put

1
<batch:fail on="FAILED" />

transition for each step that should be recoverable and which execution can result in an exception. With this statement, step fails with exception being the actual reason of its failure, instead of failing because of FlowExecutionException: Next state not found in case of exception occurred.

Job should fail at the exact point of failure

If some undesirable behaviour is detected in one of job’s steps, for example IOException is thrown when trying to rename some file, that particular step should be failed. The rule of thumb is to fail the step which has actually failed its execution, instead of routing flow to some common jobHasFailedLogAndAlertSomeAdult step and failing it there. To better illustrate this situation please review examples below.

The following example presents an ANTI-PATTERN that should be avoided – such construction is invalid and prevents job from being correctly recovered:

1
2
3
4
5
6
7
8
9
10
<batch:step id="myTestStep">
  <batch:tasklet ref="myTasklet" />
  <batch:next on="COMPLETED" to="someAnotherStep" />
  <batch:next on="FAILED" to="jobFailedStep" />
</batch:step>

<batch:step id="jobFailedStep">
  <batch:tasklet ref="callAnAdult" />
  <batch:fail on="*" />
</batch:step>

If myTasklet throws an exception, Spring Batch routes processing to jobFailedStep that is calling some tasklet and failing job. Why this is an anti-pattern? When job recovery is triggered, it iterates through job steps and re-run the first step that has failed. In this particular case it would be jobFailedStep instead of myTestStep so there would be no chance for myTasklet to try to execute again. Instead of giving myTestStep a chance to execute once again, job is failing in jobFailedStep straight after recovery has been run (this is the first step that has an ExitStatus.FAILED). Every time job recovery is run on this job, the same situation happens – there is no chance to actually recover.

What should be done instead, is to fail a job at the exact point of failure. If myTasklet throws an exception, this is the myTestStep that should be failed:

1
2
3
4
5
<batch:step id="myTestStep">
  <batch:tasklet ref="myTasklet"/>
  <batch:next on="COMPLETED" to="someAnotherStep" />
  <batch:fail on="FAILED" />
</batch:step>

This way, when job recovery starts, it picks up myTestStep and executes tasklet once again giving it a chance to complete successfully this time.

If step failed its execution, fail job instead of ending it

When step execution failed for some reason and you want to make job eligible for recovery you need to use <batch:fail> instead of <batch:end>. The former sets BatchStatus as FAILED, thus, makes job eligible for recovery. The latter, sets BatchStatus as COMPLETED and job cannot be started once again. If you try to do so, JobInstanceAlreadyCompleteException will be thrown.

For example, the following transition ends job with ExitStatus.FAILED but it does NOT make it recoverable (its BatchStatus is set to COMPLETED):

1
2
3
4
5
<batch:step id="myTestStep">
  <batch:tasklet ref="myTasklet"/>
  <batch:next on="COMPLETED" to="onCompletedStep" />
  <batch:end on="FAILED" exit-code="FAILED" />
</batch:step>

To make job recoverable at this step, it is required to fail processing instead of ending it. The following transition sets both BatchStatus and ExitStatus to FAILED.

1
2
3
4
5
<batch:step id="myTestStep">
  <batch:tasklet ref="myTasklet"/>
  <batch:next on="COMPLETED" to="onCompletedStep" />
  <batch:fail on="FAILED" />
</batch:step>

Comments

Author

photo

View Piotr Dyraga's LinkedIn profile  Piotr Dyraga
Software engineering consultant experienced in a wide range of projects (banking, logistics, computer networks and others). Please feel free to contact me if you are looking for development services for your project.

Recent Posts