🗓️ February 2020👀  loading

Retry Mechanisms in Apache Camel

The Apache Camel library is, among other things, great for consuming messages on a retry schedule, but the setup can be a bit daunting. So in this article, I'll give you an overview of how to do it, as well as some useful tips.

Poster for Retry Mechanisms in Apache Camel
architectureapache-cameljava

Apache Camel is an incredibly powerful Java framework for (asynchronously) routing messages between different components within your system.

In this context, a component is very loosely defined as something capable of receiving or producing data; often in the form of so-called messages. Camel supports over 300 different components, examples of which include:

When you're not simply routing messages between different components, but when you're also processing these messages in Java (via the bean component for example), there are many reasons why the processing of a particular message could fail.

So in this article, we will explore Apache Camel's retry pattern by taking a look at the different built-in retry mechanisms when consuming messages from a queue.

Table of Contents

The Application

We're going to need some kind of application to discuss in this article, so consider an e-mail microservice as part of some larger system.

This microservice is connected to a message broker, where it's constantly listening for CustomerCreated application events on a particular queue. In reaction to such an event, the microservices should simply send the newly registered customer a welcome e-mail.

Diagram where the application subscribes to a 'customer.created' JSON message from an ActiveMQ instance, and sends this customer a welcome e-mail

For our technologies, we'll go with Spring Boot for the microservice app and ActiveMQ for the message broker. You can find this application's source code on GitHub - to follow along, please make sure to have the following tools installed:

To keep things simple, all e-mails will simply be logged to the console, without actually integrating to an external e-mail provider like Mailgun or Sendgrid.

Dead Letter Queues (DLQs)

Once you've added the required (Spring) dependencies for Camel itself, the ActiveMQ component and the Jackson data format, the following route builder can be used for consuming messages from a customer.created queue and sending a welcome e-mail in response.

@Component
@RequiredArgsConstructor
public class CustomerCreatedSubscriber extends SpringRouteBuilder {
 
    private final EmailService emailService;
 
    @Override
    public void configure() {
        from("activemq:queue:customer.created")
                .unmarshal().json(Jackson, CustomerCreated.class)
                .bean(this);
    }
 
    public void process(CustomerCreated event) {
        emailService.sendWelcomeEmail(event.getEmail(), event.getName());
    }
 
}

Unfortunately, there's a potential downside to this (simple) approach. If a CustomerCreated event comes in, but an exception occurs while attempting to send the welcome e-mail, Camel will log the exception to the console, but the original CustomerCreated message would be lost.

This is where Dead Letter Queues (DLQs) come in handy. You can think of a DLQ as a graveyard for messages which weren't successfully processed. Camel makes it very easy to set up DLQs via the built-in errorHandler method, available in every route builder.

@Override
public void configure() {
    errorHandler(deadLetterChannel("activemq:queue:customer.created.dead").useOriginalMessage());
    from("activemq:queue:customer.created")
            .unmarshal().json(Jackson, CustomerCreated.class)
            .bean(this);
}

This errorHandler method accepts an ErrorHandlerBuilder object, which can be conveniently created via the built-in deadLetterChannel method.

Soon we will have a look at the many ways in which ErrorHandlerBuilder objects can be customized, but for now, we'll just enable the useOriginalMessage() option, making sure that:

DLQs are a must-have if you want to be able to manually or automatically retry or replay failed messages. For example, the ActiveMQ web console, available at localhost:8161 by default, provides a way to move messages between queues, which can be used to manually replay a particular message by moving it back from the DLQ to the original queue.

Reusable Builder Classes

To keep things organized, I usually create:

This is what the new CustomerCreatedSubscriber class looks like when making use of these two helper classes.

@Component
@RequiredArgsConstructor
public class CustomerCreatedSubscriber extends SpringRouteBuilder {
 
    private final EmailService emailService;
 
    @Override
    public void configure() {
        errorHandler(dlq("customer.created.dead"));
        from(queue("customer.created").build())
                .unmarshal().json(Jackson, CustomerCreated.class)
                .bean(this);
    }
 
    public void process(CustomerCreated event) {
        emailService.sendWelcomeEmail(event.getEmail(), event.getName());
    }
 
}

Here, the queue(...) method is statically imported from the new EndpointBuilder class, and the dlq(...) method is statically imported from the new EndpointDLQBuilder class, which is defined as follows:

public class EndpointDLQBuilder {
 
    public static DefaultErrorHandlerBuilder dlq(String endpoint) {
        return new DeadLetterChannelBuilder(queue(endpoint).build())
                .useOriginalMessage();
    }
 
}

Basic Retries

The (Default)ErrorHandlerBuilder, used for configuring DLQs in Camel, provides an easy way to automatically retry messages which couldn't be processed first-time because of some exception. To test this, I've intentionally set up my EmailService to fail every request.

We can use the following DLQ configuration to:

public static DefaultErrorHandlerBuilder dlq(String endpoint) {
    return deadLetterChannel(queue(endpoint).build())
            .useOriginalMessage()
            .maximumRedeliveries(3)
            .redeliveryDelay(1000);
}

When I start the application (which I've set up to publish a dummy event to the customer.created queue during startup) I'm greeted with the following log statements:

2020-02-29 09:47:39.833  INFO   ...   Attempting to send e-mail; address=jessy@example.com, name=Jessy
2020-02-29 09:47:40.867  INFO   ...   Attempting to send e-mail; address=jessy@example.com, name=Jessy
2020-02-29 09:47:41.869  INFO   ...   Attempting to send e-mail; address=jessy@example.com, name=Jessy
2020-02-29 09:47:42.874  INFO   ...   Attempting to send e-mail; address=jessy@example.com, name=Jessy

The original attempt, followed by 3 retries, makes for a total of 4 attempts. Also note how the timestamps are approximately 1 second apart of each other, as configured by the redeliveryDelay(1000) option.

This setup is known as the retry pattern, which is especially useful if you're integrating with external systems. For example, if some external system is known to be unavailable for a few minutes from time to time, it might make sense to configure a Camel retry schedule with a fixed delay of 3-5 minutes between different attempts.

Exponential Backoff

When integrating with some external system, it's important to be a good client 😇. If an external system becomes unavailable, possibly due to traffic overload, we mustn't contribute to the problem by continuously trying to hit some struggling server with our tight Camel retry schedules.

This is where the exponential backoff algorithm comes in. Instead of reattempting requests on a fixed schedule, the idea is to wait (exponentially) longer between consecutive requests.

Visual comparison between a fixed-delay schedule and an exponential backoff schedule

Configuring exponential backoff in Camel is pretty easy:

public static DefaultErrorHandlerBuilder dlq(String endpoint) {
    return deadLetterChannel(queue(endpoint).build())
            .useOriginalMessage()
            .maximumRedeliveries(3)
            .redeliveryDelay(1000)
            .useExponentialBackOff()
            .backOffMultiplier(2);
}

Here, the backOffMultiplier(2) parameter doubles the configured 1000 millisecond wait time between each consecutive attempt, before eventually moving the message to the DLQ.

AttemptTime
109:00:00
209:00:01
309:00:03
409:00:07

The exponential growth may cause the wait time in between requests to grow too large. For example, if we set the maximumRedeliveries to 12, the wait time between the 11th and 12th (final) retry will be 211 = 2048 seconds (34 minutes).

To prevent the exponential backoff from growing too large, one can use the maximumRedeliveryDelay(20000) option to instruct Camel to never wait for longer than 20 seconds between attempts: this also is known as the truncated exponential backoff algorithm.

There's one more consideration I would like to mention here. If the schedules of a large number of clients are closely aligned, it's still possible for them to bring a struggling server down (whether the clients are using exponential backoff or not). This is a phenomenon known as the thundering herd problem and also the reason why it's often advised to add some randomness ("jitter") to the wait time between each attempt, but that's outside the scope of this article.

Logging

At this point, it would be hard to know if a message in our system has ended up on the DLQ. We could have a look at the ActiveMQ web console, but it would be much better if we also saw the error printed to the console.

For this, Camel offers us the following configuration, where the stack trace of the last failed delivery will be logged to the console, right before the message gets moved to the DLQ.

@Slf4j
public class EndpointDLQBuilder {
 
    public static DefaultErrorHandlerBuilder dlq(String endpoint) {
        return deadLetterChannel(queue(endpoint).build())
                .useOriginalMessage()
                .maximumRedeliveries(3)
                .redeliveryDelay(1000)
                .useExponentialBackOff()
                .backOffMultiplier(2)
                .log(log)
                .loggingLevel(LoggingLevel.ERROR)
                .logHandled(true)
                .logExhausted(true);
    }
 
}

Here, the log variable refers to a private static final org.slf4j.Logger field, auto-generated by Lombok's @Slf4j annotation.

This setup works fairly well, but usually, I also like to add an Exception header to the ActiveMQ message itself, containing the stack trace of the actual exception; this way, there's also some information on why a particular message has ended up on the DLQ.

We can use the onPrepareFailure option for this, which, according to the JavaDoc, was specifically designed to "enrich the message before sending it to a dead letter queue".

@Slf4j
public class EndpointDLQBuilder {
 
    public static DefaultErrorHandlerBuilder dlq(String endpoint) {
        return new DeadLetterChannelBuilder(queue(endpoint).build())
                .useOriginalMessage()
                .maximumRedeliveries(3)
                .redeliveryDelay(1000)
                .useExponentialBackOff()
                .backOffMultiplier(2)
                .log(log)
                .loggingLevel(LoggingLevel.ERROR)
                .logHandled(true)
                .logExhausted(true)
                .onPrepareFailure(EndpointDLQBuilder::attachException);
    }
 
    private static void attachException(Exchange exchange) {
        var message = exchange.getIn();
        var exception = (Exception) exchange.getProperty(Exchange.EXCEPTION_CAUGHT);
        message.setHeader("Exception", getStackTrace(exception));
    }
 
    @SneakyThrows
    private static String getStackTrace(Exception exception) {
        @Cleanup var sw = new StringWriter();
        @Cleanup var pw = new PrintWriter(sw, true);
        exception.printStackTrace(pw);
        return sw.getBuffer().toString();
    }
 
}

Conclusion

As we've discussed, Camel's retry pattern works great when integrating with potentially unreliable external systems.

To learn more about Camel's Dead Letter Channel and message redelivery, feel free to have a look at this very readable section from the official Camel user manual on which most of this article is based.