Graceful Request Retries in Ruby Applications

Failure Management, Fallbacks, Exponential backoff, Tools and Patterns

When designing modern applications on microservices architecture and systems based on cloud solutions such as AWS, Azure, or Google Cloud imply the need to handle expected failures.

How to handle failures?

  1. Retry execute in background jobs

Retry failed code

Also Ruby has built-in retry keyword, an example can be changed in the next way:

Some best practices related to exceptions:

  1. List of handled errors should be specified in rescue, or at least rescue StandardError => e
  2. Each code block retry needs to be logged with attributes and retries count

Tools

Code in a Retriable.retriable block will be retried if an exception is raised.

Defaults

  • Rescue any exception inherited from StandardError
  • Make 3 tries (including the initial attempt) before raising the exception
  • Use randomized exponential backoff to calculate each succeeding try interval

Exponential backoff is a common algorithm for retrying requests. The retries exponentially increase the waiting time up to a certain threshold. The idea is that if the server is down temporarily, it is not overloaded with requests going at the same time when it comes back up.

Also, gem provides configurations for a specific context. A number of retries, list of exceptions can be specified for internal APIs, cloud services such as AWS and etc.

These are used simply by calling Retriable.with_context:

Unfortunately, gem doesn’t provide an interface for fallbacks, so you implement it by yourself.

Background Jobs

Unfortunately, errors handling in background jobs are global, so retry can be related not only to request. Try to move API requests to separate jobs that are not related to other logic and set limit retries for them. For example, Sidekiq makes 25 retires for the failed job (about 21 days), in most cases when working with HTTP services it doesn’t make sense.

Do not use retrying in a code which is running in background jobs because they can multiply retry count.

Error Handling

Let’s look at a good example aws-sdk-s3. Each service error handled on the API wrapper level.

Follow this approach when writing your API client for internal and external services.

Conclusion

Software Engineer. Interested in Full-Stack Development and DevOps.