Development

SQS vs SNS for Lambda Dead Letter Queues

SQS vs SNS for Lambda Dead Letter Queues

Serverless computing and event-driven functions are what it’s all about at the moment. But what happens when the event trigger fires, and your process then encounters an error? How do you recover from this given the event has since passed and may never happen again? This is a common question in AWS when working with their serverless, event-driven Lambda Functions.

Fortunately, AWS lets you define Dead Letter Queues for this very scenario. This option allows you to designate either an SQS queue or SNS topic as a DLQ, meaning that when your Lambda function fails it will push the incoming event message (and some additional context) onto the specified resource. If it’s SNS you can send out alerts or trigger other services (maybe even a retry of the same function - although watch out for infinite loops), or any combination of the above, given its fanout nature. If it’s SQS you can persist the message and process it with another service.

So let’s look at both options in a little more detail.

SQS Dead Letter Queue

Using SQS as a Dead Letter Queue (DLQ) ensures that you have a durable store for failed events that can be monitored (allowing necessary services/individuals to be alerted) and picked up for resolution at your convenience. This allows you to process failures in bulk, have a defined wait period before re-triggering the original event, or taking some other steps to resolution.

Diagram showing an AWS architecture with SQS Dead Letter Queue

SQS gives you a durable dead letter queue that can be monitored and polled to collect failed events for re-processing or special attention.

The fact that you don’t reprocess the event straight away gives you a little more flexibility around when and how you deal with lambda failures.

Pros

  • Durability: process when you’re ready to deal with the issue, maybe in bulk.
  • Can keep messages for up to 14 days
  • Next to guaranteed delivery

Cons

  • Latency: not event-driven so must be polled.
  • Single-subscriber: Messages will be deleted after being consumed by a subcriber, so it assumes a single process will be taking action on failed messages.

SNS Dead Letter Queue

SNS or Simple Notification Service is a key part of AWS’s event-driven offering, letting you process events almost instantaneously and fan-out to multiple subscribers. It’s a great way to integrate applications in a microservices architecture. You can also use an SNS Topic as a Dead Letter Queue (DLQ). This has the benefit of allowing you to instantly take action on failure, whether that be attempting to re-process the message, alert an individual/process, store the event message somewhere for follow up, or any combination/all of the above.

Diagram showing an AWS architecture with SNS Dead Letter Queue

SNS Dead Letter Queues allow for multiple actions in response to a failure, such as sending a notification and adding the message to a more durable resource, such as an SQS queue.

The key to the SNS approach is its flexibility in sending messages to multiple subscribers. it allows you to take some action immediately, while also passing the message to other, more suitable systems where it can be picked up and processed.

Pros

  • Event-driven: An SNS DLQ will trigger actions instantly upon receiving a message.
  • Fan-out: Configuring multiple subscribers allows multiple actions to be taken by different subscribers at the same time.

Cons

  • Non-Durable: SNS doesn’t keep messages for more than an hour.

Best of Both Worlds

A pattern that works rather well, and offers the best of both worlds, is to combine both SNS and SQS as in the diagram above. By defining an SNS Topic as the DLQ, and having an SQS subscriber attached to the SNS Topic, you can have your durable store in the SQS queue, while also taking instant action. The only caveat is that if you are re-attempting to process the message and this time it succeeds, you need some way to tell SQS so that you can remove the message from the queue.

Not perfect by any stretch, but it gives a little of the benefit of both.

Summary

There are a huge number of different patterns (and anti-patterns) out there for implementing SQS and SNS, as well as Lamba and event-driven patterns in general. The two above are just a basic representation that work well in certain scenarios. I’d be really interested to hear from other people who have worked with serverless/event-driven on AWS and what your opinions are, as well as any patterns you’ve found to be a good way of managing DLQs.

Please leave your comments thoughts below!

comments powered by Disqus