A presentation at Velocity in October 2017 in London, UK by Lorna Jane Mitchell
Handling Failure in RabbitMQ Lorna Mitchell, IBM https://speakerdeck.com/lornajane
Queues and RabbitMQ • Queues are a brilliant addition to any application • They introduce coupling points • RabbitMQ is an open source, powerful message queue • https://www.rabbitmq.com @lornajane
What is Failure? Reality. @lornajane
A Selection Box Of Failures @lornajane
Message Not Processed Question: Better late than never? @lornajane
Message Not Processed Question: Better late than never? If not: • set up “at-most-once” delivery • configure queue with auto-ack @lornajane
Message Not Processed To react to unprocessed messages: • set up “at-least-once” delivery; requires messages to be acknowledged • beware duplicate and out-of-order messages • if the consumer drops connection or dies, message will be requeued automatically • detect failure and reject messages with requeue, or implement retries @lornajane
Implementing Retries If there isn’t built-in support, try this: 1. Identify message should be retried 2. Create a new message with same data 3. Add retry count/date 4. Ack the original message 5. Reject after X attempts @lornajane
Can Never Process Message When a worker cannot process a message: • be defensive and if in doubt: exit • reject the message (either with or without requeue) • look out for “poison” messages that can never be processed • configure the queue with a “dead letter” exchange to catch rejected messages @lornajane
Dead Letter Exchanges @lornajane
Reincarnating Messages From the dead letter exchange we usually: • monitor and log what arrives • collect messages, then re-route to original destination when danger has passed @lornajane
Queue Is Getting Bigger A constantly-growing queue should set off alarms Ideal queue length depends on: • size of message • available consuming resources • how long a message spends queued @lornajane
Queue Is Getting Bigger To stop queues from growing out of control: • set max queue size (oldest messages get dropped when it gets too long) • set TTL on the message to let stale messages get out of the backlog In both cases, we can use the dead letter exchange to collect and report on these @lornajane
Many Queues, Many Workers • Deploy as many workers as you need, they may consume multiple queues • The “right” number of workers may change over time • Workers can be multi-skilled, handling multiple types of message • If in doubt: use more queues in your setup @lornajane
Healthy Queues Good metrics avoid nasty surprises As a minimum: queue size, worker uptime, processing time @lornajane
Choose How To Fail @lornajane
Thanks! Blog post: http://lrnja.net/rabbitfail Personal blog: https://lornajane.net Try RabbitMQ: • https://rabbitmq.com/ • https://ibm.cloud/ @lornajane
View Handling Failure in RabbitMQ on Notist.
Dismiss
Some stories of failure and how to cope when it happens, presented at VelocityConf London