Last year we took over a Symfony e-commerce application that was struggling under a combination of high customer traffic and a constant flood of write operations from external systems: product updates, price changes, availability feeds, all hitting the application simultaneously. The Messenger setup was already in place. Messages were being dispatched. Workers were running. On paper, everything was fine.
In practice, the workers were consuming 600MB of memory and climbing. The queue had thousands of unprocessed messages backed up. The failure transport (never monitored) contained over 400 entries, some of them weeks old. And because product availability messages were sharing a transport with payment confirmation messages, a slow availability handler was blocking payment confirmations for customers trying to check out.
No single thing was catastrophically wrong. The problems were configuration decisions that looked reasonable in isolation and fell apart under load. We fixed them: restructured the transports, configured proper retry strategy, added memory limits, set up failure transport monitoring, separated handler concerns. The instability went away.
This is a writeup of what we changed and why. It assumes you know Symfony. It is not a getting-started guide.
Your Message Classes Will Outlive Your Code
Most developers treat message classes like DTOs: a constructor, some readonly properties, done. That works fine until you deploy a change, a hundred messages with the old payload are still sitting in the queue, and deserialization starts throwing.
A message class is not a DTO. It is a versioned contract between two processes that may be running different code. Unlike a DTO, you can't just change it and move on.
Versioning is not optional, it is inevitable. Adding a required constructor parameter to a message class will fail deserialization for any messages already sitting in the queue with the old payload. You have two options: make new parameters optional with a default, or version the class explicitly.
Optional parameters are fine for purely additive changes:
final class SendInvoice
{
public function __construct(
public readonly int $orderId,
public readonly ?string $locale = null, // added in v2, defaults gracefully
) {}
}
For anything that changes the meaning of the message, not just the signature, version it explicitly:
// Keep SendInvoice alive until its queue is fully drained.
// Only remove it in a follow-up deployment.
final class SendInvoiceV2
{
public function __construct(
public readonly int $orderId,
public readonly string $locale,
public readonly string $templateId,
) {}
}
The deployment sequence matters: first deploy SendInvoiceV2 alongside both handlers, wait for the SendInvoice queue to drain completely, then remove the old class in a separate deployment. During the drain window you need both handlers live: SendInvoiceHandler consuming the old queue, SendInvoiceV2Handler consuming new dispatches. Do not remove the old handler before the old queue is empty. This is a two-deployment operation, not one.
Skip the drain step and you get deserialization exceptions that are impossible to reproduce locally, appearing hours after a deployment when an old worker picks up a message for a class that no longer exists.
Sometimes the drain window is not an option. If the queue has tens of thousands of messages and the client needs the change deployed now, you have two paths. The first is a compatibility shim: keep the old message class but make it deserialize into the new handler by implementing a custom DenormalizerInterface that maps old payloads to the new shape. It is extra code but it eliminates the drain dependency entirely. The second is a forced drain: temporarily scale workers up aggressively to exhaust the queue fast, then deploy. Neither is elegant, but both are better than deploying a breaking change against a live queue and discovering the consequences at 3am.
The secondary concern, and it matters just as much, is entity state drift. Don't put Doctrine entities in messages. By the time the handler runs, the order that was pending when the message was dispatched might now be cancelled. Fetch fresh state in the handler, carry only the intent-critical values that must reflect the moment of dispatch:
// This looks convenient. It will eventually cause a subtle bug.
final class ProcessRefund
{
public function __construct(public readonly Order $order) {}
}
// This forces the handler to work with current reality,
// while preserving the amount the customer was actually charged.
final class ProcessRefund
{
public function __construct(
public readonly int $orderId,
public readonly string $currency,
public readonly int $amountCents,
) {}
}
Currency and amount are explicit not because they cannot be fetched from the order (they can), but because they represent the intent at the moment of dispatch. If the order amount is corrected between dispatch and consumption, the refund should reflect what was charged, not what was later edited.
Transport Configuration: The Defaults Will Eventually Let You Down
In the application we inherited, all messages shared a single doctrine:// transport. Product availability updates, price syncs, order confirmations, invoice generation: all in the same queue, all consuming from the same database table under load. That is not a configuration mistake exactly. It is what you get when you follow the getting-started guide and never revisit it.
The Doctrine transport is recommended because it requires no additional infrastructure. Fair enough. But it uses SELECT ... FOR UPDATE SKIP LOCKED to claim messages. Multiple workers polling the same table means database load that scales with worker count, not message volume. Under the kind of write pressure we were seeing (external systems pushing thousands of product updates per hour), the lock contention was visible in slow query logs and was adding latency to the same database handling customer requests.
If you are on PostgreSQL, switch to the native PostgreSQL transport:
MESSENGER_TRANSPORT_DSN=postgresql://user:password@localhost/dbname
It uses LISTEN/NOTIFY. Workers sleep until a message arrives instead of polling. The migration from doctrine:// is a drop-in replacement. There is no good reason to stay on the polling transport if you are already on PostgreSQL.
Turn off auto_setup in production:
transports:
async:
dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
options:
auto_setup: false
auto_setup: true creates the messenger_messages table the first time a message is dispatched. Convenient in development. In production, your deployment process should own schema changes. Create the table via a Doctrine migration and turn auto_setup off outside local development.
For Redis, set stream_max_entries. Redis Streams are append-only by default and grow indefinitely without trimming:
MESSENGER_TRANSPORT_DSN=redis://localhost:6379/messages/symfony/consumer?stream_max_entries=2000
The number depends on your throughput and how much recent history you want visible for debugging. We use 2000 in our e-commerce setups. Enough to see recent activity, bounded enough that memory usage stays predictable.
Separate transports by concern. This was the most impactful single change we made to the inherited application. Payment confirmations and product availability updates have completely different latency requirements, different failure modes, different acceptable retry windows. One slow handler type on a shared transport holds up everything else:
framework:
messenger:
transports:
catalog_sync:
dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
options:
queue_name: catalog_sync
orders_high:
dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
options:
queue_name: orders_high
routing:
App\Message\UpdateProductAvailability: catalog_sync
App\Message\UpdateProductPrice: catalog_sync
App\Message\SendOrderConfirmation: orders_high
App\Message\SendInvoice: orders_high
Give each transport a dedicated worker. A slow catalog sync handler on a shared transport was the direct cause of delayed payment confirmations in the system we inherited. Separate workers, separate queues, separate failure modes:
# Run dedicated workers per transport
php bin/console messenger:consume catalog_sync --memory-limit=128M --time-limit=3600
php bin/console messenger:consume orders_high --memory-limit=256M --time-limit=3600
Retry Strategy: The Defaults Are Too Aggressive
Three retries. One second delay. Multiplier of two. Retries at 1, 2, and 4 seconds. Total window: about 7 seconds before a message lands in the failure transport.
Consider what that covers: a service restarting after a deployment typically needs 20 to 60 seconds. A rate-limited external API may respond with 429 for several minutes. A transient network partition can last longer than 7 seconds. The default retry window is too short for anything involving an external dependency, and in a system receiving constant feed updates from external systems, external dependencies are everywhere.
A more realistic starting point:
transports:
catalog_sync:
dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
retry_strategy:
max_retries: 5
delay: 5000 # 5 seconds
multiplier: 3
max_delay: 300000 # cap at 5 minutes
Retries at 5s, 15s, 45s, 2min 15s, and then 5min (capped by max_delay). Without the cap, retry 5 would be 405 seconds. The max_delay brings it down to a sensible ceiling. That spread covers most transient failures without waiting so long that the failure transport fills up during a routine dependency outage.
Classify your failures explicitly. This is where most Messenger implementations waste their retry budget: retrying errors that will never resolve, burning through retries in seconds, flooding the failure transport with messages that should have been discarded immediately:
#[AsMessageHandler]
final class SyncProductAvailabilityHandler
{
public function __construct(
private readonly ProductRepository $productRepository,
private readonly WarehouseApiClient $warehouseClient,
private readonly LoggerInterface $logger,
) {}
public function __invoke(UpdateProductAvailability $message): void
{
$product = $this->productRepository->findBySku($message->sku);
if ($product === null) {
// The product no longer exists in our catalog.
// Retrying will not change that. Discard immediately.
throw new UnrecoverableMessageHandlingException(
sprintf('Product SKU %s not found, discarding availability update', $message->sku)
);
}
try {
$availability = $this->warehouseClient->getAvailability($message->sku);
$product->updateAvailability($availability);
$this->productRepository->save($product);
} catch (WarehouseApiRateLimitException $e) {
// The API will accept this later. Worth retrying.
throw new RecoverableMessageHandlingException(
'Warehouse API rate limited, will retry',
previous: $e
);
} catch (WarehouseApiProductNotFoundException $e) {
// The warehouse does not know this SKU either. Retrying is pointless.
throw new UnrecoverableMessageHandlingException(
sprintf('SKU %s unknown to warehouse API', $message->sku),
previous: $e
);
}
}
}
UnrecoverableMessageHandlingException bypasses the retry strategy entirely and goes straight to the failure transport. RecoverableMessageHandlingException forces a retry even for exception types Messenger would not normally retry. In the application we inherited, unclassified exceptions from a warehouse API that was frequently rate-limiting were consuming the entire retry budget in 7 seconds, then flooding the failure transport. Classifying them correctly, and extending the retry window, reduced failure transport entries by roughly 80% without changing a single line of handler logic.
Handlers Must Be Idempotent
We found this out the hard way. When we scaled the catalog workers from one process to three, two handlers ran simultaneously on the same SKU. Both read the current stock figure of 1. Both decremented it. We ended up with -1 inventory on a product that had just sold its last unit. The handlers had been written assuming sequential execution. Three parallel workers made that assumption false.
A message can be delivered more than once under normal operating conditions. Not just from bugs. A worker can process a message successfully and crash before sending the acknowledgement. The transport redelivers. On a Redis or AMQP transport with manual acknowledgement, a network hiccup between the handler completing and the ack being sent means the broker considers the message undelivered and requeues it. In a high-volume system this is not an edge case. It is routine.
If running a handler twice produces different outcomes (decremented stock, duplicate charge, second confirmation email), you have a correctness problem that no amount of retry configuration will fix.
For catalog sync the fix was straightforward: replace the decrement with an absolute write. The warehouse feed sends the current stock figure, not a delta. Writing it twice is harmless. That is natural idempotency and it is the easiest kind to achieve: design the operation as a set rather than an increment wherever the domain allows it.
For payment handlers the domain does not allow it. A charge is inherently an increment. The fix there is an idempotency key that derives from the business event, not from the dispatch:
// The key is generated when the payment intent is created, before dispatch.
// It is stable: the same order at the same version always produces the same key.
// A double-submit or a redelivery carries the same key. Both are caught.
final class ProcessPayment
{
public function __construct(
public readonly int $orderId,
public readonly string $idempotencyKey, // e.g. "payment-{orderId}-{orderVersion}"
) {}
}
A UUID generated at dispatch time does not work. If the controller dispatches twice (double form submission, request timeout with retry), each dispatch generates a new UUID and both pass the idempotency check. The key must be stable across any number of deliveries of the same logical event:
#[AsMessageHandler]
final class ProcessPaymentHandler
{
public function __invoke(ProcessPayment $message): void
{
if ($this->paymentRepository->existsByIdempotencyKey($message->idempotencyKey)) {
$this->logger->info('Duplicate payment message, skipping', [
'idempotency_key' => $message->idempotencyKey,
'order_id' => $message->orderId,
]);
return;
}
// ... process payment, store record with idempotency key
}
}
The database-level uniqueness constraint on idempotency_key is the actual safety net. The existsByIdempotencyKeycheck is an optimisation that avoids calling the payment gateway unnecessarily. Without the constraint, two concurrent deliveries can both pass the check before either has written the record, and both attempt the charge.
Design for idempotency when you write the handler, not after you find negative inventory in a live catalog.
Memory Leaks: Long-Running Processes Require Different Habits
Workers are long-running PHP processes and PHP accumulates memory. Leaks that do not matter in a 50ms request lifecycle compound over hours in a worker. In the application we took over, workers had no memory limits and no restart schedule. By morning they were consuming 600MB each and climbing. The server was struggling and workers with no ceiling were a direct contributor.
The primary source is Doctrine. The entity manager accumulates every object it loads in its identity map and never releases them. In a worker processing thousands of messages per hour, each loading several entities, the identity map grows without bound.
Symfony's ResetServicesMiddleware handles this by calling $container->reset() after each message, which clears the identity map and resets other stateful services. It is in the default middleware stack. Don't remove it, and verify it is present if you have customized the stack.
The second source is your own code: static properties, unbounded service-level caches, third-party libraries holding references. Profile worker memory over time. Log memory_get_usage(true) with each message, graph it over an hour, watch for growth that does not plateau after the first few messages.
The operational fix regardless of whether you have traced every leak:
php bin/console messenger:consume async --memory-limit=256M --time-limit=3600
--memory-limit causes the worker to finish the current message and exit cleanly when it crosses the threshold. --time-limit restarts it after an hour regardless of memory. This catches slow drifts that ResetServicesMiddleware does not cover. Both are non-negotiable in production.
[program:messenger-catalog-worker]
command=php /var/www/app/bin/console messenger:consume catalog_sync --memory-limit=128M --time-limit=3600
user=www-data
numprocs=3
autostart=true
autorestart=true
process_name=%(program_name)s_%(process_num)02d
stderr_logfile=/var/log/messenger-catalog.err.log
stdout_logfile=/var/log/messenger-catalog.out.log
[program:messenger-orders-worker]
command=php /var/www/app/bin/console messenger:consume orders_high --memory-limit=256M --time-limit=3600
user=www-data
numprocs=2
autostart=true
autorestart=true
process_name=%(program_name)s_%(process_num)02d
stderr_logfile=/var/log/messenger-orders.err.log
stdout_logfile=/var/log/messenger-orders.out.log
The catalog worker runs three processes because catalog sync is high volume and the handlers are fast. The orders worker runs two: lower volume but more critical, and handlers call external APIs that occasionally block.
Graceful Shutdown: SIGTERM, Not SIGKILL
Workers respond to SIGTERM by finishing the current message and exiting cleanly. The message is acknowledged, the worker stops. SIGKILL is not graceful. A message mid-processing is not acknowledged, and depending on your transport's visibility timeout it may be redelivered or lost.
This matters at deployment time. If your deployment pipeline stops workers with SIGKILL (the default for many container orchestration setups if you are not explicit), you will occasionally interrupt a handler mid-execution. In an e-commerce context, that might mean a payment is partially processed, an inventory update is half-written, or an invoice is generated but not sent.
In Kubernetes, set terminationGracePeriodSeconds long enough for your longest-running handler to complete:
spec:
terminationGracePeriodSeconds: 60
Supervisor uses SIGTERM by default and waits for the process to exit before restarting, so it handles this correctly out of the box.
The piece most deployment guides skip is how workers restart during a code deployment. Supervisor will restart a worker automatically when it exits, but you need the workers running the new code, not stuck processing messages with the old codebase. The reliable approach is to run messenger:stop-workers as a deployment step after your code is on disk but before traffic shifts:
# In your deployment script, after code deploy and cache warmup:
php bin/console messenger:stop-workers
This sets a cache flag that tells each worker to finish its current message and exit cleanly. Supervisor restarts them, they pick up the new code, and the transition happens without a forced kill or a gap in processing. If you are on Kubernetes and running workers as a separate Deployment, a rolling restart achieves the same thing, but only if terminationGracePeriodSeconds is long enough for the current message to finish before the pod is replaced.
The Failed Transport: Your Production Safety Net
Every message that exhausts its retries ends up in the failure transport. If you have not configured one, they are dropped silently. Configure it:
framework:
messenger:
failure_transport: failed
transports:
failed:
dsn: 'doctrine://default?queue_name=failed'
The 400 entries we found in the inherited application's failure transport when we took it over (some of them weeks old) were the direct result of nobody monitoring this queue. Product availability for discontinued SKUs, orders referencing deleted customers, payment messages for cancelled subscriptions. Most were legitimately unrecoverable. Some were failures caused by a bug that had since been fixed. Without monitoring, nobody knew.
Treat the failure transport as a production incident queue. We wrap messenger:failed:show in a cron job, pipe the count into our monitoring platform, and page on-call if it exceeds a threshold. The specific tooling does not matter. Growth in the failure transport should be investigated the same day, not discovered during a quarterly review.
Failed messages go through the full middleware stack again when you replay them. If the failure was a handler bug you have since fixed, replaying is safe. If it was a transient infrastructure issue that has resolved, same. If it was data-related (entity deleted, third-party account closed), replaying will produce another UnrecoverableMessageHandlingExceptionand the message will be cleaned up. Inspect before replaying blindly:
php bin/console messenger:failed:show --max=20
php bin/console messenger:failed:retry 42 # specific message
php bin/console messenger:failed:retry # all: use with caution
Use separate failure transports per transport. Mixing catalog sync failures with order failures in the same queue makes triage slower than it needs to be:
transports:
catalog_sync:
dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
failure_transport: failed_catalog
orders_high:
dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
failure_transport: failed_orders
failed_catalog:
dsn: 'doctrine://default?queue_name=failed_catalog'
failed_orders:
dsn: 'doctrine://default?queue_name=failed_orders'
The Doctrine Transaction Gotcha That Silently Eats Messages
If you have customized the Messenger middleware stack and your messages are mysteriously ending up in the failure transport with "entity not found" errors that cannot be reproduced locally, check whether DoctrineTransactionMiddleware is still in your chain. Its absence is one of the harder failure modes to diagnose because nothing obviously breaks: the message is dispatched, the worker picks it up, the handler fails cleanly, and the retry clock starts. By the time the transaction that created the entity commits, the message has exhausted its retries.
The middleware holds transport sends until after the transaction commits. Remove it and dispatch happens immediately, before the row exists. Verify it is present if you have a custom chain:
framework:
messenger:
buses:
command.bus:
middleware:
- doctrine_transaction
The second scenario bites in a different way. If your handler dispatches a follow-up message and the outer handler later throws, rolling back the transaction, that inner message has already been sent. The follow-up handler runs against data that was never committed. No error, no indication anything is wrong. Just work being done against a row that does not exist.
DispatchAfterCurrentBusStamp defers the inner dispatch until the outer handler completes without throwing:
#[AsMessageHandler]
final class PlaceOrderHandler
{
public function __invoke(PlaceOrder $message): void
{
// ... write order to database inside a transaction
// Not dispatched until this handler returns successfully.
// Transaction rollback = this message never leaves the bus.
$this->bus->dispatch(
new SendOrderConfirmation($message->orderId),
[new DispatchAfterCurrentBusStamp()]
);
}
}
Use this stamp any time you dispatch from inside a handler. The cases where you don't need it are rarer than the cases where you do.
Stamps Worth Knowing Beyond DelayStamp
DelayStamp is well-documented. Three others are just as useful and rarely appear in tutorials.
TransportNamesStamp overrides routing at dispatch time. In our agency work, premium merchants get their order processing routed to a dedicated high-priority transport regardless of what the YAML routing says. Encoding that in YAML would mean a new routing rule for every merchant tier. Encoding it at the dispatch site means the business logic lives where the decision is made:
$stamps = $merchant->isPremium()
? [new TransportNamesStamp(['orders_premium'])]
: [];
$this->bus->dispatch(new ProcessOrder($orderId), $stamps);
RedeliveryStamp tracks retry count. Access it in your handler when retry state should influence behaviour, for example to escalate to a manual review queue after several failed attempts:
public function __invoke(SyncProductToErp $message, Envelope $envelope): void
{
$retries = $envelope->last(RedeliveryStamp::class)?->getRetryCount() ?? 0;
if ($retries >= 3) {
$this->operationsQueue->escalate($message->productId, $retries);
return;
}
$this->erpClient->syncProduct($message->productId);
}
Add Envelope $envelope as a second parameter to __invoke and Messenger injects it automatically.
SentToFailureTransportStamp marks messages that have entered the failure transport. If your monitoring middleware counts all handled messages, use this to separate replays from first-time dispatches. Otherwise your metrics count failure transport replays as new work.
Multiple Buses: Earn the Complexity
A single bus is the right default. Add a second bus when you need a different middleware stack, not because CQRS says you should. Those are different reasons.
We learned this the hard way on a project where we introduced a command bus and query bus early, convinced the architectural separation would pay off. It did not. Both buses had identical middleware stacks for the first eight months. The only effect was that every service needed two injected buses, every new developer asked why, and the answer was "for the architecture". That is not a good answer.
We now add a query bus only when the middleware difference is concrete. In our current agency setup, the command bus carries doctrine_transaction middleware and the query bus does not. That difference is real and measurable. Reads do not need transaction overhead:
framework:
messenger:
default_bus: command.bus
buses:
command.bus:
middleware:
- doctrine_transaction
query.bus: ~
final class ProductService
{
public function __construct(
#[Target('command.bus')] private readonly MessageBusInterface $commandBus,
#[Target('query.bus')] private readonly MessageBusInterface $queryBus,
) {}
}
If you cannot name a concrete difference in what each bus does, you do not need two buses yet.
Testing: What Goes Where
Test handlers as plain PHP classes. Inject mocks, call __invoke, assert the outcome. No Messenger infrastructure:
final class SyncProductAvailabilityHandlerTest extends TestCase
{
public function testUpdatesAvailabilityForKnownProduct(): void
{
$product = new Product(sku: 'ABC-123', availability: 10);
$repository = $this->createMock(ProductRepository::class);
$repository->method('findBySku')->with('ABC-123')->willReturn($product);
$repository->expects($this->once())->method('save')->with($product);
$warehouseClient = $this->createMock(WarehouseApiClient::class);
$warehouseClient->method('getAvailability')->with('ABC-123')->willReturn(25);
$handler = new SyncProductAvailabilityHandler(
$repository,
$warehouseClient,
$this->createMock(LoggerInterface::class)
);
$handler(new UpdateProductAvailability(sku: 'ABC-123'));
self::assertSame(25, $product->getAvailability());
}
public function testThrowsUnrecoverableForUnknownProduct(): void
{
$repository = $this->createMock(ProductRepository::class);
$repository->method('findBySku')->willReturn(null);
$handler = new SyncProductAvailabilityHandler(
$repository,
$this->createMock(WarehouseApiClient::class),
$this->createMock(LoggerInterface::class)
);
$this->expectException(UnrecoverableMessageHandlingException::class);
$handler(new UpdateProductAvailability(sku: 'UNKNOWN'));
}
}
Use the in-memory:// transport in functional tests to verify that the right message was dispatched in response to an action. If you are following the transport separation from earlier in this article, map each named transport to in-memory:// in your test configuration:
# config/packages/test/messenger.yaml
framework:
messenger:
transports:
catalog_sync: 'in-memory://'
orders_high: 'in-memory://'
// Assert that placing an order dispatched a confirmation message
// to the orders_high transport, not to catalog_sync, not missing entirely.
/** @var InMemoryTransport $transport */
$transport = self::getContainer()->get('messenger.transport.orders_high');
self::assertCount(1, $transport->getSent());
self::assertInstanceOf(SendOrderConfirmation::class, $transport->getSent()[0]->getMessage());
self::assertSame(42, $transport->getSent()[0]->getMessage()->orderId);
The functional test asserts routing and dispatch. The handler test asserts business logic. They cover different things and should not bleed into each other.
Async Is Not an Upgrade
Use async because it feels sophisticated and you will be living with the consequences for a long time.
In e-commerce specifically, async has costs that are easy to underestimate. An availability update sitting in a queue for 30 seconds is 30 seconds during which a customer can add an out-of-stock product to their cart, proceed through checkout, and reach payment before the handler has run. You have not just delayed a database write. You have created an oversell window. The async layer that was supposed to protect the application under load has introduced a consistency gap directly in the checkout flow.
The same applies to pricing. A price update dispatched asynchronously means there is a window, however brief, where the displayed price and the stored price disagree. For most products that is tolerable. For a flash sale that starts at midnight, it is not.
These are not arguments against Messenger. They are arguments for being deliberate about which operations can afford eventual consistency and which cannot. Availability and pricing updates that feed from external systems are genuinely good candidates for async. The volume is too high for synchronous handling and the latency is bounded. But the decision should be made explicitly, with the consistency window understood, not by defaulting to async because the infrastructure is already in place.
In the application we fixed, async was the right call. The volume of external feed updates was genuinely incompatible with synchronous handling under customer traffic. But the correct Messenger setup for that application required transport separation, failure monitoring, idempotent handlers, memory-bounded workers, and a retry strategy matched to the actual failure modes. The component was already installed. None of that was configured.
That gap, between "Messenger is installed" and "Messenger is working correctly under production conditions", is what this article is about.
What We Actually Delivered
None of the fixes were exotic. The Symfony Messenger component was already doing exactly what it was configured to do. That was the problem.
A single shared transport meant catalog sync volume could starve order processing. Separating them meant the payment confirmation backlog cleared within minutes. Not because anything got faster, but because the two workloads stopped competing for the same queue.
Workers without memory limits ran until the server complained. The 600MB figure was not a memory leak in the traditional sense. It was Doctrine's identity map accumulating every loaded entity in a process that had been running for days. --memory-limit and --time-limit brought it down to a stable 80-120MB range.
The 400 failure transport entries were the most instructive part. Roughly a third were genuinely unrecoverable: deleted products, closed merchant accounts, SKUs the warehouse had never heard of. They had been burning through retries in 7 seconds and sitting in the failure queue unnoticed for weeks. Proper exception classification would have discarded them on the first attempt. The remaining two thirds were transient failures: rate limits, a deployment window where an upstream service was briefly unreachable. A wider retry window would have resolved all of them without ever touching the failure transport.
It was not all clean. When we separated the transports and restarted the workers, we discovered that three handler classes were not idempotent. That had never mattered with a single slow worker, but became obvious immediately when three catalog workers started running in parallel. The first sign was a product showing -1 stock in the catalog. Two availability handlers had picked up the same SKU simultaneously. Both queried the current stock figure of 1. Both decremented it. Neither knew the other existed. The race had always been possible in theory; the single-worker setup had just made it vanishingly unlikely in practice.
We had to stop the catalog workers, audit all handler classes that touched shared state, add idempotency keys to the messages, add uniqueness constraints at the database level, and redeploy before restoring full worker concurrency. It took most of a day. The fix itself was not complicated: switch from decrements to absolute writes for the availability handlers, add the idempotency key pattern for the two handlers where that was not possible. Finding every affected handler and being confident we had not missed one was the slow part. This is exactly the kind of thing you want to discover in a staging environment under a load test, not on a live catalog at 11am on a Wednesday.
That complication aside, the application has been stable under the same traffic and feed volume ever since. The queue depth stays low. The failure transport gets a handful of entries per week, all unrecoverable, all expected.
That is what a well-configured Messenger setup looks like. Not invisible, but quiet.