EU Datacenter AMQP outage

Postmortem Incident Report – AMQP Connectivity

Incident Summary

• Date: January 7, 2026
• Duration: 13:22 – 14:20 CET
• Affected Service: AMQP (RabbitMQ)
• Impact: Partial to full service unavailability during peak traffic hours

On January 7, 2026, we experienced an incident affecting AMQP connectivity. The incident started as a partial degradation and escalated into a full service outage during peak usage hours.

The service was fully restored by 14:20 CET. We sincerely apologize for the disruption and appreciate your patience.

⸻

What Happened

• 13:22 CET – One backend messaging node became unavailable, reducing overall system capacity.
• Some clients experienced intermittent connection issues.
• During peak hours, traffic shifted to the remaining nodes, significantly increasing system load.
• 13:54 CET – AMQP connectivity became unavailable for all customers.
• Recovery procedures were initiated immediately, and service was fully restored by 14:20 CET.

⸻

Customer Impact

• Some customers experienced connection failures or delays earlier in the incident.
• From 13:54 to 14:20 CET, AMQP connectivity was unavailable for all customers.
• Message processing was temporarily interrupted during the outage period.

⸻

Resolution

To restore service as quickly and safely as possible:

• The underlying infrastructure was stabilized.
• The messaging service was restarted using a fresh deployment to minimize recovery time.
• All data volumes were backed up prior to recovery to preserve investigation options.
• Service availability and message flow were verified before reopening access.

The system has been operating normally since recovery.

⸻

Data Integrity

✔️ No data loss has been identified as a result of this incident. Messages were temporarily unavailable while the service was down; normal processing resumed after recovery.

⸻

Corrective & Preventive Measures

We are implementing the following improvements to reduce the likelihood and impact of similar incidents:

• Increased capacity to better handle peak traffic, even during partial component failures.
• Improved early detection of partial service degradation before it escalates.
• Enhanced monitoring and alerting to ensure visibility remains available under high system load.
• Additional safeguards to reduce the impact of reconnect storms during partial outages.

⸻

Apology & Commitment to Improvement

We understand how critical reliable messaging is for your systems and workflows. We deeply regret the impact this incident had and take full responsibility for improving our resilience and detection capabilities.

We are committed to learning from this event and strengthening our systems to prevent similar incidents in the future. Thank you for your continued trust.

Owner: Platform Operations Team

EU Datacenter AMQP outage

Find Your Subscription

Subscribe to Status Updates