Certificate expired on discovery.myq.cloud

Resolved

Incident Summary

  • Date: February 17, 2025
  • Duration: 01:00 - 10:41 (GMT+1)
  • Impact:

    • All API requests to the discovery service from mobile apps and MRCs failed due to TLS handshake errors.
    • Users were unable to retrieve endpoint links for AMQP/Web services, leading to service disruptions.

We sincerely apologize for the disruption this caused. We understand the importance of service availability and deeply regret the inconvenience to our users and teams relying on the discovery service.

Resolution

The incident was resolved by generating and deploying a new TLS certificate for discovery.myq.cloud.

Remediation Actions Taken:

Deployed new TLS certificate to restore service.
Added the discovery service to automated certificate management to prevent future expirations.
Integrated certificate expiration monitoring into existing probes to ensure proactive alerts.

Root Cause Analysis

What Happened

  • The TLS certificate for discovery.myq.cloud expired on February 17, 2025, at 01:00 (GMT+1).
  • Mobile apps and MRCs relying on the discovery service failed due to TLS handshake verification errors.
  • This issue persisted until a new certificate was manually issued and deployed.

Why It Happened

  • The discovery service was excluded from our automated certificate management because we use split horizon DNS for local sandbox environments pointing to different server than rest of the world.
  • No automated expiration monitoring was in place for this service, preventing timely renewal.
  • The certificate renewal process was manual and was not documented correctly for quick response.

Impact

  • All API requests to the discovery service failed due to TLS handshake errors.
  • Users were unable to retrieve endpoint links for AMQP/Web services, impacting connectivity.

Corrective & Preventive Measures

Completed Actions (Post-Incident Remediation)

  1. Added the discovery service to automated certificate management to ensure automatic renewal.
  2. Integrated certificate expiration monitoring into existing probes, providing early alerts for upcoming expirations.
  3. Updated internal documentation to reflect accurate renewal steps and service dependencies.

Conclusion

This incident highlighted a gap in automated certificate management for services using split horizon DNS, leading to unexpected downtime. The issue has been permanently resolved by integrating the discovery service into certificate automation and monitoring probes. These improvements ensure early detection and prevention of future certificate expirations.

Owner: Matus Szepe

Resolved

The TLS certificate for discovery.myq.cloud expired on February 17, 2025 around 01:00 (GMT+1). Since then mobile apps and MRCs which tried to use the discovery service to get links to AMQP/Web endpoints for a given region returned errors due to failed verification of the certificate.

Began at:

Affected components
  • Discovery