2021-January-23 Service Incident
Postmortem

Dates:

Saturday, January 23, 2021 1:32 - 3:45 EST

What happened:

Customers using Sauce Headless experienced elevated wait times and an increased rate of tests unexpectedly terminating.

Why it happened:

Our Google Kubernetes Engine (GKE) cluster underwent automated upgrades during the incident window, causing services to unexpectedly terminate.

How we fixed it:

The automated cluster upgrade was completed and service returned to normal operations.

What we are doing to prevent it from happening again:

We’re adjusting maintenance windows to reduce customer impact. We’re increasing system robustness by adding graceful shutdown mechanisms and creating a highly available setup for identified services.

Posted Jan 27, 2021 - 15:48 EST

Resolved
Remedial action is finished and error rates are back to normal. All services are fully operational.
Posted Jan 23, 2021 - 15:46 EST
Identified
We have experienced high error rates on our Headless cloud. While error rates have subsided, we continue to take remedial action and continue to monitor error rates.
Posted Jan 23, 2021 - 14:24 EST