2020-June-29 Service Incident
Postmortem

Dates:

Monday, June 29, 11:05 - 13:55 EDT

What happened:

Our US East data center was unavailable for automated testing and REST API calls. 

Why it happened:

Our public cloud provider experienced a zone outage that impacted one of the services responsible for state persistence.

How we fixed it:

We removed the dependency on the unavailable resource.

What we are doing to prevent it from happening again:

We are moving to a new high availability service that includes fault tolerance for components and services responsible for state persistence.

Posted Jul 24, 2020 - 04:45 EDT

Resolved
All our Headless services have recovered successfully. All services are now fully operational
Posted Jun 29, 2020 - 14:27 EDT
Update
All Headless services including Headless UI, Headless REST API and Headless Sauce Connect tunnels continue to be affected.

The services are impacted by an incident that our Headless infrastructure provider is currently experiencing: https://status.cloud.google.com/incident/cloud-networking/20005
Posted Jun 29, 2020 - 13:21 EDT
Update
Headless Testing UI, REST API and Sauce Connect tunnels are also affected. We continue investigating.
Posted Jun 29, 2020 - 12:11 EDT
Investigating
Automated Headless tests are not starting. We are investigating.
Posted Jun 29, 2020 - 11:41 EDT