Dear Headless User, due an unplanned deploy, testing on Headless may be affected with intermittent downtimes, causing some tests to possibly fail. This outage is scheduled for about 3 hours maximum, between 7:00pm PDT - 10:00pm PDT on Thursday, August 8, 2019. Things are expected to be back to normal once the deploy has finished. We apologize for the inconvenience.
2020-November-19 Service Incident
Incident Report for Sauce Labs US East Data Center
Postmortem

Dates:

Tuesday, November 19 11:40 - 13:30 EST

What happened:

We experienced an increased error rate for jobs in the us-west-1, eu-central-1 and us-east-1 datacenters.

Why it happened:

Back pressure from a third party provider caused a delay and build up of requests within the internal service responsible for starting new jobs. This delay caused a crash loop within our service.

How we fixed it:

Calls to the third party provider experience back pressure were disabled and rerouted to different end points that weren’t experiencing delays.

What we are doing to prevent it from happening again:

We’ve modified our services to route traffic via a more resilient path to our third party provider.  We’ve also modified our behaviour to drop messages on a queue without waiting for a response to eliminate the chance that back pressure causes our internal services to become overwhelmed.  In addition, we are going to add additional monitoring and alerting for these subsystems

Posted Nov 24, 2020 - 12:42 EST

Resolved
Error rates have subsided. All services are fully operational
Posted Nov 19, 2020 - 14:39 EST
Monitoring
We have taken remedial action and are seeing improvements. We are monitoring.
Posted Nov 19, 2020 - 13:45 EST
Investigating
We are seeing a high error rate on automated headless tests. We are investigating.
Posted Nov 19, 2020 - 13:02 EST
This incident affected: Headless Testing.