At 2:16 AM Tuesday, Coordinated Universal Time (UTC), we received notifications of SSO failures.
At 2:38 AM UTC we identified the cause. A worker queue processing job requests encountered a combination of a failed job and a high level of requests. While the failed job was clearing, the volume of requests created a processing delay. This was isolated to a single customer’s job queue and did not impact any other customers.
At 2:42 UTC we cleared the failed job and resumed request processing. We confirmed logins were restored for the customer, and monitored to ensure the queue completed all outstanding tasks.
We apologize for the inconvenience. Our team is reviewing working task clearing at high volume moments to mitigate future events such as this.