We’d like to share more about the service disruption which occurred last Friday night, June 29th, in one of our Availability Zones in the US East-1 Region. The event was triggered during a large scale electrical storm which swept through the Northern Virginia area. We regret the problems experienced by customers affected by the disruption and, in addition to giving more detail, also wanted to provide information on actions we’ll be taking to mitigate these issues in the future. Continue reading…
Tagged in: AWS
High API latency caused by multiple issues in AWS including SQS API errors
AWS has declared their SQS issues resolved. DynamoDB is mostly fixed – AWS is still throttling requests, but not to a level that impacts our service. We are closing this incident.
Summary of the Amazon DynamoDB Service Disruption and Related Impacts in the US-East Region – Amazon
Early Sunday morning, September 20, we had a DynamoDB service event in the US-East Region that impacted DynamoDB customers in US-East, as well as some other services in the region. The following are some additional details on the root cause, subsequent impact to other AWS services that depend on DynamoDB, and corrective actions we’re taking.
From 20:30 to 23:50 UTC on 5 September, there were a number of problems on PythonAnywhere. Our own site, and those of our customers, were generally up and running, but were experiencing intermittent failures and frequent slowdowns. We’re still investigating the underlying cause of this issue; this blog post is an interim report.
Link to Original Report
Now that we have fully restored functionality to all affected services, we would like to share more details with our customers about the events that occurred with the Amazon Elastic Compute Cloud (“EC2”) last week, our efforts to restore the services, and what we are doing to prevent this sort of issue from happening again. We are very aware that many of our customers were significantly impacted by this event, and as with any significant service issue, our intention is to share the details of what happened and how we will improve the service for our customers.
Starting last Thursday, Heroku suffered the worst outage in the nearly four years we’ve been operating. Large production apps using our dedicated database service may have experienced up to 16 hours of operational downtime. Some smaller apps using shared databases may have experienced up to 60 hours of operational downtime. Code deploys were unavailable across some parts of the platform for almost 76 hours – over three days. In short: this was an absolute disaster.