Tagged in: Typo

Postmortem for outage of us-east-1 – Joyent

 

Link to Original Report

We would like to share the details on what occurred during the outage on 5/27/2014 in our us-east-1 datacenter, what we have learned, and what actions we are taking to prevent this from happening again. On behalf of all of Joyent, we are extremely sorry for this outage, and the severe inconvenience it may have caused to you, and your customers.

Background

In order to understand the event, first we need to explain a few basics about the architecture of our datacenters. All of Joyent’s datacenters run our SmartDataCenter product, which provides centralized management of all administrative services, and compute nodes (servers) used to host customer instances. The architecture of the system is built such that the control plane, which includes both the API and boot sequences, is highly-available within a single datacenter and survives any two failures. In addition to this control plane stack, every server in the datacenter has a daemon on it that responds to normal, machine generated requests for things like provisioning, upgrades, and changes related to maintenance.

Continue reading…