Tagged in: CircleCI

Thoughts Evoked By CircleCI’s July 2015 Outage

Interesting article and analysis of the recent post mortem from the CircleCI team. It makes some good points about using a RDBMS as a queueing system…


After having a bit of downtime, CircleCI’s team have been very kind to post a very detailed Post Mortem. I’m a post mortem junkie, so I always appreciate when companies are honest enough to openly discuss what went wrong.

I also greatly enjoy analyzing these things, especially through the complex systems lens. Each one of these posts is an opportunity to learn and to reinforce otherwise abstract concepts.

NOTE: This post is NOT about what the CircleCI team should or shouldn’t have done – hindsight is always 20/20, complex systems are difficult, and hidden interactions actually are hidden. Everyone’s infrastructures are full of traps like the one that ensnared them, and some days, you just land on the wrong square. Basically, that PM made me think of stuff, so here is that stuff. Nothing more.

Continue reading…

Linux build queue backing up – CircleCI

Incident Report for CircleCI

CircleCI is a platform for continuous integration and continuous delivery. We take care of all the low-level details so that you have the simplest, fastest continuous integration and deployment possible.

We are sincerely sorry for the outage that prevented builds from running late Wednesday and early Thursday. We know you rely on us to deploy, and that downtime is painful for you and your customers. We take our responsibility to you very seriously, and we’re sorry we let you down.

Here’s what happened, what we learned, and what actions we’re taking to prevent this from happening again:

What We Saw
Continue reading…

DB performance issue-circleci

CircleCI is a platform for continuous delivery. This means (among other things) we’re building serious distributed systems: hundreds of servers managing thousands of containers, coordinating between all the moving parts, and taking care of all the low-level details so that you have the simplest, fastest continuous integration and deployment possible.

Last Tuesday, we experienced a severe and lengthy downtime, during which our build queue was at a complete standstill. The entire company scrambled into firefighting mode to get the queue unlocked and customer builds moving again. Here’s what happened….

Continue reading…