Tagged in: Load Balancer
This morning around 7:20am Pacific time, several platform EC2 instances began failing and our load balancer began returning 503 errors. Ordinarily our scaling configuration would terminate and replace unhealthy instances, but for an as-yet-undetermined reason all instances became unhealthy and were not replaced. This caused a widespread outage that lasted for nearly two hours.
On August 25, 2014 there was an outage of all Stack Exchange sites (Q&A sites as well as Careers) from 7:26 pm to 7:32 pm UTC (approximately 6 minutes). The cause was an incorrect change to network firewall configuration – specifically, iptables running on our HAProxy load balancers.
Renaming this bug as it really has nothing to do with GMail specifically. To clarify: