On Monday 7 December 2015, Google Container Engine customers could not
create external load balancers for their services for a duration of 21
hours and 38 minutes. If your service or application was affected, we
apologize — this is not the level of quality and reliability we strive to
offer you, and we have taken and are taking immediate steps to improve the
platform’s performance and availability.
DETAILED DESCRIPTION OF IMPACT:
From Monday 7 December 2015 15:00 PST to Tuesday 8 December 2015 12:38 PST, Google Container Engine customers could not create external load balancers for their services. Affected customers saw HTTP 400 “invalid argument” errors when creating load balancers in their Container Engine clusters. 6.7% of clusters experienced API errors due to this issue.
The issue also affected customers who deployed Kubernetes clusters in the
Google Compute Engine environment.
The issue was confined to Google Container Engine and Kubernetes, with no
effect on users of any other resource based on Google Compute Engine.
Google Container Engine uses the Google Compute Engine API to manage
computational resources. At about 15:00 PST on Monday 7 December, a minor update to the Compute Engine API inadvertently changed the case-sensitivity of the “sessionAffinity” enum variable in the target pool definition, and this variation was not covered by testing. Google Container Engine was not aware of this change and sent requests with incompatible case, causing the Compute Engine API to return an error status.
REMEDIATION AND PREVENTION:
Google engineers re-enabled load balancer creation by rolling back the
Google Compute Engine API to its previous version. This was complete by 8
December 2015 12:38 PST.
At 8 December 2015 10:00 PST, Google engineers committed a fix to the
Kubernetes public open source repository.
Google engineers will increase the coverage of the Container Engine
continuous integration system to detect compatibility issues of this kind.
In addition, Google engineers will change the release process of the
Compute Engine API to detect issues earlier to minimize potential negative