Network Issues within London Datacenter-Linode

Incident Report

On January 31, 2015, at approximately 2:05 AM EST until 3:30 AM EST, a subset of London Linodes experienced packet loss on all of their network communication. Network and operations engineers were immediately contacted and troubleshooting began approximately 20 minutes after the start of the disruption, once Linode’s engineers were briefed on the symptoms and facts that were known at the time.

An older generation switch was identified that had a malfunctioning transceiver module. Under normal conditions, the full 1+1 hardware redundancy within the London network fabric would have isolated this failure without any functional impact. However, this transceiver module had not failed completely; rather, the module was experiencing severe voltage fluctuation, causing it to ‘flap’ in an erratic manner. We believe that the partial failure of this transceiver module allowed this specific switch to confuse and bypass our network’s normal loop guard mechanism (“RSTP”, or “rapid spanning-tree protocol”).

In order to fully mitigate the loop, a line card in the core router that this switch port was connected to was shut down, and the malfunctioning switch port was administratively disabled. Immediately after this, network conditions in the London datacenter recovered.

The malfunctioning transceiver module has been replaced, and full network connectivity has been restored. We apologize for any impact that this network disruption has caused.

Link to Original Report