(London) The company said the global blackout that took Facebook and its other platforms offline earlier this week was caused by an error during normal maintenance.
Santosh Janardhan, Vice President of Infrastructure for Facebook, said in a blog post that the disruption of Facebook, Instagram and WhatsApp was “not caused by malicious activity, but by mistake on our part.”
The problem arose as engineers were performing daily maintenance on Facebook’s backbone network: computers, routers, and software in its data centers around the world and the fiber-optic cables that connect them.
“In one of these simple maintenance tasks, an order was issued to assess the availability of global capacity, which inadvertently disrupted all connections in our spine, effectively disconnecting centers. Data from Facebook worldwide,” Janardhan said Tuesday.
Janardhan explained that Facebook systems were designed to detect such errors, but in this case, the bug in the audit tool prevented the command from stopping properly.
This change also triggered a second problem, which made matters worse by making it impossible to access Facebook servers, even when they were running.
Engineers quickly fixed the problem on site, but it took some time due to additional security layers, Janardhan said. Data centers are “hard to reach, and once inside, the hardware and routers are designed to be difficult to change even when you have physical access.”
After connectivity is restored, services are gradually restarted to avoid an increase in traffic, which can lead to further disruptions.