AWS Frankfurt experiences major incident that staff couldn’t fix it due to ‘environmental conditions’ on data centre floor
A single Availability Zone in Amazon Web Services’ EU-CENTRAL-1 Region has experienced a major incident.
The company’s status page says the incident says the incident began at 1:24PM PDT (8:30PM UTC) on June 10th and initially caused “connectivity issues for some EC2 instances”.
Half an hour later AWS reported “increased API error rates and latencies for the EC2 APIs and connectivity issues for instances … caused by an increase in ambient temperature within a subsection of the affected Availability Zone.”
By 2:36PM PDT AWS said temperatures were falling but that network connectivity remained down.
But an hour later, the cloud colossus offered the following rather unsettling assessment:
A 4:12PM update reported that staff were still unable to enter the site for safety reasons.
At 4:33PM network services were restored, an event AWS said should lead to swift resumption of EC2 instances. A 5:19PM update stated: “environmental conditions within the affected Availability Zone have now returned to normal level” and advised users that “The vast majority of affected EC2 instances have now fully recovered but we’re continuing to work through some EBS volumes that continue to experience degraded performance.”
Kinesis Data Streams, Kinesis Firehose, Amazon Relational Database Service, and AWS CloudFormation also wobbled.
AWS’ most recent status update concluded: “We will provide further details on the root cause in a subsequent posts, but can confirm that there was no fire within the facility.”
Which leaves the question of just what made the data centre too dangerous to enter?
The whole point of hypoxic gas release into data centres is to deprive fires of oxygen. And as humans need oxygen, it can be a while before engineers can return to a data centre.
The Register mentions this as it fits the facts offered in this incident, and with AWS’ language about “environmental conditions” preventing entry.
We will update this story if new information about this incident comes to hand. ®
via The Register https://ift.tt/3gv0n1Y
June 10, 2021 at 06:07PM