The biggest service outages of 2021

We never get tired of saying it: the advantages of the cloud are enormous…until something goes wrong. And it is that for years service outages have ceased to be something that affects only one company, to affect its customers, whether they are final consumers, or other companies that, as a consequence, stop providing service to their own users. .

And although it is true that thanks to the contingency measures that are planned this type of event is becoming less frequent, it is also true that sometimes a human error or a software update that does not work as it should, can leave without service to “half Internet” for hours. 2021 has been no exception in this regard and in the last 12 months we have seen companies such as Meta, Akamai, Microsoft or AWS star in some of the most notorious interruptions.

Goal (October 4 and April 8)

The honor for the biggest incident of 2021 goes to Facebook, which was down for more than seven hours on October 4.

The incident affected all the multinational services, including Facebook, Instagram, WhatsApp and Oculus, in addition to the companies that use Facebook Connect to authenticate users in their respective services.

The issue occurred when a routine maintenance job went wrong and a number of servers were affected. The incident not only affected the company’s public services, but also the tools that employees use to manage those services. As a result, company workers had to physically enter the various data centers to manually reboot systems.

The company also experienced a service outage issue on April 8, with a 40-minute outage.

AWS (December 7)

AWS is not only one of the most trusted public cloud services by businesses, it is the foundation of the service that many others offer to their customers.

So much so that the service outage experienced on December 7 affected the normal operation of devices from companies such as Roomba or Ring, as well as services from streaming such as Disney + and Netflix, due to the problems that the AWS EC2 service began to present in the data centers of the East Coast of the United States.

The outage highlighted the need for companies to monitor the health of all APIs that are part of their applications and help deliver services to their customers.

Fastly (June 8)

Despite the fact that it only maintains a market share of 4% (compared to more than 39% of CloudFare or 24% of Amazon CloudFront), Fastly is one of the essential CDNs (Content Delivery Network) for the proper functioning of the Internet, if we have Keep in mind that it serves more than 100,000 companies, including Reddit, the New York Times, eBay or even Amazon itself.

It is therefore not surprising that the loss of service experienced by this CDN on June 8th began to make headlines around the world, with services being unavailable for over an hour, even when users were redirected to alternate servers.

In a public statement, the company ended up explaining a software deployment introduced a bug that could be triggered by a specific client configuration made under certain circumstancesso unluckily that earlier that day, one of their customers unknowingly introduced a valid configuration change that included those specific circumstances, causing 85% of their network to return errors.

Akamai (June 16 and July 22)

Akamai offers CDN services to thousands of companies around the world and both in terms of market share and number of customers, it is a very similar company to Fastly. In recent months, the company has presented two incidents that are worth taking into account.

The first occurred on June 16, when the company saw how they interrupted Prolexic Routedits mitigation services against DDoS attacks, due to a BGP routing problem, which ended up affecting several of the company’s clients.

The second and more serious incident occurred a month later (July 22) when the affected service was Akamai Edge DNSwhen precisely the DNS that redirects the company’s clients to its CDN were out of operation for more than an hour.

Among the companies affected are big names such as Steam, American Airlines, Fox News or HSBC.

Microsoft Azure Active Directory (December 15)

The last of the major service interruptions that occurred in 2021, had Microsoft as the main protagonist, due to its problems on December 15 when offering its Azure Active Directory service.

Active Directory was down for an hour and a half, preventing users from signing in to Microsoft services like Office 365.

