5 minute read

2021: The year the centralised cloud hit breaking point

CUDO Ventures

Dec 24, 2021, 8:51 AM

Hard as it may be to believe, the holiday season is already upon us. As eager as we might all be to see the back of 2021, it’s always helpful to pause and reflect on the year gone by. And when it comes to the digital world, in particular, there’s certainly a lot to think about.

2021 brought a great many exciting developments for those active in the crypto space. The technological innovations underpinning Bitcoin and other cryptocurrencies exploded into the public eye as harbingers of a true revolution in how we live, work, and interact.

Unfortunately, 2021 also gave some very clear indications that the current infrastructure of the web is simply not fit for purpose. Even setting aside all the controversies over mass data surveillance, environmental devastation, and political interference, we’ve all had to confront a fundamental issue: the internet just kept on breaking.

2021 was the year that the highly centralised and cloud-driven infrastructure of the web failed us when we needed it most. Hyper-scale cloud providers repeatedly took down essential services, often resulting from the tiniest and most innocuous of bugs. With many continuing to work remotely and staying socially distanced from family and friends, the pain of being cut off from key sites and applications was felt more keenly than ever.

But we are optimistic that 2021 will also be remembered as the year that an alternative took shape: a decentralised, blockchain-driven cloud computing solution powering an open and sustainable future for the web.

In this post, we’ll take a look back at the year’s most significant and disruptive cloud outages. We’ll try to gauge the major impact these had on businesses and individuals, and look at some underlying causes. Finally, we’ll look forward to 2022, when Cudos will launch the mainnet of its decentralised cloud computing alternative with the aim of making such outages a thing of the past.

January 4th – Slack outage kicks off the working year

As the saying goes, “start as you mean to go on”. And what better way to kick off the year than with an hours-long outage for an essential workplace communication tool?

For many, January 4th was the first working day of the year – the perfect time to bring a fresh perspective to ongoing projects and make the most of all your newfound energy after some festive rest and recuperation. Unfortunately, the 12.5million or so users relying on Slack to keep them connected while working remotely found their best intentions of little use. For more than three hours, users across the world were unable to connect to the service and send messages, preventing essential workplace interactions.

Interestingly, the cause of the outage was ultimately revealed to be a scaling problem with Slack’s cloud provider AWS, which wasn’t able to adequately respond to the sudden spike in demand caused by the return to work. Slack’s current deal with AWS was negotiated in 2018 and is due to last through 2023, committing them to an annual spend of at least $50 million. Amazon, for its part, offers Slack as an option for all its internal communications thanks to a partnership deal signed in mid-2020.

This leads to the obvious question: if one of Amazon’s biggest customers and closest partners can’t escape major outages, what chance does anyone else have?

March-April – Microsoft’s remote working solutions suffer repeated failures

Microsoft is AWS’s closest competitor in the cloud computing market – a market that, as we’ve discussed previously, is dominated by a small number of providers.

Part of Microsoft’s appeal as a provider has been its range of tools for remote working, including its cloud-based software-as-a-service (SaaS) packages under the Microsoft 365 banner and its workplace communication tool Teams. As with Slack, Microsoft saw usage of its SaaS offerings spike during the pandemic. In March 2021, they released an in-depth report reflecting on the previous twelve months, highlighting the massive increase in Teams and Office usage.

Unfortunately, that same month the limitations of its cloud offerings were plainly on display. On March 15th, a major incident rendered most of Microsoft’s cloud-based services inaccessible, including Outlook, Teams, and Office 365 – as well as the gaming service Xbox Live. The outage even managed to delay the highly anticipated virtual premiere of Zack Snyder’s Justice League.

The outage lasted around fourteen hours, meaning that for more than a full working day companies across the world were deprived of tools they had come to rely on more than ever. Needless to say, such outages have the potential to profoundly disrupt essential work, and the unpredictability only compounds the risks.

Nor was this the end of the disruptions for Microsoft customers – a mere two weeks later, on April 1st, services were once again disrupted for several hours. And the end of April saw yet more issues, with Teams suffering a three-hour global outage on the 27th.

Perhaps most concerningly, it was ultimately revealed that the March 15th outage was caused by a problem similar to that which had led to a massive outage in September 2020. That particular incident had taken down Microsoft services across the globe for more than five hours.

All this leads to the unavoidable question: is it sensible for remote working to be built around tools that are randomly rendered inoperable for hours at a time? With remote working set to become an essential part of how we organise our working lives, it’s a question we need to be taking seriously.

8th June – A single customer triggers massive Fastly outage

As the web has become fundamental to our everyday lives, the intricate architecture that keeps it operating has gradually slipped from view. The Web 2.0 era has prioritised user-friendliness and accessibility, enabling billions across the world to access everything they need on the internet without any technical know-how. The result is that many of us are unaware of the complex processes behind some of our most familiar and routine online activities.

But when some essential but obscure aspect of the web’s infrastructure suddenly breaks, the public is given a crash-course in just how intricate – and how fragile – the web really is.

In early June, this was exactly what happened when the infrastructure provider Fastly suffered a significant outage that rendered some of the most prominent websites inaccessible for nearly an hour. Amazon, Reddit, Twitch, Spotify, CNN, and the BBC were among those affected.

For many, of course, this was the first they’d heard of Fastly, who acts as a content delivery network (CDN) for many of the affected sites. Indeed, reports indicated that, as of June, Fastly were responsible for handling 10% of the world’s internet traffic.

It was ultimately discovered that this massively disruptive outage was caused by a single Fastly customer updating their settings, thereby triggering a previously undiscovered bug. This type of incident, where a minor issue leads to enormous consequences, is inherent to the highly centralised infrastructure of the web. It’s understandable that, following the Fastly outage, many analysts redoubled their warnings over the dangerous overreliance on a small number of infrastructure providers.

4th October – Facebook’s year of woe gets worse

Facebook haven’t had the most auspicious of years, to put it mildly. From being implicated in the spread of misinformation around the US election and the subsequent assault on the US Capitol to the extensive revelations of whistleblower Frances Haugen, you can’t blame them for trying to deflect attention with a major rebrand.

And on top of all their other headline-grabbing scandals, Facebook also managed to suffer one of the biggest and most disruptive outages of the year. What better way of drawing attention to our collective overreliance on their services at the worst possible time?

The outage took place on October 4th and lasted for around seven hours, rendering all Facebook’s services, including WhatsApp, Messenger, and Instagram, completely unusable. This was enormously disruptive for the billions of people across the world who rely on Facebook’s services to stay in touch with friends and family, as well as – controversially – for managing their business or governmental affairs.

The importance of Facebook is even more pronounced in the developing world. The company’s “Free Basics” program offers internet access to 65 countries across Asia, Africa, and Latin America, which was cut off completely during the outage. While for those in Europe and North America Facebook is just a social media platform, for others it’s their very means of access to the web. And given Facebook’s current reputation, this should be a cause for major concern.

The impact of the outage was certainly significant for Facebook, resulting in a 5% drop in share value and knocking $6 billion off Mark Zuckerberg’s net worth in a single day. But much more troubling is what the outage tells us about how much vast swathes of the global population have come to rely on the services of a company increasingly notorious for evading oversight and failing to take their responsibilities seriously.

7th December – AWS takes centre stage

Last but not least, we can turn to the overwhelming leader in the cloud computing market, AWS. With their 32% market share, it’s no surprise that issues affecting AWS can have a major impact. In fact, its outsized role in keeping the web running led Vice magazine to recently describe it as “the internet’s single biggest point of failure”.

This pointed criticism of AWS was motivated by a major outage on December 7th that took out vast swathes of the internet for multiple hours. Not only were services owned by Amazon affected, rendering customers unable to issue commands to Alexa or access their Ring doorbells, but a wide range of sites that rely on AWS for the computing needs were also impacted. The outages ranged from major streaming services like Netflix and Disney+ to online games like Playerunknown’s Battlegrounds and League of Legends and apps like Venmo, Tinder, and Duolingo.

The issues caused by the AWS outage also revealed just how deeply enmeshed its services are with key parts of our everyday lives. Some US colleges had to postpone exams after realising that they could no longer access key content due to the outage. For owners of web-enabled smart tech, meanwhile, there were a whole range of unforeseen issues to contend with, from an app-controlled cat feeder that stopped dispensing food to Roomba vacuums that would no longer patrol for dirt.

And while those may seem like relatively minor issues, they offer a grim view of what the future may hold. With Amazon aggressively seeking to corner the smart home market, it’s easy to imagine a future AWS outage stopping you from opening your Amazon fridge or disabling your Astro robot’s “sentry” mode.

Of course, this was far from the first time AWS had caused major issues – in November 2020, for instance, an AWS outage took out a range of websites and apps, including image-hosting service Flickr and Adobe’s web and mobile design app Spark. And nor would it be the last – in weeks since the December 7th outage, we’ve already seen another significant AWS issue disrupt web traffic across the eastern US.

Ultimately, the persistent issues with AWS are the clearest indication yet of the risks that the web’s highly centralised infrastructure poses. As ever more devices and services become reliant on the cloud to operate – including the smart home tech that increasing numbers of us are incorporating into our homes – the dangers of major outages will only continue to grow.

What catastrophes could we see if the much-anticipated metaverse remains reliant on the current cloud infrastructure, for instance?

The Cudos alternative: Decentralised cloud computing using blockchain tech

Thankfully, 2021 wasn’t just the year when persistent web outages finally caught the public eye. It was also the year when a more open, secure and sustainable future for the web began to take shape. Innovations from the crypto space began to diffuse into the mainstream and reveal their truly revolutionary impact – from NFTs shaking up the art world to DeFi opening up the exclusionary world of investing.

And the cloud computing market has been no exception. Here at Cudos, we have been working hard to build a decentralised, blockchain-driven alternative to the hyper-centralised cloud offerings of Amazon, Microsoft and Google. Not only is our alternative more sustainable, efficient, and secure, it also avoids the risks of the persistent, disruptive outages that we’ve seen throughout 2021. By allowing computing tasks to be distributed across a huge number of different nodes, we avoid the dangers of a single, central point of failure. If any compute providers drop off the network, others can seamlessly and smoothly take their place.

The Cudos testnet went live in three stages through the latter half of 2021 as part of Project Artemis, with the third stage concluding on December 20th. The fourth and final stage will launch on January 10th, with the mainnet launch following soon after on January 25th.

Our revolutionary decentralised cloud solution will soon be available for everyone. Read on below to learn how you can get involved and help make 2022 the year we break the centralised cloud monopoly.

Support our decentralised cloud solution

Building a decentralised future for the web – a future free of the kinds of outages that have plagued us all throughout the year – is a collective endeavour. The more people that take part, the better our chances of making the future of the web fairer, more open, and more secure.

To help build our decentralised cloud alternative, Cudos is in need of data centers and cloud service providers. If you can help us, please get in touch to see how we can work together.

If you’ve missed out on our latest announcements, here are some of our recent partnerships.

Lastly, if you already have your CUDOS tokens, you can make the most of them by staking them on our platform and helping to secure our network.

About CUDO Compute

CUDO Compute is a fairer cloud computing platform for everyone. It provides access to distributed resources by leveraging underutilised computing globally on idle data centre hardware. It allows users to deploy virtual machines on the world’s first democratised cloud platform, finding the optimal resources in the ideal location at the best price.

CUDO Compute aims to democratise the public cloud by delivering a more sustainable economic, environmental, and societal model for computing by empowering businesses and individuals to monetise unused resources.

Our platform allows organisations and developers to deploy, run and scale based on demands without the constraints of centralised cloud environments. As a result, we realise significant availability, proximity and cost benefits for customers by simplifying their access to a broader pool of high-powered computing and distributed resources at the edge.

Learn more:

Continue reading

High-performance cloud GPUs

2021: The year the centralised cloud hit breaking point

CUDO Ventures

Continue reading

Building AI infrastructure in-house vs using external experts: Bridging the skills gap

5 mistakes to avoid in AI infrastructure projects: from inefficient training to poor planning

Real-world benchmarks demonstrating performance variances across different GPU cloud infrastructures

Choosing a GPU cloud provider in 2025: A proven evaluation checklist

Subscribe to our Newsletter