Early December 2021 customers all over the country began experiencing outages to Disney+, Venmo and Amazon services.
This was especially concerning for Amazon. The outage left Amazon delivery drivers unable to access their routes to deliver packages. Amazon warehouses were also left unable to process orders for several hours until the issue was resolved.
Often when we see outages like this, we fear the worst. A ransomware attack can often be behind outages like this. This was not the case this time thankfully, as we all take a big breath of relief, but it leaves the question, how were all these separate services affected all at the same time?
Amazon Web Services (aka AWS) is a cloud-based service that companies all over the world use to build private networks for their businesses. Large business services that we use daily such as Netflix, LinkedIn, Facebook, Twitch, and many more use the AWS cloud service. This leads us to a few weeks ago when many of these services, including Amazon, were suddenly in the dark.
- While dozens of AWS services were affected, AWS says the outage occurred in its Northern Virginia, US-East-1, region. It happened after a “small addition of capacity” to its front-end fleet of Kinesis servers.
Kinesis is used by developers, as well as other AWS services like CloudWatch and Cognito authentication, to capture data, video streams, and run them through AWS machine-learning platforms.
The Kinesis service’s front-end handles authentication, throttling, and distributes workloads to its back end “workhorse” cluster via a database mechanism called sharding.
As AWS notes in a lengthy summary of the outage, the addition of capacity triggered the outage but was not the root cause of it. AWS was adding capacity for an hour after 2:44am PST, and after that, all the servers in the Kinesis front-end fleet began to exceed the maximum number of threads allowed by its current operating system configuration.
- Tung, L. (2020, November 30). Amazon: Here is what caused the major AWS outage last week. ZDNet. Retrieved December 29, 2021, from https://www.zdnet.com/article/amazon-heres-what-caused-major-aws-outage-last-week-apologies/
While the outage was not a crippling ransomware attack, it still left many large services in the dark while AWS scrambled to fix the issue. What is interesting to note is that this is not the first time AWS has suffered an outage like this.
Another AWS outage took place in 2017, that left a substantial portion of the internet in the dark for a period. This outage happened very similarly in the same area of the county as this year’s outage.
- Unless you were completely off the grid on February 28th, 2017, you likely noticed Amazon S3 suffered a big outage that affected pretty much all AWS services. This happened in its biggest region, N. Virginia, and it impacted a substantial portion of the internet.
- Takeaways from the S3 outage on February 28th, 2017. (n.d.). Concurrency Labs. Retrieved December 29, 2021, from https://www.concurrencylabs.com/blog/s3-outage-takeaways/#:%7E:text=Unless%20you%20were%20completely%20off%20the%20grid%20on,it%20impacted%20a%20big%20portion%20of%20the%20internet.
We see it also happened against in November 2020, another AWS outage left businesses again, unable to access their networks.
Early in the morning on November 25th, 2020 – the day before Thanksgiving – reports started circulating, claiming various issues with popular consumer applications and sites. It started off as intermittent availability problems. Before long, it escalated to full-blown unavailability. We turned to Amazon CloudWatch, a monitoring and management service that provides data and actionable insights for applications and resources. But CloudWatch was not loading. And then the realization hit. This was not an issue we could resolve.
- DiStasio, A. (2021, January 7). How to Protect Your Application Against an AWS Outage. Gavant Software. Retrieved December 30, 2021, from https://www.gavant.com/library/how-to-protect-your-application-against-an-aws-outage/
So much of our systems are dependent on cloud-based systems whether it be AWS or another one like it, we cannot avoid them because we need them to do business ourselves.
However, when one company is leading the pack with a large majority of businesses under its belt, if that cloud-based system such as AWS should fail, the potential for millions of businesses to be left in the dark is astronomical.
What Can We Do?
There are a couple of ways you can make sure your business is covered if you are using a large cloud-based system. While having that is an incredible boost for your company you want to make sure if a major outage ever occurs you can still run your operations smoothly.
Multicloud refers to the use of more than one cloud from different vendors at the same time. A multicloud environment allows your clouds to be private, public, or a combination of both. The primary goal of multicloud is to give you flexibility to operate in the best environment for your specific needs.
For example, you can have customer data in a private data center to follow compliance rules while having your website and app on public clouds [from other vendors] to increase vendor flexibility and maintain good latency.
- What Is Multicloud? |. (n.d.). Google Cloud. Retrieved December 31, 2021, from https://cloud.google.com/learn/what-is-multicloud#:%7E:text=Multicloud%20is%20the%20use%20of%20more%20than%20one,the%20ones%20that%20best%20suit%20your%20specific%20needs.
Multi cloud could be a healthy way to balance your system, giving you the ability to split your data between two different services means if one service suffers an outage, you will not lose functionality entirely.
Having a local backup for your most vital things could help prevent you losing access to your system. However, onsite backup runs the risk of being more vulnerable. Your best course of action then would be to hire a Managed Service Provider (MSP)
A Managed Service Provider is a company that provides computing platforms for businesses and organizations to manage their IT infrastructure. MSPs (managed service providers) will manage firewalls, servers, and routers on a subscription basis. Depending on your company’s needs, your pricing may vary. Each company will have a unique set-up for their business. Hiring an MSP is becoming more vital for businesses of all shapes and sizes as we navigate outages, hackers, ransomware, and data takeovers.
Preparing yourself and your businesses for breaches or outages is the most effective way to ensure the functionality of your company. Do you have your most important logins backed up? What about a reference point person if you have a family emergency and cannot answer your phone? Here are a few ways to create a plan.
Cyber Security Plan.
How many computers do you have? How many operating systems? How many phones are connected to your system? Do you have a physical security system? Looking into your infrastructure, do you have Wi-Fi? Is that secure? What network devices do you have? Bringing a company in that is gifted in this exact line of work is your best way of ensuring your doors are locked tightly.
Breach/ Outage Plan.
What will you do in the event of a breach? You want to figure out what was breached, what was done in the system during the breach, and were any of your customers’ systems breached as well? Oftentimes hackers will breach one company with the intent to get to another. They may not have been after you, but a customer of yours.
A trustworthy MSP can help you put the best practices in place to assure your company continues forward even during an outage event by any cloud-based platform. If you are looking to learn more about Astoria and how we can assist you and your MSP needs, visit our website today at www.trustaastoria.com