As reported by GeekWire, the digital snow day is over, as Amazon Web Services has fixed the issues with its Simple Storage Service, or S3 for short, that crippled significant chunks of the internet Tuesday.
Starting a little after 9:30 a.m. Pacific time Tuesday, and lasting close to five hours, the S3 cloud storage service started experiencing “high error rates.”
This outage knocked out access to a litany of websites and apps that run on AWS, including but not limited to Expedia, Slack, Medium, the U.S. Securities and Exchange Commission.
The outage even temporarily affected the AWS service health dashboard, which displays outages and events.
Amazon has not fully detailed what caused the high error rates.
For S3, we believe we understand root cause and are working hard at repairing. Future updates across all services will be on dashboard.
— Amazon Web Services (@awscloud) February 28, 2017
Nick Kephart, senior director of product marketing for San Francisco-based network intelligence company ThousandEyes, monitored the outage throughout the day.
He said information could get into Amazon’s overall network, but attempting to establish a network connection with the S3 servers was like hitting a wall.
It stopped all traffic dead in its tracks. So any site or app that hosted data, images or other information on S3 was affected (including Supply Chain 24/7).
Without having access to Amazon’s servers, Kephart couldn’t say why it became impossible to connect with the S3 servers.
He said it isn’t clear if it was a human error, or infrastructure failure, or a configuration problem or an automation issue that caused the problem.
But he theorized it was a pretty complicated malfunction given the proliferation of the outage.
“It wasn’t just the system completely misbehaving but something deeper in the infrastructure that caused these problems,” Kephart said.
ThousandEyes also produced this visualization to show the extent of the outage and all the interactions within the AWS network.
As to why the outage was so widespread, Amazon’s status as cloud king, with a market share of more than 40 percent comes into play. Another factor, Kephart said, is the way AWS programs are built on top of each other, meaning that S3 going down impacts other services.
“Amazon Web Services builds many of their individual services on building blocks built on each other,” Kephart said.
“S3 is one of the very fundamental building blocks of AWS. When S3 fails, many, many, many other services fail alongside because they are all built on top of S3.”
Now that the issues have been worked out, the question turns to what can be learned from this outage. Several experts surveyed by GeekWire say the most important takeaway from this event is the necessity of redundancy in cloud storage.
Shawn Moore, CTO of Orlando-based web experience platform Solodev, said all technology fails at some point. Large swaths of the internet went down in Tuesday’s outage, but other sites and apps didn’t experience any disruption.
Those are the ones that had their data spread across multiple regions.
“The ones who have fully embraced Amazon’s design philosophy to have their website data distributed across multiple regions were prepared,” Moore said.
“This is a wakeup call for those hosted on AWS and other providers to take a deeper look at how their infrastructure is set up and emphasizes the need for redundancy - a capability that AWS offers, but it’s now being revealed how few were actually using.”
David Linthicum is senior vice president at Cloud Technology Partners, a company based in Boston that helps enterprises migrate their data to cloud storage providers like AWS, Microsoft Azure and Google Cloud. He said the outage seems like an isolated incident, something that is bound to happen occasionally.
“Systems fail, and from time to time clouds will fail,” he said. “Amazon’s ability to get things up and running quickly, and get back to business, will be the real test,” he said.
Linthicum went on to say that he doesn’t think Tuesday’s outage will keep people from using cloud storage.
“Amazon Web services, and the other public cloud providers, pretty much stay on top of their operations,” he said. “Certainly much better than enterprises do.”
In addition to pushing redundancy and hosting data at multiple centers in different regions, experts emphasized using multiple cloud providers to store data. Not only does that protect customers from a system-wide outage, it can also let users switch between providers as cost dictates.
Akash Nankani, a former lead program manager at Microsoft and founder of NanSoft Studios and creator of the government filing tracking site SECGems said he tries to make his products “provider agnostic,” so that if an incident like Tuesday’s AWS outage went on for a long time, he could make a quick change to remove AWS dependency.
“In my view, every business should ask this question to themselves: ‘If tomorrow, for whatever reason (valid or invalid), if Amazon (or any other provider that you depend on) decides to ban/blacklist my account or business, how will I deal with it? How soon before I can recover from it? And have I pro-actively tested this scenario before it occurs?'”
“While I have a great deal of respect for Amazon/Microsoft/Google/IBM Bluemix/OVH, etc. and have used/experimented with all of them, from a business continuity perspective, I think investing in ‘multi-provider’ support is more important than ‘multi-region.’ This also comes with the benefit of dynamically switching to lowest cost provider as well as dealing with provider/regional outage.”
Source: GeekWire
Related SC24/7 Article: Forget About Cloud Infrastructure, It’s Now About Apps & API’s, Says Amazon’s CTO
Related e-Commerce White Papers
The Five “New Rules” of 3PL e-Commerce Fulfillment
The purpose of this white paper is to spell out “The Five New Rules of 3PL e-Commerce Fulfillment.” rules and explain how they will benefit your 3PL warehouse, it also provides the guidance you need to help your 3PL e-Commerce fulfillment business prepare for future growth. Download Now!
Transform Global e-Commerce and Maximize the Customer Experience
Global e-commerce is growing in leaps and bounds, particularly in emerging markets, where consumers can find it hard to locate affordable imported products. Download Now!
Global E-Commerce Logistics 2016
The report contains Ti’s bespoke market size and forecasting data, as well as overviews of some of the world’s leading e-commerce businesses, such as Alibaba and Amazon. Download Now!
Precise Cross-Belt Sortation: Unlocking Efficient E-Commerce Distribution
In the burgeoning e-commerce market, speedy delivery and diverse offerings are customer expectations, and the ability to deliver these via a seamless customer experience will increasingly separate leaders from laggards. Download Now!
Five Ways to Optimize Your Distribution Center For E-Commerce Fulfillment
Today’s challenge has now become how to manage fast-moving inventory with enough precision to meet the expectations for prompt shipment of multiple small orders. Download Now!