AWS Outages Continue, Affecting Web-Based Service

AWS

Monday, Amazon’s cloud-based services arm, Amazon Web Services (AWS), reported a widespread outage affecting access to thousands of websites, including OpenAI’s ChatGPT, Reddit, and Lyft. 

These issues have persisted into the afternoon, disrupting businesses globally, including food and beverage companies that rely on these services.  

Additional services affected include Amazon.com, Prime Video, Alexa, the company’s ad servers, and some Google services, such as Gmail and YouTube. Reports on Downdetector, a website that shows when users encounter non-functioning webpages, hit as high as 50,000, likely related to the outage. For AWS specifically, reports were closer to 9,600. 

At 3:15 p.m. EDT, Amazon said that, although service has not fully returned, it is observing “decreasing networking connectivity issues” across multiple regions. You can find the latest information about the outage here. The problem began roughly 12 hours before, at around 3 a.m. the same day.

The potential reason for these errors may have come from a “Domain Name System” (DNS) error, which impedes requests to connect to a specific service, resulting in an error. The root cause, however, has not yet been specified.  

TechRadar noted that a North Virginia “operational issue” at one of Amazon’s biggest data centers could be the culprit.

After the initial outage, users have reported issues related to AWS services throughout the day. Mike Chapple, IT professor at the University of Notre Dame, told Mashable that this is to be expected.

“While this is disruptive, it isn’t unusual. The process of fixing a serious IT infrastructure issue often creates new problems, and fixes often need to be rolled out across a large number of systems over time,” Chapple said. “As engineers work to steady the system, operations slowly stabilize and things return to normal.  Think of it like a utility outage that occurs in a large city.  The power might flicker on and off a few times as repair crews do their work.”

In a statement to CNN, Mehdi Daoudi, CEO of internet performance monitoring firm Catchpoint, estimated that the cost of the disruption could reach the range of “hundreds of billions.” 

He cited the loss in productivity of millions of workers that cannot do their jobs and delayed business operations as reasons for the estimate.  

“The incident highlights the complexity and fragility of the internet, as well as how much every aspect of our work depends on the internet to work,” Daoudi said.