This past January, the media content delivery service Netflix announced that they had completed migrating their entire video streaming service to Amazon Web Services's cloud. Completion of Netflix’s seven-year-long, massive data migration had been previously scheduled for completion in the summer of 2015, but according to a report published on Ars Technica last February, Netflix decided to rebuild nearly all of its software from the ground up. The reason behind such a bold and labor intensive undertaking was that once Netflix had the idea of cloud migration, they realized they needed to optimize their platform to “take advantage of a cloud network that ‘allows one to build highly reliable services out of fundamentally unreliable but redundant components.’”
The Netflix cloud migration stands as another example of converged infrastructure, an operational model which more and more companies are turning to as a way of minimizing disruption while storing, processing, and analyzing massive amounts of data on a daily basis. How much data are we talking about here? Sandvine’s most recent Global Internet Phenomena Report credits real-time entertainment streaming for 70 percent of North American downstream traffic in the peak hours. Five years ago that number was 35 percent. Today, Netflix carries about half of that 70 percent reported by Sandvine. In other words, more than a third of all North American fixed Internet traffic during peak hours comes from Netflix users.
Netflix launched its video streaming service in 2007. If you do the math, you might realize what the vision for cloud migration would have been born shortly after that. Then you might wonder how could they be so prescient in their business decisions. It was actually a service outage in their DVD mailing service that sparked the decision. According to the Ars Technica piece:
“For three days in August 2008, Netflix couldn’t ship DVDs to customers because of a major database corruption…Netflix knew it would be even worse if something like that happened to the streaming product.”
What Does This Mean for Other Businesses?
For one, it means businesses now have a major case study in cloud migration and converged infrastructure to pay attention to. Netflix’s seven-year project includes several systems for dealing with contingencies in the cloud. For example, Chaos Monkey. Netflix’s move to AWS enables them to build huge clusters of servers and storage without operating their own data centers, but it doesn't insulate them from failure. This is a common concern with cloud migration. Operating services on somebody else’s hardware means having to think about what happens both when the rented services suffer outages or when their own software causes downtime. Netflix’s Chaos Monkey uses virtual machines in a hypothetical role-playing scenario in which the VMs are randomly taken offline to see how well the system stands up to failure. Another Netflix self-checking feature, Chaos Kong, simulates outages in Amazon regions to gauge how well the system responds to shifting traffic loads to other regions.
The list goes on. There are contingencies for natural disasters and massive security breaches in the Amazon cloud. And the real boon to businesses in all this is that Netflix has released much of their new software as open source. There are already real-world examples of some of the hypothetical scenarios Netflix has planned for. In 2011, for example, Amazon customers like Reddit, Foursquare, and Quora suffered downtimes due to an Amazon outage. Pinterest and Netflix too have suffered other instances of downtime related to outages on AWS. Because so many businesses have at least some stake in developing reliable cloud networks, Netflix told Ars Technica that their decision to release their software as open source was driven by a shared interest in building up the cloud computing community.