March 15, 2010
Editor’s Note: This is a warning, this article is a TOTAL GEEK FEST! Proceed at your own risk.
Now that’s out of the way, let me tell you about a terrific session I attended early on Sunday morning titled “Beyond LAMP: Scaling Websites Past MySQL.” Now by trade, I am a systems engineer and do a lot of geeky computer stuff, so this panel was right up my alley. The panel was a great mix of engineers from some of the heavyweights, including Serkan Piantino of Facebook Inc., Alan Schaaf of Imgur LLC, Christopher Slowe of Reddit, and Jason Kincaid of TechCrunch.
As the SXSW website noted, the panel was all about this issue:
“Most startups begin with a basic LAMP stack (on PHP or Python) and then add database replication and memcache as they grow. But then what? There’s a big gap between these out-of-the-box solutions and what it takes to run something bigger.”
Like I said, I sit around with friends and discuss what Twitter, Facebook, and the like do on their backends, so this was geek heaven. Each panelist first talked a bit about their current architecture and the hurdles they have battled in trying to scale up to a massive level. Alan of imgur.com started out as a one-man show; his image service was hosted on a run-of-the-mill vanilla hosting platform that he was paying like $5 month for, then it took off. He is currently pushing 80TB of data through the pipes every month. When the service took off, he faced many challenges trying to scale up so quickly. In the end, he switched to a content delivery network to server out the images. On the server side, he is using enginex to us http to server the images and apache to handle the static html and php.
Reddit, I am happy to say has switched to Amazon EC2 instances, with all of its load and computing handled by Amazon’s cloud platform. It is using 20 app servers and switched to the postgres database as basically a key-value store.
Now for the BIG boys.
Kevin Weil explained how Twitter started as a pretty standard Ruby website. As it grew, the Twitter staff was forced to take measures allowing them to scale more efficiently. They stripped out a lot of the unnecessary ruby code and built more standard objects that interacted with one another more independently. This allowed them to better handle upgrading disparate parts individually. Currently Twitter handles 50 million tweets a day from over 70,000 third-party apps. Amazing.
Facebook, as opposed to Reddit, is building its own data centers to handle its growth. It is using standard mysql for its key-value store, and like the others relies on memcache to have cached user-facing data available. However, unlike the others, Facebook is caching an amazing 40TB of data at any one time. It receives over 100k page views per second ! Now that’s big.
As for some takeaways from this session, all of the panelists agreed with the importance of monitoring and graphing to keep up on the system and to pre-empt any issues that might come up. Another interesting point made by both the Twitter and Facebook people was how they use custom-built bit torrent deployment systems to update their servers. Twitters, which is called Murder, cut deployment times from 12 minutes to mere seconds and is due be open source in the near future.
Told you, total geek fest. There was so much more I could have written, but I fear most of you have either fallen asleep by now or run for the hills.
Did you like this article?
Get more delivered to your inbox just like it!