One of the questions I get asked a lot is what developer platform is the most scalable. Some people seem to believe it’s Java or .NET, others feel it’s Python or PHP. There also seems to be a misconception that Ruby doesn’t scale well. A definitive answer to this trick question? They all scale – up to a point (*when your infrastructure has millions of users & transactions – then you have to rely on in-memory databases, content delivery networks, caching servers, x64 servers, ect to keep things snappy). To illustrate this, I’ll delve into the architecture of some of the most popular social networks out there. Let’s take a look at a few key examples:
Dodgeball (the precursor to Foursquare) started out as an ASP site running on an Access database. Founder Dennis Crowley (who was not a coder) created the site using a “Learn ASP in 30 days!” type of book. Later, Dodgeball was rewritten in PHP and MySQL. Foursquare.com was likewise originally written using PHP/MySQL on Apache. When Harry Heymann joined Foursquare (he was the last employee from Dodgeball after the Google acquisition), he helped the company scale up using Scala and the Lift Web Framework. Foursquare is compiled into Java bytecode and mostly runs on Jetty (Jetty project page). Foursquare also runs in MongoDB (MongoDB project page). Here is a fantastic deck explaining the process of migrating Foursquare.com from PHP to Scala/Lift. On a side note – Foursquare also uses Apache Hadoop, and Apache Hive in combination with a custom data server (built in Ruby), all running in Amazon EC2 to do analytics on their data (here is a Foursquare engineering blog post about it) – I also had the privilege of seeing the system in action when I visited Foursquare’s Cooper Square HQ in 2011.
In 2006, when Facebook was opened to the public (anyone over 13 with a valid email address could join), Facebook was built using open-source software including PHP/MySQL on Apache (some of the front end PHP code got leaked in 2007 – you can get interesting insights here). Facebook used memcached (project site) to help the site stay responsive for 12M active users. Facebook employees also used a combination of Python/Perl/Java/g++ and Boost managed using Subversion and git – you can read a detailed account on the Facebook site. Today, Facebook has over 800 million active users and as you can imagine, the infrastructure to support the load has become way more complex. The front end uses PHP converted into C++ using HipHop, the business logic is built using Thrift and persistence is managed using a combination of MySQL, Memcached, Cassandra & Hadoop. All of this is running on what’s estimated to be 60,000+ servers! You can learn more on this Quora thread.
Twitter started as a hack project at a company called ODEO, which was initially focused on RSS syndicated audio & video. Twitter founder Jack Dorsey (one of the ODEO engineers at the time) was really interested in status and tried to find some way to make it easier for people to share what they were doing. The Twitter project was initially written on Ruby on Rails (which was the backbone of ODEO) and evolved from there. Raffi Krikorian (Director of Twitter’s Application Service) has a really excellent OSCON talk which describes Twitter’s current infrastructure. According to Raffi, Twitter is ”the largest Ruby on Rails website on the planet“). Twitter is currently active in rewriting their infrastructure to the Java Virtual Machine (JVM), Scala, Thrift and Clojure. Here is a deck on Slideshare with the gory details on the migration. Raffi’s advice – “One thing we want to emphasize to start ups out there is that switching to Java doesn’t imply that we think that Ruby is a mistake. Ruby got us fundamentally where we are today. We are somewhere between the 9th and 5th largest site on the internet…we have some of the best world class product engineers who can write code in Ruby faster than anyone I’ve seen…And we think that’s really important…and pivotal.”
According to this documentation dating back to 2008 (and this YouTube architecture video from the 2007 Seattle Conference on Scalability), YouTube was originally built using a combination of Apache, Python, Linux (SuSe) and MySQL. The YouTube engineering team also used psyco (dynamic python->C compiler) and lighttpd (for video instead of Apache). This architecture was at the time supporting over 100 million video views per day. Few details are known about the back end infrastructure today, however you can use the YouTube Direct platform to host your mini-customized version of YouTube on Google AppEngine (via the YouTube API).
Moral of the story
Java, PHP, Ruby or Python – they are all suitable languages/technology stacks for a startup & will scale enough to make your product a success. The real question you should be asking – which one of these platforms will make your developers as productive as possible?