Viewing By Entry / Main
Feb 24, 2010
Lessons Learned Building Reddit - Steve Huffman at FOWA Miami
After the break, Steve Huffman of Reddit came on stage at FOWA Miami 2010 to talk about the lessons his team learned building Reddit from an engineering point of view. Reddit is a social news site where users submit links and other items and users can vote them up and down. It was founded in June 2005 right after the founders got out of college and was acquired by Conde Nast at October 2007. Steve left Reddit just this last fall. The following are the lessons he says he learned (the hard way) while building Reddit:
- Crash often - "When in doubt, let it die" and restart and then read the logs. Then they started using Daemontools Supervise that would restart the app or kill certain scripts.
- Separation of services - often one machine to two can more then double performance. Group similar processes and similar types of data together. So, for instance, each database server handles one type of data and all its related items. Avoid using threads as processes are easier to separate later into different machines, allowing you to scale easier.
- Open schema - Early on they spent too much time thinking about the database and every feature required a schema update. Schema updates can become very painful as you grow. Maintaining replication was difficult and deployments were complex. An "open schema" has two types of tables for each data type: a "thing" table and a data table. The data table has thing ID, the key (as in, what is it: title, URL, etc) and the data. Now adding new features don't require schema changes. There are no joins in the database which makes it easier to distribute within your architecture but you must be careful about consistency since you aren't storing data relationally as the databases are intended to.
- Keep it stateless - with the goal of allowing any application server to handle any request. This means that server failure and restart is no big deal and scaling is straightforward (just add more server). The caching layer must be independent from the application layer.
- Memcache everything - following on number 4's suggestion to keeping caching and application layers separate, Steve recommends using memcached for everything including: database data, session data, rendered pages, rate limiting (for crawlers for instance), storing pre-computed (pre-rendered) listings and pages and memcachedb for persistence.
- Store redundant data - the recipe for a slow application is to store everything as normalized data and get it and assemble it as you need it. If data has multiple presentations, store it multiple times in multiple formats because disk and memory are cheaper than lost customers (because of speed).
- Work offline - By this Steve means, do the the minimum amount of work to end the request and do everything else offline. For example, you can update the cache and the master database for a vote but then put a job in the queue to actually handle things like check for spam, thumbnail items, update "worker databases."
An audience member asked if storing database data in memcachedb and memcache is wasteful by duplicating data. Steve says essentially that it is but the benefit outweighs the burden.
Comments
That is very interesting that he stats to de-normalize your data. Thanks!
Another example of denormalization is flickr. You can check out this article on O'Reilly
http://radar.oreilly.com/archives/2006/04/database-war-stories-3-flickr.html
http://radar.oreilly.com/archives/2006/04/database-war-stories-3-flickr.html
What software do you use for job queues?
