Web scale for the rest of us

00:00:33.840 Um, so there are three things you need to know about me and why I'm giving this talk. The first is, before I was a Ruby guy, I was a .NET guy, and I did a lot of the scaling work for mycareer.com.au, Australia's second biggest job board. The second thing is that I helped Invado grow to the size it is today. I joined when our CEO was doing his Tim Ferriss remote working thing. There were three developers in a co-working office.

00:00:46.800 I left just a little before Invado was big enough to afford the Ruby sponsorship level at this conference. So there's a pretty big gap between those two. Finally, I work for a startup now, which is quite a bit smaller than Invado and has quite a bit less money in the bank. So, if you think about it this way: this talk is titled "Web Scale for the Rest of Us." Invado taught me about web scale, and being poor again has taught me about the rest of us.

00:01:17.119 The key thing I want to talk about is that, as awesome as Paul’s talk on the Braintree stack was, you don't want to build that for your startup—not immediately. That stuff is out of control crazy, and it's not kind of appropriate for your situation.

00:01:36.560 So, I find the topic of scaling Rails applications tricky to discuss. The longer I’ve been building Rails apps or any kind of software, the more the answer to every single question has become, 'It depends.' You know, should I do this? Well, it depends.

00:02:01.439 Thankfully, some kind of joke on the internet gave me an out. This was my first reaction to the Rails is Omar blog post, which is a pretentious rationalization for excluding or including RBM. I can't remember what it is. After further reflection, I think it's a pretty handy analogy. As software developers, we rely pretty heavily on our metaphors.

00:02:21.599 So, if DHH wants to think he’s this guy, fine. Does anyone know who this is? Cool, not enough of you. This is a fantastic documentary about the world's best ramen chef, Jiro. You can get it on Netflix; watch it; it's really good.

00:02:43.640 So, I decided I’m going to use the Rails as Ramen analogy and beat it to death for this talk today. I wouldn’t be a very good software developer if I didn’t rely heavily on a leaky abstraction. If we look at DHH as our chef and Rails as our menu, then I consider that menu to cover the way we develop software. I want to discuss a bit about how you operate that software, so I want you to consider my perspective to be analogous to the wine list that goes with a pretentious sushi menu.

00:03:24.820 So, before I get any further, I want to define a few terms. In my mind, scaling is doing whatever makes your business more money for less than you spend. The money that's left over is called profit. I’ve seen it before, but I haven’t seen it recently.

00:03:55.120 The funny thing is, if you, let’s say you’re hypothetically some kind of startup guy and you’re sitting in a meeting with investors discussing scale, they’re not going to ask you how many dynos you’re running or how many Node.js and MongoDB servers are in your stack; they’re more likely to ask about how many sales you expect to close. What are your sales figures? The funny thing is the business person's version of DB benchmark figures is talked about a lot, but they have little impact on your day-to-day work.

00:05:15.319 So, anyway, into terms: scaling means doing more. I’m mixing in a lot of concepts about high availability when I talk about scaling because in my mind the two things go hand in hand. Downtime is essentially anti-scaling. You’re meant to go up and to the right, but instead, you’re still going to the right, but you’re going down.

00:05:59.319 Usually, if you’re having scaling problems, it leads to availability problems, so these two concepts are very tightly linked. Therefore, I’m going to mix together these concepts about scaling up and not falling over as kind of the same idea.

00:06:37.560 Another important point to note about scaling is that it isn't just about serving web traffic, but also preserving the ability to evolve your application over time. Your business isn't just going to grow on its own; you're going to be changing your software ideally a few times a day, but let’s say daily, to meet the demands of new customers.

00:06:59.000 Your software will evolve as you get new customers, which will place more demands on your stack. So, these two concepts come together, and that’s what I mean by scaling. You need to be able to develop more software to serve more people cheaply, or hopefully cheaply enough to not affect your bottom line.

00:07:39.440 Returning to our sushi menu metaphor and the leaky boat analogy, these are the kinds of things that DHH likes—the aesthetic of certain technical choices, such as CoffeeScript or the use of Bundler. Things like TDD are also technical choices, but they are more about practices. It’s a combination—as it’s a statement on how to develop software using these practices and these tools.

00:08:00.840 The ‘wine list’ that goes with that, in my mind, is to have more than two developers doing dev on the most boring version of the Rails stack you can imagine—across two web servers in front of two SQL databases. I’ve spoken on the specifics of this stack before and blogged about it, so I’m not going to get too much into the technicalities because this stuff changes.

00:08:42.960 However, the core idea of two and two and two remains: I think this is the best way to develop and deploy Rails applications. I feel comfortable saying that because I very strongly believe in the idea that it depends, and you should respond to your situation intelligently. Therefore, I encourage you to have a strong opinion and to express it.

00:09:15.180 The first key principle that guides my thinking on this is that, as Cory H's talk yesterday morning was really helpful in understanding, it involves knowing the heritage of your software development practices and tools. You’ve got to understand the history.

00:09:58.440 So the first thing is that Rails and DevOps go hand in hand. You can have DevOps without Rails, but you can’t sensibly have Rails in production without some kind of DevOps environment. I was just going to drop this slide and say, 'QED,' but the essence of this is that over time, RAM gets cheap, whereas programs get expensive. If you’re in a traditional organization with a one-off team occupying one floor of a building and a development team on another, this setup doesn’t make sense.

00:10:55.760 In contrast, if you have one person making decisions about how to allocate development time, operational time, and money, Rails makes perfect sense. You can make those trade-offs. However, when you have one person managing the budget for service and RAM, they might argue that you need five more servers to serve the equivalent traffic of a .NET app because of the memory blow.

00:11:39.560 So, they'll never approve it. I occasionally do a bit of consulting and have sat in rooms with traditional enterprise ops teams, attempting to convince them that it’s perfectly okay for a Rails app to consume ten times as much memory as a JVM. They didn’t believe me because they don't see the corresponding savings in the development team.

00:12:02.040 This is why DevOps and Rails pair well together; it’s all about that trade-off. DHH likes to share pictures on his blog of all the RAM he buys. The psychology of it drives the shape of your framework, so you may as well play to its strengths.

00:12:55.800 The other reason I believe DevOps and Rails go hand in hand is that, compared to deployment options for something like the JVM, Rails options such as Unicorn or Passenger offer ops teams very few levers to pull. They can add another process, maybe remove it, or add some memory to a box. However, there's not much to do. Generally, if some kind of performance or scaling change is needed, you must go back to the development team to do the most efficient work to resolve that.

00:13:40.960 So yeah, if you’ve got heavily siloed DevOps, Rails is not a very good framework because the team responsible for scaling the app doesn’t have enough levers to effectively accomplish that. Always consider that you’re likely going to work closely with your ops team—though you may not have one early on.

00:14:14.960 The next crucial thing I think is really important when deploying Rails applications is to utilize the Single Responsibility Principle. It’s something we all use for our code, but not many people think it applies to servers. I’m paraphrasing a quote from Brian Kerigan, who said that debugging is twice as hard as writing a program. If you’re as clever as you can be while writing, how are you ever going to debug it?

00:14:40.880 Things like the Single Responsibility Principle and other design strategies serve to constrain our code, making it easier to maintain over the long term. This principle applies to servers as well; if you can tell what a server does from its name, it has a single responsibility, and you understand why it does what it does.

00:15:26.360 You can easily change it if you need to. Isolating components also helps with performance degradation over time, because if your database is on the same server as your Rails processes and memory starts running out, you have two issues to address.

00:15:47.640 If your database is isolated from your app servers, you’ll receive an alert specifying that your database is running out of memory, effectively isolating one problem. Although it may not seem like the most practical solution in terms of dollars for a startup, the cost of troubleshooting can be significantly more expensive than the cost of that additional server. Therefore, as soon as you can afford it, I strongly advise you to separate your app processes from your database.

00:16:37.040 The most common violation of the Single Responsibility Principle I see is the utility server. Hands up if you've got a utility server! Surprisingly, that's not as common as I expected.

00:17:08.480 The Single Responsibility Principle is essential for driving down the troubleshooting cost of each component of your stack. First, developers aren't great at operations—honestly, we tend to only focus on the ways things break rather than how to create things.

00:17:47.160 Moreover, many of us have been awoken at 3:00 AM to resolve an issue. Has anyone tried to do difficult work at that time? It doesn't work well! You really want to apply constraints to your stack so that, as a developer or an unskilled operations person, you're capable of effectively resolving problems, even at 3:00 AM.

00:18:17.799 The next principle that is crucial to deciding on your early stack is YAGNI (You Aren't Gonna Need It). YAGNI was the best thing I learned as an early programmer, as we've all been in situations where we code just a little extra because we anticipate needing it soon.

00:19:03.640 However, when the time comes, we find we were completely wrong, and end up tearing everything down to build the next iteration. The same occurs with operations: generally, one thing will bottleneck as your business grows. If you think you know what it is, chances are you're wrong.

00:19:56.520 So if you're preemptively building components of your stack to address potential future bottlenecks, you’ll just end up dealing with other bottlenecks that emerge, making it hard to identify the culprit. I do want to put a caveat on using YAGNI, though.

00:20:35.840 While nothing is forcing you to add another feature to your app, the reality of actual operations is that you do not have full control over what happens next. Your marketing department may decide to launch the perfect viral email, resulting in a growth spike that dictates an urgent need for adjustments to the stack.

00:21:41.560 So, you must keep some flexibility in mind. Think of your operations approach as akin to the character in Ronin. I want you to envision having a backup plan for your operations, much like that character. You don’t need the elaborate plans of Oceans 11; they are tailored to their specific problems—problems that you do not face.

00:22:13.119 The approach you take should instead be simple, practical, and adaptable. This also applies to how you structure your stack to avoid building things unnecessarily.

00:22:59.640 Which brings me back to the topics of caching and scaling decisions. Caching can both serve as a solution for performance spikes and as a temporary fix for potential underlying database issues. Building in caching early is beneficial, particularly for situations like read-heavy news sites like The Conversation, where users are primarily consuming content.

00:23:47.680 Caching is very important for sites like that, but for applications featuring more user-specific content—like a social film review site—you may not want to apply caching too hastily and risk premature decisions. Still, you want to have caching ready for use at 3:00 AM, so that if a page gives you trouble, you can implement it and go back to sleep.

00:24:39.840 It's a critical tool to have, but don't rely on it too heavily; instead, address the root of performance issues in due time. As your user base grows, be aware that they will provide you valuable feedback about the performance quality of the site.

00:25:25.080 This leads me to my positioning on MongoDB. I feel it's a social signal saying you and I will not be friends. However, I don't want to bash it too hard. MongoDB makes a fair case as a semi-durable data store in a cloud environment. Given that in cloud deployment, your disks aren't that reliable, you need a robust approach to data redundancy; even if it’s just quirky reliability.

00:26:41.400 While MongoDB can be associated with bad design, it actually has its use cases when one designs to naturally account for server instability and unreliable durability. If you construct a system to accommodate cloud utilization, relying on the notion that some servers might come and go, then using MongoDB is plausible.

00:27:27.920 Nonetheless, in the context of building your startup on Rails, deploying on SQL databases makes sense. It aligns with Rails; they were built to go together. Regular SQL databases have robust systems in place, whereas MongoDB often lacks the same structural integrity.

00:28:15.920 SQL databases existed long before Rails, meaning that having a SQL database keeps your options open. You can vertically scale your database or utilize various tools to ensure good performance. It is essential to work with tools and environments you understand and that provide reliable support for scalability.

00:29:04.080 The golden rule to remember is: if you can't afford two, you can't afford one. This applies to servers, developers, and any critical infrastructure you rely on. You should avoid singular points of failure; thus, I believe that having redundancy is key to maintaining a smoothly operating application.

00:29:45.760 Availability is crucial, as services rarely live forever. Consider your development team as part of this failure; if one key person leaves, it disrupts everything. Aim to ensure you can have two developers, two servers—the aim is always redundancy, balance, and sustainability.

00:30:30.760 Having only one instance of everything inherently risks failure. During your early phases of development, prioritize redundancies, even at the cost of simplicity. It provides peace of mind, smoothens operations, and allows for easier scaling.

00:31:07.840 I've been the second developer on a product three times in my career. Each time, I’ve told myself I wouldn’t do it again. It's essential to get a team as soon as you believe your solo project is viable. Invest in the second developer early on, as it fosters better practices and communication.

00:31:44.200 I want to thank Cory for introducing the idea I’m sharing today. While the entire idea sounds pretty straightforward, those small decisions can significantly impact the trajectory of a business. Gather resources and team members as soon as you see potential in the project.

00:32:25.760 Now, let’s talk about how I’ve applied these principles in my own company, Good Films. This startup comprises two products: the main site, a social film rating network, and a market research analytics tool for the film industry. We have two very different products that come with their unique constraints relating to development resources and budgets.

00:33:19.400 Good Films is hosted on Rackspace Cloud, behind a load balancer. We have two app servers, one utility server (which I'm not a huge fan of), and a database master/slave configuration. We run Unicorns with Nginx—a very straightforward stack.

00:34:12.080 The choices were made because the social network is cost-sensitive. We want to maintain tight control and the ability to spin up or down servers as necessary. After receiving a large influx of traffic from Reddit, we scaled to five app servers, and as traffic declined, we killed the extras.

00:35:10.000 In contrast, our analytics tool, which services a much narrower audience, is deployed on Heroku. Heroku offers operational simplicity, which makes sense for that product, even though it's more expensive. The trade-offs come into play, as the pricing structure for that product covers the associated costs.

00:35:54.000 Utilizing the two-stack maintainability promotes familiarity with scaling options. You can treat the Heroku infrastructure as if you had several dynos on standby, and this mindset encourages scalability without the usual growing pains.

00:37:07.000 In essence, engage with the principle of redundancy at each level—whether that’s developers or operational capabilities—and foster the understanding within your teams. I hope you found this discussion useful, and I'm happy to take questions.

00:37:35.600 The question was whether I've ever had to fail over from the master to the slave. I’ve practiced it, but not during a real failure. It only took about five minutes, mainly because I'd never failed one over in production before.

00:38:02.640 We're equipped now with Capistrano tasks to help complete that process. That’s just understanding the replication and recovery processes that can feel arduous.

00:38:32.760 The follow-up question mentioned my experiences with NoSQL solutions and the perception of databases versus programming languages. I don’t genuinely hate MongoDB, but I find certain limitations in relational databases often make more sense in a Rails context.

00:39:35.360 Ultimately, I want to stress that we must not allow the notion of NoSQL find its way into our conventional Rails workings. Use regular SQL options; they are tried and tested. It’s about leveraging shared practices and patterns recognized by the community.

00:40:14.720 When we look at emerging technologies, we may find interesting solutions, but we must be mindful of not straying too far from what has proven effective in our working context.

00:40:43.240 I appreciate when we can have discussions about alternatives, but one should also understand the community conventions that herald success.

00:41:08.560 Thank you for your time today. Do we have any more questions? If something piques your interest or you'd like further clarification, please feel free to ask.

00:41:31.560 I want to express my gratitude for the opportunity to share my thoughts with you. Thank you very much for your attention.