Talks
My Server for Aiur: How Starcraft Taught Me To Scale
Summarized using AI

My Server for Aiur: How Starcraft Taught Me To Scale

by Richard Schneeman

In Richard Schneeman's talk at Aloha RubyConf 2012, titled "My Server for Aiur: How Starcraft Taught Me To Scale," he draws parallels between the strategy of the game Starcraft and the principles of web application scaling.

Main Topics:

  • Introduction to Scaling: Schneeman emphasizes the importance of balancing speed and throughput, translating gaming strategies to real-world web server optimization.
  • Speed and Throughput: He differentiates between speed (how fast requests are processed) and throughput (how many requests are handled at once). Lessons from Starcraft, such as proper resource utilization, are applied to improve web application performance.
  • Optimizing for Speed:
    • Utilizing efficient algorithms to minimize slow operations.
    • Implementing caching strategies to reduce the cost of expensive operations.
    • Importance of logging and monitoring to identify bottlenecks, akin to analyzing replays in Starcraft to assess performance.
  • Improving Throughput:
    • Strategies for adding server capacity through horizontal scaling (adding more servers) rather than vertical scaling (upgrading existing servers).
    • Implementing workers for non-blocking tasks using tools like Resque, allowing for a more responsive user experience.
  • Client-Side Performance: Emphasis on optimizing user experience through fast loading of assets, employing techniques like Gzip compression, CDNs, and browser caching.
  • Avoiding Premature Optimization: He warns against the trap of focusing too early on optimizations without understanding the actual needs, advocating for a more measured approach.

Key Takeaways:

  • Always measure performance and analyze logs to discover inefficiencies in your applications.
  • Use caching effectively while being cautious of the pitfalls of premature optimization.
  • Understand the balance between speed and throughput to enhance performance and scalability.
  • Implement best practices for server capacity and optimize client-side speed to ensure a robust web application.

Schneeman concludes with a reminder to continually assess performance and to explore the right strategies for scaling effectively, reiterating that both gaming and software development require discipline and informed strategic choices.

00:00:15.599 All right, can you all hear me in the back? Okay, thumbs up! That's a good sign.
00:00:21.920 Hello everyone! Welcome, and thank you very much for coming out. My name is Richard Schneeman or Schneems on the internet.
00:00:28.160 To get started, a little bit about me: I am actually a mechanical engineer from Georgia Tech, but I've been writing Ruby code for about five to six years.
00:00:34.160 I really enjoy it. You might recognize me from such gems as Sextant. If you've ever run rake routes on your console and it takes an eternity—like 20 seconds—you can add this gem to your Gemfile.
00:00:40.480 If you have your server running, you can just go to localhost:3000/rake/routes, and it will come up like that. You'll also be glad to know that this is a feature in Rails 4, although it has a different path: rails info routes.
00:00:51.440 I also coded the Wiki gem, which was featured on a RailsCast with Mr. Bates. This is for handling step-by-step actions inside a controller in a RESTful way. If you're interested in that, I also work for Heroku, a small company that does a little bit with Ruby and Rails.
00:01:03.440 Furthermore, I am an adjunct professor at the University of Texas, where I teach Ruby and Rails. This is great news for anyone learning Ruby or Rails because I have all my content online—about 40 hours worth of lectures, presentations, quizzes, and exercises—all at schneems.com/ut-rails.
00:01:17.360 My last name, Schneeman, means snowman in German, and Schneems is just an abbreviated version of that.
00:01:22.400 You might be wondering why I'm here and why I'm talking about Starcraft.
00:01:27.840 Well, I freaking love Starcraft! I did a study abroad in China, where they had all these internet cafes during the days of Starcraft 1, and I got massively addicted to it. When Starcraft 2 came out, I was on the beta constantly. At one point, I was in 1v1 Platinum—before Diamond League, so that was a big deal.
00:01:41.360 Then I realized I needed to see the light of day, so I don't play as much anymore, but I'm incredibly excited about the expansion, Heart of the Storm. Anyone here pre-order it?
00:01:48.400 All right, we've got some Zerg, Protoss, and Terrans in the crowd.
00:01:53.440 You'll also be happy to know that I'm presenting with my gaming mouse, a Razer Naga, which has buttons on the side. So, you know I'm legit.
00:02:04.479 So what does Starcraft have to do with scaling, you might ask? Well, it's a game of balance and precision.
00:02:11.599 In general, there is no one best strategy. Everyone thinks, "Oh man, Protoss is better than Terran," or, "Terran is better than Zerg." At the end of the day, it's incredibly balanced. There are different ways to get better and faster while playing.
00:02:24.239 A lot of it is about APM (Actions Per Minute), which is somewhat like servers—it's all about requests per minute, or requests per second.
00:02:31.680 However, what it really comes down to is that if you want to get good at a video game or scale your web server, it's going to take discipline and time. You have to learn the rules and apply them in the correct way.
00:02:58.080 I've recently been introduced to a book—actually, I don't remember the author's name, but it was recommended to me by Steve Klapnik, the author of "Playing to Win." He is the premier Street Fighter player in the world. He's written this book about strategies that apply to everyday life, not just video games.
00:03:11.840 I like to take lessons from different areas of my life, like Starcraft, and really internalize them. I ask myself, "Why is this a good strategy?" Is it a good strategy just because of the game mechanics, or can I apply these lessons to other areas of my life? That's kind of how the talk came about.
00:03:37.120 Now, whenever we're discussing scaling and speed, we're really talking about two different things: speed and throughput.
00:03:47.280 Speed is the most common factor you will encounter—it's just about things going faster. For instance, we're going to upgrade our Zerglings to Speedlings. Throughput, on the other hand, is literally about how many things we can get through at one time. Each of these individually doesn't do a lot of damage, but if you fill the whole screen with them, you can achieve more.
00:04:11.840 Both of these aspects are really important; you can’t focus just on one and completely ignore the other. We will explore two common patterns for achieving faster speeds and increasing throughput. First, we will optimize and cache for speed. We heard a bit about different aspects of caching in the keynote, so with optimization, we will search for areas in our program that are slow and make them fast.
00:04:52.279 It sounds simple, right? But the key takeaway is that you want to measure everything and use those measurements to make informed decisions about how to improve speed and increase throughput. So for optimizing, we will focus on minimizing slow operations and maximizing fast ones. It really is that easy.
00:05:24.799 The second pattern is going to be caching. We'll search for expensive operations; maybe we can't make them faster; perhaps we added indices, and our request is going as fast as it can.
00:06:01.919 So instead of incurring that cost repeatedly, we utilize caching. With caching, you're taking something expensive and making it cheaper. Finally, we aim to add capacity to our servers to get additional throughput.
00:06:35.679 This is correlated with speed: if we can serve more requests and each request takes less time, we can achieve more throughput. However, there's a reverse correlation; if you don't have enough throughput, that can start affecting individual page speed.
00:07:10.799 So that's kind of the introduction to the talk. Now, here's the actual talk. First, we're going to discuss speed. When talking about speed, there are two very critical things to consider: client-side speed and server-side speed.
00:07:20.560 Client-side speed refers to what the client sees, while server-side speed pertains to the request and response cycle on your server. This happens when a client types in a URL; it hits your Ruby server, does some processing, and comes back.
00:07:41.440 I like to correlate this to macro in Starcraft. Imagine a huge map with way more minerals than your enemy—that's one strategy we can use to win. We want to make sure we are utilizing our server resources fully. In Starcraft, this would mean having three workers on each set of minerals. You want to keep your money low and make the most of everything available.
00:08:02.400 Now, there are common causes of slowness in apps, especially Rails apps. A frequent issue arises from inefficient usage; in Starcraft, many people will queue units and waste resources because each time you click to build a unit, you commit those minerals that can't be spent elsewhere.
00:08:49.440 Instead of doing that, just create one unit when you need it. It's harder to manage but provides more capacity. Similarly, we shouldn't queue web requests in our apps.
00:09:02.080 A very common cause of slowness is database or I/O inefficiencies when querying expensive SQL commands or managing garbage collection. The key is to uncover the reasons your application might be slow or simply inefficient. You should always look to know more than your opponent, in this case, time.
00:10:03.600 I recommend measuring in production, similar to the adage ‘practice like you play.’ Keeping records of actions and their outcomes allows for valuable data analysis. In Starcraft, using a build order tester can be extremely beneficial. For example, a build order tester like 'Yabbit' helps test specific strategies against a sophisticated AI.
00:10:29.440 After creating a game, you can review your replays to analyze what went wrong. If your enemy defeated your base, you can review the match to determine why you were slow and figure out how to improve. In your web server, look for N+1 queries within your logs as logs are one of your best free resources.
00:11:25.360 For example, if you pull a bunch of products from a database and then for each product pull users, you may accidentally queue up numerous SQL queries. Eager loading can help resolve this; it’s an easy fix.
00:11:50.560 Measuring these stats in logs allows you to refine your method. However, always make sure to benchmark your application if you do choose to optimize. Look for queries that are not using an index and are causing slowdowns. Set an auto-explain threshold in your config, and Rails will automatically run an explain on any slow queries.
00:12:33.440 So, logs and monitoring software are critical tools for analysis, akin to reviewing game replays.
00:13:14.640 Utilizing tools like New Relic or Scout can provide more granularity in your data, allowing you to visualize performance over time and identify which part of your request is causing delays.
00:13:42.160 Tuning your back-end and optimizing your data store is imperative, as well as caching expensive queries if they remain inefficient. If your database queries are fast but you still experience delays, then using Memcache is fantastic. It’s extremely simple to set up with Rails.
00:14:20.800 You can easily wrap existing queries within a cache block, and if the data is not present in your cache, Rails will query the database and insert new data into Memcache.
00:14:50.159 Caching in Rails can also do view caching, including page and action caching, but fragment caching is ideal for caching small discrete units, like a sidebar, without caching the entire page.
00:15:09.679 This prevents complex logic about expiration and ensures your caching remains efficient.
00:15:34.560 In short, if measuring your logs is rule number one and avoiding premature optimization is rule number two, we must remember that premature optimization is particularly common in software development.
00:16:22.559 For instance, someone might think, 'I will build 10 bases and crush my enemy with relentless mineral production,' only to lose to a few well-timed Zerglings.
00:16:42.720 So, build what you want, then return to measure what you should optimize. Focus on the essential features before expanding and avoid wasting resources on premature decisions.
00:17:23.120 Next, let's discuss throughput. Remember, if you're dealing with requests, throughput is how many we can manage at once.
00:17:45.920 With popular services like Wikipedia, as your service gains traction, you will see compounding traffic—more users lead to more page views and actions.
00:18:24.639 To deal with this increase, we want to add more capacity. Splitting your web servers and data store is vital for performance. You can decide to keep your database on another server so they run independently.
00:18:57.440 This will allow the web server to handle more requests without lagging. Also, separating the responsibilities allows each service to be scaled according to its own needs.
00:19:30.959 Utilizing workers for non-request processing with tools like Resque allows for tasks like sending emails to operate without blocking the flow of user experiences.
00:20:00.720 If, for example, a notification email doesn’t need to be sent immediately while a user waits for feedback, you can queue the task instead.
00:20:43.200 Should you run out of capacity, the first traditional answer is to scale up, which means upgrading to a more powerful server. However, scaling out—adding more machines—is considered better in the long run.
00:21:24.559 Each additional server functions like Starcraft units, where capacity allows for massive scale. Although scaling up has limits in a Starcraft context, theoretically you could have unlimited servers as needed.
00:22:05.840 This could be accomplished through ephemeral machines and cloud services, where you can scale your applications seamlessly.
00:22:38.960 In AWS, you provision another machine, install dependencies, and set it to run, which can be repeated effortlessly.
00:23:02.799 Remember, you're copying code and not state; your server should not depend on the state stored on it. Utilizing tools like S3 or Memcache avoids potential problems with user sessions.
00:23:45.119 Next, focusing on data storage scaling is vital to maintain efficiency. You can use master/slave architectures to manage read-intensive apps or sharding to distribute data across servers.
00:24:51.360 Watch out for limitations: sharded data cannot join tables across servers but is beneficial for handling large datasets efficiently. Services like Heroku offer features called forks and followers to enhance scaling.
00:25:26.480 Let’s now shift focus to the client side. Understanding client-side speed and optimizing how users experience the app is equally important as the backend. The speed of requests is of utmost importance.
00:26:21.600 Loading assets like CSS and JavaScript can often slow down the experience, so it's important to optimize those loads. Gzip compression and utilizing a content delivery network (CDN) can help speed things up.
00:26:51.920 Make sure your assets are nearer to users by leveraging CDNs to distribute them globally. This minimizes the distance requests must travel and enhances overall speed.
00:27:34.560 Browser caching is also crucial. By setting expires headers in your application’s configuration, you can ensure that public images are retained in users' browsers, preventing redundant requests.
00:28:19.920 Rails asset fingerprinting allows you to manage this process effectively. Whenever changes occur in your assets, the browser will see it as a completely new file due to a changed fingerprint.
00:29:24.640 After enabling your CDN, configuring expires headers, and activating asset fingerprints, your application should show a marked improvement in response times.
00:30:00.160 In summary, measuring everything you can is critical. Notably, use tools such as YSlow to identify and correct inefficiencies. Algorithms easily condense into one simple line of code, speeding up your website significantly!
00:30:48.480 Ensure that you compress assets using the asset pipeline, make use of CDNs, and implement an efficient front end that focuses on speed. Lastly, when in doubt measure, memcache, and add more instances.
00:31:03.840 Thank you very much for your attention. Now, if you have any questions, feel free to ask!
00:31:50.240 Thank you very much.
Explore all talks recorded at Aloha RubyConf 2012
+13