Performance Optimization
`bundle install` Y U SO SLOW: Server Edition

Summarized using AI

`bundle install` Y U SO SLOW: Server Edition

Terence Lee • February 20, 2013 • Earth

The video titled "bundle install Y U SO SLOW: Server Edition" features Terrence Lee discussing the challenges and improvements associated with Bundler, a dependency manager for Ruby. Presented at RubyConf AU 2013, the talk dives into the need for a robust Dependency API due to performance issues encountered when RubyGems.org went down in 2012.

Main Points Discussed:
- Introduction to Bundler:
- Bundler simplifies the management of Ruby gems, much of which was cumbersome before its implementation, especially prior to Rails 3.
- It allows users to define gem dependencies in a Gemfile.

  • Performance Issues:

    • Bundler's initial dependency resolution was slow; installation times could reach up to 18 seconds due to how it fetched specifications from RubyGems.
    • Bugs and the need for multiple API calls contributed to this slowness.
  • The Need for API Improvements:

    • The spike in API usage highlighted scalability issues with RubyGems.org, culminating in its downtime on October 17, 2012, which resulted from heavy load from Bundler API requests.
    • The bundler API's implementation soon required a redesign to manage traffic effectively.
  • Creation of Bundler API Service:

    • A community-driven initiative led to the development of the Bundler API, separating it from RubyGems.org to reduce load and improve performance.
    • The prototype was established in under a week and provided a dedicated service to enhance response time and reliability.
  • Performance Enhancements:

    • The new syncing code significantly improved update times from 16 minutes to 2-3 minutes through the use of a consumer thread pool.
    • Future directions included implementing a webhook mechanism for instant updates when new gems are pushed.
  • Monitoring and Open Source Contributions:

    • Continuous improvements were tracked using logging and alerts for any service interruptions, showcasing a commitment to open-source principles.
    • The ongoing collaboration within the community plays a vital role in the Bundler API's evolution.
  • Future Prototypes and Features:

    • The talk concluded with a preview of a replay service intended to test performance against real-world traffic without impacting service stability.
    • Insights gathered would help optimize the infrastructure and provide a feedback loop for further enhancements.

Conclusion:
- The session emphasizes the importance of community involvement in technology evolution and the necessity for performance-focused solutions in software development.
- Lee encouraged attendees to consider further optimizations and improvements in the Bundler API while highlighting the efficacy of collaboration within the Ruby community.

`bundle install` Y U SO SLOW: Server Edition
Terence Lee • February 20, 2013 • Earth

RubyConf AU 2013: http://www.rubyconf.org.au

If you've ever done anything in ruby, you've probably used rubygems and to search or install your favorite gem. On October 17, 2012, rubygems.org went down. A Dependency API was built to be used by Bundler 1.1+ to speed up bundle install. Unfortunately, it was a bit too popular and the service caused too much load on the current infrasture. In order to get rubygems.org back up the API had to be disabled. You can watch the post-mortem for details.
Members in the community stepped up and built a compatible Dependency API service called the Bundler API. Working with the rubygems.org team, we were able to bring the API up for everyone within a week. In this talk, I will cover the process we went through of extracting the API out into a separate Sinatra app that now runs on Heroku. I'll go over how the API works and how Bundler itself uses it. Since we don't have direct access to their database, we needed to build a syncing service to keep our local cache up to date. We'll discuss the original sequential implementation of the sync code and how we improved performance through the use of a consumer thread pool. The sync time down was cut from 16 mins to 2-3 mins. Next, we'll talk about the productization steps we took for visibility to make sure the app is performing well.
We're prototyping a replay service, to replay production traffic to different Heroku apps. With this we can compare the performance between the current MRI app in production and the same app running on JRuby. This will also allow us to test any changes/features against real production load. We'll go over how to set this up.

RubyConf AU 2013

00:00:10.800 I highly recommend going to Pat's talk downstairs; it's by a good friend of mine.
00:00:16.039 If you're having trouble hearing, I definitely suggest checking it out.
00:00:22.119 I was speaking with Emma from Melbourne, and she mentioned that I was helping out with Rails Girls yesterday, trying to get Wi-Fi.
00:00:28.519 She mentioned that you all like to read things upside down, so I decided to do my presentation upside down.
00:00:34.160 This thing is really sensitive.
00:00:40.960 I'm Terrence, the guy in the blue hat. If you've seen any of my previous presentations, that's usually how people find me at conferences.
00:00:48.480 You can find me on Twitter as h02 or on GitHub as hone, or feel free to visit terrencehero.com.
00:00:54.719 Feel free to reach out to me regarding anything Ruby, Bundler, or routing.
00:01:01.239 I live in Austin, Texas, and I haven't been home since November.
00:01:08.759 Austin is a great place for tacos, so if you're ever in town, hit me up and I will take you out for tacos.
00:01:16.280 I think we have some of the best tacos in the world.
00:01:21.720 We also have a fantastic barbecue place called Rudy's.
00:01:27.360 The best part about Rudy's is that they have these awesome washing machines where you can wash your hands after eating a bunch of greasy meat.
00:01:32.640 And then you get one of these cool stickers that says, 'I have clean hands.'
00:01:38.479 So if you ever see Constantine's laptop, this is probably the best sticker on it.
00:01:45.280 For my next set of presentations, I'm planning to collect a bunch of these stickers and hand them out instead of Ruby stickers.
00:01:52.000 I work on a few community projects, like the Bundler API, which I'll discuss today, as well as helping out with Rails Girls. I was involved in the Rails Girls event yesterday.
00:02:06.159 If you've never participated in something like Rails Girls or RailsBridge or any of these beginner teaching workshops, I highly recommend it.
00:02:12.640 It offers a fantastic opportunity to see what it's like to be a beginner again and understand how challenging it can be.
00:02:18.800 It helps you realize how much we take for granted and gives you a fresh perspective on everything you already know.
00:02:24.239 Definitely, I recommend it if you haven't done it yet.
00:02:29.680 Based on Corey's keynote this morning, I draw a lot of inspiration from Aaron but in different ways.
00:02:36.360 One thing I admire about Aaron is that he started something called Friday Hugs.
00:02:42.480 If you aren't familiar, every Friday, he takes a picture of himself hugging a webcam, and sometimes his cat participates.
00:02:47.599 His cat's name is Garbage Chof Puff Puff Thunderhorse, which is a really awesome cat name.
00:02:52.680 Inspired by this, I went around, and I decided to do some Friday Hugs myself.
00:02:59.000 I attended Frozen Rails in Finland, did one in Singapore, and one in Amsterdam last year.
00:03:07.040 There's also a photo from Lyon, France, at RubyLnum, and Cascadia Ruby in Seattle.
00:03:13.040 I also have some other ones that I haven't posted yet.
00:03:20.040 Now, I'm calling myself a Friday Hug Evangelist.
00:03:28.599 Unfortunately, when I spoke to the organizers, they wouldn't give me a slot on Friday.
00:03:35.519 So, we decided to do a warm-up Friday Hug before the keynote.
00:03:40.640 Richard, my lovely co-worker, is going to come take a picture.
00:03:47.400 Everyone, please stand for the Thursday Hug!
00:03:52.560 Alright, thank you, everyone! Now, let's move on.
00:03:59.920 So, onto the actual talk: the agenda for today includes discussing Bundler.
00:04:06.560 I'm not sure if there are any newcomers in this room who haven't used it before, but I'll explain how Bundler interacts with the API.
00:04:12.480 Then, we'll cover Rubygems.org, the Bundler API service that emerged from it, and if time permits, I'll have a bonus round.
00:04:19.680 I would like to discuss some other prototype work we've been doing for the Bundler API.
00:04:25.000 So, let's dive into Bundler.
00:04:31.520 By a show of hands, how many of you have never heard of Bundler or used it before?
00:04:36.680 Thanks to Aaron, I see a few hands.
00:04:42.840 To quickly recap, Bundler is a dependency manager for Ruby.
00:04:48.720 You have a Gemfile where you list all your gem sources, usually just RubyGems, but you can also include private gems.
00:04:56.160 Then you simply list all your gems.
00:05:01.800 The main command to use is `bundle install`, which fetches all your dependencies.
00:05:06.720 It's designed to handle the resolution of dependencies, so you don't have to deal with that hassle.
00:05:12.479 Back in the days before Rails 3, managing dependencies was a significant pain.
00:05:18.680 I remember when I started my first full-time Ruby job; it took me a day just to get the application working.
00:05:24.680 You would usually have a list of gems in the README, and you had to manually install them one by one.
00:05:31.520 If you were lucky, you had all the configurations in Rails 2 projects, but not all required gems were listed.
00:05:37.600 It was a constant battle to figure out which gems to install, and you often didn't have correct version information.
00:05:44.880 Thanks to Col Yuda, Bundler became part of Rails 3, which simplified the process tremendously.
00:05:52.480 With `bundle install`, it automatically fetches the necessary versions and dependencies, which is fantastic.
00:06:01.800 For example, when installing a single gem like Sinatra, it used to take about 18 seconds from America, where the servers are located.
00:06:08.640 You might be wondering why this was so slow. Let's dig into how Bundler performs its dependency resolution.
00:06:14.360 Bundler maintains an index containing all the specs and metadata it requires.
00:06:20.720 This index class contains specs that are a hash keyed by the gem name, with values being where those gems can be downloaded.
00:06:27.200 Bundler fetches remote specs by calling RubyGems, which requires two calls due to an old bug.
00:06:33.280 Previously, there was a bug that caused the API to return the whole set of gems when listing specs.
00:06:39.760 As a result, it had to call the API twice: once for normal gems and once for pre-released specifications.
00:06:46.079 Looking at modern indexes, the specs file is only about 1.2 megabytes when you exclude pre-released specs.
00:06:54.680 RubyGems uses CloudFront for CDN, so it likely doesn't take long to download a file of this size.
00:07:01.600 However, many users complained about Bundler's performance leading them to believe the index was too large.
00:07:08.080 But as I showed in my slides, the uncompressed index is manageable.
00:07:14.480 We need this information to be able to resolve dependencies, especially since `gem spec` files contain more data.
00:07:21.760 In addition to the gem's basic name and version, they also contain dependencies.
00:07:28.360 Thus, whenever we fetch all the specs, we have to create remote specification objects from this information.
00:07:34.160 However, many complained about bundler being slow due to the index size, resulting in long fetching times.
00:07:41.840 People expected it to be quick because they assumed large indexes would lead to slow download times.
00:07:48.080 Most people experienced issues when RubyGems handled requests if they set up were callled before.
00:07:54.800 When Bundler was created, we noticed these issues could not persist.
00:08:02.320 We realized that we needed a better solution to deal with the increasing number of gems.
00:08:08.480 One prominent improvement that most users observed with Bundler 1.1 was hitting the API endpoint.
00:08:16.320 This integration allowed users to leverage API interactions and improve their speeds.
00:08:22.240 If we use the same Gemfile with Bundler 1.2, you would see the install time drop significantly.
00:08:28.720 It reduced from about 18 seconds to just under 3 seconds, indicating a massive speed improvement.
00:08:36.440 This works by hitting the RubyGems API v1 endpoint, API V1 dependencies, using a comma-separated value list.
00:08:43.680 For instance, you could request Sinatra, Rack, RSpec, or any other gem in your Gemfile.
00:08:49.760 It returns only the top-level dependencies, which is more efficient.
00:08:56.560 However, we don't usually provide version numbers while making this request, as it may require extra details.
00:09:01.600 We opt for simply specifying the gem name, which returns the necessary information without updating versions.
00:09:07.840 This is how we can build a recursive method to iterate through and effectively fetch dependencies.
00:09:14.320 When you run out of dependencies, you'll know you're done and can return the entire list back.
00:09:20.640 This creates endpoint specifications, incorporating dependency info.
00:09:25.600 Consequently, we can limit the gems we keep in memory, leading to better performance.
00:09:32.080 Bundler 1.1 implemented these changes which led to a greater efficiency in speed performance.
00:09:39.439 When this update was launched, many in the Ruby community were excited.
00:09:44.840 I recall receiving numerous positive tweets about it.
00:09:49.840 Now, let's discuss the server side of this story.
00:09:56.320 To understand the RubyGems.org infrastructure as it stood, we noticed a significant flaw in scalability.
00:10:02.880 Initially, they relied heavily on a single machine to manage all operations.
00:10:09.680 This machine operated the Rails app server, handled the PostgreSQL database, and catered to Redis.
00:10:16.760 On October 17, 2012, RubyGems experienced a critical failure.
00:10:23.200 Our post-mortem revealed that the Bundler API consumed 70 to 80% of the traffic going to that single machine.
00:10:30.080 This spike in usage could be traced back to Bundler 1.1 defaulting to the API.
00:10:36.400 Many Rails applications on Rails 3 and above executed a bundler install frequently.
00:10:43.760 Additionally, the server was only a four-core machine, yet it consumed 380 out of 400 CPU.
00:10:49.760 A significant factor contributing to this high load was due to marshalling.
00:10:55.920 Bundler cannot have its own dependencies, as it's solely a dependency resolver.
00:11:03.120 Because of this limitation, only a select few components like HTPersistent and Thor were allowed.
00:11:09.760 Marshalling for larger datasets became very CPU intensive.
00:11:14.560 Consequently, they had to disable the Bundler API and reset RubyGems.org.
00:11:22.400 This action restored basic functionalities like searching, uploading, and pushing gems.
00:11:29.239 The Bundler team felt frustrated by this, as the situation negatively affected deploy times at Heroku.
00:11:34.560 Not to mention, the community wasn't pleased either.
00:11:40.480 We proposed a solution to extract the API outside of RubyGems and run it as a separate service.
00:11:48.960 By offloading this task, we could alleviate CPU pressures on RubyGems.org.
00:11:55.920 We quickly built a prototype and had it running within a week.
00:12:01.920 This marked a significant milestone where a crucial part of RubyGems infrastructure was separated.
00:12:07.680 This change paves the way for a federated RubyGems, enabling others in the community to leverage data not constrained.
00:12:13.360 We structured API endpoints accordingly; however, since we didn't have access to RubyGems' database, we had to create our own.
00:12:20.160 To make this work seamlessly, we developed a polling code to sync data.
00:12:27.440 Initially, this process took a lengthy 16 minutes to refresh our data.
00:12:34.160 Having that long lag meant that new gems wouldn't be instantly available for users.
00:12:39.760 The basic function involved the addition of new gems and managing yanked gems.
00:12:47.600 By rewriting our syncing code, we introduced a more efficient threaded consumer pool.
00:12:54.400 This allowed us to decrease the sync delay to around 2 to 3 minutes, a drastic improvement.
00:13:01.920 While this was better than 16 minutes, we aimed to move towards a webhook mechanism.
00:13:09.440 The ultimate goal was to allow instant updates when a gem was pushed.
00:13:15.760 As for implementation, we began with a dump from RubyGems' PostgreSQL database.
00:13:22.160 This database contains around 50,000 gems and their various versions.
00:13:29.760 From here, we set off to build the necessary SQL queries to retrieve gem information.
00:13:35.360 This only took about half a day to get right, which was quite efficient.
00:13:42.960 One important nuance was to ensure that only runtime dependencies were covered.
00:13:49.760 Development dependencies could lead to cyclic dependency issues, resulting in major problems.
00:13:55.440 We built a simple Sinatra app to manage this CRUD operation.
00:14:02.960 We set the endpoint in such a way that it matched the structure in RubyGems.
00:14:10.840 In the process, we also improved response times by utilizing cached data.
00:14:17.200 This allowed us to reduce the processing delay significantly when potential lag times were concerned.
00:14:23.120 We streamlined everything to cut down lag to between one or two minutes.
00:14:29.760 As time went on, this allowed for smoother use and more significant performance improvements.
00:14:38.640 Additionally, we also integrated webhooks for when gems were pushed.
00:14:44.320 Registered webhooks enable notifications when gems are pushed, allowing us to fetch info promptly.
00:14:50.320 We included authentication tokens to ensure that received data is legitimate.
00:14:56.640 This setup means that whenever there is a gem push action, the webhook is triggered.
00:15:02.880 This helps manage smooth operations as requests are made for gem installations.
00:15:09.920 In return, we have our consumer pool to manage requests.
00:15:16.760 This consumer pool plays a significant role in maintaining efficiency.
00:15:22.240 When training in a multi-threaded manner, we ensured proper management of the process.
00:15:28.440 After syncing operations, we implemented various methods to improve our response times.
00:15:34.360 We adopted headers for cache control to enhance performance further.
00:15:40.320 Our work on the API aims to provide top-notch service to all community users.
00:15:46.440 We maintained efforts for improvements and ensured that we constantly followed up on service quality.
00:15:52.080 We receive alerts via PagerDuty to address any service interruptions.
00:15:59.520 It's discouraging to be on call at times, but monitoring is essential.
00:16:06.080 For monitoring, we implemented various logging methods.
00:16:12.320 This includes using Librato for graphing data and monitoring overall application health.
00:16:20.920 We ensure to keep all logs available for community access.
00:16:26.720 This serves to enhance transparency and maintain trust around how we run the service.
00:16:35.040 We constantly work on this project as a community initiative; all of our work is open source.
00:16:40.960 We're looking for continuous improvement through collaborative efforts.
00:16:47.040 For metrics, we consider measuring the client-side usage of Bundler as we have control over both sides now.
00:16:54.640 This means we can gather data on common usage patterns and payload sizes efficiently.
00:17:01.840 Our goal is to receive more insights into gem dependencies over time.
00:17:06.800 Currently, we can only view instances on a call-by-call basis.
00:17:15.840 We'll have to work together to enhance our platform infrastructure.
00:17:23.080 With that, I'd like to open up the floor for any questions besides these matters.
00:17:28.720 How often are we polling for updates?
00:17:35.600 We initially polled every minute, since data on S3 is static, but now it’s around every 30 minutes as we implemented webhooks.
00:17:42.880 This reduced unnecessary load and improved system performance overall.
00:17:50.240 For client-side caching, there are discussions about improving caching handling.
00:17:56.840 There are potential improvements in the gem index to enable incremental updates.
00:18:04.560 Reducing unnecessary SQL queries and enhancing performance is an ongoing focus.
00:18:10.880 Would anyone else like to share ideas on optimizing this aspect?
00:18:17.680 What have you chosen for testing your Bundler API work?
00:18:24.240 Every gem push results in some automation where the API endpoints are contacted automatically.
00:18:30.720 The RubyGems system has a built-in method for handling that utilizing webhooks.
00:18:36.360 It has been refined over the past several iterations to provide reliable service.
00:18:42.960 Now, let's walk through some more advanced features and utilize them efficiently together.
00:18:50.320 We've included additional functionality to ensure users experience an optimized service.
00:18:57.600 In the bonus round, I’ll talk about the Bundler API Replay project, focusing on production traffic.
00:19:04.880 The goal is to capture real production data without affecting performance.
00:19:11.920 We want to develop a way to test and replay based on real-world traffic metrics.
00:19:18.840 To this end, we set it up by utilizing Heroku logging capabilities.
00:19:27.280 This allows the combination of app and router logs to effectively capture necessary information.
00:19:35.200 Through aggregation, we capture traffic in an organized manner and can then replay it efficiently.
00:19:42.720 Our implementation uses existing data streams and finds a way to build infrastructure around them.
00:19:50.320 The goal is to improve the overall feedback loop around performance metrics.
00:19:57.200 With the use of follower databases, we're able to facilitate this step forward effectively.
00:20:03.920 This allows easy adaptations from production to staging environments.
00:20:10.720 From existing logs, we can assess real user behavior and track the system's response across time.
00:20:18.440 This reveals key insights that can help future optimizations.
00:20:26.560 Would anyone like to further delve into the topic of performance evaluations?
00:20:32.720 Research around distributed gems mirrors something like Python's Cheese Shop model.
00:20:39.720 There’s potential to explore establishing a well-distributed mirrored model.
00:20:46.480 If developers are interested, backing efforts to mirror RubyGems could be fruitful.
00:20:54.080 While RubyGems offers great uptime, building a distributed solution would ensure redundancy.
00:21:00.640 After passing through the session, does anyone have any remaining questions?
00:21:06.760 Thank you for this time, it has been a pleasure sharing and discussing these topics.
Explore all talks recorded at RubyConf AU 2013
+21