RubyConf AU 2013
bundle install Y U SO SLOW: Server Edition
00:00:10.800 I highly recommend going to Pat's talk downstairs; it's by a good friend of mine.
00:00:16.039 If you're having trouble hearing, I definitely suggest checking it out.
00:00:22.119 I was speaking with Emma from Melbourne, and she mentioned that I was helping out with Rails Girls yesterday, trying to get Wi-Fi.
00:00:28.519 She mentioned that you all like to read things upside down, so I decided to do my presentation upside down.
00:00:34.160 This thing is really sensitive.
00:00:40.960 I'm Terrence, the guy in the blue hat. If you've seen any of my previous presentations, that's usually how people find me at conferences.
00:00:48.480 You can find me on Twitter as h02 or on GitHub as hone, or feel free to visit
00:00:54.719 Feel free to reach out to me regarding anything Ruby, Bundler, or routing.
00:03:59.920 So, onto the actual talk: the agenda for today includes discussing Bundler.
00:04:06.560 I'm not sure if there are any newcomers in this room who haven't used it before, but I'll explain how Bundler interacts with the API.
00:04:12.480 Then, we'll cover, the Bundler API service that emerged from it, and if time permits, I'll have a bonus round.
00:04:19.680 I would like to discuss some other prototype work we've been doing for the Bundler API.
00:04:25.000 So, let's dive into Bundler.
00:04:31.520 By a show of hands, how many of you have never heard of Bundler or used it before?
00:04:36.680 Thanks to Aaron, I see a few hands.
00:04:42.840 To quickly recap, Bundler is a dependency manager for Ruby.
00:04:48.720 You have a Gemfile where you list all your gem sources, usually just RubyGems, but you can also include private gems.
00:04:56.160 Then you simply list all your gems.
00:05:01.800 The main command to use is `bundle install`, which fetches all your dependencies.
00:05:06.720 It's designed to handle the resolution of dependencies, so you don't have to deal with that hassle.
00:05:12.479 Back in the days before Rails 3, managing dependencies was a significant pain.
00:05:18.680 I remember when I started my first full-time Ruby job; it took me a day just to get the application working.
00:05:24.680 You would usually have a list of gems in the README, and you had to manually install them one by one.
00:05:31.520 If you were lucky, you had all the configurations in Rails 2 projects, but not all required gems were listed.
00:05:37.600 It was a constant battle to figure out which gems to install, and you often didn't have correct version information.
00:05:44.880 Thanks to Col Yuda, Bundler became part of Rails 3, which simplified the process tremendously.
00:05:52.480 With `bundle install`, it automatically fetches the necessary versions and dependencies, which is fantastic.
00:06:01.800 For example, when installing a single gem like Sinatra, it used to take about 18 seconds from America, where the servers are located.
00:06:08.640 You might be wondering why this was so slow. Let's dig into how Bundler performs its dependency resolution.
00:06:14.360 Bundler maintains an index containing all the specs and metadata it requires.
00:06:20.720 This index class contains specs that are a hash keyed by the gem name, with values being where those gems can be downloaded.
00:06:27.200 Bundler fetches remote specs by calling RubyGems, which requires two calls due to an old bug.
00:06:33.280 Previously, there was a bug that caused the API to return the whole set of gems when listing specs.
00:06:39.760 As a result, it had to call the API twice: once for normal gems and once for pre-released specifications.
00:06:46.079 Looking at modern indexes, the specs file is only about 1.2 megabytes when you exclude pre-released specs.
00:06:54.680 RubyGems uses CloudFront for CDN, so it likely doesn't take long to download a file of this size.
00:07:01.600 However, many users complained about Bundler's performance leading them to believe the index was too large.
00:07:08.080 But as I showed in my slides, the uncompressed index is manageable.
00:07:14.480 We need this information to be able to resolve dependencies, especially since `gem spec` files contain more data.
00:07:21.760 In addition to the gem's basic name and version, they also contain dependencies.
00:07:28.360 Thus, whenever we fetch all the specs, we have to create remote specification objects from this information.
00:07:34.160 However, many complained about bundler being slow due to the index size, resulting in long fetching times.
00:07:41.840 People expected it to be quick because they assumed large indexes would lead to slow download times.
00:07:48.080 Most people experienced issues when RubyGems handled requests if they set up were callled before.
00:07:54.800 When Bundler was created, we noticed these issues could not persist.
00:08:02.320 We realized that we needed a better solution to deal with the increasing number of gems.
00:08:08.480 One prominent improvement that most users observed with Bundler 1.1 was hitting the API endpoint.
00:08:16.320 This integration allowed users to leverage API interactions and improve their speeds.
00:08:22.240 If we use the same Gemfile with Bundler 1.2, you would see the install time drop significantly.
00:08:28.720 It reduced from about 18 seconds to just under 3 seconds, indicating a massive speed improvement.
00:08:36.440 This works by hitting the RubyGems API v1 endpoint, API V1 dependencies, using a comma-separated value list.
00:08:43.680 For instance, you could request Sinatra, Rack, RSpec, or any other gem in your Gemfile.
00:08:49.760 It returns only the top-level dependencies, which is more efficient.
00:08:56.560 However, we don't usually provide version numbers while making this request, as it may require extra details.
00:09:01.600 We opt for simply specifying the gem name, which returns the necessary information without updating versions.
00:09:07.840 This is how we can build a recursive method to iterate through and effectively fetch dependencies.
00:09:14.320 When you run out of dependencies, you'll know you're done and can return the entire list back.
00:09:20.640 This creates endpoint specifications, incorporating dependency info.
00:09:25.600 Consequently, we can limit the gems we keep in memory, leading to better performance.
00:09:32.080 Bundler 1.1 implemented these changes which led to a greater efficiency in speed performance.
00:09:39.439 When this update was launched, many in the Ruby community were excited.
00:09:44.840 I recall receiving numerous positive tweets about it.
00:09:49.840 Now, let's discuss the server side of this story.
00:09:56.320 To understand the infrastructure as it stood, we noticed a significant flaw in scalability.
00:10:02.880 Initially, they relied heavily on a single machine to manage all operations.
00:10:09.680 This machine operated the Rails app server, handled the PostgreSQL database, and catered to Redis.
00:10:16.760 On October 17, 2012, RubyGems experienced a critical failure.
00:10:23.200 Our post-mortem revealed that the Bundler API consumed 70 to 80% of the traffic going to that single machine.
00:10:30.080 This spike in usage could be traced back to Bundler 1.1 defaulting to the API.
00:10:36.400 Many Rails applications on Rails 3 and above executed a bundler install frequently.
00:10:43.760 Additionally, the server was only a four-core machine, yet it consumed 380 out of 400 CPU.
00:10:49.760 A significant factor contributing to this high load was due to marshalling.
00:10:55.920 Bundler cannot have its own dependencies, as it's solely a dependency resolver.
00:11:03.120 Because of this limitation, only a select few components like HTPersistent and Thor were allowed.
00:11:09.760 Marshalling for larger datasets became very CPU intensive.
00:11:14.560 Consequently, they had to disable the Bundler API and reset
00:11:22.400 This action restored basic functionalities like searching, uploading, and pushing gems.
00:11:29.239 The Bundler team felt frustrated by this, as the situation negatively affected deploy times at Heroku.
00:11:34.560 Not to mention, the community wasn't pleased either.
00:11:40.480 We proposed a solution to extract the API outside of RubyGems and run it as a separate service.
00:11:48.960 By offloading this task, we could alleviate CPU pressures on
00:11:55.920 We quickly built a prototype and had it running within a week.
00:12:01.920 This marked a significant milestone where a crucial part of RubyGems infrastructure was separated.
00:12:07.680 This change paves the way for a federated RubyGems, enabling others in the community to leverage data not constrained.
00:12:13.360 We structured API endpoints accordingly; however, since we didn't have access to RubyGems' database, we had to create our own.
00:12:20.160 To make this work seamlessly, we developed a polling code to sync data.
00:12:27.440 Initially, this process took a lengthy 16 minutes to refresh our data.
00:12:34.160 Having that long lag meant that new gems wouldn't be instantly available for users.
00:12:39.760 The basic function involved the addition of new gems and managing yanked gems.
00:12:47.600 By rewriting our syncing code, we introduced a more efficient threaded consumer pool.
00:12:54.400 This allowed us to decrease the sync delay to around 2 to 3 minutes, a drastic improvement.
00:13:01.920 While this was better than 16 minutes, we aimed to move towards a webhook mechanism.
00:13:09.440 The ultimate goal was to allow instant updates when a gem was pushed.
00:13:15.760 As for implementation, we began with a dump from RubyGems' PostgreSQL database.
00:13:22.160 This database contains around 50,000 gems and their various versions.
00:13:29.760 From here, we set off to build the necessary SQL queries to retrieve gem information.
00:13:35.360 This only took about half a day to get right, which was quite efficient.
00:13:42.960 One important nuance was to ensure that only runtime dependencies were covered.
00:13:49.760 Development dependencies could lead to cyclic dependency issues, resulting in major problems.
00:13:55.440 We built a simple Sinatra app to manage this CRUD operation.
00:14:02.960 We set the endpoint in such a way that it matched the structure in RubyGems.
00:14:10.840 In the process, we also improved response times by utilizing cached data.
00:14:17.200 This allowed us to reduce the processing delay significantly when potential lag times were concerned.
00:14:23.120 We streamlined everything to cut down lag to between one or two minutes.
00:14:29.760 As time went on, this allowed for smoother use and more significant performance improvements.
00:14:38.640 Additionally, we also integrated webhooks for when gems were pushed.
00:14:44.320 Registered webhooks enable notifications when gems are pushed, allowing us to fetch info promptly.
00:14:50.320 We included authentication tokens to ensure that received data is legitimate.
00:14:56.640 This setup means that whenever there is a gem push action, the webhook is triggered.
00:15:02.880 This helps manage smooth operations as requests are made for gem installations.
00:15:09.920 In return, we have our consumer pool to manage requests.
00:15:16.760 This consumer pool plays a significant role in maintaining efficiency.
00:15:22.240 When training in a multi-threaded manner, we ensured proper management of the process.
00:15:28.440 After syncing operations, we implemented various methods to improve our response times.
00:15:34.360 We adopted headers for cache control to enhance performance further.
00:15:40.320 Our work on the API aims to provide top-notch service to all community users.
00:15:46.440 We maintained efforts for improvements and ensured that we constantly followed up on service quality.
00:15:52.080 We receive alerts via PagerDuty to address any service interruptions.
00:15:59.520 It's discouraging to be on call at times, but monitoring is essential.
00:16:06.080 For monitoring, we implemented various logging methods.
00:16:12.320 This includes using Librato for graphing data and monitoring overall application health.
00:16:20.920 We ensure to keep all logs available for community access.
00:16:26.720 This serves to enhance transparency and maintain trust around how we run the service.
00:16:35.040 We constantly work on this project as a community initiative; all of our work is open source.
00:16:40.960 We're looking for continuous improvement through collaborative efforts.
00:16:47.040 For metrics, we consider measuring the client-side usage of Bundler as we have control over both sides now.
00:16:54.640 This means we can gather data on common usage patterns and payload sizes efficiently.
00:17:01.840 Our goal is to receive more insights into gem dependencies over time.
00:17:06.800 Currently, we can only view instances on a call-by-call basis.
00:17:15.840 We'll have to work together to enhance our platform infrastructure.
00:17:23.080 With that, I'd like to open up the floor for any questions besides these matters.
00:17:28.720 How often are we polling for updates?
00:17:35.600 We initially polled every minute, since data on S3 is static, but now it’s around every 30 minutes as we implemented webhooks.
00:17:42.880 This reduced unnecessary load and improved system performance overall.
00:17:50.240 For client-side caching, there are discussions about improving caching handling.
00:17:56.840 There are potential improvements in the gem index to enable incremental updates.
00:18:04.560 Reducing unnecessary SQL queries and enhancing performance is an ongoing focus.
00:18:10.880 Would anyone else like to share ideas on optimizing this aspect?
00:18:17.680 What have you chosen for testing your Bundler API work?
00:18:24.240 Every gem push results in some automation where the API endpoints are contacted automatically.
00:18:30.720 The RubyGems system has a built-in method for handling that utilizing webhooks.
00:18:36.360 It has been refined over the past several iterations to provide reliable service.
00:18:42.960 Now, let's walk through some more advanced features and utilize them efficiently together.
00:18:50.320 We've included additional functionality to ensure users experience an optimized service.
00:18:57.600 In the bonus round, I’ll talk about the Bundler API Replay project, focusing on production traffic.
00:19:04.880 The goal is to capture real production data without affecting performance.
00:19:11.920 We want to develop a way to test and replay based on real-world traffic metrics.
00:19:18.840 To this end, we set it up by utilizing Heroku logging capabilities.
00:19:27.280 This allows the combination of app and router logs to effectively capture necessary information.
00:19:35.200 Through aggregation, we capture traffic in an organized manner and can then replay it efficiently.
00:19:42.720 Our implementation uses existing data streams and finds a way to build infrastructure around them.
00:19:50.320 The goal is to improve the overall feedback loop around performance metrics.
00:19:57.200 With the use of follower databases, we're able to facilitate this step forward effectively.
00:20:03.920 This allows easy adaptations from production to staging environments.
00:20:10.720 From existing logs, we can assess real user behavior and track the system's response across time.
00:20:18.440 This reveals key insights that can help future optimizations.
00:20:26.560 Would anyone like to further delve into the topic of performance evaluations?
00:20:32.720 Research around distributed gems mirrors something like Python's Cheese Shop model.
00:20:39.720 There’s potential to explore establishing a well-distributed mirrored model.
00:20:46.480 If developers are interested, backing efforts to mirror RubyGems could be fruitful.
00:20:54.080 While RubyGems offers great uptime, building a distributed solution would ensure redundancy.
00:21:00.640 After passing through the session, does anyone have any remaining questions?
00:21:06.760 Thank you for this time, it has been a pleasure sharing and discussing these topics.
