`bundle install` Y U SO SLOW: Server Edition

00:00:10.800 I highly recommend going to Pat's talk downstairs; it's by a good friend of mine.

00:00:16.039 If you're having trouble hearing, I definitely suggest checking it out.

00:00:22.119 I was speaking with Emma from Melbourne, and she mentioned that I was helping out with Rails Girls yesterday, trying to get Wi-Fi.

00:00:28.519 She mentioned that you all like to read things upside down, so I decided to do my presentation upside down.

00:00:34.160 This thing is really sensitive.

00:00:40.960 I'm Terrence, the guy in the blue hat. If you've seen any of my previous presentations, that's usually how people find me at conferences.

00:00:48.480 You can find me on Twitter as h02 or on GitHub as hone, or feel free to visit terrencehero.com.

00:00:54.719 Feel free to reach out to me regarding anything Ruby, Bundler, or routing.

00:01:01.239 I live in Austin, Texas, and I haven't been home since November.

00:01:08.759 Austin is a great place for tacos, so if you're ever in town, hit me up and I will take you out for tacos.

00:01:16.280 I think we have some of the best tacos in the world.

00:01:21.720 We also have a fantastic barbecue place called Rudy's.

00:01:27.360 The best part about Rudy's is that they have these awesome washing machines where you can wash your hands after eating a bunch of greasy meat.

00:01:32.640 And then you get one of these cool stickers that says, 'I have clean hands.'

00:01:38.479 So if you ever see Constantine's laptop, this is probably the best sticker on it.

00:01:45.280 For my next set of presentations, I'm planning to collect a bunch of these stickers and hand them out instead of Ruby stickers.

00:01:52.000 I work on a few community projects, like the Bundler API, which I'll discuss today, as well as helping out with Rails Girls. I was involved in the Rails Girls event yesterday.

00:02:06.159 If you've never participated in something like Rails Girls or RailsBridge or any of these beginner teaching workshops, I highly recommend it.

00:02:12.640 It offers a fantastic opportunity to see what it's like to be a beginner again and understand how challenging it can be.

00:02:18.800 It helps you realize how much we take for granted and gives you a fresh perspective on everything you already know.

00:02:24.239 Definitely, I recommend it if you haven't done it yet.

00:02:29.680 Based on Corey's keynote this morning, I draw a lot of inspiration from Aaron but in different ways.

00:02:36.360 One thing I admire about Aaron is that he started something called Friday Hugs.

00:02:42.480 If you aren't familiar, every Friday, he takes a picture of himself hugging a webcam, and sometimes his cat participates.

00:02:47.599 His cat's name is Garbage Chof Puff Puff Thunderhorse, which is a really awesome cat name.

00:02:52.680 Inspired by this, I went around, and I decided to do some Friday Hugs myself.

00:02:59.000 I attended Frozen Rails in Finland, did one in Singapore, and one in Amsterdam last year.

00:03:07.040 There's also a photo from Lyon, France, at RubyLnum, and Cascadia Ruby in Seattle.

00:03:13.040 I also have some other ones that I haven't posted yet.

00:03:20.040 Now, I'm calling myself a Friday Hug Evangelist.

00:03:28.599 Unfortunately, when I spoke to the organizers, they wouldn't give me a slot on Friday.

00:03:35.519 So, we decided to do a warm-up Friday Hug before the keynote.

00:03:40.640 Richard, my lovely co-worker, is going to come take a picture.

00:03:47.400 Everyone, please stand for the Thursday Hug!

00:03:52.560 Alright, thank you, everyone! Now, let's move on.

00:03:59.920 So, onto the actual talk: the agenda for today includes discussing Bundler.

00:04:06.560 I'm not sure if there are any newcomers in this room who haven't used it before, but I'll explain how Bundler interacts with the API.

00:04:12.480 Then, we'll cover Rubygems.org, the Bundler API service that emerged from it, and if time permits, I'll have a bonus round.

00:04:19.680 I would like to discuss some other prototype work we've been doing for the Bundler API.

00:04:25.000 So, let's dive into Bundler.

00:04:31.520 By a show of hands, how many of you have never heard of Bundler or used it before?

00:04:36.680 Thanks to Aaron, I see a few hands.

00:04:42.840 To quickly recap, Bundler is a dependency manager for Ruby.

00:04:48.720 You have a Gemfile where you list all your gem sources, usually just RubyGems, but you can also include private gems.

00:04:56.160 Then you simply list all your gems.

00:05:01.800 The main command to use is `bundle install`, which fetches all your dependencies.

00:05:06.720 It's designed to handle the resolution of dependencies, so you don't have to deal with that hassle.

00:05:12.479 Back in the days before Rails 3, managing dependencies was a significant pain.

00:05:18.680 I remember when I started my first full-time Ruby job; it took me a day just to get the application working.

00:05:24.680 You would usually have a list of gems in the README, and you had to manually install them one by one.

00:05:31.520 If you were lucky, you had all the configurations in Rails 2 projects, but not all required gems were listed.

00:05:37.600 It was a constant battle to figure out which gems to install, and you often didn't have correct version information.

00:05:44.880 Thanks to Col Yuda, Bundler became part of Rails 3, which simplified the process tremendously.

00:05:52.480 With `bundle install`, it automatically fetches the necessary versions and dependencies, which is fantastic.

00:06:01.800 For example, when installing a single gem like Sinatra, it used to take about 18 seconds from America, where the servers are located.

00:06:08.640 You might be wondering why this was so slow. Let's dig into how Bundler performs its dependency resolution.

00:06:14.360 Bundler maintains an index containing all the specs and metadata it requires.

00:06:20.720 This index class contains specs that are a hash keyed by the gem name, with values being where those gems can be downloaded.

00:06:27.200 Bundler fetches remote specs by calling RubyGems, which requires two calls due to an old bug.

00:06:33.280 Previously, there was a bug that caused the API to return the whole set of gems when listing specs.

00:06:39.760 As a result, it had to call the API twice: once for normal gems and once for pre-released specifications.

00:06:46.079 Looking at modern indexes, the specs file is only about 1.2 megabytes when you exclude pre-released specs.

00:06:54.680 RubyGems uses CloudFront for CDN, so it likely doesn't take long to download a file of this size.

00:07:01.600 However, many users complained about Bundler's performance leading them to believe the index was too large.

00:07:08.080 But as I showed in my slides, the uncompressed index is manageable.

00:07:14.480 We need this information to be able to resolve dependencies, especially since `gem spec` files contain more data.

00:07:21.760 In addition to the gem's basic name and version, they also contain dependencies.

00:07:28.360 Thus, whenever we fetch all the specs, we have to create remote specification objects from this information.

00:07:34.160 However, many complained about bundler being slow due to the index size, resulting in long fetching times.

00:07:41.840 People expected it to be quick because they assumed large indexes would lead to slow download times.

00:07:48.080 Most people experienced issues when RubyGems handled requests if they set up were callled before.

00:07:54.800 When Bundler was created, we noticed these issues could not persist.

00:08:02.320 We realized that we needed a better solution to deal with the increasing number of gems.

00:08:08.480 One prominent improvement that most users observed with Bundler 1.1 was hitting the API endpoint.

00:08:16.320 This integration allowed users to leverage API interactions and improve their speeds.

00:08:22.240 If we use the same Gemfile with Bundler 1.2, you would see the install time drop significantly.

00:08:28.720 It reduced from about 18 seconds to just under 3 seconds, indicating a massive speed improvement.

00:08:36.440 This works by hitting the RubyGems API v1 endpoint, API V1 dependencies, using a comma-separated value list.

00:08:43.680 For instance, you could request Sinatra, Rack, RSpec, or any other gem in your Gemfile.

00:08:49.760 It returns only the top-level dependencies, which is more efficient.

00:08:56.560 However, we don't usually provide version numbers while making this request, as it may require extra details.

00:09:01.600 We opt for simply specifying the gem name, which returns the necessary information without updating versions.

00:09:07.840 This is how we can build a recursive method to iterate through and effectively fetch dependencies.

00:09:14.320 When you run out of dependencies, you'll know you're done and can return the entire list back.

00:09:20.640 This creates endpoint specifications, incorporating dependency info.

00:09:25.600 Consequently, we can limit the gems we keep in memory, leading to better performance.

00:09:32.080 Bundler 1.1 implemented these changes which led to a greater efficiency in speed performance.

00:09:39.439 When this update was launched, many in the Ruby community were excited.

00:09:44.840 I recall receiving numerous positive tweets about it.

00:09:49.840 Now, let's discuss the server side of this story.

00:09:56.320 To understand the RubyGems.org infrastructure as it stood, we noticed a significant flaw in scalability.

00:10:02.880 Initially, they relied heavily on a single machine to manage all operations.

00:10:09.680 This machine operated the Rails app server, handled the PostgreSQL database, and catered to Redis.

00:10:16.760 On October 17, 2012, RubyGems experienced a critical failure.

00:10:23.200 Our post-mortem revealed that the Bundler API consumed 70 to 80% of the traffic going to that single machine.

00:10:30.080 This spike in usage could be traced back to Bundler 1.1 defaulting to the API.

00:10:36.400 Many Rails applications on Rails 3 and above executed a bundler install frequently.

00:10:43.760 Additionally, the server was only a four-core machine, yet it consumed 380 out of 400 CPU.

00:10:49.760 A significant factor contributing to this high load was due to marshalling.

00:10:55.920 Bundler cannot have its own dependencies, as it's solely a dependency resolver.

00:11:03.120 Because of this limitation, only a select few components like HTPersistent and Thor were allowed.

00:11:09.760 Marshalling for larger datasets became very CPU intensive.

00:11:14.560 Consequently, they had to disable the Bundler API and reset RubyGems.org.

00:11:22.400 This action restored basic functionalities like searching, uploading, and pushing gems.

00:11:29.239 The Bundler team felt frustrated by this, as the situation negatively affected deploy times at Heroku.

00:11:34.560 Not to mention, the community wasn't pleased either.

00:11:40.480 We proposed a solution to extract the API outside of RubyGems and run it as a separate service.

00:11:48.960 By offloading this task, we could alleviate CPU pressures on RubyGems.org.

00:11:55.920 We quickly built a prototype and had it running within a week.

00:12:01.920 This marked a significant milestone where a crucial part of RubyGems infrastructure was separated.

00:12:07.680 This change paves the way for a federated RubyGems, enabling others in the community to leverage data not constrained.

00:12:13.360 We structured API endpoints accordingly; however, since we didn't have access to RubyGems' database, we had to create our own.

00:12:20.160 To make this work seamlessly, we developed a polling code to sync data.

00:12:27.440 Initially, this process took a lengthy 16 minutes to refresh our data.

00:12:34.160 Having that long lag meant that new gems wouldn't be instantly available for users.

00:12:39.760 The basic function involved the addition of new gems and managing yanked gems.

00:12:47.600 By rewriting our syncing code, we introduced a more efficient threaded consumer pool.

00:12:54.400 This allowed us to decrease the sync delay to around 2 to 3 minutes, a drastic improvement.

00:13:01.920 While this was better than 16 minutes, we aimed to move towards a webhook mechanism.

00:13:09.440 The ultimate goal was to allow instant updates when a gem was pushed.

00:13:15.760 As for implementation, we began with a dump from RubyGems' PostgreSQL database.

00:13:22.160 This database contains around 50,000 gems and their various versions.

00:13:29.760 From here, we set off to build the necessary SQL queries to retrieve gem information.

00:13:35.360 This only took about half a day to get right, which was quite efficient.

00:13:42.960 One important nuance was to ensure that only runtime dependencies were covered.

00:13:49.760 Development dependencies could lead to cyclic dependency issues, resulting in major problems.

00:13:55.440 We built a simple Sinatra app to manage this CRUD operation.

00:14:02.960 We set the endpoint in such a way that it matched the structure in RubyGems.

00:14:10.840 In the process, we also improved response times by utilizing cached data.

00:14:17.200 This allowed us to reduce the processing delay significantly when potential lag times were concerned.

00:14:23.120 We streamlined everything to cut down lag to between one or two minutes.

00:14:29.760 As time went on, this allowed for smoother use and more significant performance improvements.

00:14:38.640 Additionally, we also integrated webhooks for when gems were pushed.

00:14:44.320 Registered webhooks enable notifications when gems are pushed, allowing us to fetch info promptly.

00:14:50.320 We included authentication tokens to ensure that received data is legitimate.

00:14:56.640 This setup means that whenever there is a gem push action, the webhook is triggered.

00:15:02.880 This helps manage smooth operations as requests are made for gem installations.

00:15:09.920 In return, we have our consumer pool to manage requests.

00:15:16.760 This consumer pool plays a significant role in maintaining efficiency.

00:15:22.240 When training in a multi-threaded manner, we ensured proper management of the process.

00:15:28.440 After syncing operations, we implemented various methods to improve our response times.

00:15:34.360 We adopted headers for cache control to enhance performance further.

00:15:40.320 Our work on the API aims to provide top-notch service to all community users.

00:15:46.440 We maintained efforts for improvements and ensured that we constantly followed up on service quality.

00:15:52.080 We receive alerts via PagerDuty to address any service interruptions.

00:15:59.520 It's discouraging to be on call at times, but monitoring is essential.

00:16:06.080 For monitoring, we implemented various logging methods.

00:16:12.320 This includes using Librato for graphing data and monitoring overall application health.

00:16:20.920 We ensure to keep all logs available for community access.

00:16:26.720 This serves to enhance transparency and maintain trust around how we run the service.

00:16:35.040 We constantly work on this project as a community initiative; all of our work is open source.

00:16:40.960 We're looking for continuous improvement through collaborative efforts.

00:16:47.040 For metrics, we consider measuring the client-side usage of Bundler as we have control over both sides now.

00:16:54.640 This means we can gather data on common usage patterns and payload sizes efficiently.

00:17:01.840 Our goal is to receive more insights into gem dependencies over time.

00:17:06.800 Currently, we can only view instances on a call-by-call basis.

00:17:15.840 We'll have to work together to enhance our platform infrastructure.

00:17:23.080 With that, I'd like to open up the floor for any questions besides these matters.

00:17:28.720 How often are we polling for updates?

00:17:35.600 We initially polled every minute, since data on S3 is static, but now it’s around every 30 minutes as we implemented webhooks.

00:17:42.880 This reduced unnecessary load and improved system performance overall.

00:17:50.240 For client-side caching, there are discussions about improving caching handling.

00:17:56.840 There are potential improvements in the gem index to enable incremental updates.

00:18:04.560 Reducing unnecessary SQL queries and enhancing performance is an ongoing focus.

00:18:10.880 Would anyone else like to share ideas on optimizing this aspect?

00:18:17.680 What have you chosen for testing your Bundler API work?

00:18:24.240 Every gem push results in some automation where the API endpoints are contacted automatically.

00:18:30.720 The RubyGems system has a built-in method for handling that utilizing webhooks.

00:18:36.360 It has been refined over the past several iterations to provide reliable service.

00:18:42.960 Now, let's walk through some more advanced features and utilize them efficiently together.

00:18:50.320 We've included additional functionality to ensure users experience an optimized service.

00:18:57.600 In the bonus round, I’ll talk about the Bundler API Replay project, focusing on production traffic.

00:19:04.880 The goal is to capture real production data without affecting performance.

00:19:11.920 We want to develop a way to test and replay based on real-world traffic metrics.

00:19:18.840 To this end, we set it up by utilizing Heroku logging capabilities.

00:19:27.280 This allows the combination of app and router logs to effectively capture necessary information.

00:19:35.200 Through aggregation, we capture traffic in an organized manner and can then replay it efficiently.

00:19:42.720 Our implementation uses existing data streams and finds a way to build infrastructure around them.

00:19:50.320 The goal is to improve the overall feedback loop around performance metrics.

00:19:57.200 With the use of follower databases, we're able to facilitate this step forward effectively.

00:20:03.920 This allows easy adaptations from production to staging environments.

00:20:10.720 From existing logs, we can assess real user behavior and track the system's response across time.

00:20:18.440 This reveals key insights that can help future optimizations.

00:20:26.560 Would anyone like to further delve into the topic of performance evaluations?

00:20:32.720 Research around distributed gems mirrors something like Python's Cheese Shop model.

00:20:39.720 There’s potential to explore establishing a well-distributed mirrored model.

00:20:46.480 If developers are interested, backing efforts to mirror RubyGems could be fruitful.

00:20:54.080 While RubyGems offers great uptime, building a distributed solution would ensure redundancy.

00:21:00.640 After passing through the session, does anyone have any remaining questions?

00:21:06.760 Thank you for this time, it has been a pleasure sharing and discussing these topics.