Performance Optimization

A History of Bundles: 2010 to 2017

A History of Bundles: 2010 to 2017

by André Arko

In this talk at RubyConf 2017, André Arko delves into the evolution of Bundler from its inception in 2010 to 2017, exploring the extensive development effort behind this essential tool for Ruby developers.

Key Points Discussed:

- Introduction to Bundler:

- Bundler initially came out in 2010, allowing developers to manage gem dependencies effectively in their applications. Despite the surface-level similarity of Bundler's function today, it has undergone significant changes owing to substantial development efforts.

  • Historical Context:

    • André introduces his background, emphasizing his role in Bundler's development and his contributions to Ruby open-source projects like Ruby Together. He reflects on the challenges of early Ruby dependency management prior to Bundler’s existence.
  • Launch of Bundler:

    • Bundler was born out of a necessity to handle gem dependencies for web applications, especially as the Ruby ecosystem evolved and applications began using a multitude of gems.
    • He outlines the fundamental functionalities launched, such as the installation-time dependency resolver and the creation of a lock file, which allowed developers to ensure that their applications would work consistently across different environments.
  • Growth and Adoption:

    • From its initial release with 8,000 downloads per day, Bundler's usage surged, reaching an average of 30,000 downloads daily by 2012, which inadvertently led to performance issues at RubyGems.org due to overwhelming traffic.
    • The Bundler team worked on enhancing installation speeds and tackling performance complaints arising from the increased number of gems and the need to download comprehensive data for each install.
  • New Features and Improvements:

    • New functionalities such as multi-threaded installations, Git integration, and the 'clean' command were introduced to improve user experience and address various operational challenges developers faced.
    • Changes were also made to support newer versions of Ruby and address security concerns regarding gem sourcing.
  • Community and Funding Impact:

    • The talk highlights the role of community funding models, such as Ruby Together and corporate sponsorship, in supporting ongoing improvements to Bundler and RubyGems.
  • Looking Forward to Bundler 2:

    • André shares insights about transitioning towards Bundler 2.0, focusing on ensuring backward compatibility while introducing new functionalities and optimizations.

Conclusions and Takeaways:

- The journey of Bundler illustrates the importance of community-driven development in the open-source landscape. André's talk offers practical tips for leveraging Bundler's capabilities, such as utilizing ‘bundle bin stubs,’ the ‘doctor’ command for diagnosing issues, and implementing Ruby version locking. This historical retrospective encourages developers to appreciate the evolution of Bundler and understand how to optimize their workflows using this powerful gem management tool.

00:00:10.550 Hi everybody, let's get started. I mean, I really can't see any of you; this is a super glarey stage.
00:00:16.190 Okay, so this is my talk. When Bundler came out in 2010, it did something really cool: it installed your gems and let you use those gems in your application. It was great! Today, Bundler still installs your gems and lets you use them in your application, which is also pretty great.
00:00:31.140 But in the eight years since then, Bundler has required thousands and thousands of hours of development work. Most people might not clearly understand why that was necessary, especially if the starting point and the ending point are so similar. So, what exactly happened between then and now? This talk will uncover the changes that have transpired.
00:01:01.050 Before we dive into that, let me introduce myself. I'm André Arko, and I go by 'indirect' on the internet. That's the tiny picture of my head that shows up next to commits and GitHub issues. I work at Cloud City doing Ruby and Rails consulting.
00:01:12.210 This usually means that I join teams at other startups already working on web development. I participate as a senior developer, offering suggestions and architecting solutions. Feel free to hit me up if that's something your company would like.
00:01:36.270 I also wrote a book that I'm quite fond of. I keep mentioning it because it makes me really happy that I learned Ruby from "The Ruby Way" first edition in the early 2000s, and I got to update it for the modern Ruby world. The third edition covers up through Ruby 2.3, so it's relatively up to date, which is cool.
00:01:56.130 My other company is Ruby Together. Ruby Together takes funds from companies that use Ruby and uses those funds to pay for development work on Ruby open source, like Bundler and RubyGems. I'll talk a little more about that as we delve into this part of Ruby's history.
00:02:34.590 Of course, the reason you're here for this talk is Bundler. I've been on the Bundler team since approximately 2009 and have been the lead developer since around 2012. This talk is a historical retrospective and a walkthrough of Bundler's advanced features, which were mostly added after version 1.0 arrived.
00:02:49.230 As we discovered what users needed and uncorked cool things to do with Bundler, by the end of this talk, my hope is that you'll have a better understanding of why Bundler has necessitated so much development over time, exactly what we've achieved with all that development, and how to utilize some of the cool Bundler features for more complex tasks.
00:03:20.760 When Ruby was first released, sharing code with other developers meant copying files that ended in .rb and then using 'require,' which often implied editing the load path variable. This process involved a lot of work without any concepts about versions, except maybe by putting a comment in a file stating its version.
00:03:31.050 It was not an optimal situation. Later on, Jim Wyrick, Chad Fowler—who just gave the keynote last night—and Rich Hickey collaborated to create RubyGems. With RubyGems, you could install things with just one command, which was a significant improvement. It was so much better than downloading tarballs from people's websites!
00:03:53.580 After a few years, we noticed that while installing gems was easy, changing gems after you'd written code using them turned out to be quite difficult. This was not obviously apparent until you found yourself in a situation where you had installed a gem so quickly three years prior, only to discover it was the wrong version, leading to confusion in your application.
00:04:20.579 Suddenly and unexpectedly, installing gems could yield different code than you'd anticipated, causing significant pain. The most popular approach for managing dependencies briefly became to take all gems, place them into your application's Git repository, and commit them. Please, don't do this—it’s a very bad idea! But at the time, it seemed like the only way to deploy your app to another machine with confidence.
00:04:39.860 There is actually a lot more detail about this history in an entire talk I gave a couple of years ago at RubyConf, titled: 'How Does Bundler Work?' It's pretty remarkable to recognize the context that Bundler emerged into and how it solved the problems faced by developers who didn't have to endure those earlier struggles.
00:05:13.650 Despite being used in nearly every Ruby application and script today, Bundler was developed to address a very specific developer need related to web applications with numerous dependencies. When Bundler was first prototyped, it catered to web applications built on an old Ruby web framework called Merb.
00:05:32.430 Merb introduced the innovative idea of breaking a framework into many small gems so that developers weren't tied to the entire framework. At that point, Rails was still composed of only five gems total. If you had to manage twenty gems, Bundler became absolutely essential as it streamlined the process of getting them to work together.
00:05:54.169 As time went on, the Rails teams recognized the need for Bundler, which eventually transitioned to serve Rails 3 applications. Their existing structures meant that Rails apps could out of the box require dozens of gems instead of just four, which made life much simpler for developers.
00:06:10.600 There were two significant insights that generated the necessity for Bundler as a development tool. The first was the need for an install-time dependency resolver, which was previously nonexistent in Ruby, and the second was the requirement for the resolution process to produce a lock file that could be reused when installing that application on a new developer's environment or server.
00:06:34.759 So, what is a dependency resolver? It takes a list of gems you request, asks those gems for their dependencies, continues asking for the dependencies of dependencies, and eventually generates a comprehensive list of every gem you might need. Afterward, it checks to ensure that all versions fit together correctly.
00:07:01.010 For example, if one gem depends on rack version 1.0 and another on rack less than 2.2, the resolver needs to figure out compatible versions, such as rack 1.1.1. Performing this task manually is quite complex.
00:07:17.080 Installing dependencies at install time is crucial because it guarantees you know in advance whether your application can start up successfully. If dependencies are resolved at runtime, the application may crash unexpectedly when it encounters an incompatible gem, which is far from ideal.
00:07:35.540 Identifying these issues before your application starts is a considerable improvement. Consequently, the lock file is generated after determining a compatible list of gems and their versions, delineating the bundle—the derivation of Bundler's name.
00:08:01.030 Thus, the core rationale for Bundler's existence is to install and run Ruby software in a deterministic and repeatable manner across different machines. Much of the tooling in Bundler has remained recognizable throughout its ten-year history: you specify your gems in a Gemfile, run 'bundle install' to install them, and create a lock file to ensure you use the correct versions.
00:08:20.680 In its early days, a particularly lovely feature launched was the ability to use gems directly from Git repositories, which was previously impossible. Before Bundler allowed this, forking a gem to make changes was a cumbersome process; you had to fork the repository, make changes, build a new gem, and host it somewhere for installation.
00:08:39.670 This was a tedious and challenging endeavor, but Bundler significantly improved this functionality. Another essential feature introduced was the 'bundle gem' command, enabling the seamless creation of gems that interact within Git repositories.
00:09:01.840 At that time, the most popular gem generation tool was Jeweler, but Jeweler created gems that were incompatible with Git repositories. The Bundler team recognized the need to address this issue and streamline the process for developers.
00:09:17.050 Today, using Bundler for gem management may seem obvious and natural, but it was intensely controversial back then. Many experienced developers resisted adopting Bundler because they were accustomed to their less-efficient workarounds for managing gem installations.
00:09:36.340 The entire Bundler team dedicated months to passionately advocating online for the tool, emphasizing its advantages. When I presented at RailsConf in 2010, the first slide posed the question: 'Why Bundler?' We really had to convince many developers that Bundler was worth using.
00:09:58.980 Fortunately, within a few years, developers rallied around Bundler, realizing its benefits. It soon became widely accepted and popular among developers. However, that acceptance came with its own set of challenges, as criticisms began to surface regarding its performance.
00:10:22.680 Once it became indisputable that Bundler was beneficial, the most prevalent complaint shifted to its speed. The initial focus during the development of Bundler 1.0 was ensuring that it worked effectively. We completely rewrote the UI and internal implementation multiple times leading up to version 1.0.
00:10:45.300 When we finally released version 1.0, we were thrilled that it worked. However, at that time, there weren’t any major applications loaded with hundreds of gems using Bundler, as it was newly launched.
00:11:09.580 So, as Bundler gained popularity, applications either evolved over time to include a multitude of gems, or developers began retrofitting Bundler onto older applications that already included many gems. Suddenly, we found ourselves in a situation where we had to manage applications containing hundreds of gems.
00:11:39.440 It's worth noting that when we released Bundler 1.0, we could never have predicted that single applications would end up containing 600 gems, and sadly, this situation is not uncommon today. Interestingly, you might assume that if large applications were slow to install, small applications would be quicker. That turns out not to be the case, either.
00:12:05.140 Even if your Gemfile contained only one gem, Bundler was still required to download a list of every single gem in existence to ensure it knew about the gem you intended to install. As you can imagine, this downloading process took considerable time.
00:12:37.670 This situation resulted in both small and large gem files being slow to install—small ones due to excessive data downloads, and large ones because of the unforeseen number of gems being installed in a single bundle. This led the Bundler team to acknowledge the issue; we realized we had to download a substantial amount of data consistently.
00:13:02.400 Then Nick Perata introduced a new API to RubyGems, allowing us to request only the data needed for installations. This change significantly improved Bundler’s performance. If you're interested in the details of this historic moment, Terrence Lee and I gave a talk at Ruby on Rails 2012 titled 'Bundle Install: Why are You So Slow?'
00:13:21.830 We discussed the technical aspects of the new API and its implications. The short version is, if you had a low-latency connection to the RubyGems servers, this new API was fantastic. You could ask for just the information you needed, receiving a significant speed boost during the installation process.
00:13:49.220 However, the caveat was that for slower connections, such as those outside the U.S., this optimization didn’t yield any noticeable speed benefits. In some cases, it could even appear slower due to the increased calls to the RubyGems API.
00:14:01.960 Many developers from other regions express incredible frustration with the slow nature of their installs. I’ve spoken to developers in South Africa, whose typical routine was to start a Bundler install and then take the time to brew a cup of coffee while waiting.
00:14:25.689 The installation process itself could take several minutes, and with round trip times to RubyGems bordering on a second and a half, they found it exceedingly cumbersome. In response to these challenges, the team initiated work on a new index format aimed at reducing the need for excessive API calls.
00:14:43.549 Additionally, during this period, we continued augmenting Bundler and enhancing its functionality. Among the notable additions was the 'clean' command, permitting users to remove any unused gems, particularly useful in Continuous Integration environments or when deploying on platforms like Heroku.
00:15:05.760 Before the 'bundle clean' feature was implemented, CI and Heroku had to choose from slow options, either installing all gems every time a new version was pushed—which took a considerable amount of time—or utilizing limitless space for gems, which was untenable.
00:15:25.850 By introducing 'bundle clean,' teams could eliminate unnecessary gems and maintain project efficiency. We also added an 'outdated' option, enabling developers to easily view which gems were out of date without forcing updates every time.
00:15:49.590 Support was granted to cache Git repositories alongside regular gems and we implemented local Git development functionality, allowing you to check out a Git repository, modify it, and seamlessly integrate changes with your application as Bundler tracked all changes in the lock file.
00:16:07.410 We added Ruby version support, where Gemfiles could specify the Ruby version being used. This feature significantly minimized confusion, ensuring developers wouldn't accidentally run applications on incompatible Ruby versions, avoiding frustrating bugs.
00:16:25.860 Although this feature did eventually introduce some complications that will need addressing later, it was beneficial overall. Now, let's transition to the next segment of history and discover the ways we mistakenly caused ourselves challenges from some of the changes I just detailed.
00:16:39.480 The most significant development from 2012 to 2014 was the dramatic rise in Bundler's adoption. When Bundler 1.0 launched in 2010, there were approximately 8,000 downloads per day. When 1.1 was released a couple of years later, it skyrocketed to around 20,000 downloads daily. By August of 2012, it averaged around 30,000 downloads.
00:17:09.130 Unfortunately, that surge inadvertently resulted in us launching a distributed denial-of-service attack against RubyGems.org. To put it simply, the servers were overwhelmed by the volumes of Bundler users trying to install gems. Consequently, RubyGems.org suffered from server failures, preventing users from installing any gems, which left everyone disheartened.
00:17:29.750 As a temporary solution to restore access to gems, we had to disable the API that Bundler relied on to accelerate installations. Unfortunately, this reverted installs back to being noticeably slow, placing us right back where we started two years earlier.
00:17:46.950 At this juncture, a team of us, including myself, Terrence Lee, and others worked together to create a new standalone application for serving the API independently of RubyGems. We collaborated with the RubyGems team, and received tremendous support from individuals like Evan Phoenix and David Radcliffe, which made it possible to maintain the old URLs alongside this new, scalable application.
00:18:09.880 While this new API was designed to help Bundler users, it introduced a completely new set of challenges. One of the main ones was that we ended up having a separate database and therefore needed to find out about any new gems pushed to RubyGems before they could be installed in a bundle.
00:18:37.100 We attempted several synchronization strategies: webhooks, database replication, scraping the API, and importing database dumps; all of which encountered various challenges. Young and naive at the time, we thought it would be easy to implement these connections—but they proved significantly more complicated than we'd anticipated.
00:18:58.180 A recurring issue was the propagation delay between pushing a gem and being able to install it. This lag was due to the need to wait for RubyGems to finish processing before sending that information to the Bundler API, which then required additional processing time before installation was possible.
00:19:25.320 Under certain conditions, particularly geographic disparities, updates would fail to propagate, causing installation errors. For example, Canadian developers faced unusual downtime on gem installations every fourth Wednesday, rendering them unable to install new gems for a few hours due to inconsistencies with CDN refresh rates.
00:19:44.580 In addition to these complications, the standalone API, built on Sinatra and Sequel, limited contributor availability. Because fewer people were familiar with these technologies compared to RubyGems, built on Rails with ActiveRecord, we attracted fewer contributors to the Bundler API project.
00:20:06.790 There have been entire talks surrounding the complexity of the overloaded RubyGems.org, the process of building a new API, and the optimization of that API to accommodate the growing number of Bundler users worldwide. In 2013, I gave a talk entitled 'Deathmatch: Bundler vs RubyGems' that delved deeper into these experiences.
00:20:32.469 Meanwhile, we continued to roll out exciting features in Bundler, including multi-threaded installations, allowing bundles to install without waiting for previous downloads to complete. This enhancement presented substantial improvements for applications with extensive dependencies.
00:20:51.890 We also transitioned to a non-recursive resolver, proposed by Smit Shah, who made significant contributions to Bundler's resolver. This transition removed the inherent issues that arose during installations on JRuby, where recursion would often lead to memory crashes.
00:21:09.570 This change drastically improved the stability of Bundler on JRuby, allowing for smoother installations without causing memory leakage due to recursive calls. Furthermore, we introduced HTTP support enablement, corresponding with GitHub’s support for secure connections.
00:21:29.040 This ushered in the first CVE security issues for Bundler. A summarized recommendation is to avoid placing 'source' into your Gemfile multiple times unless one of them directly references a gem. With multiple sources, anyone could push a gem with any name to RubyGems.org, leading to potential security vulnerabilities.
00:21:47.110 To mitigate these concerns, adopt the block form of source, which specifies gems explicitly within a defined block for context. While it's a simplification, more detailed explanations are available for anyone utilizing multiple sources.
00:22:09.790 At this point, managing the separate API became increasingly tedious. While the API was an optimization tool allowing for faster gem installation, we faced complaints about slow installations anytime it was down, leading to escalating panic each time it faced downtime.
00:22:27.940 As a result, many contributors, who were previously enthusiastic about working on Bundler and RubyGems, began to lose motivation. However, serendipitously, the Ruby community started implementing funding models for open-source development around this period.
00:22:47.370 Ruby Central, the organization that organizes this conference, regularly offers grants for Ruby open-source projects using leftover funds from their events. This financial support helped myself and others on the Bundler and RubyGems teams maintain ongoing development.
00:23:09.110 Around the same time, Stripe launched a program for granting funds toward open source projects, which boosted projects like CocoaPods—an Objective-C equivalent of Bundler. They proposed developing a dependency resolver that would not only support CocoaPods but also benefit Bundler and RubyGems.
00:23:31.520 Today, we have a well-documented dependency resolver library shared among multiple projects, representing a significant improvement over Bundler's early resolver. It's also worth noting that around this period, Stripe and Engine Yard began funding the Bundler project directly.
00:23:53.480 These funds allowed us to formally incorporate Ruby Together, which now receives contributions from the community to support development efforts. Today, we use this funding model to pay for development related to Bundler, RubyGems, the Ruby on Rails application, and various other community initiatives.
00:24:18.790 We're even supporting Christoph, the developer of the Ruby Toolbox, in his work on the Ruby Toolbox 2.0 project. While Ruby Central provides operational grants and server expenses, they do not fund ongoing maintenance, making it critical to maintain our community's support of Ruby Together.
00:24:37.510 We hope Ruby Together can expand to fund more Ruby open-source development efforts, but we need community partners to join and assist in that goal. So, if your company values contributing to open-source projects, consider becoming a member of Ruby Together.
00:25:02.380 Receiving regular support meant that we could begin tackling longstanding issues that had remained stagnant for years. One of our major accomplishments included shifting every request to RubyGems.org behind the Fastly CDN, which significantly enhanced performance.
00:25:23.050 Previously, making a request to RubyGems would necessitate speaking with Fastly, which had to validate whether the gem was cached before reaching out to RubyGems when it wasn’t. This entire process was cumbersome and inefficient, but moving the Bundler API back into RubyGems.org improved responses.
00:25:43.180 Subsequently, the new RubyGems could support significantly increased traffic from the API without delays or interruptions, which was exciting. With these enhancements, we completed the compact index, a feature that allowed Bundler to tell users about gems without needing to conduct the previous recursive API calls.
00:25:59.590 This improvement meant users could locally store copies of the gem list rather than retrieve it every time, drastically reducing unnecessary bandwidth usage. For further insights into that process, you might check out my talk from RubyConf 2013 titled 'Extreme Makeover: RubyGems Edition.'
00:26:17.350 In this particular period, the speed of installations drastically increased. Other notable improvements included making Gemfile usage optional, granting more flexibility in how developers manage their dependency files.
00:26:40.120 We even allowed developers to opt out of using the .lock file, enabling those who had specific needs to do so without constraints. This ability was born out of feedback received immediately after the release of Bundler 1.0 which revealed some developers had reservations about the necessity of the .lock extension.
00:27:00.260 We acknowledged that even while these options were introduced, they remained entirely optional—neither Gemfile nor Gemfile.lock were being removed. We also worked on Ruby version locking, which allowed specification of the exact Ruby version required for an application.
00:27:21.410 This locking feature was essential to avoid discrepancies regarding patches and allowed developers to modify version ranges efficiently. We also introduced a new command, bundle lock, allowing the addition of platforms, ensuring applications worked correctly across environments such as UNIX and JRuby.
00:27:42.220 The command enhances compatibility when deploying applications, accommodating various platforms without frequently changing the lock file. Additionally, we implemented a 'doctor' command to assist developers in diagnosing issues with compatibility in their repositories.
00:28:06.270 We also added a 'pristine' command, similar to 'gem pristine,' allowing users to revert local modifications to installed gems in the bundle. Updates to the 'bundle update' command offered options to include or limit updates to major versions for significant refinements.
00:28:26.990 New features also included mirror configurations, enabling the use of local gem servers, and checksum validation, verifying the integrity of each gem upon installation. The funding we received started to accelerate our development velocity, leading to the introduction of a plugin system to expand Bundler's capabilities.
00:28:51.510 This system allows developers to build command plugins for unique functionalities, such as integration of C tags within gems installed through Bundler or the creation of gems sourced from Mercurial repositories, which was previously not an option.
00:29:12.920 As we approach completion of this segment regarding Bundler’s growth, we recently shipped Bundler 1.16 and are currently working toward 2.0, targeted for release around Christmas. While time is limited today, I invite you to review Colby Swann Dale's prior talk at RubyConf for an in-depth overview of what's coming in Bundler 2.
00:29:33.020 In a nutshell, Bundler 2 aims to make breaking changes while ensuring compatibility and stability retain prominence. We're committed to ensuring that applications using Bundler 1 and 2 can both function seamlessly.
00:29:56.640 In the meantime, a few cool options get you close to 2.0, achievable with configuration settings available today. By enabling 'only update to newer versions,' the `bundle update` command won't revert to earlier versions of gems.
00:30:17.880 Setting the 'disable multi-source' option will prevent insecure Gemfile configurations, providing developers with immediate warnings regarding potential vulnerabilities. The 'specific platform' feature tells Bundler to respect the OS and CPU architecture while resolving dependencies.
00:30:39.100 Adding the global gem cache function means Bundler will only download gem files once, share them across applications on your machine, and avoid unnecessary duplication of operations or compiling native extensions. This embodies a significant optimization.
00:31:01.370 These options will become standard features in Bundler 2. Additionally, we once shipped Bundler with a GitHub option using HTTP connections, which could raise concerns about security. Future releases will default to HTTPS for every resource.
00:31:17.160 Having caught up on the journey of Bundler, I'm going to share some final practical tips before we run out of time. Instead of ‘bundle exec’, the command 'bundle bin stubs [something]' creates a new executable that you can commit to your repository, allowing executions to tie directly to the correct gem version.
00:31:33.350 This refreshes your workflow significantly, especially if you're working with cron jobs. Utilizing `bundle vis` reveals your dependencies illustrated as an openly viewable graph, although the readability might be challenging.
00:31:54.450 If you aim to switch platforms—for instance, running your application on JRuby or Windows—you can use the 'add platform' option to maintain a stable Gemfile.lock across the two. The local development feature lets you modify gems locally while synchronizing those changes back into your application automatically.
00:32:15.960 Additionally, you can write one-file scripts requiring gems directly in-line followed by a Gemfile block, and Bundler will handle the installations accordingly. This is particularly handy for ad-hoc scripting.
00:32:37.040 You can locate gem paths using the 'bundle show --paths' option and directly open gem files for debugging. Once you finish your modifications, you can run `bundle pristine` to revert any local changes.
00:32:53.710 Lastly, a small quality-of-life improvement: if you wish to silence the post-install messages, you can disable them so you won't be annoyed with reminders of HTTP and whatnot ever again.