RubyConf 2023

Demystifying the Ruby package ecosystem

Demystifying the Ruby package ecosystem

by Jenny Shen

Demystifying the Ruby Package Ecosystem

Jenny Shen, a senior developer at Shopify and a maintainer of rubygems.org, presents a detailed exploration of the Ruby package ecosystem at RubyConf 2023 in San Diego. The talk aims to clarify the processes involved in installing and managing Ruby gems, allowing developers to focus on building applications rather than getting bogged down by dependency issues.

Key Points Covered:

  • Introduction to RubyGems and Bundler:

    • RubyGems is the package manager for Ruby, while Bundler helps standardize gem usage across different environments.
    • RubyGems hosts over 190,000 gems and is critical for dependency management in Ruby projects.
  • How 'gem install' Works:

    • When a command like 'gem install rails' is executed, it goes through various stages: initial parsing, dependency resolution, and fetching appropriate gem versions.
    • The dependency graph is built using the Millennial resolver, which ensures compatibility of gem versions with one another.
  • The Bundler Experience:

    • Bundler reads the Gemfile to establish dependencies and their sources, resolving them through a more efficient algorithm called Pub Grub.
    • Users are encouraged to utilize commands like 'bundle exec' to ensure they are using the correct gem versions for a project.
  • Integrate with Rails:

    • Rails leverages Bundler’s capabilities to manage gem dependencies effectively with features like binstubs and environment-specific gem handling.
    • Commands such as 'bundle show' and 'bundle open' help in debugging and modifying gems efficiently.
  • Security Considerations:

    • The speaker outlines potential pitfalls, such as installing the wrong gems due to typographical errors, which can introduce vulnerabilities into applications.
    • Important security practices include using multi-factor authentication (MFA) for gem accounts and being cautious about gem choices to avoid malicious code.
  • Conclusion:

    • Jenny emphasizes that while utilizing gems simplifies software development, careful selection and understanding of gem management tools like RubyGems and Bundler can prevent significant issues. She encourages developers to ask critical questions regarding gem reputations and security to maintain a safer coding environment.

The talk concludes with an invitation to connect and further discuss the intricacies of the Ruby package ecosystem, reinforcing the importance of understanding these tools in modern Ruby development.

00:00:18.039 Our next speaker in this room is Jenny Shen, who will be demystifying the Ruby package ecosystem for us. Jenny is a senior developer at Shopify and is based in Toronto, Canada. As a maintainer of rubygems.org, she works to help secure Ruby's dependency ecosystem and is passionate about open source.
00:00:30.679 Jenny is also one of the most energetic, enthusiastic, and intelligent people I know. It is my pleasure to invite her onto the stage.
00:00:57.719 Thanks for those amazing words! Before this talk, I asked if the intro was good, and I was like, ‘Oh yeah, it’s good, but feel free to emphasize how great I am.’ But the introduction was very sincere, so thank you! Hello, San Diego! Welcome, welcome! I hope everyone’s having a great time at RubyConf so far. I am certainly having a blast here in this wonderful summer weather, which feels like an absolute treat compared to the dreary Canadian weather.
00:01:22.960 If you spoke to me during this conference, I have probably mentioned the weather to you, and I sound like a broken record at this point, but I can't help it! The Indeed editorial team says that a simple question, like discussing the weather we're having, can be a starting point for friendship and connection.
00:01:48.840 Funny enough, right after I included this in my talk, I met someone from Indeed, so the stars aligned! Furthermore, in social situations, it ranks as number four among the top eight small talk topics. It’s a safe topic to leverage, and as a bonus, if it goes well, it can transition into more interesting topics. So, what can be more interesting than talking about the weather? In case you haven't guessed, that topic is the title of my talk: Demystifying the Ruby Package Ecosystem.
00:02:10.640 Alright, let's get back to business. Hopefully, you're excited to learn a bit more about the foundation of what Ruby projects are made from—gems! And if you're not excited, I encourage you to go chat with someone about the weather. To get us started, raise your hand if you have ever run 'bundle install.'
00:02:25.640 It seems like a lot of you have! Then you’ve probably seen a long list of gems being fetched and installed on your machine. If you're running 'bundle install' in your Ruby project, you can boot it up, run the tests, and if it works all fine, yay! You can jump right into coding that new feature that will amaze your users.
00:02:43.920 But do you really know what’s going on when you run 'bundle install'? How can you use gems so easily in your Ruby application? These are questions I was personally wondering about and wanted to get to the bottom of. It’s a great opportunity for some talk-driven development, so hopefully, by the end of this talk, you'll also have a better understanding of the ins and outs of Ruby dependencies and how they work in our Ruby applications.
00:03:07.600 I'm Jenny, and I work at Shopify on the Ruby on Rails infrastructure team. I have been primarily focused on working at rubygems.org for the last couple of years to help add security features and policies to make our dependency ecosystem more secure, which I will touch upon later in this talk. Rubygems.org is the community gem hosting service that hosts over 190,000 gems. It is indeed a Rails application where you can view the available gems and their information, and it provides an API to manage gems.
00:03:37.120 The tool for managing these gems is RubyGems without the ‘.org.’ It is a package manager bundled with Ruby and used to manage gems using the 'gem' command. Bundler is also a gem that gets mentioned a lot with RubyGems. Its main purpose is to help resolve and standardize the gems used in a Ruby project across all machines and environments so that they all work together well.
00:03:55.000 Fun fact: Bundler and RubyGems live in the same GitHub repository but are released separately. Today, we will go through how gem installation works under the hood. We'll first look at how 'gem install' operates when installing a single gem, and then we will go through how Bundler tackles installing the correct dependencies for your Ruby application.
00:04:10.680 After that, we’ll see how dependencies work seamlessly in a type of Ruby project, specifically Rails. We will cover dependency groups, binstubs, and debugging a gem in a Rails application. During the last few minutes, I will also share some potential issues that could arise when installing gems.
00:04:30.199 If anyone wants a copy of the slides or some resources, feel free to scan this QR code. I will also have this up at the end of my talk, so if you miss it now, it's totally fine.
00:04:48.479 So, how does 'gem install' work? What happens when you run 'gem install rails'? When you run it, it shows that the most recent version of Rails and its dependencies are being installed. But what’s actually happening when the command is run? It accepts a gem name, like Rails, and its version requirements. This could be an actual version, a version range, or by default, it will specify the lower bound.
00:05:06.480 Each command that RubyGems supports has a corresponding file in the commands directory with an 'execute' method. The name and the version that gets passed into the command will go through the 'install' method. From there, the version will be parsed into a requirement object. This is where, if you specify an invalid requirement format, RubyGems will throw an error.
00:05:31.440 It also initializes a dependency installer, which is responsible for installing a gem along with its dependencies by calling 'resolve_dependencies' on the dependency installer with the name and version. It will return something called a request set. A request set represents a list of gem information or activation requests needed to determine how to download and install the gem.
00:05:48.360 In resolving dependencies, it parses the gem name and the version into an actual dependency object. Then a request set will be initialized, and the dependency will be parsed into a set that is resolved when called. Within the 'resolve' method, it determines which versions of the dependencies should be installed.
00:06:05.040 This will return a request set with activation requests, which again represent the gems that need to be downloaded and resolved. RubyGems currently uses the Millennial resolver to create a dependency graph that determines which versions of each dependency will work with other gems. In summary, given a gem, it fetches all of the possibilities available for installation based on the requirement.
00:06:30.079 It will choose the best requirement or the most recent one and add it to the current state of the dependency graph. Then, it will find the possibilities of its dependencies. If there comes a time where no possibilities are present, we will rewind to a state where the conflict can be resolved or avoided, and select the next best version.
00:06:55.040 To find version information, the fetcher will receive specs of Rails from the index, which is a separate instance of RubyGems.org, to serve this information. It will parse each line with the version and platform at the front, the dependencies of the gem in the middle, and the requirements at the end.
00:07:12.920 We're back at the top level of the install command. We now have a request set returned from 'resolve_dependencies' and it’s time to actually download the gem by calling 'install' on the request set. So in 'install', we are concurrently downloading all the gems from the remote that aren't cached on the machine from the RubyGems.org S3 bucket, and each gem is stored as a gem file.
00:07:27.680 When a gem maintainer wants to publish a new version of a gem, they would run ‘rake release’ with the gemspec if they have the bundler gem tasks included. 'Rake release' does two things: 'gem build' and 'gem push.' 'Gem build' takes the gemspec of the gem that's going to be published and creates a tarball file with the gem extension. 'Gem push' takes in the gem file and posts it to the RubyGems.org creation endpoint with the gem binary.
00:07:54.560 If all the things are good and if the user has the correct permissions, the file will be written to the S3 bucket. If we actually download the gems from RubyGems.org, we get the file. We can untar it and receive a folder with more compressed files. The checksums provide hash values for other files to signal that the files have not been tampered with or corrupted, and we also store a compressed file of the metadata.
00:08:13.400 The data contains the actual gem contents, including the executables and libraries. You can run 'gem unpack' with the gem name to easily view the contents of a specific gem. For example, with ActiveSupport, once you run 'unpack', it will create a folder in the directory with the contents of the gem.
00:08:32.400 When it receives the binary from the S3 bucket, it will untar the file and store the data on your machine under the gems folder of the specific Ruby version. It will also store the gem file in the cache in case you need to reinstall it at any time, along with the gem specifications in the specifications folder. It also installs the executable specified inside the bin directory, so you can easily run executables like 'rails new' and those kinds of commands.
00:08:48.560 To use the gem in your Ruby project, you can require the library using 'require.' This will add the gem path to the load path variable in Ruby so Ruby can use its code. For example, if we pull up an IRB session, we can't call ActiveSupport’s method 'blank' without first requiring it.
00:09:10.080 After we require it, we can see that ActiveSupport's gem path is now included in the load path variable. That’s a brief overview of how ‘gem install’ works. It's much more complex than that, but that captures the essence.
00:09:27.440 Now that we've gone through how to install gems using the gem command, let’s explore how 'bundle install' works. Bundler ensures that the gems in a Ruby application remain consistent among all machines and does that through defining a Gemfile in a project.
00:09:48.000 When 'bundle install' is run, Bundler will build a definition object that represents the information in the application's Gemfile and Gemfile.lock file. It does this by reading the Gemfile and evaluating it just like regular Ruby code.
00:10:02.120 The Gemfile is a Ruby DSL, a programming construct used specifically for defining which gems to install. Evaluating it will create a DSL instance in the DSL class, which will call 'eval_gemfile'—resulting in a call to 'instance_eval' on the Gemfile contents.
00:10:25.360 So, a common line you'll see in a Gemfile is 'gem GemName,' followed by version requirements. This actually calls the gem method in the DSL when you run 'instance_eval.' It will take the name, optional version, and options as hash arguments and create dependency objects with them.
00:10:39.440 These dependency objects are then added to the dependencies list. Another method that is defined is the 'source' method, usually set to RubyGems.org at the top of a file. The DSL adds a string that represents the source as the global Ruby gem source and will throw an error if you define more than one global source.
00:10:58.639 You can also have a source block if you want your gems to be installed from a different source. The context of this block will override the global source with the one defined. After the DSL object is built with the dependency sources and all the data, the two definition classes are called, which will accept all these values and initialize a definition object.
00:11:16.639 Now we are on part two, which is actually installing the definition. The resolution is handled by the Public Grub dependency resolver, which you may have heard in Samuel’s talk earlier today. This algorithm was created by Natalie Weisenbaum originally for the Dart programming language and has been ported for Ruby by John Hawthorne.
00:11:39.760 Pub Grub is faster than the traditional resolver by introducing something known as conflict-driven clause learning. Before, when there was a version conflict, the Millennial resolver would backtrack and go up the path to find a dependency that wouldn't introduce a conflict. However, it wasn't great at remembering previous conflicts, leading to the same failure path multiple times during dependency resolution.
00:11:56.760 Pub Grub introduces a concept called traits, which is basically a version range of a gem that either works or isn't allowed. This can be used to determine incompatibilities. For example, if you need to install the gem 'cool' greater than version 1.10 and 'beans' at version 2.0.1, the initial run-through will determine that you can't install 'cool' below version 1.1 while having 'beans' greater than 2.0.1.
00:12:12.760 As resolving continues, incompatibilities are tracked so that versions known not to work will be avoided. For instance, if we find that the version of the 'cool' gem leads to requiring ‘beans’ above version 2.0, we can deduce that installing 'cool' above 1.2.0 would require a version of ‘beans’ that conflicts with our constraints.
00:12:29.760 The compact index is used to retrieve version information. The versions endpoint will return the available versions for all gems, while the info endpoint will provide more in-depth information about each gem. These files are cached on your machine and updated if they’re outdated by checking the version of the files.
00:12:51.080 By tracking conflicts, we can give better error messages to help users know how to resolve them instead of simply providing a backtrace message. Once all of that is figured out, Bundler will download and install all of the resolved gems and generate a new lock file with specific versions of the gems and their dependencies.
00:13:11.920 This includes the originating source, the platforms it supports, and the Ruby and Bundler requirements. This allows all machines to install the same resolved gems if the Gemfile hasn't changed since the initial generation of the file. To use these resolved dependencies in your Ruby project, you can require the Bundler setup file; without it, the gems you require will be based on what’s installed on your machine, which can differ across machines and environments.
00:13:32.639 Essentially, it turns on the Bundler runtime, meaning it will utilize the definitions created during installation and load all of the correct gem paths into the load paths environment variable.
00:13:51.840 Now we know a bit about how bundle install works. How does it work in the context of a Rails application? There are nifty features in Rails that manage these dependencies smoothly. The first thing that comes to mind is how all of the requires work in a Rails application. In 'gem install,' you have to require a gem to add its path into the load paths, and in `application.rb`, the Bundler require method, which is similar to the setup method, will require all of the gems in the application's Gemfile based on their groups.
00:14:10.799 In the Gemfile, the default group is always included, and the test, staging, and production gems are included depending on the Rails environment variable value. Rails also has what are called binstubs. Binstubs are executables that help set up your environment to run the correct version of the gem executable.
00:14:33.600 To run a version of the gem specified in the Gemfile, you would need to run 'bundle exec.' Without 'bundle exec,' just running 'rails s' will most likely execute the most recent version installed on your machine, which is not necessarily the same version specified in your Gemfile.
00:14:52.160 That's why some people recommend running 'bin/rails' instead of just 'rails.' The custom binstubs for Rails will include the commands for the current version of Rails that is installed. You can generate binstubs with the --binstub flag or use 'bundle binstub.' Binstub will create a generic bin file ensuring that you run the gem specified in the Gemfile.
00:15:14.160 The last topic I’ll discuss is how someone can debug or work with gems in a Rails application. You can run 'bundle show' to see the path of the gem that was included, and you can run 'bundle open' with the code editor of your choice to open the gem code in the editor. This allows you to make edits and save, and your changes will be visible.
00:15:32.520 If your application uses Spring, however, the gem will probably be cached since Spring preloads the application for faster boot times. So, you’ll need to stop Spring in order for the Rails application to reload the files with your changes.
00:15:50.039 Modifying your gems directly is straightforward, but if you forget to undo those changes, it might inadvertently break your code in other areas or within other projects, leaving you to question why your application isn't working anymore.
00:16:08.560 The command 'gem pristine' can reset your gems by reinstalling them to their initial state. To avoid issues, I like to clone a version from GitHub and link the path of the clone in the Gemfile. This makes contributing to the gem easier since you can simply create a branch, push your changes, and create a pull request.
00:16:26.639 Now that we've become more familiar with installing and working with dependencies, it’s clear that Bundler makes using open-source libraries and other people's code incredibly easy. You can add a line into the Gemfile, run 'bundle install,' and just like that, you can run someone else's code.
00:16:53.760 However, it’s not all sunshine and rainbows. I will touch on some somewhat 'evil' things that can happen in the Ruby ecosystem. First, it’s very easy to install the wrong gem. Suppose you want to install Rails, but due to a single key slip, you run 'gem install rils.'
00:17:09.640 Oops! If you don't notice it and run something like 'rails new,' it will never work. Thank you! I was too lazy to record my computer's audio and thought singing is a better option. Seriously though, if you pull the ‘ri’ gem from RubyGems, you’ll get the Rick Astley hit 'Never Gonna Give You Up.'
00:17:26.400 While ‘ri’ is harmless fun, installing the wrong gem might introduce code that grabs application secrets, inserts backdoors, or anything else malicious you can think of. Unfortunately, someone can write such detrimental code using the Ruby ecosystem.
00:17:52.000 Fortunately, RubyGems has basic checks to mitigate this type of attack, known as typo-squatting. This is done by checking the names of published gems to determine if they are similar to other popular gems using the Levenshtein distance algorithm.
00:18:11.360 Every package that gets released goes through dynamic and static analysis checks to determine if it's malicious, and if it receives further investigation, it will likely be yanked from the system.
00:18:27.440 Gems published by RubyGems accounts are like any old accounts that can be taken over. Think of it like my Neopets account ten years ago, which I am still locked out of.
00:18:40.920 Getting hold of a RubyGems account can be more impactful, as that person can publish a malicious version of the gem. For instance, someone could take over a popular maintainer’s account and publish a harmful version of Rails. Anyone using Rails could inadvertently update to this malicious version, leading to compromised applications.
00:18:57.520 Thus, securing your account is crucial. And when I say keys, I mean multiple keys, specifically MFA (multi-factor authentication). We currently require the most popular gem maintainers to have MFA enabled.
00:19:17.720 If you would like to learn more about how this policy was established, I gave a talk at RubyConf Mini last year covering the RFC process and gaining consensus within the community.
00:19:42.239 We also introduced WebAuthn key support, which is more secure and convenient than traditional one-time password systems. You can use Touch ID or a security key on both the UI and command line interface, receiving a custom link to authenticate through your security key.
00:19:58.560 Popular accounts should not be the only ones with MFA; if you own an open-source gem, please enable MFA to enhance security in the Ruby community.
00:20:13.120 Furthermore, Samuel from RubyGems is working on a flow to publish gems through CI using OIDC, which is great for securely publishing gems via tools like GitHub Actions. This feature will soon be released to the general public.
00:20:31.240 So, the RubyGems team is actively working on ensuring that the RubyGems ecosystem is safe and stable for everyone. But what does this mean for you? How do you ensure that the gems you're installing are safe to use? We can never be 100% safe, but generally speaking, less is more when using gems in a project.
00:20:50.360 This doesn’t mean not using gems at all, but you shouldn’t have ten gems in your Gemfile that all accomplish the same thing. This approach reduces the entry points through which malicious code can penetrate your system, and it’s easier to maintain.
00:21:05.760 I know this might sound obvious, but some applications install a lot of unnecessary gems. Are the maintainers reputable? Do they have MFA enabled? And how many users are using this gem? I would rather choose a gem called Rails with almost half a billion downloads than something called 'RS' with only 14,000 downloads.
00:21:21.560 It's pretty comical just how many times someone has possibly been rickrolled by now. And yes, that’s all for now! Hopefully, you can take away at least one thing about how the gem installation process works, such as choosing what gem to install, how gems are packaged and used, and how they function in a Rails application.
00:21:38.920 I hope you also learned about selecting the right gems to use in your projects. Thank you for listening! Enjoy the rest of RubyConf! I'll be around afterward to answer any questions or just to chat.
00:22:40.560 Thank you!