Talks
Summarized using AI

Ruby at GitHub

by Brandon Keepers

In the talk titled "Ruby at GitHub," Brandon Keepers discusses the extensive use of Ruby at GitHub, emphasizing the company's deep roots and significant contributions to the Ruby community. He outlines the various programming languages utilized by GitHub, indicating that approximately two-thirds of their repositories employ Ruby. However, a deeper analysis shows that Ruby accounts for only about one-third of their codebase by bytes, which is attributed to its expressive nature unlike other programming languages. Key points discussed include the role of Ruby at GitHub, where it is not used, why Ruby is preferred for many applications, the libraries used in their projects, and their approach to handling updates and migrations.

  • Languages Used at GitHub: 602 repositories were analyzed; 228 inactive, with a breakdown of languages showing that Ruby comprises roughly one-third by bytes.
  • Native Applications: GitHub's native applications are primarily developed in Objective-C for Mac and C for Windows, alongside Android applications developed with a focus on existing open-source libraries.
  • Ruby’s Role: The choice of Ruby arises naturally through the founders' connections with the Ruby community, but GitHub maintains flexibility in language choice with a philosophy of empowerment and responsibility for technical decisions.
  • Libraries and Frameworks: GitHub employs multiple libraries, with Sinatra and Rails as the dominant frameworks, along with Redis, PostgreSQL, and MySQL for data management. Testing frameworks include RSpec and MiniTest, reflecting diverse preferences among developers.
  • Managing Updates: GitHub’s approach to updating features involves using feature flags to enable gradual rollout and testing before full deployment, which encourages a continuous integration process.
  • Migration Experiences: The talk highlights GitHub's challenges and successes in migrating code bases from Ruby 1.8 to 1.9 and 2.0 while managing dependencies and performance optimizations.

In conclusion, Keepers encapsulates GitHub's strategic use of Ruby while also acknowledging that they do not solely identify as a Rails shop. The culture encourages exploration of various technologies, underscoring a pragmatic approach to development while fostering Ruby's growth within their ecosystem.

As a takeaway, the culture at GitHub offers a blend of empowerment, responsibility, and strategic choice in technology, which has defined their successful web services development and infrastructure management.

00:00:20.560 Thank you to all of the organizers. This has been really awesome, and I want to give a special thanks to the audience. I know you guys have been a great audience, and it can be intimidating to be up here, so thank you first.
00:00:27.599 I want to talk about Ruby at GitHub. I'm Brandon Keepers. Because of my last name, people like to make jokes, saying things like, 'Haha, you're a keeper.' I naively chose my first initial and last name as my screen name, so when people call me 'beekeepers,' they often joke, 'Oh, you keep bees.' Just to clarify, I do not keep bees, in case there was any confusion.
00:00:40.239 I wonder if I should, though. We were joking yesterday at lunch about people whose names are ironic with what they do. Someone mentioned there was a dentist named Dr. Payne or a urologist named Dr. Chop. Anyway, enough about that.
00:00:58.080 A little bit about me, besides my name: I was the co-founder of a consulting company called Collective Idea, where I spent about five years. Then I joined Ordered List, where we built a couple of products that you may have heard of or used, such as Gages, a web analytics app, and Speakerdeck.com, where many presenters here have posted their slides.
00:01:03.600 In late 2011, we had the opportunity to join GitHub along with our entire team, which we were thrilled to do. We've been at GitHub for about a year and a half now. While I have worked on a few things at GitHub, lately I spend most of my time working on Speakerdeck. We have a whole team of people working on it now, and we have some pretty exciting things to come.
00:01:53.719 It’s probably no secret that GitHub loves Ruby. Our founders were very active in the Ruby community early on. We've attended many conferences, and GitHub being somewhat admired in this community means that people often have a lot of questions. I thought it would be fun to do a talk addressing some of those questions and giving you all a chance for some extended time for any questions you may have.
00:02:15.239 To start, here are the topics I will cover: Which languages are used at GitHub, where is Ruby not used, why Ruby for everything else, which libraries do we use, and how do we handle updates. This includes releasing new features and handling upgrades to Ruby and Rails.
00:02:35.280 First, which languages are used at GitHub? I cloned all the repositories in our GitHub organization at github.com/GitHub, and we have 602 repositories that are in some form or another used internally.
00:02:41.760 When looking through those, I found that 228 of them were inactive, meaning they hadn’t had a commit in six months. I assumed that if we are not maintaining them, they probably are not a core part of our infrastructure, so I ignored those. I also ignored 42 projects that were forks of open-source projects. Additionally, I disregarded seven projects that skewed the stats and didn’t really seem relevant, like some Cisco router configurations that were being detected as some language called 'racket,' which I have no idea what that is. I didn’t try to chase down every open-source library we utilize since GitHub uses a lot of open-source software and contributes to many as well.
00:03:28.120 With that said, here’s the language breakdown by primary language of all our internal projects: two-thirds of them are Ruby. There’s a lot of JavaScript, some Objective-C, Shell, Coffee Script, C, and then other languages making up around 20 or 30. These may include a few libraries that had been vendored or similar. One surprising detail is that Java didn't show up initially.
00:04:07.200 After digging deeper, I realized the stats might be misleading since this is categorized by primary language, and many repositories contain multiple languages. I decided to analyze the breakdown by bytes instead. Surprisingly, now only a third of our codebase is Ruby, with 15% in C, and JavaScript, Objective-C, and Coffee Script, and still 6% in other languages. It’s interesting that we have a lot less Ruby if we compare by bytes, perhaps due to Ruby being more expressive and concise, or because we have many Ruby projects that do not contain a lot of code.
00:05:07.920 On the flip side, we have only a few Objective-C applications that contain a significant amount of code. It could also suggest that some languages may not be as expressive as Ruby. The story told here is summed up by Ryan, one of our developers: at GitHub, we do not see ourselves as a Rails shop. Many in the community might see us that way, but we don’t even identify solely with Ruby. If we have any identity tied to technology, I think it’s Unix.
00:06:01.199 Our culture emphasizes finding small pieces that excel at one thing and combining them effectively. While we love Rails and Ruby, I feel our identity lies more with Unix. A side note to this is that at GitHub, pretty much everyone in the company has pseudo access, metaphorically speaking. This means we all have complete autonomy, and if I want to advocate for something, I can, provided I can convince my co-workers.
00:06:49.800 Now, let’s look at where Ruby is not used. The biggest category is native applications. GitHub for Mac is written in Objective-C, which is typical for most Mac applications, while GitHub for Windows is written in C, as most Windows applications are. We also have a couple of Android apps for GitHub, encompassing both a generic GitHub app and Gages, a web analytics application.
00:07:20.199 Reflecting back on the pie chart that indicated just 2% of our code being Java, I suspect that the small percentage is due to both of our Android apps being open-source, where our developer Kevin Swii is very intentional about extracting reusable pieces into open-source libraries or contributing back to existing ones. Therefore, the GitHub Android apps are relatively compact since they leverage so much code.
00:08:27.680 We also have numerous iOS applications, including a couple public ones in the app store and several internal ones for staff use. Additionally, Hubot, which was discussed at DevOps Day by Jesse, is entirely written in Coffee Script and runs on Node, which should please the Node advocates in the audience.
00:09:03.560 Now, turning to Unix utilities, we developed tools around Git and processes that Git conducts, and these are all primarily written in C, along with a lot of shell scripts. When people inquire about Phone Gap or similar wrapper frameworks for building native or mobile applications in non-native languages, I believe there are a couple of reasons why we don’t use them.
00:09:52.800 To start, we tend to hire individuals who already possess specific experience related to the job. When hiring for an iPhone app developer, we often find candidates with prior experience creating iPhone apps, further merging them into the Objective-C community. Personally, I’ve never encountered a Phone Gap app that felt like a genuine native app.
00:10:33.360 I wouldn’t mind being proven wrong if somebody has developed one that functions like a true native app. In a similar vein, there's also Ruby Motion, which compiles down to LLVM, similar to Objective-C. I think we have the same issue since we do not hire folks with Ruby development backgrounds to work on iOS, but instead recruit those with prior iOS experience.
00:11:04.799 As for my perspective, coming from a Ruby background, I’m not as interested in Ruby Motion because I find that Objective-C isn’t particularly complex. It's learning everything else that poses challenges: the conventions, user interface guidelines, and iOS APIs. Just to mention, these numbers are entirely fabricated.
00:12:16.480 So why Ruby for everything else? We don’t always receive this question at Ruby conferences, but it often comes up at Python events or similar. I don’t know if it was ever a deliberate choice. The founders of GitHub started employing Ruby, leading to a gathering of Ruby users. We don't have an official policy stating that when writing a web service, we have to use Ruby.
00:13:03.200 One of GitHub's core principles is the idea of no parents. What we mean by that is, when you arrive at GitHub, you are empowered to make your own decisions, but with that freedom comes the responsibility of managing the outcomes of those decisions. For example, I could opt to write crucial parts of GitHub using Go or Scala, but I would also be responsible for maintaining that and ensuring it’s stable without issues.
00:14:10.440 So while we remain open to utilizing various languages, we hold some responsibility for the outcomes of our choices. If you were to ask me personally why I use Ruby, I admit that this is a complex question. It’s somewhat similar to asking why I drive a Toyota. Recently, my wife and I bought a new car to replace our old Toyota Camry, test-driving essentially every model available but eventually settling on another Toyota.
00:15:38.120 I could easily compile a list of reasons why I like Toyotas since they are reliable and dependable, similar to how I could list reasons for preferring Ruby. Ultimately, I believe it comes down to personal taste. I initially attempted Python before learning Ruby, but for some reason, it just didn't resonate with me. It’s not that there is anything wrong with it; it just was not my preference.
00:17:16.880 Another aspect is practicality. If I am to invest 10,000 hours or four hours daily learning something, I prefer to choose something offering various applications. Ruby is one of those languages. It serves numerous purposes; it is not a specialized tool but is comparable to a practical sedan.
00:17:56.240 I can leverage it for scripting, web applications, and theoretically, if desired, for native applications just for exploration purposes. So, that summarizes why I prefer Ruby. Next, let’s discuss what libraries we use. This was interesting to analyze because we have 153 repositories with a Gemfile, which is the standard way to declare dependencies.
00:19:05.760 I parsed through these Gemfiles and aggregated the gems we declare dependencies on. Rather than examining the Gemfile.lock, which shows all the actual dependencies, my analysis focused solely on the declared dependencies. I faced challenges finding a way to visualize this, resulting in a word cloud, typical of presentations from 2001. Here, we can see the larger words indicate greater usage across our projects.
00:19:50.640 Examining this closely, I observe that the two main application frameworks are Sinatra and Rails. I’ll delve deeper into those shortly. Concerning databases, we frequently utilize Redis, PostgreSQL, and MySQL. It’s worth noting that SQLite 3 appears here oddly, prompting me to investigate further; I suspect these could be prototypes or experiments rather than being used actively in production environments.
00:20:30.920 Additionally, we have several small applications that use React. Interestingly, our Ops team has chosen to support MySQL officially, perhaps for operational capacity and sanity. However, it’s intriguing that we leverage PostgreSQL in more applications than anticipated, likely due to our extensive deployment on Heroku.
00:21:12.680 Next, let’s look at testing frameworks: we also employ a variety of libraries to assist with testing, including RSpec and RSpec Rails. It’s important to note that my analysis only considered declared dependencies; some projects declared RSpec, while others used RSpec Rails. A few utilize MiniTest, while TestUnit doesn't show up as it resides in the standard library. The testing frameworks further include mocking tools like WebMock, Rack Test, and Capybara.
00:22:27.280 On the topic of Sinatra versus Rails, there was a time when someone at GitHub claimed to dislike Rails. Consequently, many now believe that everyone at GitHub holds that sentiment. In reality, we use both frameworks extensively and benefit greatly from both. In fact, we have 51 projects utilizing Sinatra, with nine that employ both frameworks within the same app and 30 that solely utilize Rails.
00:22:50.960 The common practice is to build the main parts of an application using Rails while specializing unique functions or APIs with Sinatra, subsequently mounting Sinatra as middleware within the Rails application. Therefore, the GitHub API operates similarly, incorporating both frameworks within the same code base.
00:23:24.160 This observation made me curious about the sizes of the Sinatra projects. I compared their size in bytes, leading to the discovery that in our projects, we have 12 MB dedicated to Ruby code that is Rails-related, 10 MB for those with Sinatra dependencies, and eight MB in shared projects. This indicates a trend where we tend to leverage Rails for larger applications. Overall, many of the Sinatra applications haven’t reached sufficient complexity to warrant migrating toward Rails, which is a positive outcome.
00:24:09.280 In the realm of testing libraries, there’s an ongoing debate akin to tabs versus spaces, with preferences across our GitHub community varying widely, whether it be plain old TestUnit or RSpec. There are also those who favor MiniTest, and I personally wish that the TestUnit crowd would transition to MiniTest due to its superior quality. It seems being pragmatic about what works for our team is the best route.
00:26:49.920 Now, how do we manage updates? A significant aspect involves deploying new features. At GitHub, we constantly deploy and rarely implement a feature for several weeks or months without deployment. Our approach is to develop features until they are ready to be merged back into master and deployed with the feature turned off initially.
00:27:20.880 We define helper methods in our code that determine whether the current user can access this new feature. We'll typically have a module called feature flags that stipulates user permissions for preview features, which usually defaults to whether the user is a GitHub employee or similar. As we introduce new functionality, we add corresponding methods and flags for them. In each view, we implement a condition verifying if the time travel feature is enabled for the user before showing it on the page.
00:28:55.559 In the controller, we restrict access, ensuring users cannot execute any functions unless they possess that feature. This approach is crucial as it lets us deploy code as soon as it's ready, avoiding long-lived branches.
00:29:04.560 For instance, I worked on a project involving new notifications which took approximately six months. Had I maintained it on a separate branch and not deployed it earlier, that would have created complications.By about three months in, we decided to allow staff to utilize these new notifications and thus could test their use, identify issues, and improve performance before the launch day.
00:29:32.080 As we approach the final launch of the features, there's nothing more we do than adjust the feature flags module, replacing the condition check with a true declaration. It’s an efficient means of releasing new features. If you haven’t experimented with some of these dynamic frameworks, I highly recommend exploring them; libraries like Flipper and Rollout can let you roll out features to specific user groups dynamically.
00:30:11.440 Migrating to Ruby 1.9 and now 2.0 has been an intriguing challenge. Many knew for a while that GitHub was still operating on Ruby 1.8 and Rails 2, which I will discuss next. The transition to 1.9 was lengthy—approximately six months worth of effort. Initially, we ran our continuous integration tests against both Ruby 1.8 and 1.9 whenever we committed code.
00:31:10.440 Predictably, all tests failed immediately, as we had dependencies tied exclusively to Ruby 1.8. However, eventually a breakthrough occurred, and we arrived at 1.9. The pivotal step was arguably the work done by TMM1 (Man Gupta), who stands out as a hero to GitHub.
00:32:11.520 He diligently identified failures by running tests against both platforms, determining the failures, and rectifying them. For instance, when a gem proved incompatible with newer Ruby versions, it was removed from the Gemfile so it would not impact other dependencies. This continuous merging process made it feasible to maintain test functionality across both Ruby versions.
00:33:00.440 Our route toward migrating to Ruby 2.0 mirrors the journey we took to move to Ruby 1.9. We continue to run CI against multiple versions. For a time, GitHub Enterprise remained on Ruby 1.8, but they have recently transitioned to 1.9, allowing us to upgrade multiple projects to Ruby 2.0.
00:34:04.680 The positive aspect of this transition is the relative backward compatibility of Ruby 2.0 with 1.9, which simplifies ensuring that our tests pass across versions. We have observed performance improvements after switching to Ruby 1.9, with average CPU response time decreased by around 25 milliseconds.
00:35:59.440 I can present a graph highlighting our CPU response time fluctuation, initiating after we deployed Ruby 1.9, showing a marked dip followed by a return to usual times after addressing a Garbage Collector bug in Ruby 1.9 and pushing the fix upstream to the Ruby team. Based on these developments, we expect to evaluate performance benefits of Rails 3.
00:36:49.520 As for development on Rails 3, GitHub.com is still operating on Rails 2.3, and migrating is proving to be a challenge due to the declaration of gem dependencies. We maintain a Rails 3 branch and attempt to backport some APIs.
00:37:40.000 This allows us to keep working with both Rails 2 and 3, maintaining functionality while navigating disparities.
00:38:56.360 It's a difficult process contending with the changes in ActionMailer and ActiveRecord API to ensure they operate similarly across both versions.
00:39:14.260 The Rails 3 branch will eventually achieve passing tests, which will open the door to deploying it, though we must manage performance impacts based on the differences in Rails 3.
00:39:28.800 That covers most of my talk. If you have any further questions, feel free to ask.
00:40:00.560 Thank you very much for your attention.
Explore all talks recorded at MountainWest RubyConf 2013
+24