RailsConf 2017

Open Sourcing: Real Talk

Open Sourcing: Real Talk

by Andrew Evans

In the talk "Open Sourcing: Real Talk," Andrew Evans discusses the experiences and insights Hired has gained from open-sourcing parts of their codebase. This session at RailsConf 2017 explores when, why, and how developers might consider pulling code out of their applications into open-source repositories, emphasizing both the challenges and benefits involved.

Key Points Discussed:
- Introduction to Hired: Evans provides context about Hired's role in connecting tech candidates with companies and mentions their ongoing commitment to open-source projects, showcasing 46 public repositories on GitHub.

  • Reasons to Open Source:

    • Sharing common application logic can reduce duplication across different Rails applications.
    • Contributions from the community can provide bug fixes, feature additions, and documentation at no cost.
    • It can enhance the visibility and credibility of developers within the tech community.
  • Challenges Faced: Evans addresses the reality that while open-sourcing can lead to excitement, the actual traction for projects can be underwhelming. Metrics from Hired's own repositories reveal they aren't attracting vast engagement or recognition.

  • Case Studies of Open-Sourced Projects:

    • Reactor: A publish/subscribe system built to manage background processing efficiently. Evan discusses the technical benefits of separating this code from their main application, which helps maintain clean architecture and reduces inadvertent changes. Additionally, it encourages thorough documentation and testing because of public scrutiny.
    • Design Patterns: Evans highlights learning opportunities about design patterns through open-sourcing, such as the visitor pattern used for building query syntax.
    • Stretchy: A query builder for Elasticsearch intended to provide flexibility in search queries. Evans discusses the design choices made, the user experience, and the balance of keeping business logic contained within their application while allowing extensible flexibility in the open-sourced tool.
  • Conclusions and Takeaways:

    • Open-sourcing can significantly refine coding practices, enhance team conventions, and improve overall code quality, despite not generating the anticipated fame or recognition.
    • Developers are encouraged to take a thoughtful approach when separating code into open-source projects to foster clearer and simpler code structures.
    • Ultimately, the experience provides valuable learning about Ruby, RubyGems, and the integration of open-source projects within Ruby on Rails environments.

The overall message conveys that while the journey of open-sourcing may not always yield external fame, inwardly, it can cultivate improved practices and processes, ultimately benefiting both the developers and the organization.

00:00:13.080 We've got a lot to cover here, so I'm going to go ahead and get started. This is "Open Sourcing: Real Talk." So, who am I?
00:00:26.130 I'm Andrew Evans. I work for Hired, and I've been there for a little over two years. I've worked for all kinds of different startups for about nine to ten years, and I've been working with Ruby on Rails since around version 1.8.
00:00:39.750 If anybody remembers some fun Twitter-related arguments back then, good times! I've been writing bugs in PHP, QBasic, and all sorts of other languages for longer than I care to remember.
00:00:51.960 So, what is Hired? Hired is the best way to get a tech job. As a candidate, you create a profile, and we put you live in our two-sided marketplace. Then companies ask you to interview with them, and they have to disclose salary, equity, and everything else upfront.
00:01:03.809 You decide if it's worth responding and having that conversation. For companies, this is the best and fastest way to get great candidates into your pipeline. However, this is not a pitch talk, so I'm not going to focus on that. Instead, I'm going to talk about open sourcing.
00:01:18.659 This is something that Hired has been doing basically since the beginning. We have 46 public repositories on GitHub. Many of those are forks where we've made some modifications or thought about modifying things, but some of them are open-source projects that you can potentially use.
00:01:37.290 Here are just a couple of examples: we have a pub/sub layer over Sidekiq, a query builder for Elasticsearch, and Fortitude, which is our CSS framework designed to scale for teams. We even have smaller projects, such as a Rack middleware that logs stats from a POU web server. This can be helpful for gathering more information.
00:01:58.560 We also have a tiny utility that changes the color of the lights in your office if you're using something like Philips Hue. For example, we turn our lights red when the master build fails—it's a good incentive to get on Slack and figure out what went wrong. I want to dive deeper into two specific projects to discuss what producing these projects was like.
00:02:29.970 I'll cover whether it would be worthwhile for you to do any open-source work, whether extracting anything from your app and making it public would be beneficial. I'll discuss some challenges we faced while doing this and how it affected team dynamics and collaboration.
00:02:53.910 Additionally, I'll try to get a little technical and talk about what we learned regarding Ruby, Ruby Gems, and Rails. Hopefully, we'll have some time for Q&A at the end.
00:03:06.870 So, why should you open source things? If you have some logic in your application, then clearly every other Rails app is doing pretty much the same thing. If we all shared our code, we could automate many processes. When you open source something, the community will help fix bugs, add features, and document your work—it's essentially free labor, which is amazing!
00:03:38.640 You'll gain developers' street cred. You'll receive respect and appreciation, gain followers on Twitter, and accumulate GitHub stars. However, I should clarify that much of this might not necessarily happen. You may have pieces of application code that everyone needs across Rails apps, but for the most part, you might be dealing with specialized logic.
00:03:45.750 Most of your app consists of business logic that stays within your application. I want to discuss some real talk about our projects in particular. I pulled data from bestgems.info, which scrapes data from rubygems.org daily and tracks total downloads, average downloads, and ranks your gems out of over 130,000 available on Ruby Gems.
00:04:08.220 I can see that ours are not quite in the top 100 or 1000, which means we're probably not going to be on the public leaderboards anytime soon. Our average daily downloads, which I've adjusted for weekends and holidays, are slightly over the average you'd expect based on our deployment frequency to Heroku. I also pulled some data off GitHub. The number of stars and followers we have is more encouraging, with likely some people outside of our company caring about these repositories.
00:04:51.419 However, we're still not getting many people opening issues just to say how great we are. I found some gems that are similar on the best gems ranking. For instance, there's a framework for building bots for Google Wave, which some of you might remember existed for a brief period. There's also a command-line interface for The Pirate Bay, which has been abandoned due to legal issues.
00:05:27.120 Moreover, there's one that combines Active Admin and Trailblazer—two frameworks with conflicting philosophies. Finally, there's a command-line app that provides random Mitch Hedberg quotes, which I used to enjoy.
00:05:59.730 What we found is that if you don't actively market your gem, people won't find it on their own. We haven't done much promotion for these; I don't think we've posted about them on our blog often. This may be the first talk where we've gone in-depth regarding our open-source endeavors at RailsConf. So was it worth it for us to open source the code from our application?
00:06:35.060 I've been pondering that, and I think we are seeing benefits.
00:06:42.090 Reactor is our enhanced interface to Sidekiq, the premier background processing library for Ruby and Rails applications, created by Mike Perham—who is awesome, by the way! Sidekiq has served us well at Hired, and its stability has been an asset. When we consider Reactor, we're looking at how we handle various background tasks whenever a candidate updates their profile.
00:07:18.060 You might want to notify the talent advocate working with them, bust a cache, or update saved searches. Using Sidekiq, you create worker classes for each of those tasks, performing actions asynchronously outside of your web requests. However, if you have multiple tasks to perform, you'll need to call various workers, which can lead to messy code.
00:07:59.040 You could either copy-paste calls to worker classes in multiple places or design one worker that triggers others. This isn't a huge issue, but if you're forgetful like me, you might introduce cycles that spawn numerous unnecessary jobs, leading to potential headaches as you try to shut down Sidekiq and untangle the mess.
00:08:42.670 So, we ended up creating a publish and subscribe system with Reactor. You publish an event, which has a name (a symbol) and can pass any arbitrary data to it. Additionally, we included an extension to allow Active Record objects to publish events, which are serialized and deserialized for use within your event code. This approach lets us trigger a single Sidekiq worker that locates all subscribers to a specific event name and enqueues jobs for them.
00:09:45.850 In our implementation, we created a convenience method for controllers called 'action_event.' This will publish an event with the current user, blending in relevant parameters and useful data to help with tracing and analytics.
00:10:03.420 What do subscribers look like? You include a subscribable module as a class method on the event. You indicate the event name and provide a block for the actions to take when that event occurs. For instance, we can pull out the actor who published the event, get their ID, and bust a cache related to a specific record.
00:10:53.040 You can also set wildcard handlers that respond to all events, which we use for logging site activity to Postgres or any other destination we want to capture analytics data.
00:11:11.600 Instead of passing a block, you can specify an event name if your objects have more complicated functionality. This makes it easier to manage. You can add options, such as delays, to schedule events after specific actions occur. For example, once an interview request is submitted by a subscriber, we can schedule an update two days later, conditional logic included.
00:12:06.990 We also built a system for creating model-based subscribers. For instance, the admin notifier subscriber lives in our Postgres database, and we define the actions that should be taken when events fire. These subscribers can respond to multiple events, making it flexible.
00:12:34.680 Additionally, we designed the system so that records can publish events based on data within those records. If an interview has a start and end time, we can tell it to publish events at those times. If a model is updated, the associated events will automatically reschedule.
00:13:03.640 Was it worthwhile to take the code for Reactor and put it into an open-source repo? Well, when you do that, you get a fresh greenfield project. You can start developing without legacy code, keeping your codebase separate from your application, which may introduce some friction, but it's generally a positive friction. This separation encourages code updates and improvements.
00:13:56.390 The Reactor code is slightly out of sight and out of mind for our day-to-day development. Consequently, it discourages arbitrary small changes, as any modification requires context switching, publishing to Ruby Gems, and other steps that add friction. This means that the code quality remains stable and the patterns and conventions clear.
00:14:45.640 The Action Event method works consistently throughout our application, and we don’t have to change it too often. We've made some updates for Rails 5 and Sidekiq 5, but the general consistency helps maintain it.
00:15:12.389 When you have a public repository instead of just a private one, your reputation is on the line. This subtle pressure encourages developers to do things right, such as documenting methods meticulously and maintaining solid test coverage. It motivates us to ensure our code quality remains high.
00:15:51.090 When publishing through Ruby Gems, you need to be careful with your versioning. Each change should consider the potential for breaking compatibility versus adding a new feature. It ensures structured thinking about what an open-source library requires.
00:16:25.750 Requiring specific versions of Rails with your gem can be trickier than it seems. I noticed that libraries like Devise, Paperclip, and Sidekiq do it differently, and this inconsistency made me rethink how I structure dependencies.
00:16:55.310 This challenge led me to think about what core pattern I needed. When designing a gem from the ground up, I had to reflect on naming conventions and patterns such as the Visitor pattern, which can help manage complexity in queries.
00:17:34.800 For example, with Reactor, we were able to create channels where events pass through. Each event carries arbitrary data, which can confuse but ultimately proves beneficial. You can look at Action Cable for a well-documented example of subscribers and message buses.
00:18:18.270 However, publishing the Reactor gem didn't achieve the rockstar status or validation we desired. Yet it helped establish conventions and functionality for our team. It remains usable and has led us to a code structure that's more organized and decoupled.
00:18:56.640 An additional project I want to discuss is Stretchy, a composable query builder for Elasticsearch. Elasticsearch is an exceptional technology for full-text searches, and we spent considerable time thinking about its naming because no gem gains traction without a clever pun.
00:19:34.210 In the initial version of Stretchy, I aimed to implement Active Record-style query syntax for Elasticsearch, ensuring immutable query objects for easier response handling. My goal was to create a query builder that sufficiently indexes our models.
00:20:05.260 By keeping the design focused on constructing queries in a familiar way, we avoided unnecessary dependencies on Rails. Our final goal was to produce composable query scopes similar to what you can use with Active Record.
00:20:40.640 For instance, here's an example of a query that retrieves a candidate based on a specific criterion. The 'where' method operates similarly to how you would expect from Active Record, while also incorporating Elasticsearch-specific functionality like a match query for full-text relevance.
00:21:35.430 The query can include various flexible conditions based on the context of an Elasticsearch search. In addition, it allows for combining filters and implementing boosts to enhance ranking algorithms.
00:22:30.730 If Stretchy doesn’t meet your needs, you can still write raw JSON queries and integrate them directly into the final query sent to Elasticsearch. It maximizes flexibility and keeps options open for developers.
00:23:04.760 As we developed Stretchy, challenges arose since it needed to exist independently of our main repo. Developers initially found it confusing, especially since other gems often integrate seamlessly with Active Model.
00:23:57.410 Many libraries hide complexities, which can mislead users—making it difficult to understand what features are available. Ultimately, we didn't want our open-source library to block future developments of our own application.
00:24:34.920 In response, we made it more extensible and customizable within our application, optimizing our queries without making it difficult for developers to implement changes.
00:25:35.700 We continuously refined it, which resulted in clearer coding decisions and a better understanding of the building blocks that underlie our overall processes. I even graphed changes to the Stretchy repository, observing that while we began with more lines of code, over time we were able to streamline.
00:26:32.580 Through this process, making an open-source gem allowed us to apply the visitor pattern and enhance our understanding of Ruby and query construction. It streamlined our internal code while keeping advanced logic where it belongs—in the application.
00:27:21.880 In conclusion, open sourcing has offered significant benefits, even if we haven’t become rockstars overnight. It might not provide instant recognition, but it still contributes positively to your organization. Even without free labor, it fosters internal improvements while creating a bit of friction.
00:27:55.910 Considering ideas outside of Ruby on Rails, along with shedding strict conventions, can simplify your code leading to a clearer understanding of its purpose. Developing open source enriches our knowledge of different systems, such as Ruby, Ruby Gems, Bundler, and the intricacies of integration with our applications.
00:29:51.820 With a greenfield project, it presents a chance to thoughtfully design code. I promised time for Q&A and here we are, any questions about these open-source projects or Hired?
00:30:49.750 So the question was about Elasticsearch version changes. Recently, we went from version 2.x to 5.0, and this led to substantial adjustments in how our query DSL operates.
00:31:06.210 We spent approximately 3-4 months upgrading internally and, surprisingly, updating the open-source gem was straightforward. The way we structured the reactor pattern made it easy for us to adapt our internal changes.
00:31:58.560 Thank you all for being here today! I'm Andrew on Twitter, and my slides are available on my website. If you're looking for new opportunities, consider hiring us or using our services!
00:32:54.500 It's a pretty cool company. Thank you, everyone!