Paris.rb Conf 2020

Kiba ETL: Feedback on OSS Open Core Sustainability for a Ruby Gem

Paris.rb Conf 2020

00:00:16.110 Hello, thank you for your patience. I'm Thibaut from a place in France you may have never heard of. First, I want to thank the organizers for the rich diversity we have at this event. Even though I'm just a gray-haired, 40-something white Caucasian male, I appreciate their effort, and I think they deserve applause.
00:00:30.580 Today, I want to talk about sustainability in the context of a Ruby project. I'm an independent consultant, and five years ago, I started a project called Kiba. I'll explain a bit more about it, and it is now at version 3. So that’s pretty much the background.
00:01:03.219 You may ask, what is ETL exactly? It's a kind of arcane word that stands for Extract, Transform, Load. This is an old term used to describe a form of data pipeline. Typically, you have a structured process with a source responsible for extracting the data. The data then flows through a transformation process that you create to adapt it to your requirements before reaching the destination.
00:01:30.000 Kiba, the project I am presenting here, embodies this concept. However, I was influenced by Ruby, leading me to develop a Domain-Specific Language (DSL) with a particular syntax. Now, I want to share this approach and show you how you can also write data pipelines.
00:01:56.000 With Kiba, you have a simple syntax to describe what you see in a data pipeline: the source, transformation, and destination. This is all done using plain Ruby objects, which you can test separately.
00:02:11.920 Sustainability, in this context, means being able to maintain something over the long run. You may need motivation, funding, or other forms of support, as there are many different situations. My first piece of feedback is to avoid thinking in place of others. Your situation or current self has specific needs that you need to address.
00:02:37.000 Your circumstances may vary, so follow your context to determine what is sustainable. I have two kids and a family to support, and I work remotely. Balancing these responsibilities is important. Don't compare your progress to others, especially in open-source projects, where you may see people producing features at a rapid pace.
00:03:01.120 Becoming overwhelmed can lead to burnout, which is why I emphasize sustainability. I often reflect on my journey back when my parents gifted me a computer with a mere 48 kilobytes of RAM. I had a lot of time to learn and explore, which was essential for my development.
00:03:30.000 Fast forward ten years, and that space to learn paid off when I was able to implement real-time 3D graphics without a Floating Point Unit. This experience emphasized the significance of having quality time and support during your journey.
00:03:55.500 I must thank my parents for their support, even if my dad would casually question whether my skills would bring in income. Ultimately, they did. It's about securing funding and support to foster innovation, which can lead to substantial developments over time.
00:04:23.500 Ten years later, I was implementing data warehouses using Ruby for business intelligence, moving data around in a fresh manner compared to commercial tools available. I stumbled upon a gem called ActiveRecord ETL, which was open-source and featured a declarative DSL for defining data pipelines.
00:04:58.000 ActiveRecord ETL had loads of features for manipulating data and working with databases. It was great, but less pluggable than the current Kiba. In this gem, you could declare sources, like CSV files or databases, gathering all your data in one place.
00:05:23.180 I used it on many projects and was very happy with its capabilities. However, at some point, the author stopped maintenance because he had no need for it anymore. I thought it was a shame to let it go to waste, so I took over its maintenance.
00:05:46.790 This transition came with a lot of challenges, including continuous integration issues, feature requests, and managing emails. Eventually, I realized there was an imbalance between the value I derived from the tool and the support cost associated with it.
00:06:14.000 As an indie developer, the cost of maintaining OSS without compensation was significant. I felt that the effort required to keep it alive wasn't sustainable and there wasn't enough applicability to justify my efforts.
00:06:41.700 In dealing with the pressures of burnout, I also learned about the sustainability issues many creators face. For example, how many of you have heard of Sonic Pi, the music environment in Ruby? The author shared yesterday on Twitter that he had experienced burnout.
00:07:14.000 The reality is, if you don't have money or support, you might find yourself in a vulnerable position. You can't help others if you’re in a difficult situation yourself, so it’s essential to think about your well-being.
00:07:40.000 Despite these challenges, my passion for Ruby persisted. I took time to contemplate and made three decisions that would guide my future in open-source projects.
00:08:01.000 First, I realized that we all tend to want many features and to connect to every database in the world, but I had to focus and reduce the scope. A reduced scope meant fewer continuous integration problems, emails, pull requests, and bug triage—all of which ultimately reduced the support cost.
00:08:41.000 The second decision was to increase the tool's applicability by making it more generic. This means being open to using various databases and not placing too strong expectations on specific technologies.
00:09:22.000 The third decision was about time boxing my activities. Time boxing constrains the amount of time I allocate to a specific task, like learning or developing. This practice helps keep me focused and prevents burnout.
00:09:49.000 Using time tracking can also offer insights into how much time you devote to open-source work, which is often overlooked.
00:10:01.000 With those three decisions and around 30 hours of coding, I was able to publish the first version of my library five years ago under the LGPL license. I envisioned it as a solution that I could employ in various scenarios.
00:10:41.000 I often think about Sidekiq as exemplary in leading the way in this aspect. I wanted to have a pro version that would at least implement features that are too costly to bear for an open-source project.
00:11:26.000 I wanted to ensure that the software I developed was not limited or crippled while maintaining a fully open-source foundation. This internal battle confirmed that I should not shy away from monetization, which is often not accepted in the open-source community.
00:12:21.000 Eventually, I developed a solution that met both my consulting needs and allowed me to use the library internally. This balanced approach helped me gain valuable insights and innovate on the project.
00:12:46.000 For instance, I've used Kiba to extract data from large enterprise systems and establish real-time synchronization with Rails applications. This integration is especially appealing since enterprise-level data management can be quite cumbersome.
00:13:25.000 Another application of Kiba lies in aggregating different data systems from numerous vendors. By retrieving data through various means, like SMTP emails or S3 storage, I can seamlessly transform these formats into a common structure for my Rails application database.
00:13:52.000 Additionally, Kiba proves beneficial for database migrations, where, for example, a client wishes to transition from a legacy PHP/MySQL platform to a more modern Rails/PostgreSQL setup.
00:14:28.000 In summary, I realized that with my three guiding principles, the support costs for open-source became manageable, and I could apply my work across more scenarios without risking burnout.
00:14:50.000 These principles, alongside a sustainable funding model and boundaries regarding my time commitment, have helped shape my journey in open source.
00:15:12.000 It's crucial to maintain a balance between innovation in open source and managing personal life responsibilities. Finding enjoyment outside of work is essential for long-term sustainability.
00:15:31.000 I encourage everyone involved in open-source communities to ensure they reserve quality time for their contributions rather than relegating maintenance to weekends or off hours. This should not be a second-class endeavor.
00:15:56.000 Addressing the intellectual property aspect of this journey, I recognized the need to navigate contractual obligations carefully. In consulting, I often faced exclusivity clauses, which placed limits on my ability to innovate further.
00:16:26.000 I collaborated with a lawyer to develop non-exclusive rights for the components I created, allowing my clients to use them while retaining the flexibility to innovate on my end.
00:16:53.000 Thanks to this framework, I can now utilize both open-source and premium versions of my work without compromising on my responsibilities. I encourage others to find a balance that works for them as well.
00:17:26.000 Through these efforts, I've enhanced the open-source version, improved documentation, and embarked on maintaining a low-maintenance sister project called Kiba Common, which eases the user experience.
00:17:58.000 Currently, I’m focusing on launching a pro version that caters to fast SQL bulk insertion and lookup, facilitating efficient data synchronization without the tedious process of row-by-row operations.
00:18:48.000 This ongoing evolution has led me to a state of sustainability, where I can balance my passion for open-source development with personal stability.
00:19:07.000 Thank you.
00:19:58.000 Audience question.
00:20:18.000 What do you consider offensive support?
00:20:21.000 Maintaining a community or simply answering questions?
00:20:39.000 Some suggest that I want this feature, and I ask what it takes your time.
00:21:01.000 Open support means thinking about how to ensure that people won't need me, effectively removing myself from the equation as much as possible.
00:21:44.000 For example, I redirect basic questions to Stack Overflow with a specific tag, allowing me to respond when necessary and create a knowledge-sharing environment.
00:22:18.000 Most of my time is spent addressing inquiries that could lead to paid consulting opportunities, where I can provide comprehensive answers or point users to valuable resources.
00:22:55.000 I constantly work to keep the project’s scope aggressively reduced, acknowledging that users may request features that are not essential.
00:23:24.000 For instance, people may suggest adding a component registry, but I encourage external features that can be developed without modifying the core.
00:23:30.000 These interactions reflect my main goal: to validate the necessity of any new feature.
00:23:51.000 Regarding non-exclusive rights in clients' agreements, the focus is on balancing their demands and the services I provide.
00:24:02.000 I’ve been fortunate in being able to negotiate terms that assure clients of the dedication I bring to the table while allowing me the freedom to innovate.
00:24:35.000 While my agreements cover all code, the key factor remains the potential for reusable components, which are crucial for ongoing development.
00:24:58.000 Thank you for your time.