Talks

Plugin-based software design with Ruby and RubyGems

http://rubykaigi.org/2015/presentations/frsyuki

Plugin architecture is known as a technique that brings extensibility to a program. Ruby has good language features for plugins. RubyGems.org is an excellent platform for plugin distribution. However, creating plugin architecture is not as easy as writing code without it: plugin loader, packaging, loosely-coupled API, and performance. Loading two versions of a gem is a unsolved challenge that is solved in Java on the other hand.
I have designed some open-source software such as Fluentd and Embulk. They provide most of functions by plugins. I will talk about their plugin-based architecture.

RubyKaigi 2015

00:00:01.439 Thank you for coming. I'll present on the topic of plugin-based software design with Ruby and RubyGems. My name is Sadayuki Furuhashi. I was born in Japan and moved to the US four years ago. I currently live in Montville, California, and I am the founder of a company called Tri Data.
00:00:14.080 We provide a cloud-based data management service where you can collect data, store it, and query it. Even if the data is large, such as hundreds of gigabytes per day, you can perform queries smoothly. For data collection, we have open-sourced two projects: Fluentd and Embulk. I designed both of these projects and also created another project called MessagePack, which is a serialization library similar to JSON. However, because it is binary-based, it is faster and more compact. I encourage you to check it out since it is open source.
00:00:47.239 Today's topic is plugin architecture. You're likely familiar with plugins from various applications. For example, browsers have add-ons to extend their functionalities. Eclipse has its own marketplace for plugins, and WordPress has thousands of open-source plugins available for download, allowing you to enhance its capabilities. A plugin architecture allows for similar enhancements in applications. With something like Fluentd, you can add extensions that enhance its features, such as integrating with PostgreSQL. By doing so, you can run queries seamlessly across different databases.
00:01:37.680 Fluentd has over 300 open-source plugins available. The benefits of a plugin architecture are evident, as it allows a wide range of features provided by a community of developers. You can continuously add new features while keeping the core application simple. Since the plugins are isolated, it's easier to test them individually. This leads to a more active developer community, which is crucial for an open-source project. However, the success of this architecture largely depends on its design. If it is not designed well, we may encounter significant downsides. Therefore, today I will touch on how to design a robust plugin architecture.
00:02:23.920 To begin, I'll discuss some design patterns crucial for plugin architecture. There are two main approaches: traditional extensible software architecture and pure plugin-based software architecture. Traditional extensible software typically features a host application that provides extension points where plugins can connect. While this is straightforward, adding more flexibility increases the complexity of the host application. On the other hand, a purely plugin-based architecture consists of a thin core with most functionalities implemented using plugins. This structure simplifies the core, as it does not become more complex with additional features, but it requires a well-thought-out modular design.
00:03:38.920 We can implement plugin-based architectures with two key techniques: dependency injection and dynamic plugin loading. Dependency injection, a well-known programming technique, involves creating software using interfaces and classes. This method is popular in Java due to the clear separation it provides between these elements. By configuring the dependency injection container, you can easily replace interfaces with plugin implementations for testing and clarity. The concept allows for effective unit testing, as different plugins can be replaced with dummy implementations to isolate functionality.
00:04:54.800 Another approach is to utilize dynamic plugin loading, which allows the core application to query the plugin loader for executing plugins. This creates a network of plugins that compose the functionality of the application. Adding a new plugin can enhance existing capabilities without modifying the core. Additionally, the combination of both dependency injection and dynamic plugin loading can yield a robust and flexible framework.
00:06:01.039 However, as we implement these designs, we also need to consider potential trade-offs. Changing an existing monolithic application to a more flexible architecture can be challenging. While adapting to a new system might push for dependency injection and dynamic plugin loading, those strategies are best suited for new developments. Existing applications may still benefit from traditional architectures, which is often sufficient. Moving on, I'll discuss actual implementations of plugin architectures, specifically with Fluentd and Embulk.
00:07:18.200 Fluentd is an event collector designed to aggregate data from various sources and transport it to multiple outputs. It resembles a traditional logging system but is more structured and programmable, allowing the addition of plugins to modify its behavior. The core of Fluentd is implemented in C for performance, while the plugins are designed in Ruby. This approach facilitates extensive usage and encourages production readiness.
00:08:07.040 One challenge faced by Fluentd was the need to write separate scripts for each input-output combination, resulting in a labyrinth of management. By implementing a plugin system, adding a new input allows for immediate copying of data across all outputs without individual scripts. This configuration flexibility is essential for managing log data efficiently and adapting to user requirements.
00:09:50.080 I will now show an example of a configuration file showcasing how plugins interact within Fluentd. Each type in the configuration file defines the plugin name, such as 'tail' for reading files. The core architecture remains minimal, focusing only on routing events, while other functionalities are executed via plugins. For instance, you can use a meta-plugin called 'copy' to duplicate data and send it to multiple outputs, like Elasticsearch for real-time querying. This modular framework emphasizes flexibility, allowing users to choose appropriate plugins depending on their requirements.
00:12:17.680 Fluentd has effectively positioned itself within the RubyGems ecosystem, allowing users to easily install and utilize plugins. The core implementation leverages RubyGems to load necessary plugins at runtime, primarily focusing on a simple and efficient method. By utilizing existing Ruby tools, developers can harness the functionality they need, rapidly enhancing their data processing capabilities. Notably, over 300 plugins are available for Fluentd on RubyGems, marking its wide acceptance and utility in the community.
00:13:43.640 Next, I will discuss another project called Embulk, an open-source bulk data loader that supports both Java and Ruby. Embulk is designed for efficient data transfers, particularly for batch operations. Much like Fluentd, it utilizes a plugin architecture, allowing developers to create input and output plugins to handle data seamlessly. Vendors can create their custom plugins, enabling them to adapt to specific use cases without the need for extensive core modifications.
00:16:02.760 When using Embulk, you define a configuration file specifying data input and output types, while the core handles the data processing efficiently. Embulk's flexible plugin system ensures that operations can be performed across multiple data sources, optimizing data loading times. This adaptability is critical for handling large datasets often encountered in real-world applications.
00:17:40.520 Both Fluentd and Embulk illustrate the power of plugin architectures in simplifying complex data movement tasks. With various community-contributed plugins available for both tools, users have the freedom to leverage existing features or create new functionalities tailored to their specific needs. The ongoing evolution of these projects highlights the importance of modular design and encourages active community participation. Now, I will focus on some challenges faced in the plugin architecture to conclude this talk.
00:20:29.760 One challenge is managing library conflicts when multiple plugins depend on different versions of the same library. In such cases, isolating dependencies through unique class loaders can help, but it doesn't completely eliminate conflicts at runtime. The polymer the community continues to share solutions and approaches to navigate these complexities as they arise. Keeping plugins and their dependencies updated is another aspect that requires diligence from both the developers and the users.
00:22:32.080 In closing, I encourage everyone to explore plugin architectures in their applications. This design approach can significantly enhance extensibility while allowing for efficient data handling. Please feel free to reach out if you have insights or questions regarding plugin implementations, as I am continuously looking to learn from the community. Thank you very much for your attention.