Franck Verrot

Ruby and PostgreSQL, a love story

PostgreSQL 9.1 introduced Foreign Data Wrappers, as an implementation of SQL/MED foreign tables, to provide transparent access to external data. Restricted to being written in C, writing FDWs can be a hard task.

In this talk we will learn just enough of mruby's (the ISO-compliant version of Ruby) internals to understand how one can embed mruby in an external program (like PostgreSQL), and start writing Foreign Data Wrappers in Ruby.

RubyKaigi 2015

00:00:00.000 Thank you for joining me today. I'm going to talk about Ruby and PostgreSQL and how the two can fit together in a project.
00:00:04.680 A little about myself: I’m Franck, as mentioned. I’m French and moved to the Bay Area a month ago for a new job. My Japanese is embarrassing; I can hardly understand anything except the words used in judo and jujitsu, as I hold a brown belt. I'm a senior engineer at Omega Health, a company helping people at risk for chronic diseases by enrolling them in a 16-week program designed to change their behavior. I’d like to thank them for bringing me here. Japan is amazing, and this is my first time being this far east. I’m discovering many things, and living here seems easy.
00:00:22.949 I think getting around here is quite simple. The food is amazing—I had some nice sushi in the Bay Area, but we don't have great sushi there. Here, it's awesome! My hotel room is also nice, and I have a TV, so everything is fine. A quick word about my family: I love my children, and my seven-year-old son has recently completed an Hour of Code contest. He received a certificate, so congratulations to him!
00:01:11.520 Now, let's dive into today's topic. I will start by discussing SQL of course, focusing on Ruby, and just enough about PostgreSQL’s API to help you understand how to integrate foreign languages and foreign technology. I've been using PostgreSQL as a user since 2009. Let's do a quick poll: how many of you use PostgreSQL? Almost everyone—great to see!
00:01:30.820 PostgreSQL is a fully featured relational database. It has a clean yet extensive code base, and it's very active in terms of community. If you're interested in contributing, it's quite easy to get started. When would you prefer to use raw SQL over Active Record or other gems that abstract SQL? Essentially, it's when you need speed. While Active Record could be slow for operations involving billions of records, raw SQL makes those tasks easier.
00:01:51.370 You might also want to utilize raw SQL when managing business logic purely in Ruby isn’t ideal. For instance, if all your data is in the database and the Ruby client retrieves it, you might find it more efficient to implement business logic in another language like Go or Rust without duplicating it in Ruby.
00:02:04.400 I love Ruby and have been using it since 2006. I appreciate its efficiency and the ease it brings to design and usage. The most well-known Domain Specific Languages (DSL) in Ruby encompass a wide array, such as ETL tools like ‘Key Belgium’ that simplify data extraction and transformation. You can define many other DSLs for workflow management, state machines, generating fake data using Ruby on Rails, and much more.
00:02:42.310 For instance, generating fake data in Ruby is straightforward. You would set up a factory to define your data and create new records easily. In SQL, you need a custom script to generate the data or use external tools. You can also use PL/pgSQL for on-the-fly data generation with functions. Furthermore, you can anonymize sensitive data, such as PCI information or protected health information, by duplicating safe content into another table.
00:03:07.680 When managing sensitive data, it is crucial to ensure that the sensitive information does not remain in memory too long. You can also utilize views in SQL, combining views and functions to manage data access and permissions effectively. Creating a materialized view can simplify data retrieval but requires a method for refreshing the stored data, which can be resource-intensive.
00:03:28.520 In cases of mission-critical applications requiring monitoring, be cautious with the usage of SQL, especially when using extensive foreign data access methods. As I explored this, I found the gems written in Ruby but wished to inject them into PostgreSQL to use them directly. To this end, I researched PostgreSQL documentation to find ways to leverage Ruby within it.
00:03:52.540 However, I discovered that many of those links from 2013 were outdated and not functioning anymore. I attempted to find relevant projects in RubyGems but was often faced with insufficiently documented or outdated projects. Fortunately, I encountered indications that integrating Ruby via procedural language handlers and Foreign Data Wrappers was plausible.
00:04:14.060 Among the mechanisms in PostgreSQL for integrating foreign technologies are procedural language handlers and Foreign Data Wrappers. These allow for the embedding of new languages, including Ruby, through defined handlers that facilitate data access methods. The first step is creating a handler—a function where data can be passed, allowing you to integrate any C program.
00:04:35.890 Now, PL/pgSQL can be implemented to create custom index access methods. The custom index access methods are APIs that you can use to define and operate various indexing techniques. For example, if you want a full-text search or another non-traditional data retrieval method, successfully implementing using a specific index type is crucial for efficient data management.
00:05:14.430 Choosing to define your custom index types requires more robust integration and understanding of your database operations. Make sure to select the appropriate methods to utilize powerful indexing techniques to optimize your database performance. For example, some companies integrate Elasticsearch with PostgreSQL to enhance full-text searching and aggregation capabilities.
00:05:52.070 The culmination of these techniques facilitates the creation of Foreign Data Wrappers. Essentially, these allow PostgreSQL to interact with external data sources—not just data within its servers. For instance, you can check emails, make HTTP calls, or interact with NoSQL databases and retrieve data seamlessly. These capabilities are invaluable as they provide a conduit for diverse data management strategies.
00:06:32.490 It’s important to understand how local tables, foreign tables, and foreign data wrappers interact with each other to allow for seamless connectivity. This relationship enables PostgreSQL to perform joining operations across different tables from external sources seamlessly. Various existing foreign data wrappers capable of integration across different databases enhance the flexibility of your data architecture.
00:07:12.440 Each foreign data wrapper comes with its own set of functionalities; thus, understanding their performance during usage is critical. Additionally, predicate pushdowns optimize data retrieval processes by evaluating filter criteria closer to the source data rather than pulling vast amounts of data into PostgreSQL for processing.
00:08:00.940 To maximize performance, correctly utilizing those techniques and understanding their underlying principles will significantly enhance data management operations. Importantly, foreign data wrappers must be fine-tuned to ensure smooth collaboration with database systems. Also, understanding where the bottlenecks may arise allows businesses to strategize and optimize accordingly.
00:08:37.920 As I investigated deeper, I envisioned a general-purpose foreign data wrapper that supports Ruby. It was critical to find an efficient method of embedding Ruby in this context. The concept involves integrating the Ruby VM into PostgreSQL processes, allowing Ruby functions to operate seamlessly within database operations.
00:09:24.470 I have started developing a workable model of this integration. This endeavor, which I call PL-MRuby, though still in the early iterations, shows promise. This framework would allow users to execute Ruby code between specific delimiters while concurrently benefiting from PostgreSQL's robust features to fetch data dynamically.
00:10:15.100 Another option for integrating foreign technology in PostgreSQL involves index access methods. PostgreSQL's strength lies in its efficient handling of index operations, and developers can define custom index types for effective data management. By analyzing your data retrieval methods and creating an index that optimally accesses your data, you can significantly reduce the time needed for those operations.
00:11:06.620 Creating effective indices and understanding when to utilize specific types of indices is essential for enhancing database interaction speeds. The use of appropriate index access methods can dramatically alter the performance of data queries, resulting in responsive database interactions.
00:11:54.500 Through the use of these methods and technologies, PostgreSQL becomes a more powerful and flexible tool for developers integrating various languages, technologies, and methodologies into their data architecture. This versatility reflects the broader trend in database management systems towards fluid integration of diverse data sources.
00:13:12.690 As we progress towards integrating Ruby and PostgreSQL more effectively, these strategies pave the way for a more integrated future. It’s an exciting time, and as these tools develop, we can envision a landscape rich in data accessibility and interconnectivity.
00:14:01.200 To conclude, using Ruby with PostgreSQL opens doors to numerous possibilities for data processing and integration strategies. Today's discussion has provided an overview of how these tools work together, enhancing the potential for your projects and applications. I look forward to seeing how this intersection of technologies evolves in the future. Thank you for your attention.
00:14:58.130 I would like to open the floor to questions.
00:15:00.000 I appreciate your questions and am thrilled to explain how predicate pushdown can enhance foreign data access.