RailsConf 2022

Call me back, Postgres

Call me back, Postgres

by Ifat Ribon

The video titled "Call me back, Postgres" features Ifat Ribon discussing the use of PostgreSQL's trigger functionality as an alternative to Rails' Active Record callbacks for managing data synchronization processes.

Key Points Discussed:

  • Introduction to Postgres Triggers: The talk starts with an overview of PostgreSQL's native functionalities, particularly focusing on triggers and the listen/notify mechanism, which serves as a pub-sub system for database changes.
  • Case Study Background: Ifat shares her experience working on a project for a large international client using Salesforce, which required migrating data from a MySQL database to PostgreSQL.
  • Integration with Salesforce: The project involved using Heroku Connect as a data sync layer. It was found that existing legacy data in the PHP MySQL application couldn't leverage Active Record callbacks because the application made direct database commits.
  • Approaches Explored:
    • Rails Callbacks: Although they provide a user-friendly interface for syncing data, they proved ineffective since the legacy app operated outside the Rails API.
    • Polling Scheme: This approach involved running Cron jobs to query the database continuously, but was inefficient and potentially hazardous due to the need for manual coding and high expense.
  • Postgres Triggers and Listen/Notify: Ultimately, the triggers provide a means to listen for changes at the database level, which solved the issue of missing changes from users of other applications not routed through Rails. PostgreSQL triggers execute functions written in SQL triggered by data changes (insert, update, delete) and enable notifications that can be handled in Ruby.
  • Implementation Details: Ifat provides an overview of how to define PostgreSQL functions and triggers, explaining the nuances of writing SQL, the importance of maintaining database connections, and how to handle potential issues like continuous loops caused by trigger activations.
  • Considerations for Data Syncing: Several best practices are highlighted, including logging changes to ensure nothing is missed, maintaining separate connectivity for listeners, and ensuring the reliability of processes even amid connection interruptions. She emphasizes the importance of tests to verify attribute mapping and integration flows throughout the data sync process.

Conclusion:

The video concludes by stressing the utility of PostgreSQL's trigger functionality as a robust solution for implementing data synchronization without overly relying on Rails Active Record callbacks. This approach improves the flexibility and scalability of systems that require intricate data relationships across various applications.

00:00:12.179 Hello everyone, welcome to the talk titled 'Call Me Back, Postgres.' Today, we're going to discuss some native functions that Postgres offers in terms of triggers and their listen/notify capabilities.
00:00:23.400 A little bit about me: my name is Ifat Ribon. I work for a digital product agency based in Chicago, where I also live. I am a principal architect there, and we are hiring, so if you are interested, let me know later. I enjoy running and have played lacrosse for most of my life, participating in many international tournaments.
00:00:50.340 You can find my social media and developer content at the bottom of the screen. With that introduction, this talk will take the form of a case study. As an agency, we work on many client projects. One project in particular led me down the path to learn about Postgres functions and how to use them for a data synchronization service.
00:01:09.720 I'll provide some background to clarify what I'm discussing, including the different sync approaches I explored. We'll then dive into some core concepts I've been hinting at and wrap up with a system design overview. I will also share some resources at the end.
00:01:27.479 To kick things off, the client project I was working on involved a Salesforce integration. We frequently work on Salesforce integration projects, and this particular client was hosting on Heroku. Don’t worry if you aren’t familiar with those tools; I will provide some background on them.
00:01:44.220 This client is a large international company that has been around for a while. They approached us for several engagements, one of which was to digitize some paper products they had and consolidate their data. They had a core legacy app, a PHP application with a MySQL database. Therefore, one aspect of our project was migrating from MySQL to Postgres.
00:02:09.119 Another aspect involved working with their existing Salesforce instance, which they wanted to improve with our assistance. They also used a package on Heroku called Heroku Connect, which essentially acts as a data sync layer. It's an add-on that automatically creates a schema in your database under the namespace of Salesforce.
00:02:22.260 As many of you may know, Rails default apps are under the public schema; in this case, it instantiated another schema for Salesforce and created migrations for the objects you want to sync. This is a high-level overview, but keep that in mind, as I will come back to it.
00:02:45.660 The two main players in this project, once my team was engaged, were the core Postgres database that we had migrated from MySQL, as well as a Rails API we built around that database. This API enabled other integrations to work seamlessly with the data.
00:03:05.099 So what did all the integrations look like? As I mentioned, we've got Salesforce, where they manage customer interactions and products. Heroku Connect provides a syncing layer for the Postgres database, while we developed various applications that interact with the Rails API.
00:03:10.500 Keep this diagram in mind, as we will revisit it later to see how the architecture expands. When we engaged with this project, some of you may think that since Heroku Connect already provides a sync layer, our work was done. However, not so fast.
00:03:39.120 The core database I referred to contained a lot of legacy data that didn't exist in Salesforce. Some of this data needed to be synced to Salesforce while some did not. As a result, Heroku Connect wasn’t going to cover all the cases. Additionally, the legacy app I mentioned—the PHP one—was a direct consumer of the database and did not interact with our Rails API at all. This is crucial to understand as I discuss the approaches I explored.
00:04:22.019 The first approach I tried involved using Rails callbacks to understand if any data was changing and then propagate those changes. Rails callbacks provide an effective DSL for triggering other services or logic when something happens in the database.
00:04:43.800 Common callbacks you might recognize include after_create (triggering actions after a record is created), after_update, after_destroy, and several others. This flexibility allows you to choose to trigger actions before or after the actual transaction or commit.
00:05:09.120 As I mentioned, I’m working with a Rails API, so I initially wanted to implement the syncing layer in Ruby and Rails. However, I quickly realized that this approach was problematic because the PHP app as a direct database consumer did not go through our Rails API nor Active Record, meaning these callbacks were never triggered.
00:05:34.020 Furthermore, Heroku Connect operates similarly, making direct commits to the database without going through Active Record or Rails logic. Consequently, since my Rails callbacks weren't hit, I needed a new way to detect changes in the database, especially for instance when a user was created in Salesforce.
00:06:02.759 The next approach I experimented with involved creating a polling mechanism. What I mean by polling, at a high level, is setting up a cron job that checks the database periodically for changes using various Ruby gems such as the clock gem.
00:06:22.139 This approach involved writing scripts to check the database at intervals to see if anything had been created, using timestamps, counts, or unique indexes to track changes. However, this method seemed dangerous and unreliable. I realized it would require significant resources and could prove costly, as I didn’t want to query a user table with millions of records just to check for new inserts.
00:07:04.680 While refining my polling infrastructure, I came to the conclusion that I needed both a listener for database changes and an efficient notification system, akin to a pub/sub mechanism. With some assistance from my teammates, we discovered Postgres triggers with listen/notify functionality. These are database-level callbacks that work beautifully for tracking changes.
00:07:53.639 It turns out that Heroku Connect also utilizes this mechanism under the hood, leveraging Salesforce's streaming API and creating triggers for listening to changes. Because these triggers operate at the database level, they can respond to any consumer interacting with the database, regardless of whether they are using Rails or not. This was exactly what I was looking for.
00:08:47.760 Using triggers meant I could dictate exactly when something in the database changes without needing to poll for information. It would also decouple the database change notifications from the sync logic, allowing all synchronization processes to remain in Ruby and Rails.
00:09:32.760 Generally speaking, implementing triggers requires writing raw SQL, although there are gems available to help streamline this, such as HairTrigger, which offers a Ruby DSL for using triggers. I attempted to use that gem, but it wasn’t flexible enough for my needs, so I ultimately chose to avoid it.
00:09:46.320 So, what do these triggers look like? Postgres triggers function similarly to Rails callbacks, but they operate directly within the database. You define a function in SQL that executes around specific operations, such as inserts, updates, or deletes, on a designated table.
00:10:50.640 When you execute a trigger and its function, you receive metadata about the operation, such as the schema involved, the affected table, and details about the new and old record states. This is particularly useful for determining what occurred during updates or deletions.
00:11:27.180 Creating a function includes defining its name and the operations it covers. I recommend keeping the function name consistent for better organization. You can use keywords like 'tgop' to specify the operation type (insert, update, delete) and include conditionals as needed.
00:11:51.420 This simple example illustrates that a function can execute various database operations. For instance, if a user is deleted, you might want to log the record ID for audit purposes, preserving historical data. Conversely, if the record is created or updated, you can use the 'new' keyword to access the updated values.
00:12:21.600 Now let’s talk about the trigger itself. To apply the trigger, you provide a name that specifies which table it will affect. You also indicate the function to execute, the table on which it should fire, and its triggers.
00:12:58.020 You can choose whether the trigger executes for each row or for each statement executed on the table. In this example, I focused on individual records, which is why I chose the 'for each row' option.
00:13:38.040 Next, let's discuss the listen/notify mechanism in Postgres, which operates like a pub/sub framework. Essentially, Postgres has a 'notify' function that can broadcast notifications to a specified channel, which you define along with a payload.
00:14:04.500 On the other hand, the 'listen' keyword allows your application to listen for notifications on that channel. You'll indicate that you are listening for messages from this particular notification channel, which then allows you to trigger actions in your application when notifications are received.
00:14:52.920 Here’s a brief example: the 'PG_NOTIFY' function takes a channel name and string payload, so you can broadcast messages. On the listening side, the listener will detect incoming messages and perform designated actions based on the payload.
00:15:49.680 To implement this in Ruby, you can open a direct connection to Postgres, and while many of you may have done this in other areas of your application, there is some caveat to be aware of. You need a listener that runs in an infinite loop, which requires careful management to prevent unintended consequences.
00:16:27.120 The listener must be operating within a connection block, which provides access to the notification metadata. This connection needs to be sturdy; if it times out or encounters an error, the listener will need to be properly disconnected and restored to maintain consistent communication with the database.
00:17:20.040 When we set this up, you'll want to establish your payload notifications logically. In this example, I set up the listener process on the right and create a user in the Rails console on the left side. When the user is created, a notification broadcast is triggered, which can be viewed in real-time.
00:18:06.840 Putting all this together, consider how you might implement the 'notify' function within the trigger function. While writing your triggers in SQL, you can access all the relevant metadata that Postgres provides, making it straightforward to handle the data appropriately.
00:18:35.040 Using string interpolation, you can construct payloads dynamically while keeping track of relevant identifiers, such as table names and IDs, ensuring efficient tracking of updates through your application.
00:19:20.040 To implement these configurations in your database, you can utilize migrations within Rails. However, writing raw SQL in migrations can pose challenges for those not familiar with SQL syntax, so ensure you familiarize yourself with these concepts before diving in.
00:19:54.840 The migration process should include functionality for both adding and removing triggers. It is common to separate the trigger creation from the underlying function for clarity; you can handle this with straightforward SQL commands to set everything up efficiently.
00:20:53.700 I want to reiterate that while designing this system, you won’t see the triggers listed within schema.rb. However, if you utilize structural SQL formats, you can see the raw SQL for all migrations, which can improve your feedback loop when applying triggers and functions.
00:21:22.320 Returning to our earlier discussion, I want to exhibit the data flow in our architecture, involving several applications that interact with the Rails API and the Postgres database. Each application's data sync process is critical, while maintaining connections and logging actions are essential aspects.
00:22:22.740 While building this data synchronization layer, consider maintaining a separate connection for listener services, which lets you scale your configuration if needed. This means that if high traffic occurs, your workers won't interfere with your listener functionality.
00:23:09.740 Another key aspect to consider is logging. By maintaining logs of trigger actions, you can keep track of any missed notifications and retry them if needed. For instance, you could implement a simple model called DatabaseSyncLog, allowing you to migrate the payload appropriately while still updating the sync status.
00:23:56.340 However, be cautious with triggers continually firing in an infinite loop, particularly with sync processes. Develop a mechanism to disable triggers when executing critical updates to prevent triggering additional loops during active transitions.
00:24:34.140 Keep in mind, any process that modifies the table might trigger another related trigger. To bypass this, create a method like 'commit_without_trigger' that allows you to execute your updates without triggering unwanted loops.
00:25:56.340 Finally, as I conclude, it’s essential to identify dependencies and potential conflicts regarding data syncing processes. Always account for potential race conditions and retry behaviors during attempts to sync your data and ensure system integrity.
00:26:42.019 Testing your setup of triggers and syncing processes is crucial. Consider leveraging unit tests to confirm attribute mapping, as well as writing integration tests that initiate actions from the very start and verify that data flows correctly through the system.
00:27:29.899 I have compiled several resources that could be helpful. Thank you all for joining me today. If anyone has questions, I believe we have about 30 seconds to address them.
00:28:03.960 Yes, the question was whether I had to use a significant amount of raw SQL while developing this data synchronization layer. I primarily needed it to write the functions, while various gems helped streamline the process. Nevertheless, the main sync logic, which I didn't discuss, was all written in Ruby.