00:00:23.630
All right, I'm Steve Hackley, and I'm here to talk to you today about keeping data and integrations in sync.
00:00:26.640
What I'm specifically discussing is integrated environments. When we've got a production platform integrated with other applications, the challenge arises when we want to take a snapshot of that database and bring it down to our staging or development environments while keeping some amount of table data intact from the prior database after the production restore.
00:00:50.910
We all wish we could have production-like data in our development environments. Unfortunately, sometimes we lack the necessary data, forcing us to go into production and rely on tools like Rails console to fetch it. Thus, we want to understand why we need this data.
00:01:03.989
Integrated systems can be hard to debug. As developers, we desire production data in our environments to help in our debugging processes. The challenge is that when we pull production data down, the pointers—such as foreign keys to other systems—get lost because they become environment-specific.
00:01:26.640
So, what I propose is to have an external database that holds these pointers as key-value pairs, linking to third-party databases. This way, we can take a production snapshot, bring it down, and sync those pointers back to re-establish the connections, allowing the production cut to work correctly.
00:01:47.219
A modern integrated system updates various platforms, where we generate keys into our payment processing system, content management systems, and e-commerce platforms. However, when we pull production data, we risk losing these pointers and integrations, which can lead to broken connections.
00:02:06.180
So, why would we want production data in our staging or development environments? When building new functionality, having historical data helps in creating new reports, workflows, and fixing issues with a myriad of edge cases. Production systems create massive amounts of data that can significantly benefit developers.
00:02:36.990
However, when we take a production snapshot and lay it down, those pointers generally become invalid. Our proposed solution is essentially a key-value sync database. This data store operates outside of your existing Rails application, where we can sync production values with customer IDs and other data back into staging and development.
00:02:56.130
This integration can be used beyond just refreshing foreign keys or pointers; it can also scrub data, sync passwords, and serve as middleware for real-time integrations between production and staging environments.
00:03:15.090
For instance, after integrating with a payment provider, such as Stripe, we can illustrate what one of the keys might look like. For example, when I created a customer in Stripe, I received a unique identifier for that customer.
00:03:29.600
In our development environment, I may create a customer with a different identifier because each environment operates independently. Thus, we have multiple customer IDs representing the same entity across different environments.
00:03:47.760
If we want to link these IDs, we need to implement a method to connect the different instances. This process starts with a backup of production data followed by exporting the key-value pairs from staging.
00:04:06.730
Currently, we utilize a Ruby rate test to accomplish this task. It imports key-value pairs across multiple systems, and once we complete this export, we restore the production data to our staging setup after scrubbing any sensitive information.
00:04:30.490
This process refreshes our environments, ensuring that our staging remains current and ready for development.
00:04:49.639
In the key-value sync database, we maintain a connection table that holds unique connections per instance, linking to key-value entries that represent the information we want to track.
00:05:03.090
In this model, we define each key value pair, including table names, key columns, and unique identifiers, allowing us to create a functional mapping of data across environments.
00:05:21.170
One challenge we face is connecting to an external database. Using Active Record, we can establish an alternative connection property, enabling us to interact with a second data source.
00:05:40.370
We’ve created a base class called KeyValueSyncBase that helps establish connections to external data sources easily. Other classes can inherit from this to connect to various databases without affecting our primary Rails database.
00:05:55.690
When calling this method, it can either connect to the existing Rails environment or accept a source database name to establish connections for specific tasks.
00:06:21.380
Once connected, we can query the specific key-value pairs needed, making it straightforward to fetch relevant data for further operations.
00:06:39.160
For example, we can pull customer data by searching for specific identifiers such as email addresses, enabling efficient access to existing records.
00:06:55.920
This allows us to maintain current data flow even across different instances of these external systems, ensuring that transactions and subscriptions remain synchronized.
00:07:16.200
To summarize the process, we create key-value pairs utilizing the necessary connections and establish how each key is associated with data across various systems.
00:07:30.640
After setting up the connection and the necessary attributes, we can save our key-value pairs into the database, where they can be accessed as needed.
00:07:49.510
Once stored, we can issue queries back to the key-value database during necessary data fetch operations. The integration of this schema ensures that we can quickly pinpoint records as needed without extensive overhead.
00:08:08.100
Furthermore, we can conduct these processes on a scheduled basis, ensuring all integrations are current and beneficial at all times.
00:08:27.070
As a result, we've built an efficient workflow that minimizes the challenges faced while troubleshooting and developing within integrated systems.
00:08:43.220
One prevalent issue is the need to scrub production data before moving it into staging, to ensure we don’t introduce sensitive information. Our implementation regularly encrypts or anonymizes data, removing any unneeded identifiers.
00:09:04.360
The syncing process takes about 20 minutes for us, involving around nine fields across three to four million records, ensuring that we can keep pace with our workflows and respective integrations.
00:09:27.210
Through this presentation, I hope to leave you with a better understanding of how maintaining data accuracy and synchronization across integrated systems can lead to more efficient debugging and quicker development cycles.
00:09:43.490
Thank you very much for your attention. I'd be happy to take any questions regarding the process, challenges, or insights on best practices.
00:10:12.390
As for scrubbing production data, we utilize a rate test that conducts this process effectively. Predetermined key values are either encrypted or replaced to ensure that no sensitive information carries over into staging environments.
00:10:39.480
Questions regarding data handling practices or the specifics of how our syncing process operates? I appreciate your inquiries.
00:10:51.420
Thank you once again for your engagement and support.