RubyNation 2017

Summarized using AI

Keeping Data and Integrations in Sync

Steve Hackley • June 16, 2017 • Earth

In Steve Hackley's presentation titled 'Keeping Data and Integrations in Sync,' delivered at RubyNation 2017, he addresses the challenges associated with synchronizing multiple application environments—such as development, staging, and UAT—with production database data while retaining some table data from the previous database state. Hackley emphasizes the importance of having production-like data in development environments to aid in debugging and development processes.

Key points discussed in the presentation include:

  • Integration Challenges: When pulling production data into staging or development environments, critical pointers and foreign keys often become environment-specific and may get lost during the restore process.
  • Proposed Solutions: Hackley proposes utilizing an external database that stores key-value pairs to reconnect these pointers after restoring a production snapshot to maintain the integrity of integrations.
  • Maintaining Historical Data: Access to production data allows developers to handle historical data effectively, which is beneficial for creating reports, workflows, and troubleshooting edge cases.
  • Working with Payment Systems: He illustrates the challenges of linking customer identifiers across different environments, using the example of integrating with payment providers like Stripe. Each environment generates distinct customer IDs for the same entity, necessitating the need for a method to link them.
  • Implementation Process: Hackley explains their current implementation, which uses Ruby scripts to import and sync key-value pairs across environments, emphasizing the need to scrub sensitive information before moving production data to staging.
  • Data Scrubbing: The process involves ensuring that sensitive information is encrypted or anonymized to prevent exposure in less secure environments.
  • Scheduling Syncs: They establish a routine process for syncing data, which helps maintain data integrity and relevance across environments for effective workflows.

In conclusion, Hackley underscores that maintaining data accuracy and synchronizations between integrated systems not only enhances debugging but also accelerates development cycles. He encourages further conversations regarding data handling practices and the syncing process, emphasizing the importance of effective data management in modern web development environments.

Keeping Data and Integrations in Sync
Steve Hackley • June 16, 2017 • Earth

Keeping Data and Integrations in Sync by Steve Hackley

This presentation will discuss the challenges and potential solutions for refreshing multiple application
environments (Development/Staging/UAT/etc.) with data from a Production database,
while keeping some amount of table data intact from the prior database after the
Production restore.
When not white boarding out solutions with his team, Steve Hackley can be found with the hiking boots on traversing the switchbacks of the Appalachian Trail, or cooking out on the deck at home. With almost 20 years of web development and management experience in all areas including pre-sales, strategy, consulting, operations, and development, Steve has been responsible for assembling and leading several development teams implementing various technologies ( .Net Stack, BI technologies, Ruby on Rails) for clients believing in the notion 'work smarter, not harder'.

RubyNation 2017

00:00:23.630 All right, I'm Steve Hackley, and I'm here to talk to you today about keeping data and integrations in sync.
00:00:26.640 What I'm specifically discussing is integrated environments. When we've got a production platform integrated with other applications, the challenge arises when we want to take a snapshot of that database and bring it down to our staging or development environments while keeping some amount of table data intact from the prior database after the production restore.
00:00:50.910 We all wish we could have production-like data in our development environments. Unfortunately, sometimes we lack the necessary data, forcing us to go into production and rely on tools like Rails console to fetch it. Thus, we want to understand why we need this data.
00:01:03.989 Integrated systems can be hard to debug. As developers, we desire production data in our environments to help in our debugging processes. The challenge is that when we pull production data down, the pointers—such as foreign keys to other systems—get lost because they become environment-specific.
00:01:26.640 So, what I propose is to have an external database that holds these pointers as key-value pairs, linking to third-party databases. This way, we can take a production snapshot, bring it down, and sync those pointers back to re-establish the connections, allowing the production cut to work correctly.
00:01:47.219 A modern integrated system updates various platforms, where we generate keys into our payment processing system, content management systems, and e-commerce platforms. However, when we pull production data, we risk losing these pointers and integrations, which can lead to broken connections.
00:02:06.180 So, why would we want production data in our staging or development environments? When building new functionality, having historical data helps in creating new reports, workflows, and fixing issues with a myriad of edge cases. Production systems create massive amounts of data that can significantly benefit developers.
00:02:36.990 However, when we take a production snapshot and lay it down, those pointers generally become invalid. Our proposed solution is essentially a key-value sync database. This data store operates outside of your existing Rails application, where we can sync production values with customer IDs and other data back into staging and development.
00:02:56.130 This integration can be used beyond just refreshing foreign keys or pointers; it can also scrub data, sync passwords, and serve as middleware for real-time integrations between production and staging environments.
00:03:15.090 For instance, after integrating with a payment provider, such as Stripe, we can illustrate what one of the keys might look like. For example, when I created a customer in Stripe, I received a unique identifier for that customer.
00:03:29.600 In our development environment, I may create a customer with a different identifier because each environment operates independently. Thus, we have multiple customer IDs representing the same entity across different environments.
00:03:47.760 If we want to link these IDs, we need to implement a method to connect the different instances. This process starts with a backup of production data followed by exporting the key-value pairs from staging.
00:04:06.730 Currently, we utilize a Ruby rate test to accomplish this task. It imports key-value pairs across multiple systems, and once we complete this export, we restore the production data to our staging setup after scrubbing any sensitive information.
00:04:30.490 This process refreshes our environments, ensuring that our staging remains current and ready for development.
00:04:49.639 In the key-value sync database, we maintain a connection table that holds unique connections per instance, linking to key-value entries that represent the information we want to track.
00:05:03.090 In this model, we define each key value pair, including table names, key columns, and unique identifiers, allowing us to create a functional mapping of data across environments.
00:05:21.170 One challenge we face is connecting to an external database. Using Active Record, we can establish an alternative connection property, enabling us to interact with a second data source.
00:05:40.370 We’ve created a base class called KeyValueSyncBase that helps establish connections to external data sources easily. Other classes can inherit from this to connect to various databases without affecting our primary Rails database.
00:05:55.690 When calling this method, it can either connect to the existing Rails environment or accept a source database name to establish connections for specific tasks.
00:06:21.380 Once connected, we can query the specific key-value pairs needed, making it straightforward to fetch relevant data for further operations.
00:06:39.160 For example, we can pull customer data by searching for specific identifiers such as email addresses, enabling efficient access to existing records.
00:06:55.920 This allows us to maintain current data flow even across different instances of these external systems, ensuring that transactions and subscriptions remain synchronized.
00:07:16.200 To summarize the process, we create key-value pairs utilizing the necessary connections and establish how each key is associated with data across various systems.
00:07:30.640 After setting up the connection and the necessary attributes, we can save our key-value pairs into the database, where they can be accessed as needed.
00:07:49.510 Once stored, we can issue queries back to the key-value database during necessary data fetch operations. The integration of this schema ensures that we can quickly pinpoint records as needed without extensive overhead.
00:08:08.100 Furthermore, we can conduct these processes on a scheduled basis, ensuring all integrations are current and beneficial at all times.
00:08:27.070 As a result, we've built an efficient workflow that minimizes the challenges faced while troubleshooting and developing within integrated systems.
00:08:43.220 One prevalent issue is the need to scrub production data before moving it into staging, to ensure we don’t introduce sensitive information. Our implementation regularly encrypts or anonymizes data, removing any unneeded identifiers.
00:09:04.360 The syncing process takes about 20 minutes for us, involving around nine fields across three to four million records, ensuring that we can keep pace with our workflows and respective integrations.
00:09:27.210 Through this presentation, I hope to leave you with a better understanding of how maintaining data accuracy and synchronization across integrated systems can lead to more efficient debugging and quicker development cycles.
00:09:43.490 Thank you very much for your attention. I'd be happy to take any questions regarding the process, challenges, or insights on best practices.
00:10:12.390 As for scrubbing production data, we utilize a rate test that conducts this process effectively. Predetermined key values are either encrypted or replaced to ensure that no sensitive information carries over into staging environments.
00:10:39.480 Questions regarding data handling practices or the specifics of how our syncing process operates? I appreciate your inquiries.
00:10:51.420 Thank you once again for your engagement and support.
Explore all talks recorded at RubyNation 2017
+3