Seeding in a Microservice Architecture'

by Darius Murawski

In this talk at Ruby Unconf 2018, Darius Murawski discusses the challenges and solutions related to seeding data within a microservice architecture. The presentation begins by sharing Darius's background as a developer at Velocity, a search engine company, outlining his experiences transitioning from a monolithic application to a microservices model.

Key Points Discussed:
- Initial Architecture: The talk begins with an explanation of the initial monolithic structure, which included systems like the SAP system and a core application. Darius explains how data seeding was done using large JSON files, a process that was both time-consuming and error-prone due to issues like corrupt files.
- Transition to Microservices: As the company moved toward a microservices architecture, they pulled search logic from the old system into a dedicated Search app. This created a need for efficient data seeding since the new app lacked initial data.
- Cumbersome Processes: Developers had to manually trigger seeding from the old application, complicating onboarding for new team members and leading to inconsistent data seeding across the different applications that were developed.
- Global Seeding Process: Given the growing complexity (with 13 applications), the need for a global seeding approach emerged to maintain data integrity and simplify the process for developers. This needed to ensure that associations among data types were maintained and made seeding straightforward.
- Development of 'vl-seeds': To address these challenges, Darius and his team developed an internal gem known as 'vl-seeds.' This gem utilizes Virtus for a clean interface to manage data attributes and their associations, fostering a better structure when seeding data.
- JSON File Generation: A further innovation included a client that generates a comprehensive JSON file of seed data from their CI server. This data could then map efficiently to the application structure, reducing the manual overhead previously required.
- Flexibility and Mapping: The talk highlights the flexibility of the solution where different applications can structure the same underlying data differently, allowing seamless integration.

Conclusions and Takeaways:

Darius concludes the presentation by emphasizing how they navigated through the hurdles of building a complex microservice architecture, ultimately creating a comprehensive and effective seeding process. He invites questions from the audience to clarify any further points. Overall, the talk reflects on the importance of evolving tools and methods to manage data effectively in a microservices environment.

00:00:15.560 Hello, everyone. As you can see on the slides, I'm a little bit excited to share my insights with you.

00:00:18.630 I hope you enjoy the talk and feel free to give me feedback later on what I can improve.

00:00:28.080 Today, I will be talking about seeding in a microservice architecture, so let's jump right in.

00:00:31.820 To give you a bit of background about me, I've been with a company called Velocity, which is a search engine, since September 2014.

00:00:37.440 Our main office is in Hamburg, and I've worked as a developer there.

00:00:45.809 When it comes to seeding, we often use the command in our Rails applications. It's essential for managing our data effectively.

00:00:54.840 This is how we initially structured our architecture. We started with a monolithic application, which consisted of two main systems: an SAP system and what we now call our core application.

00:01:06.690 In our old monolithic application, we would extract a huge JSON file from our database. Then, we would read this JSON file in the Rails application, import the data into the database, and update our search index each time we did this.

00:01:16.320 This process was time-consuming and error-prone. We frequently encountered issues with corrupted JSON files, which resulted in various errors.

00:01:23.240 However, seeding itself was relatively easy since you could simply write seed scripts that worked as intended.

00:01:31.979 As we transitioned towards microservices, the first step involved extracting the search logic from the monolithic application. We subsequently developed a separate application, which we now refer to as the Search app.

00:01:39.830 Changes made in the old monolith were published using RabbitMQ, and the Search app listened for these updates, fetching the relevant data into its own database and updating the search index.

00:01:53.680 Unfortunately, the Search app didn't initially have any data, so to seed it with data, developers had to manually trigger the seeding process in the old application, which then published the data consumed by the Search app.

00:02:06.780 This method was cumbersome and complicated, particularly for new developers who needed to understand the process and how to operate the system effectively.

00:02:20.920 With the increasing complexity of our architecture, we also started to extract additional applications—for handling company data, product data, and media management, such as image processing.

00:02:37.170 Each of these applications produced RabbitMQ messages, which were then consumed by the Search app to provide relevant search results.

00:02:51.560 As you can imagine, with the architecture expanding to 13 different applications, managing the seeding process became increasingly challenging.

00:03:04.400 We soon faced issues with inconsistent data during the seeding processes across various applications, which prompted us to rethink our approach.

00:03:26.260 From this challenge, we defined several requirements for a global seeding process that would ensure data consistency across all applications.

00:03:39.460 One of our key goals was to safeguard the integrity of associations—for example, ensuring that product data always linked correctly to the corresponding company data.

00:03:53.710 Additionally, we sought to simplify the seeding process so that developers could easily seed data without being dependent on background workers.

00:04:06.880 To achieve this, we developed a gem internally known as 'vl-seeds,' which relies on a base model allowing for simplified seeding processes.

00:04:18.000 This gem harnesses the capabilities of Virtus, providing a clean interface for managing data attributes and associations.

00:04:37.460 For example, when we define a product model, we can easily set up all necessary attributes along with their types, making the handling of data much more straightforward.

00:04:56.360 We also implemented methods that facilitate data retrieval and organization, thereby ensuring that data management resonates with the application's structural needs.

00:05:14.270 Although we made significant progress, there were still challenges—like the need to update the gem every time we introduced new seed data.

00:05:30.710 This process was cumbersome and led to maintenance overheads, especially with multiple branches requiring different seed data.

00:05:45.280 In response, we developed a client that generates a comprehensive JSON file containing all the necessary seed data, which is produced by our CI server.

00:06:09.710 This client fetches the data from the CI server and maps it appropriately to the application's structure, allowing for greater flexibility.

00:06:25.620 For instance, when requesting product data, the client can efficiently retrieve and populate all necessary attributes into the models.

00:06:38.040 Even with the same underlying data, different applications may require different structuring, which necessitates additional mapping for seamless integration.

00:06:49.990 This flexibility proves beneficial, streamlining the data seeding process and reducing complexities inherent in the microservices architecture.

00:07:04.690 Ultimately, after navigating through the challenges, we managed to create a comprehensive and efficient seeding process.

00:07:15.760 Thank you all for listening, and I'm happy to answer any questions you may have.

See Slides on 2018.rubyunconf.eu