Darius Murawski

Seeding in a Microservice Architecture'

Ruby Unconf 2018

00:00:15.560 Hello, everyone. As you can see on the slides, I'm a little bit excited to share my insights with you.
00:00:18.630 I hope you enjoy the talk and feel free to give me feedback later on what I can improve.
00:00:28.080 Today, I will be talking about seeding in a microservice architecture, so let's jump right in.
00:00:31.820 To give you a bit of background about me, I've been with a company called Velocity, which is a search engine, since September 2014.
00:00:37.440 Our main office is in Hamburg, and I've worked as a developer there.
00:00:45.809 When it comes to seeding, we often use the command in our Rails applications. It's essential for managing our data effectively.
00:00:54.840 This is how we initially structured our architecture. We started with a monolithic application, which consisted of two main systems: an SAP system and what we now call our core application.
00:01:06.690 In our old monolithic application, we would extract a huge JSON file from our database. Then, we would read this JSON file in the Rails application, import the data into the database, and update our search index each time we did this.
00:01:16.320 This process was time-consuming and error-prone. We frequently encountered issues with corrupted JSON files, which resulted in various errors.
00:01:23.240 However, seeding itself was relatively easy since you could simply write seed scripts that worked as intended.
00:01:31.979 As we transitioned towards microservices, the first step involved extracting the search logic from the monolithic application. We subsequently developed a separate application, which we now refer to as the Search app.
00:01:39.830 Changes made in the old monolith were published using RabbitMQ, and the Search app listened for these updates, fetching the relevant data into its own database and updating the search index.
00:01:53.680 Unfortunately, the Search app didn't initially have any data, so to seed it with data, developers had to manually trigger the seeding process in the old application, which then published the data consumed by the Search app.
00:02:06.780 This method was cumbersome and complicated, particularly for new developers who needed to understand the process and how to operate the system effectively.
00:02:20.920 With the increasing complexity of our architecture, we also started to extract additional applications—for handling company data, product data, and media management, such as image processing.
00:02:37.170 Each of these applications produced RabbitMQ messages, which were then consumed by the Search app to provide relevant search results.
00:02:51.560 As you can imagine, with the architecture expanding to 13 different applications, managing the seeding process became increasingly challenging.
00:03:04.400 We soon faced issues with inconsistent data during the seeding processes across various applications, which prompted us to rethink our approach.
00:03:26.260 From this challenge, we defined several requirements for a global seeding process that would ensure data consistency across all applications.
00:03:39.460 One of our key goals was to safeguard the integrity of associations—for example, ensuring that product data always linked correctly to the corresponding company data.
00:03:53.710 Additionally, we sought to simplify the seeding process so that developers could easily seed data without being dependent on background workers.
00:04:06.880 To achieve this, we developed a gem internally known as 'vl-seeds,' which relies on a base model allowing for simplified seeding processes.
00:04:18.000 This gem harnesses the capabilities of Virtus, providing a clean interface for managing data attributes and associations.
00:04:37.460 For example, when we define a product model, we can easily set up all necessary attributes along with their types, making the handling of data much more straightforward.
00:04:56.360 We also implemented methods that facilitate data retrieval and organization, thereby ensuring that data management resonates with the application's structural needs.
00:05:14.270 Although we made significant progress, there were still challenges—like the need to update the gem every time we introduced new seed data.
00:05:30.710 This process was cumbersome and led to maintenance overheads, especially with multiple branches requiring different seed data.
00:05:45.280 In response, we developed a client that generates a comprehensive JSON file containing all the necessary seed data, which is produced by our CI server.
00:06:09.710 This client fetches the data from the CI server and maps it appropriately to the application's structure, allowing for greater flexibility.
00:06:25.620 For instance, when requesting product data, the client can efficiently retrieve and populate all necessary attributes into the models.
00:06:38.040 Even with the same underlying data, different applications may require different structuring, which necessitates additional mapping for seamless integration.
00:06:49.990 This flexibility proves beneficial, streamlining the data seeding process and reducing complexities inherent in the microservices architecture.
00:07:04.690 Ultimately, after navigating through the challenges, we managed to create a comprehensive and efficient seeding process.
00:07:15.760 Thank you all for listening, and I'm happy to answer any questions you may have.