Euruko 2023

How To Safely Split Your Multi Tenant Application

How To Safely Split Your Multi Tenant Application

by Miron Marczuk

In this engaging presentation from the Euruko 2023 conference, Miron Marczuk discusses strategies for safely splitting a multi-tenant application to accommodate growing business demands. The main theme revolves around the challenges and solutions of transitioning from a single-region to a multi-region setup, particularly in terms of data compliance and client preferences.

Key points discussed in the presentation include:

  • Introduction and Background: Miron opens with gratitude towards the audience and acknowledges the importance of supporting the community in Ukraine. He introduces the idea of a multi-tenant application for SaaS and outlines the motivation for the discussion.
  • Case Study Example: Miron provides a real-world example from his work at Apply For, a SaaS platform for securing permits in the film and event industry, illustrating the need for a multi-region setup as clients expressed preferences for local data storage due to compliance issues.
  • Data Storage Concerns: The presentation discusses how the application's growth leads to increased demand for data locality, highlighting that clients want their data stored in their respective regions to ensure compliance.
  • Two Approaches to Splitting Applications: Miron outlines two strategies:
    • Keeping one application while separating the data layer, routing requests for different tenants to the appropriate data.
    • Completely separating the applications into distinct entities, ultimately leading to a more manageable system.
  • Migration Process: He emphasizes a careful migration approach, proposing a two-stage process that includes redirecting users first before splitting the data. This phase involves transitioning users to new URLs without altering data initially.
  • Technical Strategies for Data Separation: Miron explains the use of a 'bucket' system for categorizing and separating records and the necessity of maintaining distinct databases. He compares the process to recycling, ensuring data is categorized correctly based on regional needs.
  • Implementation Strategies: He offers advice on testing and going live with the new setup, stressing the importance of clear communication with clients and proper timing of the migration to minimize disruptions.
  • Final Thoughts: Miron concludes by acknowledging his team and the successful transition of data, positing that with thorough planning and execution, companies can achieve effective multi-tenant, multi-regional application management.

In summary, the key takeaway from this video is the importance of careful planning when transitioning a multi-tenant application to a multi-region setup, utilizing proper data handling techniques and strategies to ensure a seamless experience for clients.

00:00:10.800 In a second, we'll go for a pub crawl and see what Vilnius has to offer us. I’ll try to make this presentation entertaining so we have a good first beer of knowledge before we go out. So let's get started.
00:00:30.539 First of all, thanks a lot for voting. I want to use this opportunity to be here on this stage and talk with you, sharing our support with all the community in Ukraine, especially with the Ruby community. It's very important to keep that support coming.
00:00:51.960 Now, going back to the topic, I will talk about splitting your multi-tenant application. It’s actually a story of business success. The problem we’re going to address today involves how your application grows and achieves success.
00:01:03.720 Let's start with an example. Imagine an entrepreneur—there are some entrepreneurs in the audience—and this person has an idea for a SaaS application. For the sake of this example, let’s envision that there are shops that would like to sell through their platform to clients. They can set up their shops and operate through this platform.
00:01:35.880 So, this is the idea, a business concept, and the entrepreneur must take that idea and bring it to the public by creating that application. The process starts with a seed; the idea is first in the entrepreneur's head, then it goes into the computer. In the Ruby community, we have fantastic tools to quickly develop new applications that provide business value to our clients.
00:01:57.540 Depending on what you use—Rails, Sinatra, Hanami, or whatever works—it’s just amazing because it’s Ruby, right? I was expecting a great conference, and here I am, excited to share.
00:02:08.520 Our entrepreneur sets up the application, and there are actually people who want to use the system. A shop comes to the platform, and the shops bring their first users—people who buy products through our platform. The entrepreneur has a great idea, and more shops begin coming onto the platform. This growth is happening organically because we started with just that seed.
00:02:37.560 As the market expands, there aren’t enough clients in the original market where we started, so we need to expand globally. We need clients from other countries. However, our growth remains organic, and our data is still in the original place where the application was initially hosted.
00:03:07.740 Then a prospective client comes in and says, "I want to use your platform but I don't want my data stored in another region; I want to have it where I’m based." Therefore, instead of having data stored in the original place, this client wants it where they operate. There are data compliance issues that are important in this scenario, and this is a real case example of what can happen.
00:03:40.860 The question is, how do you move from having data in a single region to a multi-region setup? This is based on a real example—I work for a company called Apply For. It's a SaaS platform that allows the film and event industry to secure permits. We faced the problem I am describing here. Our clients are not shops; they are authorities that grant permits to users across the USA, England, Canada, and New Zealand. This is a B2G platform, meaning our clients are governmental authorities.
00:04:18.720 In our case, the shops in the previous example are our authorities, while the users are actual event and film producers who want to have their events organized. This example illustrates our need for data to be managed correctly. The request was real: clients from the USA indicated that they didn't want their data stored in the UK where our platform started; instead, they needed their data stored in the US to continue using the platform.
00:04:54.240 That brings us to the journey of creating a multi-region setup. What's the goal of this transition? We start with a setup where all data is stored in the original place where we launched our application. Our aim is to end up ensuring clients’ data is stored only in the country where they come from.
00:05:29.700 We are discussing a multi-tenant, multi-region setup—multi-tenant because we serve many clients from one Rails application, and multi-region because those clients are based in different places around the globe. When I say move to a multi-tenant multi-region setup, that's what I am referring to.
00:05:49.260 It's important to note that throughout the presentation, I will refer to certain colors, like yellow and violet. These are important because they indicate which region or country a specific element pertains to, enabling you to easily spot where it belongs.
00:06:15.720 To clarify, the data we talk about includes databases and files, meaning the database records and user-uploaded files when they use our application. In terms of our technology stack, it’s relatively simple; there is no need to dive into the infrastructure details just yet. We have the data persistence layer with databases, Redis, and file storage, while the application serves as the web server and our Rails application.
00:06:47.700 Now, what are your options for implementing this requirement? I want to discuss two options. The first is keeping one application while separating the data layer. This means you still have one Rails application but conditionally route requests for specific tenants—shops or authorities—to the correct data in the appropriate region. In this setup, the logic for connecting to the right database resides in the application itself.
00:07:20.340 The second approach is to completely separate the applications. Instead of having one application, you will manage two separate ones. In this case, you will serve two different data batches. Each will operate without knowledge of the other; they will exist separately. To determine which option works best depends on your application's specific characteristics.
00:07:50.580 You should consider a few questions: Can tenants move across regions? In our case, moving a city authority to another country isn’t viable. How much data is shared across regions? Is there data that is shared? Finally, can users operate in different regions? Can they transition from one shop to another while maintaining the same user profile? In our case, the answers led us to opt for the second option: completely separating the systems.
00:08:31.740 That's where the fun begins. Starting with a single system and ending up with two separate systems requires careful planning. When I considered how to achieve this in one go, I imagined taking a leap of faith, similar to jumping into a haystack from a height. Personally, I would prefer to take the stairs, gradually working through the migration step by step.
00:09:01.440 This approach is safer and less risky. That’s what we did; we decided to implement the migration in two stages: the first stage is redirecting users, and the second involves splitting the data. In the redirecting stage, we aim not to touch the data initially. We start with a single application and move to the phase where we have two separate containers with distinct URLs.
00:09:39.480 However, the data remains unchanged, and we don’t alter the data persistence layer. We create two new applications while ensuring the data stays intact. The business requirement for this stage is to seamlessly direct users from using one URL to two separate URLs. We want to avoid any noticeable differences for users, meaning they'll be redirected to the correct region.
00:10:09.660 Now, you might wonder why we choose simple names like App One and App Two. We opted to keep it vague, but in practice, it makes sense to have more specific names reflecting the countries. For this task, we can utilize DNS and load balancers to manage the redirect requests.
00:10:45.900 Imagine this scenario: you arrive at a university for a conference and need to find out the location of the talk. You might ask someone who directs you to the right room. However, if you are mistakenly directed to the wrong location, say you end up in a different conference room, you will have to backtrack and find the correct one. That’s precisely the functionality we want to achieve with the DNS to guide users to their respective regions.
00:11:25.380 DNS can track requests based on geographic location, directing users to the correct load balancer and hence the right application. If a user ends up in the wrong location, it can be identified and redirected appropriately. Despite considering a 301 redirect, we found that it could cause caching issues, so we opted for 302 as a better solution. Furthermore, ensure proper SEO practices, like setting alternate links from App One to App Two and vice versa, to enhance visibility.
00:12:06.579 Once users are redirected correctly, we make significant progress. We end up with two separate URLs, two applications, and data that is just waiting for us to split and place accurately.
00:12:45.540 And that’s what we will do now: in Stage Two, we will take the existing setup with two applications and move everything alongside the data to two separate regions. In this instance, we retained the applications in the UK while splitting the data. We want to replicate everything in both regions.
00:13:29.520 This means creating a second instance of all infrastructure elements, including the data persistence layer and applications. It is crucial to ensure that login data is not mixed up; logging in to either region should not lead to a data breach.
00:14:15.300 To effectively separate databases—which is often the biggest question—you should analyze your tables and records. Determine which contain data that needs to be categorized into different regions. The idea is to create two distinct databases while carefully selecting the appropriate data for each one.
00:15:02.160 When separating records, you can use a strategy called bucketing. It’s somewhat similar to recycling. You take your garbage—or in this case, your data—and sort it into the correct bins, aligning each record according to the region it should belong to.
00:15:38.280 Add a column to each table in your database called 'bucket' to categorize the records. You then write code to track where each record belongs, linking back to your models ultimately. It’s a straightforward but effective process that simplifies the separation of data based on its associated region.
00:16:17.580 In deciding to which region a record belongs, you may adopt two strategies: a bottom-up approach where you start by identifying the leaf and work your way up, or a top-down approach where you start at the authority level and move downwards. We found that the top-down approach is more efficient and simpler to implement.
00:17:03.300 The code is scalable and easy to manage. You iterate through all associations and assign the appropriate bucket value to those records with minimal complexity.
00:17:43.260 Once the database is split, you can use a default scope in Rails to preview how the data will look when divided. This helps simulate the final situation before going live. As for speed, when splitting databases, copy only the records required while leaving out any unnecessary data.
00:18:30.840 For the files, the idea is to associate them with their respective records to understand where they should reside. The file designated to a region relies on the associated record's information. You can identify the files via added elements such as folder names or bucket identifiers.
00:19:36.060 For testing, I recommend creating empty copies to streamline the process and prevent unnecessary downloads from existing file systems. The old structure remains in place as new uploads utilize a revised mechanism, successfully transitioning old files to the new setup.
00:20:14.160 When the moment comes to go live for both regions, ensure to inform your clients and involve them in the process for a seamless transition. Choose a low-usage period, like very early in the morning, to make the change. Create a checklist of procedures to follow during the live migration, and assign roles to team members to execute commands.
00:21:00.480 The more you rehearse the procedure, the more confident you will be during the actual go-live. DNS is crucial since it's when changes are exposed to the outside world. You can deploy the infrastructure but keep it outside of the DNS until you're confident everything runs smoothly.
00:21:45.720 When you move, get rid of the old infrastructure. Remember to take leave on the go-live day, as well, to avoid any stressful interruptions. This leads us to the happy ending: your data is properly separated, your clients are satisfied, and revenue flows into your system.
00:22:29.520 I want to express my gratitude to my entire team who helped me create this presentation and facilitate the system changes. Thank you all, and enjoy the pub crawl and the rest of the Euruko 2023 conference!