E-Commerce

Bending Time with Crystal: 6 hours to 15 minutes

Bending Time with Crystal: 6 hours to 15 minutes

by Paul Hoffer

In the presentation titled "Bending Time with Crystal: 6 hours to 15 minutes," Paul Hoffer discusses creative problem-solving using the Crystal programming language, particularly in relation to enhancing performance at The RealReal, an e-commerce platform.

Hoffer emphasizes the potential for combining existing Ruby knowledge with Crystal to build efficient tools. He shares insights into Crystal’s architecture, highlighting its Ruby-like syntax paired with the speed of C, which makes it a powerful option for Ruby developers.

Key Points Discussed:

  • Introduction to Crystal:
    • Crystal is designed for both human readability and computer efficiency, making it accessible for Ruby developers.
  • Sitemap Generation Challenge:
    • The RealReal generates over 18 million sitemap links, currently taking up to six hours to process using Ruby, which is not efficient for their daily needs.
  • Proposal for Improvement:
    • By transitioning sitemap generation to Crystal, Hoffer aims to significantly reduce the processing time and memory usage, facilitating more frequent updates.
  • Code Compatibility:
    • Hoffer demonstrates how Crystal code can resemble Ruby code, allowing for easier porting of existing Ruby functionality into Crystal with minimal modifications.
  • Implementation Example:
    • He showcases a simple Ruby sitemap generator that accumulates links, and illustrates how similar functionality is achievable in Crystal using the 'site_mapper' shard.
  • Performance Results:
    • The prototype established in Crystal processes the sitemap in approximately 15 minutes compared to the original six hours, revealing a substantial performance improvement.
  • Further Optimization:
    • An identified memory leak issue was addressed through the implementation of a feature allowing the system to write out files incrementally.

Conclusion:

Hoffer concludes that this transition not only optimizes the sitemap generation process but also allows The RealReal to streamline its codebase by removing unnecessary dependencies on Rails. He encourages others to explore the Crystal programming language as a potential solution for performance-related challenges. The presentation also seeks to inspire attendees to tackle creative problem-solving in their own projects, leveraging the strengths of both Ruby and Crystal.

00:00:00 Ready for takeoff.
00:00:17 Hello everyone, I'm Paul Hoffer, and today I'm going to talk about creative problem solving using the Crystal programming language.
00:00:22 A little bit about my background: I work full time with Rails for a company called The RealReal. I typically focus on performance and architecture.
00:00:29 I do have some prior experience with Crystal, including connecting it with Ruby and converting Ruby code to Crystal.
00:00:41 If you have ever researched using Crystal to write native extensions in Ruby, you've probably seen one of my older projects. The main thing is that I enjoy experimenting with big problems to see if we can find solutions.
00:00:53 Now, let me talk about The RealReal for a moment. I'm curious, who has ever heard of it as a consumer outside of the tech world? A few hands? Okay, I was hoping for at least a couple.
00:01:06 We do have commercials, but it depends on what television you watch; you may see a lot or maybe none at all.
00:01:18 We are an e-commerce platform specializing in luxury consignment, selling designer bags, designer watches, clothing, and many more categories. Most of our products come from consumers who are reselling their items, while a smaller amount comes directly from retailers and designers.
00:01:37 Our architecture consists of a Rails monolith and numerous Phoenix front-end apps that consume data from that Rails app. We are slowly working to extract services from the Rails app, which gives us the freedom to explore other technologies when we need a creative solution.
00:01:55 Now, I'm going to give a quick introduction to Crystal. There was a talk at RubyConf about Crystal, and I’m really excited to see that when the videos come out.
00:02:20 For Crystal, this quote is straight from their website: 'A language for humans and computers.' I think that’s actually a really cool way to think about it.
00:02:37 For humans, we benefit from Ruby's efficiency in writing code because Crystal has a Ruby-like syntax. For computers, we gain efficiency from C's speed for running the code because Crystal is compiled with LLVM.
00:02:50 Rather than just talking about it, let's take a look at some Crystal code. If we look at this, it probably looks very familiar.
00:03:02 The code starts with a range from one to nine that we iterate through. We check to see if the number is even, and if it is, we print that; otherwise, we note that it's odd.
00:03:16 After that, we run a block three times, taking a random number up to 20, dividing it by four, and checking if the remainder is zero. If it is, we print that out; otherwise, we check if it’s a single-digit number.
00:03:33 Here’s the output; we see the numbers one through nine and then three random numbers.
00:03:40 But guess what? We can run this exact same code as Ruby code, and it will give us the exact same output.
00:03:52 Now, I want to be clear: Crystal isn't intended to be perfectly compatible with Ruby. In a larger program, you can't just copy and paste Ruby code and expect it to work perfectly, but the language is very similar and forms a powerful foundation for us.
00:04:05 Let's talk about the Crystal ecosystem as a whole. The first thing to think about are called 'shards.' Shards are the Crystal equivalent of Ruby gems.
00:04:20 However, the tooling for shards also includes functionality similar to Bundler, effectively combining RubyGems and Bundler.
00:04:29 There is an awesome Crystal list that contains a wide variety of shards available, and it’s pretty fun to explore.
00:04:41 There are shards for web frameworks similar to Rails and Sinatra, as well as database tools akin to ActiveRecord. For those interested in Elixir, there is even a full port of Sidekiq.
00:05:01 There are tools for mailers and solutions for most common problems. Sometimes shards can even be ported from existing Ruby gems.
00:05:18 A few years ago, I created a shard that was a port of ActiveSupport's inflector module. This module handles making words plural or singular, converting to snake case or camel case, and various other string manipulations.
00:05:32 It was surprisingly easy to complete, with a large amount of code that didn’t need to be modified. I think about 80% of the code could be directly copied, while the other 20% needed adjustments to work with Crystal.
00:05:45 What does that mean for us? It means that Crystal code can be very easy to understand and feel very familiar to write.
00:06:00 There’s likely an existing library for our specific use cases, and contributing to existing projects can be relatively straightforward. Overall, I think Crystal is a great tool for Rubyists to explore.
00:06:15 Now let's look at a current problem we are facing at The RealReal, specifically sitemap generation.
00:06:28 We generate links for all our shopping pages to be indexed in search engines. This includes every sale, product category, designer, promotion, and every product that’s currently for sale.
00:06:44 We also include some business-related links such as 'About Us' and 'Press Pages'. Essentially, we want everything indexed in search engines.
00:07:00 What makes this process challenging for us? We generate over 18 million links, almost all of which are products currently for sale. This process takes about six hours to run with the existing Ruby code.
00:07:18 Because we add new products every day, we need to update our site maps daily. Due to the lengthy process, we can only run this operation overnight. That’s the only time it will work within our infrastructure.
00:07:36 Additionally, the process is very memory intensive. The tool we use for generation accumulates all the links in an array until the end of processing, and that’s when it generates all the site maps, clearing everything from memory.
00:07:51 We are also loading the entire objects from our database instead of just the necessary fields for sitemap generation. Changing that would improve memory usage slightly, but it still doesn't solve the problem.
00:08:07 We're still accumulating the links in an array—18 million of those links.
00:08:23 One final note regarding our process is that our shopping front end for customers isn't delivered by the Rails monolith anymore; it's delivered by one of those Elixir apps.
00:08:38 This means that some of our Rails code isn't necessary anymore.
00:08:47 It's only used in site generation, and if we remove sitemap generation, we can clean out a significant amount of dead code.
00:09:03 We're also not adding maintenance complexity when we switch to a new service, as that complexity has already been distributed among different services.
00:09:17 In essence, there's no reason that Rails needs to handle sitemap generation. There's not much business logic involved, making it a perfect prototype to see what we can do separately from Rails.
00:09:35 So, is sitemap generation really that complex? It sounds like it, but the answer is no. It’s actually quite simple.
00:09:47 It’s simple enough that we can fit an example on a single slide, which I believe is readable for everyone.
00:10:01 This is what it looks like in Ruby. Starting from the beginning, the gem is called 'sitemap_generator.' The class we use to generate site maps is the Sitemap class.
00:10:15 We call the 'create' method on that class and pass it a block. This is typical Ruby DSL.
00:10:30 Inside that block, there's one main method used called 'add,' which adds links to the list stored for later. It accepts options for page change frequency and last modification time, which are heavily utilized by search engines.
00:10:47 We then iterate through various models. I've shown only two loops here, but we actually have five models that we loop through, creating a link object for each.
00:11:04 We also include some business-related links, as mentioned earlier. When this block ends, that’s when all the site map data generates.
00:11:19 As it runs, it accumulates those links, totaling over 18 million.
00:11:35 As noted, we have more going into this, but it's pretty straightforward: you add a link, loop through some products, and add links for each. It's quite simple.
00:11:54 Now that we’ve seen how simple the code can be, we can consider whether it's feasible to do this in Crystal and replace our existing Ruby infrastructure.
00:12:04 First, let's determine what we aim to achieve. We want to make this process faster, which is essential since it currently takes six hours. We need to manage scheduling better.
00:12:29 If we could improve this, we might run it multiple times per day, especially right after product launches, which usually happen in the morning and afternoon.
00:12:46 We can also reduce memory usage, which would lower our server requirements for processing and assist with overall system flexibility.
00:13:05 There are intangible benefits as well. The most significant is that we can enhance long-term sustainability.
00:13:22 Recurring tasks grow over time; thus, as our business expands, this task will become more significant.
00:13:38 So the question is, do we tackle this now while it's manageable, or wait for it to become an emergency?
00:13:54 Lastly, if we can eradicate unused code, that will contribute to the maintainability of our Rails app, mainly in the routing layer.
00:14:09 This includes removing some helper logic for product links, controllers, and specs that have remained due to the inability to pull everything out.
00:14:22 Removing these things would greatly assist with our cognitive load.
00:14:38 Now that we've established what we want to achieve, we can explore the feasibility of using Crystal for the job.
00:14:53 First, we need to evaluate the scope of the problem and the necessary tools to address it. We need to access the database.
00:15:07 We also require some path helpers for routing and, of course, the actual sitemap creation.
00:15:20 Next, we look for tools available in the Crystal ecosystem to assist with building this.
00:15:38 First, we need to find a tool for sitemap generation because without it, the project scope will increase significantly.
00:15:53 We wanted something straightforward to prototype and test quickly.
00:16:09 Fortunately, there is a tool called 'site_mapper,' and it's fully featured.
00:16:22 Regarding database access, plenty of tools are available; however, one specific tool implements the active record pattern, which feels familiar concerning how we interact with ActiveRecord.
00:16:35 This tool is called Jennifer. Lastly, concerning our path helpers, there are only five of them.
00:16:50 Since they are no longer used by Rails, we can probably just implement them manually.
00:17:06 The main question now becomes how difficult it would be to port the sitemap generation logic to Crystal.
00:17:18 Here’s a reminder of how the sitemap generation code looks in Ruby. It’s a large block, with the most important method being 'add,' which adds a link to the sitemap.
00:17:33 Inside that block, we iterate through various products, sales, designers, etc.
00:17:47 Now, this is what it would look like if we did it in Crystal. The green highlights indicate the differences between Ruby and Crystal.
00:18:05 We must change the constant based on a different library. We also add a block variable that we call 'Builder'.
00:18:20 Now, that 'add' method is a method on the Builder object rather than a global method, but besides that, everything remains unchanged.
00:18:38 The code for iterating through models and reading attributes will be the same in Crystal because we are using that library called Jennifer, which is akin to ActiveRecord.
00:18:55 We will have to set that up, but we will get to it later.
00:19:08 The last thing to consider is the 'add' method because if that were different, we would also need to update it.
00:19:23 Fortunately, it takes the same options as the Ruby version does, so those options are passed directly to the generated output.
00:19:38 Based on this, we appear to have everything we need to move forward. It looks feasible with minimal changes needed to the existing code.
00:19:52 Now we just have to build a prototype.
00:20:08 We will start with database modeling using the Crystal shard Jennifer, which has a query API similar to ActiveRecord.
00:20:22 It includes scopes and associations.
00:20:37 Our goal is to minimize changes to the sitemap generation code, so we'll set up our data models similar to the Rails setup.
00:20:56 Initially, the class definition will look quite similar to ActiveRecord models.
00:21:08 We inherit from a base class that Jennifer provides, but since Crystal is strongly typed and compiled, we need to specify type information.
00:21:20 We declare fields for timestamps, designer ID, taxon ID, and the primary key, which is just the ID.
00:21:35 We then provide information about the associations through the 'belongs_to' keyword for the designer and taxon, which is similar to Rails.
00:21:49 We also set up a single scope that we will use later.
00:22:03 Now that we've set up our database models, let’s look at how we would interact with them.
00:22:19 Again, this will look quite familiar because it aligns with ActiveRecord.
00:22:35 In our first example, we have a class called LandingPage, which has a scope called 'has_designer' that we just defined.
00:22:49 We instruct it to eager load the associations for designer and taxon.
00:23:03 The second example shows a spree product where we call a scope named 'available.'
00:23:15 The third example highlights how we can iterate through the data and access the attributes, just like in Ruby.
00:23:29 We have 'sale' and call the scope 'active' on it, and we use the 'find_each' method to iterate gracefully through large datasets.
00:23:44 The 'find_each' works just like ActiveRecord, where it loads a limited number of records and provides them one by one for us to work through.
00:23:58 We access the attributes the same way we do in Ruby using 'sale.id' and 'sale.perm_link.'
00:24:14 Now that we've figured out how to model our data, let’s examine how we handle sitemap generation.
00:24:32 For this, we use the Crystal shard 'site_mapper,' which has a similar API to the Ruby gem we've been using, called 'sitemap_generator.'
00:24:47 It offers the same configuration options and functionalities, which mainly involve data compression.
00:25:01 It can also upload to S3 and ping search engines to notify them of updated sitemaps.
00:25:16 Our goal remains to minimize code changes for the sitemap generation code.
00:25:32 Referencing back to the earlier slide, the changes are minimal: we change the constant, add the block variable, and then call 'add' on that variable.
00:25:46 However, there’s also support code involved. I’ve highlighted three methods we haven’t seen yet.
00:26:05 'fetch_products' handles a few different scopes determining what products we want to pull and generate links for.
00:26:19 We also have 'product_path' and 'flash_sale_path' in Rails, which are merely the path helpers used for routing.
00:26:34 In our case, we can create these manually. Since the routing is no longer managed by Rails, we don’t need to uphold the same flexibility.
00:26:49 We can hard-code the paths since any change would require updating everywhere anyway.
00:27:02 'fetch_products' resembles typical data loading in Rails.
00:27:15 Thus, returning to the generation code, it seems poised for implementation.
00:27:28 As I previously mentioned, some additional classes iterate through and a few more static links exist. Otherwise, it looks similar.
00:27:43 Ultimately, does this implementation actually work? Yes, it does, albeit with a caveat.
00:27:56 The first finding is that processing with those same 18 million records finishes in about 15 minutes.
00:28:11 This is an enormous improvement from the original six hours, indicating that this concept is viable for further development.
00:28:27 However, it does suffer from the same memory leak as the Ruby version. The Crystal library also accumulates all the links until the end of the processing block.
00:28:43 It's this observation that led me to identify the issue and seek a solution within the Ruby gem.
00:29:01 Fortunately, sitemap files can only contain 50,000 links, requiring us to split them into multiple files when exceeding that limit.
00:29:15 This could allow us to write out the files as we go, thus minimizing memory utilization.
00:29:31 I explored the Crystal code for 'site_mapper' and successfully implemented the feature to write files and reset the links as it processes.
00:29:46 After a PR submission, with some discussions and updates with the maintainer, we merged the changes.
00:30:02 With this functionality operational, we can rerun the generator without the memory leak.
00:30:17 The memory usage is significantly lower and remains stable throughout processing. This is a considerable achievement for us.
00:30:33 Previously, we couldn't run site generation on our developer machines with a production-sized dataset, but now we can do it in 15 minutes.
00:30:55 This is with 18 million products, along with a few classes I previously mentioned. The generation time was 14 minutes and 52 seconds, compared to the former six hours.
00:31:09 This test was conducted on a developer machine, suggesting we might observe an even more significant boost in production.
00:31:24 Let’s look at an overview of the final solution. It’s genuinely impressive.
00:31:39 The prototype took about one day to develop, meaning we could run the generation and compare the output files to what Rails produces.
00:31:54 It then took another day to fix the memory leak and optimize some code for Crystal. Completing this took just one PR.
00:32:09 Overall, I spent less time on this project than I did preparing for this talk, which is quite exciting.
00:32:23 In total, the code is incredibly minimal, with only two main files comprising around 185 lines.
00:32:38 There are about 90 lines of code for the generation process, which is identical to the existing Ruby code.
00:32:50 The remaining approximately 95 lines pertain to model definitions.
00:33:04 This entire solution, with database access, can run independently of the rest of our infrastructure—all in 185 lines.
00:33:18 Therefore, we don't require all Rails infrastructure anymore.
00:33:29 I would like to take a moment to recap the creative process involved.
00:33:42 First, we examine what the problem is, recognizing that it's loosely coupled to Rails, which implies it can potentially be extracted into another service.
00:34:03 Next, we explore how to address the issue, utilizing our knowledge of Ruby and Crystal, along with similar tooling available.
00:34:16 Finally, when necessary, we can leverage our Ruby knowledge to contribute back to Crystal.
00:34:28 This process is enjoyable; there's excitement in pursuing challenging problems and discovering solutions.
00:34:44 In conclusion, I want to thank everyone, especially RubyConf, for the opportunity to present.
00:35:00 This is my first time speaking at a conference, and I truly appreciate all of you being here today.
00:35:12 Also, thank you to The RealReal for supporting my participation. We are actively hiring, even in the current climate.
00:35:26 If solving hard problems sounds exciting to you, please come speak with me.
00:35:40 Additionally, there are ample resources available for learning Crystal. Their official website provides extensive documentation and resources.
00:36:00 There's also an interactive Crystal interpreter online, where you can input code and see the results.
00:36:16 Moreover, 'Crystal for Rubyists' offers a variety of resources to learn about Crystal, including a page listing popular Ruby gems and their Crystal equivalents.
00:36:31 That concludes my presentation! Are there any questions?