Talks

Bending Time with Crystal: 6 hours to 15 minutes

In software, we often encounter problems that we accept as "just how things are." But sometimes, that creates opportunities to identify creative, out of the box solutions. One idea can be combining the power of Crystal with our existing Ruby knowledge, to create effective tools with minimal learning curve and cognitive overhead. I'll demonstrate how easily Ruby code can be ported to Crystal, how it can benefit us, and how to identify these opportunities.

RubyConf 2022

00:00:00.000 ready for takeoff
00:00:17.119 hello everyone I'm Paul Hoffer and today I'm going to talk about creative problem
00:00:22.320 solving using the crystal programming language a little bit about my background I work
00:00:29.460 full time with rails for a company called the real real I typically focus on performance and
00:00:36.360 architecture I do have some prior experience with Crystal
00:00:41.420 including connecting it with Ruby and also converting Ruby code to Crystal
00:00:47.460 if you have ever researched using Crystal to write native extensions in Ruby you've actually probably seen one
00:00:53.879 of my older projects and the main thing is I just like to experiment with big problems and see if
00:01:00.780 we can find solutions to them now let me talk about the real real for
00:01:06.000 a moment I'm just kind of curious who has ever heard of it as a consumer separate from
00:01:12.060 being in the tech world before a few hands okay that's I was hoping for
00:01:18.659 at least a couple we do have commercials it depends on
00:01:24.060 what television you watch you may see a lot or you may see none of them so we are an e-commerce platform we do
00:01:31.920 Luxury Consignment so we sell designer bags designer watches clothing and many
00:01:37.920 more categories most of our products come from other consumers who are reselling their items
00:01:44.340 while a small amount comes from retailers and designers directly
00:01:49.680 our architecture consists of a rails monolith and numerous Phoenix front-end
00:01:55.259 apps which consume data from that rails app we are slowly working to extract
00:02:00.960 services from the rails app which kind of gives us the freedom to look at other Technologies when we need a creative
00:02:07.740 solution to something so now I'm going to give just a quick
00:02:13.379 introduction to Crystal I think that there was a talk at rubycon for me about Crystal and I'm really
00:02:20.280 excited to see that when the videos come out but for Crystal this quote is straight
00:02:25.680 from their website a language for humans and computers I think that's actually a really cool
00:02:31.920 way of thinking about it for humans we get Ruby's efficiency for
00:02:37.379 writing code because Crystal has a ruby-like syntax and for computers we get C's efficiency
00:02:44.580 for running the code because crystal is compiled with llvm
00:02:50.760 but rather than talking about it let's take a look at some Crystal code if we look at this it probably looks
00:02:57.120 very familiar it looks just like Ruby it starts off with a range from one to
00:03:02.400 nine that we iterate through we check to see if the number is even and if it is we print that or we print
00:03:09.720 that it's odd after that we run a block three times we take a random number up to 20.
00:03:16.920 we divided by four and see if the remainder is zero if so we print that
00:03:22.440 out otherwise we check to see if it's a single digit number or not
00:03:27.540 and then we can see the output of this right here we see the numbers one through nine and then we see the three
00:03:33.360 random numbers but guess what we can run this exact same code AS Ruby
00:03:40.140 code and it will give us the exact same output now there's one thing I do want to be
00:03:45.900 clear on crystal isn't intended to be perfectly compatible with Ruby and in a bigger program you can't just
00:03:52.799 copy and paste Ruby code and have it work perfectly but the language is very similar and it
00:03:58.500 forms a very powerful foundation for us further let's talk about the crystal
00:04:05.580 ecosystem as a whole for a moment the first thing to think about are called shards
00:04:11.580 shards are the crystal equivalent of ruby gems however the tooling for shards also
00:04:17.880 includes functionality similar to bundler so it's really kind of rubygems plus
00:04:22.979 bundler together there is an awesome Crystal list that contains a wide variety of shards that
00:04:29.580 are available and it's pretty fun to go through there are shards for web Frameworks
00:04:35.100 similar to rails and Sinatra there are database tools similar to activerecord
00:04:40.520 and also similar to Ecto for The Elixir fans in here there is a full port of sidekick there's
00:04:47.280 tools for mailers and there's tools for most common problems sometimes shards can even be ported from
00:04:53.759 an existing ruby gem a few years ago I created a Shard that was a port of active supports inflector
00:05:01.020 module this module is what handles making words plural or singular snake case or camel
00:05:08.220 case and a variety of other string manipulations that active support gives us
00:05:13.320 it was surprisingly easy to complete with a large amount of code that did not
00:05:18.600 need to be modified I think about 80 percent of code could be directly copied and then the rest the
00:05:25.560 other 20 had to be updated to work with Crystal so what does that mean for us
00:05:32.400 well it means that Crystal code can be very easy to understand and it can feel very familiar to write
00:05:38.220 there's likely an existing library for our specific use cases contributing to existing projects can be
00:05:45.660 relatively straightforward and all in all I think crystal is a great tool for rubius to look into
00:05:53.639 so let's take a look at our current problem that we're dealing with at the real real well one of them which is
00:06:00.240 sitemap Generation we generate links for all of our
00:06:05.820 shopping pages to be indexed in search engines this includes every sale every
00:06:11.280 product category designer promotion and every product that's currently for sale
00:06:16.800 we also include some business related links such as about us press Pages
00:06:22.039 shipping return info pretty much all the standard things that you would want to
00:06:27.180 be indexed in search engines so what makes that so difficult for us
00:06:32.220 well we generate over 18 million links almost all of which are products currently for sale
00:06:38.220 this process takes about six hours to run on the existing Ruby code to do this
00:06:44.759 and because we are adding new products every day to our site we have to update our site Maps daily
00:06:50.880 and because this takes so long we have to run it overnight and that's the only time that it will work in our infrastructure
00:06:57.360 Additionally the process is very memory hungry the tool that we use for Generation actually accumulates all the
00:07:04.020 links in an array until the end of processing and that's when it generates all the site maps and clears every all
00:07:11.220 clears everything out of memory we're also loading the entire objects out of our database instead of just the
00:07:18.300 fields that we need for sitemap generation changing that would improve memory usage a little but it still doesn't solve it
00:07:24.720 because we're still accumulating links in an array uh 18 million of those links
00:07:32.280 and one last note in regarding our process is that our
00:07:38.160 shopping front end for customers isn't even delivered by the rails monolith anymore it's delivered by one of those Elixir
00:07:45.300 apps that means that some of our rails code isn't necessary anymore
00:07:50.400 it's only used in site generation and if we remove sitemap generation we
00:07:55.680 can go clean out a decent amount of dead code we also aren't adding maintenance
00:08:02.039 complexibility when we switch this to a new service because that complexity has
00:08:07.800 already been distributed among different services so in essence there's no reason that
00:08:14.160 rails has to do site map generation there's not really much business logic and it makes it a perfect prototype to
00:08:21.419 see what we can do separate of rails so the question is is sitemap really
00:08:29.220 that complex like it kind of sounds like it but the answer is no it's really not
00:08:35.039 it's actually incredibly simple and it's simple enough that we can fit an example on a single slide that I believe is
00:08:42.060 pretty readable to everyone this is what it looks like in Ruby
00:08:47.180 starting from the beginning the gem is called the sitemap generator that's the
00:08:52.500 gem the class that we use to generate site Maps is the sitemap class
00:08:57.839 we call the create method on that class and we pass it a block this is pretty typical Ruby DSL that
00:09:04.740 we're looking at so far inside that block there's one main method that gets used called add and
00:09:11.100 that's to add the links to the list that gets stored for later it accepts some options and we use
00:09:17.220 options for page change frequency and last modification time those are things that are heavily used by search engines
00:09:26.100 we then iterate through various models I only show it two loops on here but we
00:09:31.620 have five models that we Loop through and we create a link object for each one
00:09:36.959 and then we also have some business related links like I mentioned earlier
00:09:42.180 when this block ends that's when it generates all the site map data so as it
00:09:47.519 goes it accumulates all those links up to over 18 million
00:09:54.959 and like I said we have a little bit more that goes into this but this is pretty simple like it's
00:10:01.920 you add a link you Loop through some products you add links for each of them that's pretty straightforward
00:10:09.180 so now that we've seen how simple code can be now we can kind of consider
00:10:14.700 whether it's feasible to actually do this in Crystal and replace our existing Ruby infrastructure for it
00:10:22.440 so the first thing that we had to consider is what do we want to achieve well we want to make it fast
00:10:29.580 that's the obvious thing if it takes six hours and we have to do this at certain times because of that that's something
00:10:36.779 that can be a problem if we could improve that we can improve our scheduling we could run it multiple
00:10:43.080 times a day we could even run it immediately following product launches we tip we tend to launch product in the
00:10:49.800 morning and the afternoon and we can just run it immediately following that
00:10:55.380 we can also reduce memory usage this would allow us to lower the requirements for our server infrastructure that does
00:11:02.459 this processing which would also help with our overall system flexibility
00:11:07.740 there are a couple intangible benefits though too the biggest one is that we can improve
00:11:12.959 long-term sustainability recurring tasks that will continue to grow over time will eventually become
00:11:19.680 problematic and as our business grows this task will grow also
00:11:24.959 so the question becomes do we deal with this now when it's still manageable or
00:11:30.180 do we deal with it when it becomes an emergency in the future and lastly
00:11:36.120 if we can remove code that isn't used anywhere that's going to help with
00:11:41.820 that maintainability of our rails app itself and that's going to be primarily the
00:11:47.399 routing layer but it also is going to include some helper Logic for product links some controllers and specs that
00:11:53.820 have just kind of been left because we can't pull everything out and
00:12:00.120 it would be really nice for you know our cognitive overload or our cognitive load
00:12:06.720 if we could remove those things so now that we've established what we
00:12:12.899 want to achieve now we can consider whether Crystal would be feasible to achieve it
00:12:20.399 and the first things to examine are the scope of the problem and what tools are
00:12:25.560 necessary to solve it first we need to be able to access the database
00:12:31.200 we have all that data that we've talked about we need to be able to get to it
00:12:37.019 we also have some path helpers for routing and then obviously we have the actual
00:12:43.380 sitemap creation so then we take a look to see what tools
00:12:48.600 are available in the crystal ecosystem to help build this the first thing is to find tooling for
00:12:55.440 sitemap Generation because if there isn't that this project is going to become a lot larger than we're hoping
00:13:01.440 for and we wanted something straightforward that we can prototype and test quickly
00:13:07.500 luckily for us there is a tool called site mapper and it's fully featured as
00:13:12.540 well secondly while there are plenty of tools to access databases there is one
00:13:19.139 specific tool that implements the active record pattern and it feels very familiar for how we are used to interacting with
00:13:26.399 activerecord this one is called Jennifer and lastly revisiting those path helpers
00:13:33.300 well there's only five of them and since they're not used by rails anymore we can probably just Implement them manually
00:13:41.639 so the biggest question becomes how difficult would it be to Port this over to Crystal
00:13:48.120 specifically with the sitemap generation logic
00:13:53.339 here's a reminder of what the sitemap generation code looks like in Ruby it's that big block with the most
00:14:00.060 important method being add adding a link to the sitemap list inside that block we
00:14:06.060 iterate through products sales designers Etc
00:14:12.660 well this is what it would look like if we did it in Crystal the green highlights here are the diff
00:14:18.480 between Ruby and Crystal we have to change the constant because we're using a different library with a
00:14:24.120 different name and now we add a block variable that we'll call Builder
00:14:29.279 and now that add method is a method on the Builder object and not a global method
00:14:35.279 but that's it all the code to iterate through models and read attributes is going to be the
00:14:40.560 same in Crystal as it is in Ruby that's because we're using that Library
00:14:45.959 called Jennifer which is similar to activerecord now we'll have to set that up but we'll
00:14:51.180 get there later and the last thing to look at is that add method because if that was different
00:14:57.839 then we would also have to update that but it takes the same options as the Ruby version does because those options
00:15:04.500 are passed directly to the generated output so just by looking at this it seems like
00:15:11.459 we have everything that we need to move forward it looks feasible it looks like we can do it with minimal changes to the
00:15:19.019 existing code now we just have to build a prototype
00:15:24.060 so let's build it well the first thing that we're going to look at is the database modeling we will
00:15:31.260 be using the crystal Shard Jennifer to accomplish this it has a similar query API to
00:15:37.440 activerecord it includes Scopes and associations
00:15:44.279 and our goal is to minimize changes to the sitemap generation code so we will set up our data models
00:15:50.519 similarly to how they have been in rails to accomplish that so let's take a look at how we would do
00:15:55.620 that at the very beginning the class definition looks similar to active
00:16:01.620 record models we inherit from a base class that Jennifer provides
00:16:07.199 however because crystal is strongly typed and compiled we need to provide
00:16:12.420 some type information for it so we tell it that our data has time stamps
00:16:18.000 and then we also provide a designer id taxon id and then the
00:16:23.579 primary key for just ID
00:16:29.660 we then provide information about the associations through the belongs to designer and blocks the taxon this is
00:16:36.779 just like we do in rails and we also set up a single scope that we're going to use later
00:16:42.779 so now that we've set up our database models let's take a look at how we would interact with them
00:16:49.440 thank you just like earlier this is going to look pretty familiar because it's the same as
00:16:54.480 active record for the first example we have a class landing page it has a scope has designer
00:17:02.040 which is what we just had in the previous slide and we're going to tell it to eager load the associations for
00:17:07.980 designer and taxon in the second example we have a spree product and we're going to call a scope
00:17:14.760 called available on that and in the third example we see how we can iterate through the data and access
00:17:21.179 the attributes it's just the same as we do in Ruby we have sale we call a scope
00:17:27.240 called active on it and then we use a find each method to gracefully iterate through large Quant
00:17:33.600 large data sets that find each is going to be the same the active record has where it's going to load a thousand
00:17:39.539 records and provide them one by one for us to work through we're going to access the attributes the
00:17:46.500 same that we do sale.id and sale.perm Link
00:17:52.919 so now that we've figured out how to do our data modeling let's take a look at how we handle site map generation
00:17:59.039 again the crystal Shard for this is site mapper it has a similar API to the ruby
00:18:04.080 gem that we have been using called sitemap generator it has the same configuration options
00:18:10.980 and it has the same functionality which is mostly compression of the output data
00:18:17.039 and also to upload to S3 it can also ping search engines to tell them that
00:18:22.860 we've updated our site Maps and again our code our goal is to
00:18:28.500 minimize code changes This Time It's For The sitemap Generation code
00:18:35.580 so showing this slide from earlier that's very minimal changes that you're going to have to see
00:18:41.340 we have to change the constant we add the block variable and then we call add on that block
00:18:47.280 variable but there's a little bit of support in code too which I kind of touched on
00:18:52.980 earlier so let's take a peek at that so I've highlighted three methods that
00:18:58.140 we haven't seen the source for yet there's fetch products that's a method that already exists in
00:19:04.799 our current sitemap generation and it just handles a few different scopes for
00:19:09.960 what products we want to pull up and generate links for and then we also have product path and
00:19:17.039 Flash sale path in rails those are just the path helpers that we get for routing
00:19:22.500 in here we can just create them manually and this is what it's going to look like
00:19:29.820 since the routing isn't or since the delivery of content isn't handled by
00:19:35.220 rails we don't need to maintain the same flexibility that we have using the
00:19:40.380 routing helpers we can just hard code this in because if it did change it would need to be changed everywhere
00:19:45.660 anyways and like I said there's fetch products which looks just like typical loading
00:19:51.600 data in rails so going back to the generation code it
00:19:56.820 looks like it's just about ready to go like I mentioned we have a few more classes that we iterate through and a
00:20:03.000 few more static links but otherwise it looks just like this
00:20:09.000 so after all of this does it actually work well yes it does there is a caveat
00:20:15.900 though but let's look at the results the first thing that we discover is it's
00:20:21.660 incredibly fast working with those same 18 million records it finishes in about 15 minutes
00:20:28.380 this is a huge improvement from the six hours that we're used to this means that this is a viable idea
00:20:34.799 that we can continue to work on and fine-tune however it does suffer from the same
00:20:40.799 memory leak as the Ruby version the crystal Library also accumulates all the links until the end of the processing
00:20:47.280 block and in fact it's this finding this here that made me realize it and go find it
00:20:53.880 in the ruby gem itself but maybe we can fix this like I said earlier contributing to
00:21:00.660 Crystal code can be very straightforward because it's so similar to Ruby and that's what we're familiar with
00:21:06.780 so as I'm thinking through what can we do to make this a little better I remember sitemap files can only
00:21:14.100 contain 50 000 links and you have to split them into more files when you go
00:21:19.260 past that so maybe we can write out the files as we go and then we don't need to keep all
00:21:25.380 of the links in memory the entire time so I dove into the crystal code for site
00:21:31.860 mapper and eventually I got it working to write the files and reset the links as it went through processing
00:21:38.159 I submitted a PR and after some conversation and updates with the maintainer
00:21:43.440 we merged it and then with this functionality working we can rerun the generator
00:21:49.440 and we don't have the memory leak anymore the memory usage isn't is so much lower
00:21:54.539 and it's stable throughout the entire processing this is a huge Win For Us in fact we
00:22:02.340 couldn't previously run site generation on our developer machines with a production size data set and now we can
00:22:08.520 do it in 15 minutes this is with 18 million products and a
00:22:13.860 few other classes that I've mentioned before and for that generation time that 892 is
00:22:19.740 14 minutes and 52 seconds and again this is coming from six hours
00:22:25.799 previously in production and this example is running on a developer
00:22:31.679 machine so we might even get a bigger boost in production
00:22:37.559 so let's have an overview of the final solution because it's actually really cool
00:22:43.860 first what went into this well it took about one day just to get a functional prototype
00:22:49.740 a functional prototype meaning we could run the generation it would complete and
00:22:55.200 the output files matched what rails was generating already then it took another day to fix the
00:23:02.039 memory leak and optimize some code for crystal took one PR to fix that
00:23:08.159 and all in all it took less time working on this than I spent preparing for this talk
00:23:13.919 which is the really exciting thing and also in the end the code is
00:23:20.340 incredibly minimal too there's two main files with code and all combined it's about 185 lines
00:23:28.200 so small that I can fit it on there albeit in very microscopic print
00:23:33.960 there are 90 lines for this generation process which is identical to the existing Ruby code
00:23:41.760 foreign and there's about 95 lines for the model
00:23:46.919 definitions and this is for the entire solution as long as it has database access this
00:23:54.240 can run completely independently of the rest of our infrastructure all in 185 lines
00:24:00.840 we don't need all of the Rails infrastructure now
00:24:06.419 and I just want to take a moment to recap the creative process involved here
00:24:14.159 foreign the first thing to do is examine what
00:24:20.039 the problem is and realize that it's Loosely coupled to rails and that means it can potentially
00:24:27.120 be extracted into another service then we have to consider how to solve it
00:24:32.400 we're already familiar with Ruby and Crystal and we know that similar Crystal tooling exists that would allow us to
00:24:39.600 test this without wasting a lot of time like we saw earlier and lastly when needed we can utilize
00:24:46.860 our existing Ruby knowledge to contribute back to Crystal and the last thing is just a general
00:24:53.400 sense of fun chasing down problems there's something really exciting about taking difficult problems and finding a
00:24:59.820 solution for them so the last thing is I just want to take
00:25:05.580 a moment to thank everyone but thank you to rubyconf for giving me the opportunity to present thank you to all
00:25:11.760 of you this is my first time speaking at a conference and I really appreciate all of you being here with me today
00:25:17.159 and thank you to the real real for supporting me speaking here and I do have to say I don't have to but
00:25:24.059 I'm going to we are actively hiring uh even in the current times so come talk
00:25:29.940 to me if solving hard problems sounds exciting to you
00:25:35.460 uh I have some there's plenty of resources out there for Crystal and here's just a few of them
00:25:41.360 crystals Lane Crystal Lang's website is really good they have a ton of
00:25:46.440 documentation lots of resources for people there's that second link is actually an inter a
00:25:54.260 interactive Crystal interpreter online that you can punch code in and run it
00:25:59.520 and see what happens uh there's a crystal for rubius's website with a variety of resources for
00:26:06.419 learning about Crystal and then there's even a page in Crystal's docs that will list popular
00:26:12.240 ruby gems and their Crystal equivalent and so with that are there any questions
00:26:22.320 okay so the first question is why is Crystal so much faster than Ruby and the second
00:26:28.620 question is why does it look so similar um the first question is because it's a
00:26:35.640 compiled language and it's designed to run at the speed of C
00:26:41.640 as for why it looks so similar to Ruby it takes it basically started out as
00:26:49.559 you know it was created by people who were Ruby developers similar to how Elixir was founded by former rails
00:26:56.640 Developers and it's basically designed to be
00:27:02.520 familiar to humans and easy to write just like Ruby is give us all those benefits while being highly performant
00:27:09.840 like C are there any others
00:27:15.000 yeah so the question is I mentioned how I made changes to the crystal library to
00:27:20.580 write out the files as as we were working through them and whether I went back to the Ruby
00:27:27.600 library to try that I didn't go back and try that I basically tried to run the
00:27:33.179 Ruby version on my development machine and it was so slow with the large data
00:27:38.760 sets that even if I could fix that it was still going to take hours to run on my
00:27:45.240 machine so that would be something that would be good for that Library most
00:27:50.400 likely but it still wouldn't have solved our problems
00:27:55.620 and then if anyone if no one else has one I'll get back to you so his question was is there any
00:28:02.400 precedent for triggering Crystal from Ruby um
00:28:07.860 kind of a few years ago there was a lot of work about writing native
00:28:13.679 extensions for Ruby with Crystal I did a lot of work on it and a couple of the
00:28:19.799 crystal uh Crystal developers were experimenting with it too
00:28:25.200 the way that the crystal language kind of changed as it got closer to 1.0 made that more
00:28:32.159 difficult and nobody's really worked on that in the last few years I do think it
00:28:38.700 is a really cool idea because we do have you know some you know math intensive or data
00:28:44.700 processing intensive things in Ruby that could take a huge speed up um but at this current time there's
00:28:51.240 there's nothing for that right now but it's it's certainly been looked at in the past
00:28:56.460 I don't have a Ben oh so the question was if I have a production environment Benchmark for the crystal builds I don't
00:29:05.159 have that but I do have some ideas on the production data set on a developer
00:29:11.340 machine for a slight comparison and in one hour
00:29:16.380 on a developer machine I think it got through like 1 million records
00:29:22.140 so we're looking at 18 hours on a development machine um unfortunately I with a data set this big
00:29:29.779 our machines end up killing the process before it really gets far enough to to have a huge benchmark
00:29:39.120 but so that's where I see optimism that it might it might be even faster than 15 minutes in production
00:29:49.320 anyone else am I missing anyone okay well again thank you all very much