00:00:00.000
ready for takeoff
00:00:17.119
hello everyone I'm Paul Hoffer and today I'm going to talk about creative problem
00:00:22.320
solving using the crystal programming language a little bit about my background I work
00:00:29.460
full time with rails for a company called the real real I typically focus on performance and
00:00:36.360
architecture I do have some prior experience with Crystal
00:00:41.420
including connecting it with Ruby and also converting Ruby code to Crystal
00:00:47.460
if you have ever researched using Crystal to write native extensions in Ruby you've actually probably seen one
00:00:53.879
of my older projects and the main thing is I just like to experiment with big problems and see if
00:01:00.780
we can find solutions to them now let me talk about the real real for
00:01:06.000
a moment I'm just kind of curious who has ever heard of it as a consumer separate from
00:01:12.060
being in the tech world before a few hands okay that's I was hoping for
00:01:18.659
at least a couple we do have commercials it depends on
00:01:24.060
what television you watch you may see a lot or you may see none of them so we are an e-commerce platform we do
00:01:31.920
Luxury Consignment so we sell designer bags designer watches clothing and many
00:01:37.920
more categories most of our products come from other consumers who are reselling their items
00:01:44.340
while a small amount comes from retailers and designers directly
00:01:49.680
our architecture consists of a rails monolith and numerous Phoenix front-end
00:01:55.259
apps which consume data from that rails app we are slowly working to extract
00:02:00.960
services from the rails app which kind of gives us the freedom to look at other Technologies when we need a creative
00:02:07.740
solution to something so now I'm going to give just a quick
00:02:13.379
introduction to Crystal I think that there was a talk at rubycon for me about Crystal and I'm really
00:02:20.280
excited to see that when the videos come out but for Crystal this quote is straight
00:02:25.680
from their website a language for humans and computers I think that's actually a really cool
00:02:31.920
way of thinking about it for humans we get Ruby's efficiency for
00:02:37.379
writing code because Crystal has a ruby-like syntax and for computers we get C's efficiency
00:02:44.580
for running the code because crystal is compiled with llvm
00:02:50.760
but rather than talking about it let's take a look at some Crystal code if we look at this it probably looks
00:02:57.120
very familiar it looks just like Ruby it starts off with a range from one to
00:03:02.400
nine that we iterate through we check to see if the number is even and if it is we print that or we print
00:03:09.720
that it's odd after that we run a block three times we take a random number up to 20.
00:03:16.920
we divided by four and see if the remainder is zero if so we print that
00:03:22.440
out otherwise we check to see if it's a single digit number or not
00:03:27.540
and then we can see the output of this right here we see the numbers one through nine and then we see the three
00:03:33.360
random numbers but guess what we can run this exact same code AS Ruby
00:03:40.140
code and it will give us the exact same output now there's one thing I do want to be
00:03:45.900
clear on crystal isn't intended to be perfectly compatible with Ruby and in a bigger program you can't just
00:03:52.799
copy and paste Ruby code and have it work perfectly but the language is very similar and it
00:03:58.500
forms a very powerful foundation for us further let's talk about the crystal
00:04:05.580
ecosystem as a whole for a moment the first thing to think about are called shards
00:04:11.580
shards are the crystal equivalent of ruby gems however the tooling for shards also
00:04:17.880
includes functionality similar to bundler so it's really kind of rubygems plus
00:04:22.979
bundler together there is an awesome Crystal list that contains a wide variety of shards that
00:04:29.580
are available and it's pretty fun to go through there are shards for web Frameworks
00:04:35.100
similar to rails and Sinatra there are database tools similar to activerecord
00:04:40.520
and also similar to Ecto for The Elixir fans in here there is a full port of sidekick there's
00:04:47.280
tools for mailers and there's tools for most common problems sometimes shards can even be ported from
00:04:53.759
an existing ruby gem a few years ago I created a Shard that was a port of active supports inflector
00:05:01.020
module this module is what handles making words plural or singular snake case or camel
00:05:08.220
case and a variety of other string manipulations that active support gives us
00:05:13.320
it was surprisingly easy to complete with a large amount of code that did not
00:05:18.600
need to be modified I think about 80 percent of code could be directly copied and then the rest the
00:05:25.560
other 20 had to be updated to work with Crystal so what does that mean for us
00:05:32.400
well it means that Crystal code can be very easy to understand and it can feel very familiar to write
00:05:38.220
there's likely an existing library for our specific use cases contributing to existing projects can be
00:05:45.660
relatively straightforward and all in all I think crystal is a great tool for rubius to look into
00:05:53.639
so let's take a look at our current problem that we're dealing with at the real real well one of them which is
00:06:00.240
sitemap Generation we generate links for all of our
00:06:05.820
shopping pages to be indexed in search engines this includes every sale every
00:06:11.280
product category designer promotion and every product that's currently for sale
00:06:16.800
we also include some business related links such as about us press Pages
00:06:22.039
shipping return info pretty much all the standard things that you would want to
00:06:27.180
be indexed in search engines so what makes that so difficult for us
00:06:32.220
well we generate over 18 million links almost all of which are products currently for sale
00:06:38.220
this process takes about six hours to run on the existing Ruby code to do this
00:06:44.759
and because we are adding new products every day to our site we have to update our site Maps daily
00:06:50.880
and because this takes so long we have to run it overnight and that's the only time that it will work in our infrastructure
00:06:57.360
Additionally the process is very memory hungry the tool that we use for Generation actually accumulates all the
00:07:04.020
links in an array until the end of processing and that's when it generates all the site maps and clears every all
00:07:11.220
clears everything out of memory we're also loading the entire objects out of our database instead of just the
00:07:18.300
fields that we need for sitemap generation changing that would improve memory usage a little but it still doesn't solve it
00:07:24.720
because we're still accumulating links in an array uh 18 million of those links
00:07:32.280
and one last note in regarding our process is that our
00:07:38.160
shopping front end for customers isn't even delivered by the rails monolith anymore it's delivered by one of those Elixir
00:07:45.300
apps that means that some of our rails code isn't necessary anymore
00:07:50.400
it's only used in site generation and if we remove sitemap generation we
00:07:55.680
can go clean out a decent amount of dead code we also aren't adding maintenance
00:08:02.039
complexibility when we switch this to a new service because that complexity has
00:08:07.800
already been distributed among different services so in essence there's no reason that
00:08:14.160
rails has to do site map generation there's not really much business logic and it makes it a perfect prototype to
00:08:21.419
see what we can do separate of rails so the question is is sitemap really
00:08:29.220
that complex like it kind of sounds like it but the answer is no it's really not
00:08:35.039
it's actually incredibly simple and it's simple enough that we can fit an example on a single slide that I believe is
00:08:42.060
pretty readable to everyone this is what it looks like in Ruby
00:08:47.180
starting from the beginning the gem is called the sitemap generator that's the
00:08:52.500
gem the class that we use to generate site Maps is the sitemap class
00:08:57.839
we call the create method on that class and we pass it a block this is pretty typical Ruby DSL that
00:09:04.740
we're looking at so far inside that block there's one main method that gets used called add and
00:09:11.100
that's to add the links to the list that gets stored for later it accepts some options and we use
00:09:17.220
options for page change frequency and last modification time those are things that are heavily used by search engines
00:09:26.100
we then iterate through various models I only show it two loops on here but we
00:09:31.620
have five models that we Loop through and we create a link object for each one
00:09:36.959
and then we also have some business related links like I mentioned earlier
00:09:42.180
when this block ends that's when it generates all the site map data so as it
00:09:47.519
goes it accumulates all those links up to over 18 million
00:09:54.959
and like I said we have a little bit more that goes into this but this is pretty simple like it's
00:10:01.920
you add a link you Loop through some products you add links for each of them that's pretty straightforward
00:10:09.180
so now that we've seen how simple code can be now we can kind of consider
00:10:14.700
whether it's feasible to actually do this in Crystal and replace our existing Ruby infrastructure for it
00:10:22.440
so the first thing that we had to consider is what do we want to achieve well we want to make it fast
00:10:29.580
that's the obvious thing if it takes six hours and we have to do this at certain times because of that that's something
00:10:36.779
that can be a problem if we could improve that we can improve our scheduling we could run it multiple
00:10:43.080
times a day we could even run it immediately following product launches we tip we tend to launch product in the
00:10:49.800
morning and the afternoon and we can just run it immediately following that
00:10:55.380
we can also reduce memory usage this would allow us to lower the requirements for our server infrastructure that does
00:11:02.459
this processing which would also help with our overall system flexibility
00:11:07.740
there are a couple intangible benefits though too the biggest one is that we can improve
00:11:12.959
long-term sustainability recurring tasks that will continue to grow over time will eventually become
00:11:19.680
problematic and as our business grows this task will grow also
00:11:24.959
so the question becomes do we deal with this now when it's still manageable or
00:11:30.180
do we deal with it when it becomes an emergency in the future and lastly
00:11:36.120
if we can remove code that isn't used anywhere that's going to help with
00:11:41.820
that maintainability of our rails app itself and that's going to be primarily the
00:11:47.399
routing layer but it also is going to include some helper Logic for product links some controllers and specs that
00:11:53.820
have just kind of been left because we can't pull everything out and
00:12:00.120
it would be really nice for you know our cognitive overload or our cognitive load
00:12:06.720
if we could remove those things so now that we've established what we
00:12:12.899
want to achieve now we can consider whether Crystal would be feasible to achieve it
00:12:20.399
and the first things to examine are the scope of the problem and what tools are
00:12:25.560
necessary to solve it first we need to be able to access the database
00:12:31.200
we have all that data that we've talked about we need to be able to get to it
00:12:37.019
we also have some path helpers for routing and then obviously we have the actual
00:12:43.380
sitemap creation so then we take a look to see what tools
00:12:48.600
are available in the crystal ecosystem to help build this the first thing is to find tooling for
00:12:55.440
sitemap Generation because if there isn't that this project is going to become a lot larger than we're hoping
00:13:01.440
for and we wanted something straightforward that we can prototype and test quickly
00:13:07.500
luckily for us there is a tool called site mapper and it's fully featured as
00:13:12.540
well secondly while there are plenty of tools to access databases there is one
00:13:19.139
specific tool that implements the active record pattern and it feels very familiar for how we are used to interacting with
00:13:26.399
activerecord this one is called Jennifer and lastly revisiting those path helpers
00:13:33.300
well there's only five of them and since they're not used by rails anymore we can probably just Implement them manually
00:13:41.639
so the biggest question becomes how difficult would it be to Port this over to Crystal
00:13:48.120
specifically with the sitemap generation logic
00:13:53.339
here's a reminder of what the sitemap generation code looks like in Ruby it's that big block with the most
00:14:00.060
important method being add adding a link to the sitemap list inside that block we
00:14:06.060
iterate through products sales designers Etc
00:14:12.660
well this is what it would look like if we did it in Crystal the green highlights here are the diff
00:14:18.480
between Ruby and Crystal we have to change the constant because we're using a different library with a
00:14:24.120
different name and now we add a block variable that we'll call Builder
00:14:29.279
and now that add method is a method on the Builder object and not a global method
00:14:35.279
but that's it all the code to iterate through models and read attributes is going to be the
00:14:40.560
same in Crystal as it is in Ruby that's because we're using that Library
00:14:45.959
called Jennifer which is similar to activerecord now we'll have to set that up but we'll
00:14:51.180
get there later and the last thing to look at is that add method because if that was different
00:14:57.839
then we would also have to update that but it takes the same options as the Ruby version does because those options
00:15:04.500
are passed directly to the generated output so just by looking at this it seems like
00:15:11.459
we have everything that we need to move forward it looks feasible it looks like we can do it with minimal changes to the
00:15:19.019
existing code now we just have to build a prototype
00:15:24.060
so let's build it well the first thing that we're going to look at is the database modeling we will
00:15:31.260
be using the crystal Shard Jennifer to accomplish this it has a similar query API to
00:15:37.440
activerecord it includes Scopes and associations
00:15:44.279
and our goal is to minimize changes to the sitemap generation code so we will set up our data models
00:15:50.519
similarly to how they have been in rails to accomplish that so let's take a look at how we would do
00:15:55.620
that at the very beginning the class definition looks similar to active
00:16:01.620
record models we inherit from a base class that Jennifer provides
00:16:07.199
however because crystal is strongly typed and compiled we need to provide
00:16:12.420
some type information for it so we tell it that our data has time stamps
00:16:18.000
and then we also provide a designer id taxon id and then the
00:16:23.579
primary key for just ID
00:16:29.660
we then provide information about the associations through the belongs to designer and blocks the taxon this is
00:16:36.779
just like we do in rails and we also set up a single scope that we're going to use later
00:16:42.779
so now that we've set up our database models let's take a look at how we would interact with them
00:16:49.440
thank you just like earlier this is going to look pretty familiar because it's the same as
00:16:54.480
active record for the first example we have a class landing page it has a scope has designer
00:17:02.040
which is what we just had in the previous slide and we're going to tell it to eager load the associations for
00:17:07.980
designer and taxon in the second example we have a spree product and we're going to call a scope
00:17:14.760
called available on that and in the third example we see how we can iterate through the data and access
00:17:21.179
the attributes it's just the same as we do in Ruby we have sale we call a scope
00:17:27.240
called active on it and then we use a find each method to gracefully iterate through large Quant
00:17:33.600
large data sets that find each is going to be the same the active record has where it's going to load a thousand
00:17:39.539
records and provide them one by one for us to work through we're going to access the attributes the
00:17:46.500
same that we do sale.id and sale.perm Link
00:17:52.919
so now that we've figured out how to do our data modeling let's take a look at how we handle site map generation
00:17:59.039
again the crystal Shard for this is site mapper it has a similar API to the ruby
00:18:04.080
gem that we have been using called sitemap generator it has the same configuration options
00:18:10.980
and it has the same functionality which is mostly compression of the output data
00:18:17.039
and also to upload to S3 it can also ping search engines to tell them that
00:18:22.860
we've updated our site Maps and again our code our goal is to
00:18:28.500
minimize code changes This Time It's For The sitemap Generation code
00:18:35.580
so showing this slide from earlier that's very minimal changes that you're going to have to see
00:18:41.340
we have to change the constant we add the block variable and then we call add on that block
00:18:47.280
variable but there's a little bit of support in code too which I kind of touched on
00:18:52.980
earlier so let's take a peek at that so I've highlighted three methods that
00:18:58.140
we haven't seen the source for yet there's fetch products that's a method that already exists in
00:19:04.799
our current sitemap generation and it just handles a few different scopes for
00:19:09.960
what products we want to pull up and generate links for and then we also have product path and
00:19:17.039
Flash sale path in rails those are just the path helpers that we get for routing
00:19:22.500
in here we can just create them manually and this is what it's going to look like
00:19:29.820
since the routing isn't or since the delivery of content isn't handled by
00:19:35.220
rails we don't need to maintain the same flexibility that we have using the
00:19:40.380
routing helpers we can just hard code this in because if it did change it would need to be changed everywhere
00:19:45.660
anyways and like I said there's fetch products which looks just like typical loading
00:19:51.600
data in rails so going back to the generation code it
00:19:56.820
looks like it's just about ready to go like I mentioned we have a few more classes that we iterate through and a
00:20:03.000
few more static links but otherwise it looks just like this
00:20:09.000
so after all of this does it actually work well yes it does there is a caveat
00:20:15.900
though but let's look at the results the first thing that we discover is it's
00:20:21.660
incredibly fast working with those same 18 million records it finishes in about 15 minutes
00:20:28.380
this is a huge improvement from the six hours that we're used to this means that this is a viable idea
00:20:34.799
that we can continue to work on and fine-tune however it does suffer from the same
00:20:40.799
memory leak as the Ruby version the crystal Library also accumulates all the links until the end of the processing
00:20:47.280
block and in fact it's this finding this here that made me realize it and go find it
00:20:53.880
in the ruby gem itself but maybe we can fix this like I said earlier contributing to
00:21:00.660
Crystal code can be very straightforward because it's so similar to Ruby and that's what we're familiar with
00:21:06.780
so as I'm thinking through what can we do to make this a little better I remember sitemap files can only
00:21:14.100
contain 50 000 links and you have to split them into more files when you go
00:21:19.260
past that so maybe we can write out the files as we go and then we don't need to keep all
00:21:25.380
of the links in memory the entire time so I dove into the crystal code for site
00:21:31.860
mapper and eventually I got it working to write the files and reset the links as it went through processing
00:21:38.159
I submitted a PR and after some conversation and updates with the maintainer
00:21:43.440
we merged it and then with this functionality working we can rerun the generator
00:21:49.440
and we don't have the memory leak anymore the memory usage isn't is so much lower
00:21:54.539
and it's stable throughout the entire processing this is a huge Win For Us in fact we
00:22:02.340
couldn't previously run site generation on our developer machines with a production size data set and now we can
00:22:08.520
do it in 15 minutes this is with 18 million products and a
00:22:13.860
few other classes that I've mentioned before and for that generation time that 892 is
00:22:19.740
14 minutes and 52 seconds and again this is coming from six hours
00:22:25.799
previously in production and this example is running on a developer
00:22:31.679
machine so we might even get a bigger boost in production
00:22:37.559
so let's have an overview of the final solution because it's actually really cool
00:22:43.860
first what went into this well it took about one day just to get a functional prototype
00:22:49.740
a functional prototype meaning we could run the generation it would complete and
00:22:55.200
the output files matched what rails was generating already then it took another day to fix the
00:23:02.039
memory leak and optimize some code for crystal took one PR to fix that
00:23:08.159
and all in all it took less time working on this than I spent preparing for this talk
00:23:13.919
which is the really exciting thing and also in the end the code is
00:23:20.340
incredibly minimal too there's two main files with code and all combined it's about 185 lines
00:23:28.200
so small that I can fit it on there albeit in very microscopic print
00:23:33.960
there are 90 lines for this generation process which is identical to the existing Ruby code
00:23:41.760
foreign and there's about 95 lines for the model
00:23:46.919
definitions and this is for the entire solution as long as it has database access this
00:23:54.240
can run completely independently of the rest of our infrastructure all in 185 lines
00:24:00.840
we don't need all of the Rails infrastructure now
00:24:06.419
and I just want to take a moment to recap the creative process involved here
00:24:14.159
foreign the first thing to do is examine what
00:24:20.039
the problem is and realize that it's Loosely coupled to rails and that means it can potentially
00:24:27.120
be extracted into another service then we have to consider how to solve it
00:24:32.400
we're already familiar with Ruby and Crystal and we know that similar Crystal tooling exists that would allow us to
00:24:39.600
test this without wasting a lot of time like we saw earlier and lastly when needed we can utilize
00:24:46.860
our existing Ruby knowledge to contribute back to Crystal and the last thing is just a general
00:24:53.400
sense of fun chasing down problems there's something really exciting about taking difficult problems and finding a
00:24:59.820
solution for them so the last thing is I just want to take
00:25:05.580
a moment to thank everyone but thank you to rubyconf for giving me the opportunity to present thank you to all
00:25:11.760
of you this is my first time speaking at a conference and I really appreciate all of you being here with me today
00:25:17.159
and thank you to the real real for supporting me speaking here and I do have to say I don't have to but
00:25:24.059
I'm going to we are actively hiring uh even in the current times so come talk
00:25:29.940
to me if solving hard problems sounds exciting to you
00:25:35.460
uh I have some there's plenty of resources out there for Crystal and here's just a few of them
00:25:41.360
crystals Lane Crystal Lang's website is really good they have a ton of
00:25:46.440
documentation lots of resources for people there's that second link is actually an inter a
00:25:54.260
interactive Crystal interpreter online that you can punch code in and run it
00:25:59.520
and see what happens uh there's a crystal for rubius's website with a variety of resources for
00:26:06.419
learning about Crystal and then there's even a page in Crystal's docs that will list popular
00:26:12.240
ruby gems and their Crystal equivalent and so with that are there any questions
00:26:22.320
okay so the first question is why is Crystal so much faster than Ruby and the second
00:26:28.620
question is why does it look so similar um the first question is because it's a
00:26:35.640
compiled language and it's designed to run at the speed of C
00:26:41.640
as for why it looks so similar to Ruby it takes it basically started out as
00:26:49.559
you know it was created by people who were Ruby developers similar to how Elixir was founded by former rails
00:26:56.640
Developers and it's basically designed to be
00:27:02.520
familiar to humans and easy to write just like Ruby is give us all those benefits while being highly performant
00:27:09.840
like C are there any others
00:27:15.000
yeah so the question is I mentioned how I made changes to the crystal library to
00:27:20.580
write out the files as as we were working through them and whether I went back to the Ruby
00:27:27.600
library to try that I didn't go back and try that I basically tried to run the
00:27:33.179
Ruby version on my development machine and it was so slow with the large data
00:27:38.760
sets that even if I could fix that it was still going to take hours to run on my
00:27:45.240
machine so that would be something that would be good for that Library most
00:27:50.400
likely but it still wouldn't have solved our problems
00:27:55.620
and then if anyone if no one else has one I'll get back to you so his question was is there any
00:28:02.400
precedent for triggering Crystal from Ruby um
00:28:07.860
kind of a few years ago there was a lot of work about writing native
00:28:13.679
extensions for Ruby with Crystal I did a lot of work on it and a couple of the
00:28:19.799
crystal uh Crystal developers were experimenting with it too
00:28:25.200
the way that the crystal language kind of changed as it got closer to 1.0 made that more
00:28:32.159
difficult and nobody's really worked on that in the last few years I do think it
00:28:38.700
is a really cool idea because we do have you know some you know math intensive or data
00:28:44.700
processing intensive things in Ruby that could take a huge speed up um but at this current time there's
00:28:51.240
there's nothing for that right now but it's it's certainly been looked at in the past
00:28:56.460
I don't have a Ben oh so the question was if I have a production environment Benchmark for the crystal builds I don't
00:29:05.159
have that but I do have some ideas on the production data set on a developer
00:29:11.340
machine for a slight comparison and in one hour
00:29:16.380
on a developer machine I think it got through like 1 million records
00:29:22.140
so we're looking at 18 hours on a development machine um unfortunately I with a data set this big
00:29:29.779
our machines end up killing the process before it really gets far enough to to have a huge benchmark
00:29:39.120
but so that's where I see optimism that it might it might be even faster than 15 minutes in production
00:29:49.320
anyone else am I missing anyone okay well again thank you all very much