Summarized using AI

Speed up Rails, Speed up Your Code

Aaron Patterson • June 27, 2014 • Singapore • Talk

In the talk titled Speed up Rails, Speed up Your Code, Aaron Patterson discusses various methods for enhancing the performance of Rails applications. He begins by expressing excitement for the event and proceeds to share insights from his three years of work on performance improvements in Active Record and view rendering in Rails. The presentation specifically focuses on performance trade-offs, emphasizing the balance between speed and memory usage.

Key points discussed include:

  • Performance Tools: Patterson highlights tools like Benchmark IPS for measuring code performance, showcasing how to compare different code implementations effectively.
  • Benchmarking Techniques: He discusses various benchmarking methods, including raw performance comparison and black box testing to understand Rails’s internals better.
  • Active Record Improvements: Patterson outlines the improvements he’s made in Active Record, including the introduction of bind parameters and caching mechanisms that help optimize SQL query handling by reducing unnecessary computations.
  • Code Refactoring: He emphasizes reducing complexity in the codebase to improve performance, like transforming the hasandbelongstomany association into has_many :through associations to enhance efficiency.
  • Memory Management: Patterson underlines the importance of memory in performance optimizations, discussing how Active Record now allocates fewer objects per request and thereby executes faster.
  • View Rendering Enhancements: He shares improvements made to Rails helpers that reduce the number of object allocations during view rendering, showcasing how to make HTML sanitization more efficient without unnecessarily creating multiple string objects.
  • Community Involvement: Patterson encourages the audience to report performance issues and share benchmarks to drive effective developments in Rails, urging that collective reporting can lead to better performance outcomes.

In conclusion, Patterson's talk provides valuable insights into how developers can approach performance optimization in Rails applications through concrete tools and techniques, stressing the importance of ongoing measurement and improvements. He assures that Rails 4.2 will be the fastest version yet, inviting further discussions on performance at the conference.

Speed up Rails, Speed up Your Code
Aaron Patterson • June 27, 2014 • Singapore • Talk

Let's talk about speed! In this talk, we'll examine ways that we've been speeding up Rails for the next release. We'll look at techniques used for speeding up database interaction as well as view processing. Techniques for finding bottlenecks and eliminating them will be presented, and we'll talk about how these techniques work with regard to Rails applications themselves. The presenter will also be discussing ways to speak in first person.

Help us caption & translate this video!

http://amara.org/v/FGYc/

Red Dot Ruby Conference 2014

00:00:22.800 all right oh are we on we're on we're on okay hello welcome hello hello hello
00:00:30.880 I'm really glad I can be behind this uh Podium here because I am not wearing any
00:00:37.160 pants no I'm not I'm wearing shorts right sorry anyway welcome uh I'm really
00:00:44.600 honored to be here I'm happy to be the first person to welcome you to Red Dot
00:00:50.440 Ruby um I want to say thank you to everybody for coming thanks to the organizers thank thank you Winston thank
00:00:57.320 you all of you please give yourselves a round of applause very happy to be
00:01:03.199 here um one interesting thing about being the last speaker is that
00:01:16.400 me so I was thinking that I would just do everything uh very as slow as
00:01:26.079 long long as
00:01:38.320 transition keynote will do I was also thinking maybe we' just stand around and watch animated gifts
00:01:45.119 all day I like this one it's pretty funny he's standing there dancing
00:01:57.000 yeah when is that party I'm sorry uh my name is Aaron Patterson
00:02:04.320 um I've come to you from the United States to bring
00:02:11.640 freedom whether you like it or not uh do we have any Germans here
00:02:18.760 tonight anyone from Germany yeah I heard we
00:02:27.920 won freedom yeah I couldn't figure out how to make
00:02:33.000 any like jets fly across that or anything anyway I'm on the I'm on the
00:02:38.200 Ruby core team I'm also on the rails core team um and this is my first time
00:02:43.680 to give a talk uh I'm just kidding uh so you can find
00:02:50.519 me on Twitter as Tender Love I'm also on GitHub as Tender Love uh Instagram is
00:02:56.280 Tender Love and also on Yo as Tender Love uh so you can yo me there um I
00:03:03.720 think about weird stuff all the time for some reason I don't I really don't know why just like just weird stuff always
00:03:09.640 comes to my mind like for example people always say like separation of concerns like I I heard that a lot at this
00:03:15.799 conference I hear it a lot all the time as a programmer and the only thing that I can think of is basically this
00:03:24.239 just that's all I can think of I can't take it seriously I'm like okay
00:03:31.400 uh anyway I recently became the um number one committer on Rails I'm the
00:03:36.760 top I on the top T ton of Internet points so many
00:03:43.000 internet points uh I think this is out of date I I think I actually have more commits than that now but let me tell
00:03:49.200 you I'm going to give you all the secret to getting internet points uh I'm going to give you this is the secret this is
00:03:55.920 my secret I'm giving it away here is that um revert commits count
00:04:03.840 to so more mistakes equals more point so don't be afraid to make
00:04:11.120 mistakes you get more internet points for it uh I'm a short stack
00:04:16.759 engineer uh I think that's actually a full stack but I I don't know I'm I'm a short stack
00:04:23.520 engineer I enjoy I enjoy PA programming that's me PA programming uh
00:04:32.960 the interface is a little sticky on that uh getting the TTY to work was difficult
00:04:38.800 I I guess I'm also a dad joke programmer too I tell a lot of bad dad jokes anyway
00:04:44.759 I have a cat uh this is my cat his name is gorbage Hef puff puff Thunder horse
00:04:50.759 um actually I have two cats this is my other cat uh her name is CAC airport
00:04:55.960 YouTube uh Facebook too is her name also that that's that's her uh actually yeah
00:05:04.320 my wife told me the reason we got the other cat was so that or the reason we got two cats is so that I would stop
00:05:11.199 looking at pictures of cats online and I was like like that's that's
00:05:17.039 not how this works you see now we have two cats and I look at cat pictures
00:05:22.639 online anyway I'm I'm also a very shy person um I you might you might think
00:05:29.120 hey and how are you getting up in front of all these people and talking to them if you're very shy well it's actually because I'm really excited about my work
00:05:35.840 I love my job and I also love like I want people to know what I do so I use that to like um give myself the courage
00:05:43.960 to be up here but also like uh I brought so I brought a whole bunch of stickers of my cat and I like I am really
00:05:51.840 terrible at starting conversations I'm very shy and awkward uh so if you want to come talk to me come up and say hey
00:05:58.800 Aaron I would like a sticker and then I'll say here's a sticker of my cat and then we can talk about cats or
00:06:03.960 programming or whatever uh so it's like little ice breaker there um anyway so I
00:06:09.039 was thinking about this thinking about this conference and I looked at like so last year I thought about the stuff that I did here and uh Red Dot Ruby conf last
00:06:17.199 year last year I went to I went to Hawker centers um and there's amazing food I
00:06:23.000 ate some really really awesome food this is one of the things I ate um I also
00:06:28.360 thought about like so last last year I thought about this is uh so this is a map of the world I came from here that's
00:06:35.840 I'm from Seattle in the United States there uh and we're down here in
00:06:41.479 Singapore um and the problem is that like I love food a lot I love food a lot
00:06:48.000 I eat all the time and it's very very bad for my weight but I was thinking that well so I'm very far north
00:06:56.039 typically right and we are now pretty far south South and if we take a look at
00:07:01.520 the Earth from the top like there on the right that's where I am in Seattle and down there on the on the bottom right
00:07:07.680 that's in that's us in Singapore and actually the Earth is you know the Earth
00:07:13.240 has the same we're rotating faster down here than we are up in Seattle so I should be being flung away from the
00:07:20.400 earth faster like if we move that up there right so that means that down here in Singapore I weigh less uh because I'm
00:07:28.520 being pushed away from the earth right so I was trying to calculate this I was trying to figure this out and I thought
00:07:34.319 okay there's there's all that and I'm like okay let's plug in some numbers here and I plug in these numbers and I'm
00:07:41.160 plugging them in and trying to figure it out and then I'm like I have no idea what I'm doing uh I couldn't figure this
00:07:46.520 out but I like this is all I was trying to figure this out last year too I I couldn't figure it out last year and
00:07:52.800 actually like one of my fond memories of this conference from last year was um
00:07:57.919 talking to Jim wck about this uh and he actually calculated all this for me and
00:08:03.159 presented it in his slides and told me exactly how much weight I lost when I was down here I was digging through
00:08:08.319 digging through um old photos that I took from last year and I found this one
00:08:13.879 where he was comparing himself to my cat so anyway I just like I wanted to
00:08:20.199 have a slide in here that just said we miss you Jim um he
00:08:25.520 is so he is a huge a huge inspiration to me
00:08:31.680 and like one of the reasons I try to do my best presenting is cuz I know that he enjoyed watching me speak so I'm doing
00:08:37.599 my best all the time now to try and remember him uh the other thing another thing that I did last year is I ate
00:08:43.120 durian fruit for the first time uh and this is this is picture of the durian
00:08:48.959 fruit that I ate and I actually really liked it I enjoyed it at first I thought it smelled incredibly bad uh and then I
00:08:55.000 tried it and I got it and I thought it was great it was great that's me the first time I had it uh and my wife came
00:09:00.160 with me too and she tried it as well and so I was like taking some reaction shot photos so I've got like this is her
00:09:07.120 before before eating durian fruit and then after so she didn't really enjoy it as
00:09:14.440 much as I did but uh I really like it anyway I've also been I've been studying node Jaz a lot lately
00:09:22.000 uh yeah and the reason I'm doing the reason I'm studying no is so that I can get a lot closer to the metal uh and I I
00:09:29.920 think that I've accomplished that like pretty close pretty close to the metal now
00:09:37.839 anyway so I'm going to talk my talk is called speed up rail speed up your code um but I guess really it should be
00:09:43.920 called rails. inspect um or maybe this is not
00:09:49.519 magenta something so uh I'm going to talk a lot
00:09:54.680 about I'm going to talk a lot about performance in rails and um performance in your own code like basically
00:10:00.920 benchmarking Stuff how to Benchmark Stuff how to measure performance of your applications um and I I have an ulterior
00:10:07.560 motive for teaching you how to measure measure performance of your applications and I will talk about that at the very
00:10:13.279 very end of this talk um but basically we're going to look at how to measure performance of code and then how I
00:10:20.040 measure performance of rails and how we've increased uh improved performance of rails uh and hopefully we can take
00:10:26.160 that you'll be able to take these tools that I show you home and apply them to your code at home be it rails or not uh
00:10:33.120 so the first thing I want to talk about is performance tradeoffs and typically when we're talking about performance we have to make tradeoffs we have to think
00:10:39.680 about we have to think about two things speed versus memory um we typically care
00:10:45.240 about like I typically care about runtime speed like how fast my program runs versus memory because it's pretty
00:10:51.360 easy for me to buy more RAM and throw it in a machine it's getting cheap cheaper uh RAM is pretty cheap I think uh we
00:10:58.600 also have to to talk about time versus Space so when I say speed or memory I'm actually talking about time and space
00:11:04.160 which I think is really interesting terms because I just think about Star Trek then like so we have q representing
00:11:10.480 time and peard representing space for some reason I don't know I just whatever
00:11:16.079 anyway so the point is that space is not free because you know we may we may be
00:11:22.079 able to add more memory to our machines but it's still not free it costs something it costs some Ram but time is
00:11:27.240 also not free because maybe we want to serve up requests quickly and really what the thing the the point of this is
00:11:33.279 is that you know nothing is actually free we have to make a decision when we're trying to improve the performance
00:11:38.959 of our systems whether we choose we choose Q or whether we choose
00:11:44.680 Bard or there's actually one there's one other thing that we can do we can find a better algorithm sometimes there's
00:11:51.639 certain cases where we're able to improve both of these but uh I find that to be pretty rare we'll find there's
00:11:57.079 going to be an example of that in my slides we'll find we'll find a mystical unicorn in these slides but I think that
00:12:03.639 this is a pretty rare uh rare thing that we can do so most of the time we're making tradeoffs most of this talk will
00:12:10.600 be making tradeoffs and most of the time we're going to talk about giving up RAM for improving the speed of our system uh
00:12:26.399 Heroku uh so we're going to focus on runtime speed we're going to focus on speed at the cost of RAM and the point
00:12:32.320 that I'm trying to make here is that time and space are related we can typically trade one for the other when
00:12:37.800 we're talking about performance of our applications uh so the first thing I'm going to look at is Performance
00:12:44.120 Tools so uh this this is basically the advertising section of my presentation
00:12:50.079 where I'm advertising other people's tools because I have no idea how to build this stuff I am just using it to
00:12:56.279 make our code better so yeah yes advertisement time so one of the tools
00:13:01.680 that I like to look at for raw performance like first we're going to look at Raw performance this is when we have like two two bits of code um and we
00:13:10.279 just want to know how they compare to each other with speed like which one of these is faster and my go-to tool for
00:13:16.480 this is a gem called Benchmark IPS and I'm going to show you show you how to use that with your code but first we're
00:13:23.160 going to compare that to Benchmark that comes with Ruby standard Library this is a benchmarking tool that comes in Ru
00:13:29.040 standard Library it looks exactly like this you say like create a new Benchmark and run some test end times right you
00:13:36.360 may have written you may have written a benchmark like this before uh now the problem is like if if you've written a
00:13:42.240 benchmark like this before you might notice one of the problems is well how big do I make n you'll sit there and
00:13:48.480 you'll like say oh okay well maybe I may need to make a 10 or maybe I need to make it a thousand I'm not exactly sure
00:13:53.880 so you run it and this may look pretty familiar to you you run it and it's like wow that's super fast took zero time
00:14:00.560 well obviously it did not take zero time uh it probably took more than that and your end was not large enough for you to
00:14:06.199 study the performance of this um study the performance of this method the other
00:14:11.839 problem is that we have to deal with noise on our machines like you might be sitting there running your benchmarks
00:14:17.240 but you've also got iTunes going and Twitter running and that YouTube video as well like and you have the yangang
00:14:23.480 cats going trying to get that high score and this is causing a lot of noise with your benchmarks so it would be nice if
00:14:29.759 you had some sort of standard deviation to tell you like hey this is it took at long at longest this amount of time and
00:14:35.959 at least this amount of time so we can say it's kind of in the middle here and this is where Benchmark IPS comes in so
00:14:42.959 what Benchmark IPS says is like okay let's figure out the iterations per second that's where IPS comes from it
00:14:48.279 says like okay how many how fast can I run this code in 5 seconds so we have an
00:14:53.360 example here where we're comparing uh set time how fast can we access a set versus how fast can we access an array
00:14:59.959 like is this a member of an array or a member member of a set and if we compare those two we'll see that uh accessing
00:15:07.000 the set is much faster over there on the right right hand side we have iterations per second so we can say like the number
00:15:13.240 the higher this number is the better it is how fast can we do that per second so a set include is about I don't know some
00:15:19.839 big number I'm not sure 30 bajillion and then is that a number I don't know and
00:15:25.480 then array. include is like this which is smaller I can tell it's smaller cuz it's
00:15:32.560 shorter fortunately I'm using the same size font uh so since the set the set
00:15:38.880 include is higher that means that using a set in this particular example is better than using an array so the point
00:15:44.680 to take home from this is for iterations per second higher is better just remember that higher is better the other
00:15:51.959 important thing that this provides is a standard deviation so we can say like well if we if we uh uh this at the
00:16:00.279 slowest this could be 12% slower or 12% faster on either side of this particular
00:16:05.800 number and the reason that this is important is maybe you've implemented some algorithm uh two different ways and
00:16:12.160 you're not sure if one is faster than the other maybe you run it one way and it seems faster but if you don't have
00:16:18.120 this standard deviation it could just be noise in your system like maybe this time you're not watching you know
00:16:24.560 YouTube videos and listening to iTunes when you run this benchmark
00:16:30.000 so if we compare this like I want to drive this home by comparing this Benchmark using the Benchmark library
00:16:37.000 from Ruby standard lib this is the same same uh Benchmark as like well except
00:16:42.880 excuse me not the same Benchmark we're doing set inclusion versus hash inclusion and comparing those two they should be roughly the same we would
00:16:49.399 think that it's roughly the same and if we run this with standard Libs Benchmark we might get an output like this and
00:16:55.240 you'll see that they look almost exactly the same or that set access is faster
00:17:01.040 and you say well okay set access is faster than hash access how can that be possible because if you look under the
00:17:06.640 hood set is actually wrapping up a hash so you should have a little bit of overhead in there if we run this with
00:17:13.039 Benchmark IPS we'll see this is the same Benchmark using uh Benchmark IPS we'll see the output looks like this um and
00:17:21.000 the hash axis is actually faster for this particular Benchmark so we get a better idea of what our numbers are
00:17:26.959 using this and we can graph this and see like okay well set access is up there at
00:17:32.960 whatever number hash access is even higher so we know iterations per second higher is better we plot our standard
00:17:39.280 deviation the standard Devi the bottom of the standard deviation on the right side doesn't even touch the one on the
00:17:44.360 left side so we know for sure that the hah access is faster in this particular case uh the other thing I like to use
00:17:50.160 these benchmarking tools for is blackbox testing so many times when I'm dealing with rails source code like I don't know
00:17:56.080 how something is implemented I have no idea uh this might be a surprise to you but
00:18:01.159 even when I'm looking at rails core I'm I'm on the rails core team but much of the time I have no idea what's going on
00:18:07.200 I really don't uh and I try to figure this out and one of the tools I use for figuring out what is going on is
00:18:13.120 benchmarking tools I'm going to show you a short example of this is let's say we have two cach implementations cach one
00:18:20.039 and cach two and we're just measuring how fast it is to access access uh an
00:18:26.559 element from our cache uh so what we can do is say well I want to I want to study how one cash or I want to study how
00:18:34.440 these caches perform as the size of the cash increases what we can do is say okay let's let's look at how big the
00:18:40.799 cash is at 10 100 a th000 and then 100,000 depending on those particular sizes and we can actually collect a
00:18:47.679 report uh IPS will return to us a report object that takes that collects this particular information so I can take
00:18:54.440 this information and compile it down so this is a code to compile that down what I do is I say okay well I want to know
00:19:01.120 uh how fast it is for me to access this particular cash 10,000 times so I have a
00:19:07.400 fixed number I want to say how fast does it take for me to get to do a get 10,000 times Well IPS gives us iterations per
00:19:15.880 second so we say one over iterations per second to convert that to seconds per iteration then we multiply by 10,000 to
00:19:23.039 say Okay I want to know how fast this is for 10,000 and then we plot that we say okay for each of those particular tests
00:19:30.559 how fast is this so we plot that out and we say okay well our cach one as we grow
00:19:37.320 our cash one seems to stay the same the same the entire time it's constant time
00:19:42.400 where the other one seems to be growing and it's it may be maybe linear growth maybe exponential growth we're not sure
00:19:48.760 exactly but one of them is definitely constant time the other one is growing and if we look at the cach
00:19:53.960 implementations it's pretty obvious the reason it's pretty obvious why we get these performance uh differences it's
00:20:00.559 because one of them is using a hash as the internal data structure where the other one is using an array and we know
00:20:05.840 that with an array it's going to take a linear amount of time to scan that array so that's why we see those numbers grow
00:20:11.159 like that so I'm going to show you a real world example of using this when I'm trying to figure out what is going
00:20:17.159 on with rails um one of the things that I was trying to study is like well how is the
00:20:22.919 performance of the routing system like what is what is the routing system like like how fast does it
00:20:28.799 let's study how fast it takes toate a link to So when you say link to how fast how long does it to generate an a tag so
00:20:36.760 what I did is I said okay let's create a small a small route set here we'll add one resource to it uh and then we'll
00:20:43.640 time how fast it takes to generate an a tag then we'll do that again for 10 routes so we'll add 10 routes to this
00:20:50.200 this is actually 10 resources then we'll do it again for 100 routes and the reason I said resources is because
00:20:55.440 resources actually adds like for a million routes I'm not sure how many adds more than one um and then we'll do
00:21:01.520 it for a thousand and then we'll plot that and see like how does that like how does this change how many iterations per
00:21:07.840 second can I do as we grow the number of resources in our system so if we plot that out it looks like this so along the
00:21:15.159 x- axis there is the number of resources that we have in the system and along the Y AIS is the number of seconds per
00:21:21.640 100,000 calls and if we plot this as we grow the number of routes it stays about linear it's a linear performance this
00:21:28.480 doesn't actually grow as we as we uh get larger now the next thing I said is like
00:21:34.640 okay so we know we know that the rails router it doesn't matter how many routes we add to the router it'll always take
00:21:40.440 the same amount of time to generate an AAG well what about the size of the URL
00:21:47.080 like how how long can the URL be so I wrote another Benchmark here that said okay well let's let's say like I want to
00:21:53.640 generate uh SL a SL um ID I think is what this this is doing uh or no this is
00:22:00.240 just SL a and then I want to say like Okay so this one says slash a and then this one says slash a a a a a like 10
00:22:07.840 A's then the next one is 100 A's and we keep growing that and we say like okay how long does it take to generate a URL
00:22:14.480 that's you know 10 of length 10 100 a th000 etc and then we plot that using our um iterations per second calculating
00:22:22.200 how many seconds does it take to do 100 100,000 calls and we see like this along
00:22:27.320 along the uh xaxis again is the number how long that URL is how many segments
00:22:32.880 are in that URL on the very right is the 24 segments and then along the Y AIS is
00:22:38.559 the number of seconds per 100,000 calls and we see the longer that the URL gets the longer it takes to generate the
00:22:45.919 AAG so now we kind of know what the algorithms are going ins going on inside
00:22:51.000 the inside the routing system we probably have some sort of hash table that keeps track of all of the uh that
00:22:56.640 keeps track of the particular routes in the system so that we can look them up and generate a tags and we have maybe
00:23:01.720 some sort of array that's actually calculating this a tag or calculating the URL part or the HF part of the a tag
00:23:09.240 so we kind of know what what's from a high level what's going on under the hood the next thing I want to know is
00:23:14.480 like where is time spent exactly so this is great we're able to use IPS to figure out like how fast two particular
00:23:20.799 algorithms are but what about like where is the time actually being spent and for that I like to use this tool called
00:23:26.840 stack Prof you should check this out this is a profiling or this gives you it's a sampling profiler so it says like
00:23:33.640 okay uh every clock tick every constant tick we say what what is our stack frame
00:23:41.080 so the idea is that um functions that are slower have a higher probability of
00:23:47.440 being there when we hit that clock tick right so if you have a method that's
00:23:53.159 called the other implication of this is that let's say you have a method that's called 100 times it may not show up in your profile 100 times it'll show up uh
00:24:01.080 less it'll show up the number of times it was actually sampled so if we run profiling for URL 4
00:24:07.200 similar to the link 2 we can see the results like this dumps out that profile to a file called URL 4. dump and we can
00:24:14.400 actually view those results using the stack Prof uh stack Prof binary on our in the terminal and say like just give
00:24:20.919 me a report of this this is what the report looks like uh you don't need to read that too closely but what it is is
00:24:26.679 it's saying like at the top there we're spending about 26% of our time in this method called URL 4 so that's where we
00:24:33.159 want to focus our attention when we're trying to improve performance is we have this one function that takes 26% of our
00:24:39.200 time let's go look at that function and focus on that to in order to make our code faster the next thing I like to
00:24:45.520 look at is uh gc. stat I think kichi wrote this kichi did you write this or
00:24:51.039 was it no okay coichi wrote this he should advertise this more it's really awesome uh I like to use this this
00:24:57.919 method gives you statistics about the garbage collector so one thing that I need to know when I'm profiling rails or
00:25:04.200 making it faster is how many objects do we create so gc. stat will say like okay
00:25:09.919 I can ask it how many how many objects have we allocated in this process so this says how many objects have we
00:25:16.039 allocated so far now and this is a total so this number is always incrementing if
00:25:22.240 you allocate a new object this will go up by one it's always incrementing so what I can do is I can say okay
00:25:28.919 uh I'm going to say how many objects are in the system now I'm going to run some code I'll ask how many objects were
00:25:35.200 allocated again later and then I can just subtract those two to calculate how many objects a particular chunk of code
00:25:42.039 allocated and we I'm doing that in this particular example like I wanted to find out how many objects does it do we use
00:25:48.320 for allocating uh active record objects or how many how many Ruby objects do we allocate so the first thing I did is I
00:25:54.760 warmed up the system and the reason I did this is because we have a bunch of caches so we don't want to take these
00:26:00.039 caches into account uh so I warm up the system then I count the number of count the number of objects the total number
00:26:05.919 of objects that have been allocated in the system then we run a benchmark and say like okay end times let's look up
00:26:11.720 this person object end times then count the number of objects in the system again and then subtract and divide by n
00:26:18.799 and then we know the number of object allocations it took per call so a real world example of this is
00:26:24.520 for benchmarking views I was very interested in how many many objects were allocated every time we calculated a
00:26:30.320 view so this is The Benchmark I used for that and I said okay we're we're creating a new uh rails application what
00:26:37.399 I did here is uh this particular Benchmark benchmarks the application but cuts out the uh middleware excuse me not
00:26:44.840 the middleware the web server uh and the reason I'm doing this and not using just a rack middleware or one of the um oh
00:26:53.880 what is it the performance stuff that's included with rails is basically because that stuff is lying to you um to be
00:27:00.720 frank uh the way that I'm doing this Benchmark is I said okay I'm going to create a real application I'm going to
00:27:06.799 send one request through that and I'm going to record the environment the rack Nash and I'm going to later Replay that
00:27:12.960 the reason I'm saying the other ones are lying to you is because they're not actually instantiating all of the rack
00:27:18.200 middle Wares that are in the rail stack also they're mocking out certain things like the session so you're testing
00:27:24.120 you're testing the performance of a mock session rather than your real session you don't know for sure what your real numbers are you know in this thing that
00:27:30.640 maybe kind of acts like your acts like your real system but doesn't I wanted to get something that was as close to a
00:27:37.399 real system as possible and the reason I'm cutting out the web server is because I'm not responsible for the web
00:27:43.039 server I can't improve the performance of the web server I'm improving the performance of rails and also maybe Puma
00:27:48.720 performs better than unicorn or doesn't you can choose a different web server and plug it in depending on which one
00:27:53.960 performs better so what this Benchmark does is it benchmarks books new so this just has a books resource benchmarks
00:28:00.720 books new what we do is we uh first instantiate the application then run one
00:28:07.480 request through it in order to warm up any caches then we run this then we run our actual Benchmark which is very this
00:28:14.000 is very similar to what we were looking at previously only this is with a rails application and then we plot that so
00:28:20.679 this says like how many objects does it allocate per request and what I want to show you is how we've improved over time
00:28:27.080 or how rails has improved over time like if we look at this graph this is a graph of the objects allocated per request
00:28:33.159 along the Y AIS there is the branch so on the left is 40 stable then 41 then
00:28:40.399 master and you can see that we're dropping there and one thing that I want to point out about this graph is that uh we're actually starting at 2,000 right
00:28:48.000 there uh the other thing that I want to teach you in this talk is how to lie with
00:28:54.519 graphs so if we actually make that very bottom zero the graph looks like this and you're
00:29:01.960 like that doesn't feel very good but but the good news the good news is that we
00:29:08.279 have about a 19% reduction in objects since 4.0 or 40 stable and a 14%
00:29:14.039 reduction in objects since 41 stable and later in the talk we're going to look at why that is or how we how we got those
00:29:20.519 um how we got those performance improvements and we're also going to talk about why um well it does matter
00:29:27.760 but it doesn't actually matter uh it does matter but Al there are many
00:29:32.799 caveats we'll go over some of those the next thing the next tool that I like to use is this gem called allocation Tracer
00:29:39.039 also written by kichi uh this is this is a tool for uh finding where objects are
00:29:45.799 allocated in your system so you can use this use this gem you give it a chunk of code and you can say like tell me uh
00:29:52.559 tell me what type of objects have been allocated and where they have been allocated and how many times they've been allocated or also gced as
00:29:58.919 well uh so here's a very short example is like all right we had give object
00:30:04.440 allocation trace or a block of code run some stuff in there uh we allocate an
00:30:09.640 array a thousand times a string and a hash and then we say like okay tell me the number of tell me the total number
00:30:15.679 of allocations that happened inside that block and give me a breakdown by type of object so if we run this code we'll see
00:30:21.919 the output looks like this I've greatly reduced the amount of output it's actually a lot more than this but it
00:30:27.600 really wouldn't fit on a slide you can see like we have a thousand strings a thousand arrays and a thousand hashes
00:30:32.960 like you would expect from that code so the next thing I want to talk about we've looked at we've looked at a
00:30:38.960 bunch of benchmarking tools and these are most of the tools that I use for uh benchmarking rails and rails
00:30:47.240 applications and these are also all the tools that I use for making the performance improvements that we're going to talk about uh and the first one
00:30:54.279 I want to talk about is speeding up active record so we've we've done some work
00:30:59.600 EXC we
00:31:07.760 me I have done a lot of work to speed up active record uh the stuff I'm going to talk to
00:31:13.440 you about is actually about three years of my work so uh I spent about three years on this
00:31:20.159 particular project all the all the stuff that I'm going to talk to you about today is three years worth of work and
00:31:25.480 it actually sucks cuz I have about minutes left and I'm trying to compress three years of My Blood Sweat and Tears
00:31:31.399 down into less than 30 minutes uh so first in order to talk to you about these performance improvements
00:31:37.200 we need to talk about how active record works so this is from a high level how active Record Works let's say you have a
00:31:44.080 you're doing a post. find uh what we do is we say we take that post. find and we
00:31:49.519 turn that into a tree of objects called active record relations this has nothing to do with arrow uh except that we take
00:31:57.000 those active record regulations and then we turn them again into a different tree this time using arel arel is actually a
00:32:03.919 representation of the uh SQL statement that we're going to execute uh it's an
00:32:09.039 as of the SQL statement so we take those active record relations construct an as from that as we compile it down into a
00:32:16.159 string that string is a select from that actually gets sent down to the database the database responds with some records
00:32:22.320 we turn those into active record base and then return those back to you so this is from a high level a high level
00:32:28.200 view how active Record Works uh so part one of this performance improvements was basically bind
00:32:34.279 parameter introduction and you may have noticed this I think in 31 or 32 is
00:32:40.240 before 31 or 32 you'd see in your logs like stuff like this so if you did post. find one you'd see a SQL statement that
00:32:47.000 looked like this um and what's interesting about this these statements is that the only thing that changes in
00:32:53.760 these SQL statements are actually these numbers the IDS that were pass passed in now in later versions of rails I think
00:33:00.919 31 or 32 and up uh you'd see something that looks like this where we'd say Okay select star from post where ID equals
00:33:07.880 question mark and you'd see like some Square brace ID and some number and what this was is we were
00:33:13.799 introducing bind parameters to this the point the point of introducing bind parameters was to separate static and
00:33:20.120 dynamic content so we have this SQL statement that's mostly static and then we have a little bit of dynamic content
00:33:27.480 the theory the theory being that well if we look at this if we look at this view
00:33:33.159 of how active Record Works these things that generate the SQL statement this active record relation every time we're
00:33:39.799 executing this active record relation and these Arrow statements all to compile down to the SQL statement if
00:33:45.679 that SQL statement is always the same if all of those calculations are always the same then we should be able to calculate
00:33:52.240 all of that if the SQL statement is the same every single time we run this why are we doing the calculations over and
00:33:58.519 over again we should be able to cach that particular computation so step two
00:34:03.960 that was a theory it did not exist in part one it was just a sparkle in my eye
00:34:09.720 um so part two is about code decoupling unfortunately we have very complex code
00:34:16.919 I'm going to show you some code and I want you to read this I want you to read this code example very closely uh it's very important
00:34:25.879 here it's all readable you can read that in the back right so obviously don't read
00:34:32.480 this what's important here is these this this is one method that is one method
00:34:38.240 it's one method and what those arrows are pointing at those arrows are embedding Dynamic content into our SQL
00:34:45.320 statement so we wanted to get rid of that Dynamic content if we ever hope to cach these calculations we need to get
00:34:51.960 rid of that Dynamic content so the only way that we could do this is by refactoring why do I keep saying
00:34:58.480 we the only the only way to do this is to
00:35:04.400 refactor this so that we could extract those extract those Dynamic values and this every time I've stared at this
00:35:10.880 function I just thought of Lord of the Rings one method to rule them all in Legacy code find them
00:35:16.560 so the way to reduce the complexity of this particular method was what I did is I removed hasn't belongs to many I am
00:35:23.320 sorry it is gone but it's not actually gone don't worry if you upgrade your rails applications the method hasn't
00:35:29.280 belongs to many actually exists still but what we do is we translate that under under the hood to has many through
00:35:35.599 the reason we could get away with this is because has and belongs to many actually is has many through if you think about it so has many through has
00:35:42.839 three tables but so does hasm belongs to many it's just that in the hasn't belongs to many case we don't we don't
00:35:48.440 have a middle model there so what we do now is whenever UK call has and belongs to manyu we generate an anonymous middle
00:35:54.800 model uh so if you dig around in the core of rails and you find that middle model don't touch it um we generate that
00:36:03.000 model and then your your has belongs to many are now translated into has many through and actually uh this is one I'm
00:36:09.960 just showing you this this is one of my favorite diffs um it's just all red we're deleting stuff deleting stuff uh
00:36:17.000 just made me super duper happy the reason it made me really really happy is because we're deleting a whole bunch of
00:36:22.560 conditionals they're just these conditionals just went away now we don't have to think about hasn't belongs to many anymore in our code we're able to
00:36:29.720 get rid of that and it made extracting that Dynamic content much easier so part
00:36:34.880 three of this was now that we've got that Dynamic that static and dynamic content gen separated we can introduce a
00:36:41.880 cache so I'm going to show you an example of this uh cach code this is a this is our cach code
00:36:49.880 example what it is is we generate an active record relation object here so we say Okay person. Weare name uh with some
00:36:57.000 V parameter we limit it to one this creates an active record relation object it's a relation object what we do down
00:37:02.560 here is we're able to execute that same relation object multiple times but using different values
00:37:09.240 so we're able to execute it with Aaron and with eie it it'll go against the database but it actually only executes
00:37:15.640 that block once it calculates that relation only once so we generate a cash object we
00:37:21.440 calculated a cache object from this relation this relation I'm going to show you what the cache object internals are because this is uh fairly important to
00:37:29.200 people upgrading and to people who are watching their memory and also to us on the rails team uh the the cache object
00:37:34.800 internal looks like this uh we keep a list of the bind parameters the reason we we keep a list of these bind
00:37:40.560 parameters along with columns is that we need to be able to typ cast things so let's say uh when you do person. find
00:37:48.200 you'll typically pass in Pam's ID right and that ID when it's coming in from the
00:37:53.280 URL is actually a string and we need to know how to typ cast that before before we send it off to the database so we we
00:37:59.200 look at we look up the ID column and we say ah the ID column is an integer and you've bound this particular place to an
00:38:05.000 integer so what we're going to do is whenever you pass in an object in that particular place we're going to cast it
00:38:10.359 to an integer before we send it to the database the other thing that we cach is the compiled SQL so this is actually the
00:38:16.480 SQL statement that's sent to the database and that's it it's just a string so we take this cache object we
00:38:23.960 have a higher level cache uh and by the way that code that code that I showed you on the previous slide this cash
00:38:29.359 stuff does exist internally to rails don't touch it you can use it if you want to uh but it's absolutely not
00:38:36.480 supported we may change the API at any moment but you can do exactly the same things I'm showing you uh but don't tell
00:38:43.960 anybody I said you could do that please this isn't being recorded is
00:38:49.319 it all right so what we did next is we we we we I I updated the internals to
00:38:55.960 use to to cach these particular relation objects so where can we use this cache
00:39:02.839 where let's look things that use relation objects post. fine post. fine
00:39:08.480 by SAR has many has many through hasn't belongs to many belongs to actually I'm missing one from the list is post. find
00:39:14.760 by and then you pass in a hash there that that syntax as well uses the uses relations I omitted that from the list
00:39:21.079 for some reason um but all of these are cachable they're all cachable uh so I'm
00:39:26.800 going to show you an example implementation this is from very similar to what we have inside of rails so for
00:39:32.400 person. find this is the this is part of the implementation for person. fine if you look at this we'll see right right
00:39:39.160 there in the middle we have a relation object and you can see we're we're saying where the primary key is equal to
00:39:45.400 some bind parameter and we limit that to one so this code looks almost exactly like what you would write in your
00:39:51.000 controller you might say post. wear ID limit one we're actually using this in
00:39:56.240 our our internals so then the next part that you'll see down here is the ID parameter this is our Dynamic parameter
00:40:03.160 this is the ID that you passed into post. find so we pass that in and we
00:40:08.200 execute it and just as a side a fun fun side note uh there are many different
00:40:13.760 ways to that you can call post. fine and I'm going to show you the many different ways you can call it with uh an integer
00:40:19.560 you can call it with an array with an array of arrays with a hash of integers you can call you can call it with
00:40:24.800 another active record object you can call it with an integer with a block you can call it with a scoping inside of a block uh and those last two you can call
00:40:32.960 it with any combination of those ones above it on those last two as well I
00:40:38.400 just didn't want to put that many heart points in my slides and the fun thing is that that only that top one is cachable
00:40:45.280 that's the only one that's cashable the rest of these I have no idea I have no idea what
00:40:51.119 they're even used for what do they I I don't even know what they mean it anyway anyway so what's very
00:40:57.680 annoying about this is that if you go look at the the implementation of fine today in active record you'll find this
00:41:04.200 this giant conditional that's like okay if we got a block if we got an array if we got all this other stuff then we
00:41:09.960 can't cash it if we slimmed down our API and got rid of those things that I don't
00:41:15.119 even know what they're for if we got rid of those things we could just delete this code it would just go away it would
00:41:20.560 be gone and we could just have caching all the things all the time so let's let's look at the performance of active
00:41:26.520 record now after these these changes been applied these changes are in master today uh and they'll be coming out with
00:41:32.560 rails 42 whenever we release that which I think is this year sometime uh so let's look at post. find and find by
00:41:39.720 name and what I did is I I executed these two uh I benchmarked these two together and the reason I did that is
00:41:45.480 because they're essentially the same thing one is finding by ID one is finding by name the queries should be mostly the same uh and I also chose
00:41:54.319 these particular forms because this code these two lines actually work all the way back to rails 23 so I was able to
00:42:01.640 take this code and Benchmark it against rails 23 through every single version of rails all the way until today uh and
00:42:09.000 this is what it looked like so if we go from two3 stable all the way over to the right hand side which is the
00:42:14.520 experimental Branch you'll see that for posttop find the number of calls per second it actually decrease from 2 three
00:42:21.280 stable so this is iterations per second like we talked about earlier so higher is better so we started out higher with
00:42:27.640 2 three and we actually went down through the 3 Series and we came up a little bit through 41 stable and then we
00:42:33.280 increased here on the experimental Branch by a lot and if we look at the same thing for postop fine by name if we
00:42:40.240 look at the calls per second there we we'll see that it goes way down way down there into the three series comes back
00:42:45.599 up a little bit and then on Master we just increase a lot uh and one thing that I thought was actually very
00:42:51.720 interesting is that if we look at calls per second with myql so I'm calling out myql in particular particular um there
00:42:58.200 was a strong bias towards MySQL in the in the rails 2 three days MySQL was much
00:43:03.720 faster than the other databases and I think this is just because most people used MySQL back then but now postgress
00:43:09.800 is a million times better uh use postgress please because it is awesome uh so if we look at we look at MySQL you
00:43:16.720 can see way down here like super sadness in the 31 and 32 series it's just super
00:43:22.160 duper sad uh I'm really sorry
00:43:28.440 seriously like what anyway so we went way down and then come way back up here
00:43:34.520 and you'll see that it's actually faster it's faster today on Master than it was in 23 but only slightly faster it is
00:43:40.400 faster but slightly faster and if we look at the percent faster than 41 stable this is this is what it looks
00:43:45.960 like the percentage faster than 41 stable uh so you can see we're faster
00:43:52.079 all the way across the board for every single database than 4 One stable if we look again against two three stable will
00:43:57.520 again we're faster across the board but not much faster for MySQL the other thing that we looked at is object object
00:44:03.760 allocations for call so if we look at object allocations per call you'll see like they went way up in the 31 series
00:44:10.800 and we can't say necessarily that allocating more objects caus the run
00:44:15.920 time to go down it doesn't have a one to one a onetoone mapping because maybe the garbage collector is faster we can't say
00:44:22.680 that these are absolutely correlated but it does have it does cause your process to do more work so you can see these
00:44:29.240 objects allocated went way up and then came way back down same thing the same thing with fine by name and when we look
00:44:35.599 at these numbers we find that actually rails Master today creates 70% fewer
00:44:41.319 objects than 41 stable when you do a post off fine and 55% fewer objects when
00:44:46.400 you do uh than two three stable so I think that this is a really good a really good achievement uh the
00:44:52.960 next thing I want to look at is belongs to we can see the exactly the same performance with belongs to it goes down
00:44:59.240 for the 30 stable and then way spikes way up and then if we look at belongs to percentage faster than uh 23 and 41
00:45:07.240 stable we're just faster across the board uh and there's not a number for my scale 2 because myql 2 didn't work it
00:45:13.319 didn't exist on 23 stable again we'll look at has many and has many through exactly the same things here where we've
00:45:20.040 improved so much uh this is has many calls speed over time again same same
00:45:26.160 sort of graphs with has many through it looks nearly the same uh percent faster than two three stable again much faster
00:45:33.359 much faster than 401 stable the next thing I really wanted to look at is I wanted to look at has many through
00:45:38.680 growth so what I mean by has many through growth is like let's say we have a has many through like a has many
00:45:45.440 through relationship here right what happens if we add another has
00:45:50.680 many through relationship and then another has many through relationship and another one and another one and another another one and we measure that
00:45:57.280 performance as the number of has many relationship or has many through relationships grows what does that look
00:46:02.559 like so if we look at the time it takes to make a 100,000 calls to a particular
00:46:08.240 hasman through relationship and we plot that we'll see that it looks like this so 41 stable is going across going
00:46:15.520 linear or linearly increasing whereas the bottom one master is just constant time this makes me extremely happy
00:46:21.720 because it's my golden unicorn of constant time constant time improvements so I'm
00:46:27.400 really excited about this the tldr to take home from this is that we're about 100% faster
00:46:36.760 uh 9,000 over 9,000 9,000 1% better so
00:46:44.160 challenges the challenges for this was that we had to trade we had to trade memory for
00:46:50.040 Speed it's pretty clear because we were keeping these cash objects around and so I wanted to think about how we would how
00:46:56.280 much memory are we trading so I'm going to give you a very simple formula for calculating how much memory that we're trading and unfortunately I can't tell
00:47:02.760 you this is this is the formula I can't tell you exactly but if we can calculate our our total calf size to be about the
00:47:09.559 number of find by calls that you do in your system or find by hash that number
00:47:16.079 uh the number of those multiplied by the size of the cache object so I can't tell you exactly what the size of the cache
00:47:21.640 object is because it depends on the size of your SQL query uh and I can't tell you how many fine buys you have in your
00:47:27.359 system but the important thing is that it should be bounded if it's not bounded then you
00:47:32.960 have a problem not bounded would mean that a user can say like can query
00:47:38.480 against a user Define a or that somebody could pass in a column from the URL and
00:47:44.240 that might be bad you probably don't want users to say like I don't know query by whatever they want to these
00:47:50.920 this cach size should be bounded now the next question is can we can we cash raw
00:47:56.000 relations so lots of people ask me like hey Aaron uh you were able to make posts out fine fast and find by ID fast and
00:48:03.400 all that fast why can't you just make why can't you just make post. wear. wear. wear. wear fast and my question is
00:48:11.280 why don't you make it faster uh anyway so let's talk about that a
00:48:19.280 little bit let's look at an example controller uh here's an example controller where we say person outware find by name and then get a list back of
00:48:26.520 people by name so why can't we make this faster uh people say like we want to
00:48:33.680 know can we make this faster can we cash this particular thing and the answer is we can cash this we can cash that we can
00:48:40.720 make that faster so I put together an experimental branch that that attempts
00:48:45.920 this and I'm going to show you the performance difference between the two and talk about the complexities with
00:48:50.960 both of them so let's say we have this experimental Branch uh we you can see
00:48:56.079 this is again iterations per second so higher is faster the experimental branch is faster we cach that it's faster it's
00:49:04.240 faster than master we're happy it's about 30% faster okay the problem is the
00:49:09.880 problem is each time you call a method on a relation object that impacts the cache key so if you say where do select
00:49:17.200 do include do whatever do whatever do whatever we have to take all of those particular values into account when we
00:49:23.079 calculate the cach key so let's look at the code on the experimental Branch this is up there in blue is the cash key
00:49:30.319 calculation and down there in red is the actual query execution so what I found is that
00:49:36.720 there's about over 11 variables impact the cach key there's 11 different ways
00:49:42.799 that you can change that you can change a relation object and I'm being conservative I'm not 100% sure that
00:49:48.000 that's correct so that code that I showed you that code only handles two types that
00:49:53.440 code only handles two of those variables imagine if it handled all 11 of those it may be that calculating the cash key
00:50:00.040 might be way more expensive than actually than bothering with it I mean why bother if it's so expensive to
00:50:05.640 calculate the cash key so I did one more experiment do one more experiment finder
00:50:11.559 comparison so what this is is this this is a benchmark that runs where name
00:50:16.880 versus find by name uh and this is against the experimental branch that has
00:50:21.920 both the performance improvements that we talked about earlier along with uh the performance improvements post
00:50:28.079 outware person outware I compared those two and if we look at that we run this
00:50:33.720 this is again iterations per second so higher is faster or higher is better on the right side is fine by name on the
00:50:40.200 left side is doing uh post. find this is even with our new caching code put into place and still find by name is three
00:50:47.480 times faster than the previous version so the reason that it's so much
00:50:53.079 faster is because even though we could cash that stuff every time we execute that particular block we have to create
00:50:59.319 a new active record a new record active record relation object we always have to do that so the question is should we
00:51:06.200 even cash these relations should we bother with this code and I'm not sure I don't know that we should I personally I
00:51:13.040 don't think we should I don't think the complexity is worth it what I what I would prefer is I would prefer we have a new we introduce a new API and the new
00:51:19.319 API would look something like this where we say okay uh let's cash a query let's have a cached query we'll put a relation
00:51:25.880 we'll create a relation object inside that and we'll cach that and then we can execute it multiple times many of you
00:51:31.559 might be looking at this and saying hey that looks very similar to a scope and why don't we just use Scopes and we
00:51:37.400 can't do that because people expect that Scopes are executed multiple times where this block is only executed once for
00:51:44.640 example you might have a scope that's like give me all of the comments for today uh well you probably wouldn't be
00:51:51.200 happy if today was the same day every single day that wouldn't be it wouldn't be very fun the other thing is that the
00:51:57.559 cash key is extremely easy so I think if we introduce a new API like that the cash key is extremely easy plus you get
00:52:04.280 to keep using the nice relation API relation API is maintained so the next thing I want to
00:52:10.559 talk about is speeding up helpers like that was all active record stuff um and
00:52:17.000 I'm pretty after three years of doing that stuff I'm pretty burnt out on active record uh so I'm looking at I'm
00:52:22.839 looking at speeding up helpers in more of the view end side so this is like this is stuff I've been working on very
00:52:28.720 recently um so one of the things that I wanted to do is reduce uh object
00:52:34.480 allocations so I was doing Pro I was profiling request and response time like
00:52:39.640 I was showing earlier and I want to look at look at a little bit of that what I did is I said okay here's our test code
00:52:45.720 again we're creating a new creating a new uh application here running running one request through it running stack
00:52:52.599 Prof against that and what I found is that I'm running stack Prof the tool that we looked at earlier I wanted to
00:52:58.079 see look at bottlenecks in our rendering system so what I found is I found uh our
00:53:04.319 benchmarks look like this or our results look like this at the very top there we have this object called active active
00:53:09.680 support safe buffer initialize that is the very top line it's chewing up about 9% of our processing speed every time so
00:53:16.319 we're calling initialize a lot so I said okay where are we calling active support safe buffer initialized I need to be
00:53:22.640 able to find this how did I find this so one thing I use is I use a tool called Trace Point that's in Ruby 21 I'm not
00:53:31.079 sure when it was introduced but it's in later versions of Ruby and I use this code to say okay every time every time
00:53:38.280 we get a call uh I check to see if the call is on the active support safe buffer class and if the method is
00:53:44.720 initialized if it's that class and initialized I'm going to say give me the call stack tell me what the call stack
00:53:51.200 is I get the binding and a Val caller I'm sure kichi is cringy ing at my code
00:53:56.760 was really super terrible but I say okay give me the collar and down here below those two those two lines at the very
00:54:02.160 bottom are our tests so we can see what the actual output like output is from this and if we look at the output we'll
00:54:07.720 see okay uh we have two calls from that one is inside the HTML safe method and
00:54:13.960 the other one is actually inside our test script on line 13 so I found the inside of rail this is
00:54:19.839 this is actually our test script and where I found these inside of rails in rails I found them inside this method
00:54:25.880 called the tag options method and what the tag options method is is it's a
00:54:31.640 helper method that generates the actual the attributes inside your tags so I
00:54:36.720 don't know any any of the attributes that are inside your tags and it actually comes from inside this Erb
00:54:43.280 u.h uh method in order to talk about this I should hurry along I have a million billion slides left okay so
00:54:50.520 let's talk about HTML sanitization in rails very quickly um HT HML standardization works with an a with an
00:54:57.000 active support safe buffer so if we look at an ordinary string in rails we'll say like okay we check the string the string
00:55:03.079 type we ask if it's HTML safe it says false the safe buffer if we use a safe
00:55:09.040 buffer we say like okay give me an HTML safe version of this string we get back a safe buffer we ask is that HTML safe
00:55:15.280 it says true so an important thing to note here is that HTML safe just tags the string if you say HT if you say
00:55:22.559 string. HTML safe what you are saying is is I am okay for these bites to be
00:55:28.119 output across the wire without being escaped okay it does not mean that it is
00:55:35.880 safe despite the name it does not mean that it is safe it just means that we will not rails will not Escape it when
00:55:42.160 we send it out across the wire so what does Erb utils h. do do what that does
00:55:48.280 is it says okay if the string is not HTML safe then will G sub it will say
00:55:54.119 like okay Let's Escape ape everything and then we'll call HTML safe on that and return to you an active support safe
00:56:00.440 buffer and what happens here is we're actually creating two different strings a safe buffer is a subass of a ruby
00:56:06.480 strings so we say okay we do a g sub that creates the string a new string object with the escaped code in it and
00:56:13.359 then we create an active support safe buffer here so it actually creates two strings so how does this relate to tag
00:56:20.240 options if we go back and look at TAG options we assign the return value of that to this we say Okay Escape it we
00:56:26.799 get it back here into value and then we actually interpolate it down here into another string so what's happening is
00:56:33.960 we're taking a string we're generating a string we're generating a safe buffer then we're generating another string
00:56:39.440 it's like so many objects everywhere so I was wondering to myself what is the
00:56:45.400 point of generating the safe buffer if we're just going to interpolate it back into a straight up Ruby string we can
00:56:51.280 remove this safe buffer we can remove that completely so we ended up extracting that and say like okay let's
00:56:57.440 just do string to string to string I guess it's better it's one
00:57:05.520 better anyway so we extract a method say like okay unwrapped HTML Escape we
00:57:11.480 extracted the inside of HTML escape to say like okay let's just have a method that escapes the string but doesn't
00:57:17.599 return a safe buffer so we just say we just do the G sub and then we're still backwards compatible the original method
00:57:23.720 calls this unwrapped HTML and then and then set calls HTML safe on it then we update our callers to just call the
00:57:30.400 unwrapped version call the unwrapped version gets assigned a value value gets
00:57:35.960 interpolated and now we're just doing string to string to string to string so what happened was we were
00:57:42.520 looking at about 200 allocations per request 200 string allocations per request for or
00:57:48.559 this saved about 200 allocations per request for books. new what this is is just a scaffold page what I'm testing
00:57:54.520 here is is a scaffold if you had generated a book scaffold and you're looking at that particular form so that
00:58:00.119 alone saved 200 allocations per request and if we look at our request Benchmark
00:58:05.400 looks like this the way that I benchmarked this is I said okay we do exactly the same warm-up uh we're we're
00:58:11.760 requesting books. new warmup here using allocation Tracer here to get the number
00:58:17.440 of uh number of allocations per request and if we break this down by object type we can see this is what our this is what
00:58:23.720 our object type alloc look like a cross verion so blue is 40 stable green is 41 stable and then whatever that orangey
00:58:30.920 thing is is master so you can see our number one problem is t- strings and we're reducing these we're reducing
00:58:37.359 these objects as time goes on so we're improving we're improving rails here and this I this graph doesn't look super
00:58:44.720 impressive but remember it's a 90% reduction since 4.0 stable remember that that's very large so is 14% reduction
00:58:52.280 since 41 stable I'm very happy about this now unfortunately unfortunately with this particular change your mileage
00:58:59.400 may vary so I said like what we were talking about here is string buffer allocations inside of tag options and
00:59:07.280 obviously this depends on what your HTML looks like so I can tell you that this
00:59:12.359 this saved us 19% 19% of the strings uh for this particular page but what it
00:59:18.920 does for your application I'm not sure like if you're just generating Json this will save you nothing but if you're
00:59:24.920 actually generating an HTML page it may save you a lot depending on what your your HTML looks
00:59:30.119 like so the next thing I want to talk about is string object reduction like what happened between 40 and 401 we
00:59:35.839 didn't talk about that now man so many slides let's go through this quickly all
00:59:42.000 right so so I get extremely nervous before my talks and I just think like okay if I
00:59:50.400 add more slides everything will be okay
00:59:55.440 I'm extremely paranoid I'll get up in front of everybody take like 10 minutes and be like I'm
01:00:00.640 sorry but now I have millions of slides okay so in Ruby we have mutable strings and what that means is that every time
01:00:06.799 we evaluate this block we get a new object allocated this is very apparent look at these object IDs they're
01:00:12.079 different every time that block executes it shouldn't be a surprise to anybody but what's really cool is in later
01:00:19.160 versions of Ruby I think it was introduced in 21 there's now a compiler trick in there that says if we freeze
01:00:25.280 the strings we actually get just one object back so you can't mutate you
01:00:30.760 can't mutate that string anyway so what we said is okay uh every time the
01:00:36.720 compiler comes across a string literal with a freeze on it we're just going to return the same object to you every time
01:00:42.599 we won't allocate a new object every time that's evaluated and you can see that the object IDs are the same every time so the way we took advantage of
01:00:49.839 this in rails is if you look at Erb templates we have stuff this is an example Erb template here is a compiled
01:00:55.720 template that I a compiled version of the template that I want you to read extremely closely that is a joke uh so this is the
01:01:04.839 compiled version of the template we took the Erb and compiled it down into Ruby code and if we look at that if we zoom
01:01:10.839 in on it we'll see something that looks like this we have a string literal there that's the actual HTML that TR or that
01:01:16.839 TD that we had in the Erb it's the HTML literal and what's interesting about
01:01:22.559 this is you can't mutate it it's never mutated template literals
01:01:28.039 can't change so what we did for 4.1 we me I'm not sure if anybody
01:01:35.240 noticed I did this I noticed I did it um anyway we added freeze in there so these
01:01:41.359 string literals are now Frozen so the HTML literal is frozen and
01:01:46.440 we actually reduce that's where these savings came from the 40 to 41 that's how we reduced that there the next thing
01:01:53.160 that we're that I'm looking at is is speeding up output and I want to warn you this is a work in progress um the
01:01:59.640 code I'm going to talk about does not exist on Master today it exists on my machine so hopefully my machine does not
01:02:05.680 die um or maybe I should push a branch somewhere I'm not sure anyway so I'm
01:02:10.920 going to talk about how the law of Dem meter is helping us speed up our code and really this is just a I think this
01:02:17.079 is just a suggestion of demer I don't like that it's called law of the meter because it's like if you screw that up
01:02:22.760 it's like somebody going to come along and ARR you for breaking the law uh well it turns out in the United
01:02:28.599 States they will and then I wondered to myself like so if you get arrested for if you get
01:02:35.480 arrested for violating the law of the meter does that mean you're doing Arrested
01:02:48.279 exactly what the what the definition of uh law of the meter is like I'm not I'm totally not sure about it I know it's
01:02:54.680 like like you don't want to have too many dots or something like if you're calling too many methods down then
01:03:00.359 you're doing something wrong which is probably true but the way I look at it is it's not about the dots it's about
01:03:05.880 the types that your function handles um so like what I think is that the the
01:03:12.440 fewer types that your function actually handles uh the faster and easier your code is going to end up being so what I
01:03:18.440 mean by that is we'll look at this compiled template again that you can totally read very well and we see this we have this HTM literal here again
01:03:25.799 what's interesting about this is if we go look at the safe append method that safe append equals method and why there
01:03:31.319 is an equal sign there I have no idea super stupid we should probably
01:03:37.039 remove that but if we look at this it it has a line there look at that line it says return self if value.
01:03:44.400 nil why guess what value can never ever be
01:03:49.839 it can never be nil the Erb compiler guarantees that this will never never be nil so why are we doing this why is that
01:03:56.960 there we don't need to handle nil this is a type that we don't need to deal with so we just say like I don't care
01:04:02.920 about nil I don't handle it if you pass a nil to me whatever I don't care so we
01:04:08.680 just remove that we don't handle NS just delete that why are we doing this runtime check we're checking that every
01:04:15.520 single time your Erb templates every time you have that HTML literal we're checking whether or not that's nil what
01:04:20.839 a waste it's a huge waste so I have to realize that's that probably wasn't
01:04:26.039 actually law of the meter that was I think that was actually more defensive programming but um my joke would not
01:04:32.000 have worked with defensive programming so I use law of the meter uh now the
01:04:37.079 other thing that we're working on with this is this output buffer this thing that it's appending to happens to be a
01:04:42.319 subass of ah God this kills me happens to be a
01:04:47.799 subclass of safe buffer that we were looking at earlier which is also a subass of string now if you look at safe
01:04:53.160 append in the super class class it's saying hey if I'm not HTML safe if this safe buffer is not HTML safe then do
01:04:59.599 something special but this can only happen this this conditional only returns true if the safe buffer has ever
01:05:07.160 been mutated well it turns out that you can't actually get access to the output buffer we don't give you access to that
01:05:14.559 which means that you can never mutate the output buffer you can never make this conditional will never return true
01:05:22.000 it'll always be HTML safe so why do we have that it kills me I look at this I'm
01:05:28.319 like how can you how can you even get to the output buffer I look at this I'm like I look through rails I'm like how
01:05:33.520 do we how can you get access who mutates it and I think the answer is no one so we should be able to just reduce this
01:05:39.839 conditional to that so if I could sum this up what I
01:05:45.559 think that you should be doing in your applications in order to increase speed caching in variance anytime that you can
01:05:52.680 find a calculation that you're doing over and over again it always ends up being the same you should cach it
01:05:58.680 eliminate objects um and where this comes in is no code is faster than no code if we can
01:06:05.039 delete code it's going to be faster uh limit types limit the types that your
01:06:11.880 functions handle like if you can reduce the number of the types of objects that your functions handle it'll make your code you'll have less code and just as
01:06:19.160 we learned a few slides earlier less code is equal to faster code and the other thing that I want you to do is now
01:06:24.559 that I've taught you about all of these Performance Tools all these tools that you can use to uh test the performance
01:06:30.079 of your rails application report performance issues to us please that dip
01:06:36.599 when we were looking at those those graphs that dip from 23 down to 3 and 31
01:06:42.000 that should never have happened I it should never have happened in my opinion and I think the reason is that nobody
01:06:48.400 was measuring their applications nobody was saying like hey people were like ah rails seem slow
01:06:55.240 it's gotten slower and I hate it when people do that they say the reason I hate it is because saying something like
01:07:01.680 rails is slower is not helpful to me so you say to me Aaron rails is slower I'm like
01:07:10.160 cool I'll just go speed up rails I
01:07:22.119 so if you can tell me what is slower we can make it faster so I actually heard
01:07:28.240 this I it warmed my heart I heard yesterday that GitHub was finally upgrading to rails 3.0 and I know that
01:07:34.240 they had tried this a long time ago they're upgrading to rails 3 and I talked to my friends that work there and I said well why couldn't you upgrade and
01:07:40.359 they said well it's slower and I'm like what's slower no response thanks I'll
01:07:47.200 get right on that anyway rails 42 will be the fastest
01:07:52.520 rails ever I think I think this will be the fastest rails ever I confident saying
01:08:00.760 that so thank you uh if you have any questions about this stuff like I don't
01:08:06.920 know if we have time I'm a little bit over an hour uh otherwise I'm going to be at the party later so please come say
01:08:12.680 hello to me I will give you a sticker we can talk about performance issues or whatever you can just say
01:08:31.920 you thanks thanks Aaron um any questions for him don't be
01:08:45.080 questions yes can I talk oh yeah um so so you showed a few things you think are
01:08:50.759 bad DSL or API design choices in rails like so even in rails 5 you could uh
01:08:56.679 pull your your veto card and get rid of something what would it be you think I
01:09:02.080 have a veto card yeah I don't have that uh if I could pull something from rails 5 what
01:09:08.880 would it be um oh I have no idea I can't even tell
01:09:15.839 you well actually some of the stuff like we provide we have some methods for doing introspection into active record I
01:09:23.400 wish that we could remove those because some of those things just don't make sense but people like uh for example one
01:09:30.040 of the things I broke is that somebody was saying iterating through all the reflections on active record looking for
01:09:35.719 hasn't belongs to many ones and now that hasn't belongs to many is implemented in terms of has many throughs these didn't
01:09:41.640 show up anymore so I broke their code if I could remove some of this introspection stuff I think that might
01:09:47.239 help uh the other things I'd like to do is um honestly the next thing I'm working
01:09:53.319 the things that I really want to get rid of are things that impact thread safety really I can't give you anything super
01:09:59.360 specific now but anything that impacts thread safety I'd like to remove so
01:10:04.600 globals I know that's not super specific but I hope that answers your
01:10:12.239 question I had a question yeah so uh on those uh performance graphs was that all
01:10:17.600 with the same version of Ruby yes it was have you uh tried it with uh new version
01:10:24.120 of Ruby just to see what the garbage collector does uh that was with the newest that was a trunk Ruby okay um but
01:10:30.719 so all those tests were performed with trunk Ruby um they should be the graphs should
01:10:38.920 probably be similar on any particular version of Ruby maybe not exactly the same numbers but still the same
01:10:44.480 percentage increase hi hey all your slides spoke
01:10:51.360 about performance Improvement but after Master you also adequate yes talk a little more about
01:10:56.880 that uh so I originally named this project adequate record after my consulting company adequate we do
01:11:03.800 everything adequately uh we have just enough
01:11:10.040 clients so don't ask me to do work um no
01:11:15.080 we named so we named it after we named I named it after that um and now that it's
01:11:22.600 been merged back into to master uh it's dead so Master is it all those numbers are from
01:11:28.360 master so most of the performance improvements are actually inade adequate record and it's or just is that separate
01:11:35.560 gem anymore no no no um adequate is just a branch name it's actually all of it
01:11:40.920 was just merged back into master so it's all there is there's no such thing as
01:11:46.000 adequate anymore it's not adequate it's it's just rails and and
01:11:52.639 this is the first time I'm seeing seeing you informally dressed what happened to your suit what happened to my suit I'll
01:11:58.120 tell you what happened to my suit is like extremely hot temperatures and high humidity is what happened to my
01:12:04.639 suit I would I would die up here if I was wearing that
01:12:13.800 thanks so does does the rails core team have is there any discussion around uh
01:12:18.880 automated performance regression tests so um yeah we've talked about so we've
01:12:25.239 talked about that but some of the problems are like so theoretically we have
01:12:32.159 performance tests we do have performance tests but they test things like uh if I remember correctly they only test active
01:12:38.080 record stuff and uh we never look at the output so uh
01:12:44.639 it doesn't doesn't super help very much and we don't have historical data we don't have historical data for them either so
01:12:51.239 we don't know across particular branches as far as far as I know like uh the stuff that I've showed you all here is
01:12:56.880 the most comprehensive most comprehensive performance tests backwards in time uh we've talked about
01:13:04.600 another problem is we can do another problem with uh doing performance test
01:13:10.000 like that is API API stability so like
01:13:15.400 uh when I was showing you controller performance tests from 41 and or 40 41 and master I actually couldn't show them
01:13:22.920 to you for 32 and 31 because the API changes in rails made it such that I
01:13:28.280 would have to do major work on my benchmarks in order to show you those tests over time
01:13:34.639 so uh the answer is yes we've talked about it and it is hard but if someone
01:13:42.520 can come up with a good idea like please we'd love to have that uh I saw a talk
01:13:50.159 of yours at some point and you were talking about actually uh submitting uh prepared statements so that the database
01:13:57.199 could uh could cach the query plan and I was wondering I think at the time you
01:14:02.600 guys decided not to do that uh in active record is that that is not true we do that that is not true okay yes we do
01:14:09.040 that uh we do we do do prepared statements and you can shut them off if you don't want to use them but we do
01:14:15.280 them by default except on MySQL um yeah we can talk about that we
01:14:21.639 can talk about how awesome MySQL is at the party hello I know you're I know you're
01:14:35.400 that can we give it up for uh Aaron again thank
01:14:52.280 you
01:15:04.760 m
Explore all talks recorded at Red Dot Ruby Conference 2014
+20