00:00:19.320
good morning um I just wanted to say thanks
00:00:24.359
to Winston and to Anna and Alan and JY and everybody for organizing this it's
00:00:30.240
been an excellent conference I'm having a great time um this talk is about what we do at
00:00:39.200
discourse that is kind of moving away from the rails conventions so I called
00:00:45.320
it off the rails um it is also there are also some crazy things here so the name is pretty
00:00:54.280
apt um a little bit about myself uh I've been doing open source
00:00:59.600
devel vment full-time for the last three years um I've co-founded discourse with
00:01:06.159
Robin and Jeff Atwood uh I worked remote now for the last six or seven years I haven't gone
00:01:13.640
into an office uh I've been I'm fascinated with performance problems and squeezing the
00:01:20.680
extra little last millisecond out of the app um in a previous life I worked at
00:01:27.000
stack Overflow and my Twitter account is Sam saffron so a little bit about this
00:01:34.119
discourse thing that I just mentioned um you can see here this is the picture of
00:01:39.200
forums today this is one of the most popular forums in the world um this is what everybody tends to
00:01:46.200
use um I'm I'm sure you've all seen this before Google Groups um and we have
00:01:53.479
mailing lists where which are very hard to navigate and discour came out to try
00:01:58.920
and um reboot this space and create something modern and um nice that is
00:02:07.360
very easy to use and has all of those features that we'd expect an app that was built today to
00:02:14.319
have um we're strictly open source unlike a whole bunch of um other things
00:02:21.519
there's only one version of discourse the awesome version um and you can Fork it and work
00:02:28.480
on it if you want and this actually leads to this uh we have a happy Richard stalman unlike the sad Richard stalman
00:02:36.360
that we had yesterday
00:02:41.400
um so I I wanted to talk a little bit about um a couple of years a a few years
00:02:46.920
ago dhh had this blog post talking about rails as omasa and he chooses all of the
00:02:53.280
wonderful dishes and uh he's a sushi chef and if you don't like this unagi
00:02:58.560
then don't have unagi but you know you have to be more or less um in think in
00:03:04.720
those terms that you know he's choosing the things for you and and where I struggle with it is that analogy it's
00:03:11.080
very hard to explain what's going on in terms of that analogy for example we
00:03:16.640
have to Monkey patch all sorts of parts of rails to make it a little bit faster and how do I explain that in terms of
00:03:23.239
unagi it's very hard for me to do that so um what we do as we bolt on all sorts
00:03:31.799
of pieces to make our development life much more fun uh we bolt on all sorts of pieces
00:03:38.879
that are missing from rails that give us a better caching story better profiling story uh we've got all sorts of gems
00:03:45.640
that we install that give us completely free performance benefits without paying any kind of cost and they're not
00:03:52.920
included in general in rails because um they are MRI specific and rails in
00:03:59.400
general will not take in MRI dependencies but we have there's no
00:04:04.560
reason to be uh spending 2% of your time in the app uh in this dot blank call
00:04:11.519
like literally we've seen those kind of numbers so like we installed these gems that just give us absolute free
00:04:17.639
performance uh we've got a big big piece of rails that uh we add which is
00:04:22.680
messaging so we have a reliable message bus that we can talk between our processes and from the processes to the
00:04:28.600
client um uh we've got a very interesting uh logging UI and uh we're
00:04:35.120
in Ember app unlike uh so so we don't use turbolinks at Discord uh we use
00:04:42.840
Ember instead and we think of our app as a complete app so we use uh DACA to
00:04:48.720
deploy it and our app is the engine X config and it is all of the little bits
00:04:54.800
and pieces that um glue the deployment story as well as the actual um uh
00:05:01.560
development sorry in general like when you think about rails apps then it stops at the point you run unicorn uh start
00:05:09.280
and all of the rest is the Ops problem but for us it's a complete app uh from
00:05:14.639
all the way down including engine X optimizations and uh monitoring engine X and all of the and and dealing with logs
00:05:22.400
and whatnot that's what makes discourse and the only supported way that the only
00:05:27.680
way that we'll support the open source installs out there is if they're using a Docker based
00:05:34.360
install so I'd like to give you a little bit of a demo maybe in production of how this
00:05:42.479
thing looks
00:05:47.520
so so this is um this is metad discourse in
00:05:53.720
production this is my view of a so one thing that you'll first notice
00:06:01.000
is that I get these little tidbits at the side of the page so I'm constantly aware of how performance is going on the
00:06:07.599
website as I'm using it and that means that I can't get away with stuff like if
00:06:13.520
somebody has an n+1 query or something like that it'll be automatically in my
00:06:18.599
face in production this is obviously something that only admins get but
00:06:23.960
um that it is incredibly valuable and I can just break down and look at the actual sequence that ran inside the
00:06:31.840
inside the app I'm going to have to remember to log out because you probably saw my or to token and that's probably
00:06:38.319
not a good idea but let's ignore that for a moment um so yeah with the SQL you
00:06:45.080
can actually get the back traces of each SQL statement so debugging of why why is somebody running this SQL is very easy
00:06:53.120
um and you get timing for each SQL statement as well um and this is um the
00:07:01.280
Rak mini profiler gem the Rak mini profiler gem comes with like the uh
00:07:07.960
Ninja switch which is putting question mark PP help at the end of it and that's
00:07:13.680
the ninja switch of all of the magical stuff that you can do with it so for example if I want to see a flame grth
00:07:20.319
this is all in production now by the way just running on a meta instance I want
00:07:25.520
to see a flame graph of this page um so I'll run this thing and then I can see
00:07:32.400
these are back traces uh that are side by side of um in in in a long uh across
00:07:39.960
a Time timeline so the little Snippets of what is happening in the app and I can zoom in and see that uh see where
00:07:48.240
various stuff is running I can see where active record is spending its time this
00:07:53.800
for example would be where we're ex executing some SQL and this would be active record overhead
00:08:00.000
before the execution of the sequel so I can look and zoom in and get good
00:08:05.039
analysis of what's going on inside as well like it's interesting
00:08:14.639
um an interesting thing is that we've got integrated memory profiling as well
00:08:21.879
so I can run this thing which I didn't test running but who knows anyway so I
00:08:28.080
can see like for this particular request I can see how many objects were allocated how many were retained I can
00:08:34.919
get breakdown of where they are I can even see the locations of where certain
00:08:40.880
memory was allocated so if uh output safety caused like 387k of um
00:08:48.440
allocation and so on and I can get the actual line where everything is happening and debug into
00:08:56.120
it um so this is a nice big detailed report report I can see which which
00:09:01.880
types of uh objects and um and I can and I'll go
00:09:08.040
through um memory profiler in a bit more a last thing that is very cool about us
00:09:13.600
is that we're living in the future and in the future if you need to
00:09:19.000
debug your JavaScript application in uh production you'll have Source Maps so I
00:09:26.120
can actually go to the user card
00:09:31.839
View and I can say ah okay when hide is called let's just drop a breakpoint here
00:09:38.680
and then I'll just bring up a user card ah wow Hyde got called and I'm inside
00:09:44.320
here and alliver uh JavaScript is obviously minified and bundled together
00:09:50.079
but I'm able to see now the actual file where this happened and debugging in production becomes much much simpler
00:09:56.760
when you have Source Maps and uh I think that's about it for like
00:10:02.560
the production demo and I'll just go back to the slides okay that works that worked okay
00:10:09.800
so the first piece I'm going to spend some time talking about is this message bus Gem and this gem is um gives us a
00:10:18.320
feature that um there's some discussion of getting it into rails 5 um with
00:10:25.240
action cable but we've had this now for over two years and this this is how it
00:10:30.839
looks on the server at any point in time we can publish messages to a
00:10:36.839
channel and then on any of the other processes that are running uh running
00:10:43.399
discourse we can subscribe to the channel and we can print it so you can uh basically communicate between all of
00:10:50.240
your processes at any point in time and deliver messages
00:10:55.839
anywhere the Additionally you can deliver messages to the clients to the
00:11:01.360
JavaScript app so you can see on the top you've got the Ruby code where it's publishing a message and at the bottom
00:11:07.200
you've got the JavaScript code where you're actually listening for the message and you're acting on it so
00:11:12.600
conceptually you can send out messages and then receive them on the client and
00:11:18.279
actually restrict it so only a certain user or only certain group can get access to that or a certain client can
00:11:24.240
get access to that and you're guaranteed that when you post a message it'll be received at the same at a reliable
00:11:31.000
ordering and it will recover on the client if it um if it happens to be if
00:11:36.480
the laptop L is shut it'll eventually catch up and get the old message um the big things about message
00:11:44.920
bus that are a bit uh different from what I've heard about action cable is uh
00:11:49.959
that it has multiple backends for dealing with this transport so we have polling and long polling that we support
00:11:56.040
out of the box and long polling is what um is known as Comet where you make a
00:12:01.560
request and then to the web server and the web server just grabs onto it and keeps it and keeps it um ready there
00:12:09.600
until it's got data and then it'll ship it back to the client uh it has security built in um reliable playback which I
00:12:18.560
mentioned uh very very efficient uh transport so you can have like 100 different channels but it's all
00:12:25.040
multiplexed inside redus and um that means that the transport is very very
00:12:30.600
efficient uh and uh multide support which is very important for discourse which is another kind of off the rails
00:12:38.079
thing uh we have to be able to host multiple sites on one process and uh
00:12:43.680
that means that they all get their own database so we need to when we any all of our components need to be able to
00:12:49.680
work in that mode where there are multiple databases going
00:12:55.440
on and how is this magic done and how is websockets done in rail so this is all
00:13:01.920
the code you really need to know to understand how this works so a feature that was added to
00:13:09.360
Rack a few years ago is something called rack hijack this is supported by every
00:13:15.079
um uh big web application server that's out there so unicorn has it Puma has
00:13:21.680
this and uh passenger has this thin does not have this because thin did their own
00:13:27.800
thing a few years earlier and they haven't moved to this thing um but
00:13:33.079
message bus also supports thin um so basically what this little code example
00:13:40.120
doing is a tiny rack app uh whenever it's called it starts a new thread the
00:13:46.839
thread sleeps for a second and then it returns something to the client uh the one very important thing
00:13:53.920
to notice is that there's this connection Clos thing so
00:14:00.000
uh let's see if this anyway uh there's it does iite
00:14:07.000
Connection close which is critical to do when you're dealing with uh rack hijack
00:14:12.320
because if you keep the connection open then the next requests that you get may not be for a hijacked request it might
00:14:19.680
have meant to have gone to the main web server and you have no way of returning a connection back to the web server so
00:14:26.680
you always have to close connections um and you can see the numbers are
00:14:33.839
pretty amazing of what you get so you get 100 concurrent connection so 100 different clients connecting at the same
00:14:39.880
time to the server they all sleep for one second and within one8 seconds they uh return back so
00:14:49.600
there's no blocking like if there was blocking you'd have to wait you know 100 seconds to serve all of this uh now
00:14:56.680
you'd never do this stuff with threads that's way too expensive and message bus does not do that but this is just to get
00:15:03.000
like the concept across of what rack hijack is cu threads are expensive you don't want to spawn them like
00:15:10.399
this um this has been used in production now
00:15:16.120
for multiple years has almost no dependencies the
00:15:21.279
only gem dependency that we have is uh a dependency on Rus now this dependency
00:15:27.360
can go it's only it's at to one file and for and that's for the transport and
00:15:32.480
potentially you could Implement transport under PG which gives you uh pbab or you could Implement a memory
00:15:39.079
transport if you've got like an app that's uh running single process um and this is the uh real cool
00:15:47.560
thing about this is that all you have to do is include this gem in your rails app and then you've got this feature that
00:15:54.959
you can build stuff on so it doesn't run another process it's all running in the
00:16:00.519
same single rails process that you already
00:16:06.440
have and then you know the big question is what about action cable why do I need
00:16:12.319
to look at this shouldn't I wait for Action cable and with action cable there are a lot of questions that are
00:16:18.480
unanswered at the moment uh one big thing is is this thing web sockets only
00:16:24.600
uh and I'll go through some of the issues with websockets what about this reliable message ordering what if I
00:16:30.399
close the laptop and and open it am I going to be able to catch up on all the old messages that I missed uh what kind
00:16:36.959
of other dependencies am I going to have am I going to bring in have to bring in a vent machine or
00:16:42.319
Celluloid um so those are the kind of worries that you have when when when I'm
00:16:48.519
thinking about this but really we don't know we have no idea what it's going to be so but but message bus has been used
00:16:55.360
in production for multiple years now so there is that and uh one interesting
00:17:00.839
thing I noticed this the other day like uh they had these new Performance Tools in Firefox web Developer Edition and the
00:17:07.760
bottom on the same page the first comment it goes web socker traffic
00:17:12.880
monitoring in the network tub please because one thing with websockets that
00:17:18.000
they haven't told you is that debugging these things is a huge pain uh and all
00:17:24.039
of the tooling to debug them to debug it is also very very painful uh we
00:17:29.440
initially did support web sockets but there were a few things the
00:17:34.520
htps required was like a very huge one and uh the thing is if you're running
00:17:40.080
over websockets over HTTP and not htps weird things can happen and by weird
00:17:46.880
things can happen you can have all sorts of proxies that rewrite traffic and mess stuff up for you um for example in
00:17:54.880
Australia uh Telstra Network decided that they don't like web Sockers but they like it enough to allow you to
00:18:02.240
connect but then they'll just store all the data so you have a web socket that's open and you think that you're meant to
00:18:08.840
be getting data from it but like the whole pipe is just clogged they shut the pipe down uh so you need to be able to
00:18:16.120
recover from all of these kind of situations with websockets and fall back anyway so the idea with message bus is
00:18:23.799
that we um first solve the base problem of having reli messaging and then
00:18:30.039
transport is added on top of that and you can have whatever transport protocol you really want um and it's not starting
00:18:38.000
at websockets I'm not against adding something like websockets to message bus
00:18:43.200
if we had to but that is a value ad as opposed to the first thing that it does and websockets is just another protocol
00:18:50.240
for transport um now if you have any
00:18:57.480
questions one thing that I did notice here that people are a bit shy to ask questions so I made a site for you to
00:19:03.480
ask questions which I'll go through at the end of the uh talk so you can all go there and what that is is a little chat
00:19:11.159
app written on top of message bus that um you can all just go in and
00:19:17.799
chat on and hopefully it doesn't crash it's about like 120 lines of code to get
00:19:23.520
uh kind of a very rudimentary chat app that gives you presence so you can see who's online and allows you to send
00:19:30.400
messages um and you can play around with it so if you have any questions during the talk you can go there and just put
00:19:36.600
the question in at the end of the talk I'll go to that site and I'll quickly triage it and try and answer any
00:19:42.919
questions you have okay the next gem I'm going to talk
00:19:48.840
about is Liu Redux which is a piece that I know Aaron wants to put into rails as well it's a least recently used cache
00:19:57.240
now one cool thing about Ruby 1 19 is the uh hash class is actually ordered
00:20:02.799
now back in 18 when Matts made the hash class it wasn't ordered but then it
00:20:08.240
became ordered in 1 nine and we can um depend on that feature to build a very
00:20:13.720
very efficient least recently used cache um and that construct becomes incredibly
00:20:19.600
useful so at the base what Liu Redux does is gives you
00:20:25.320
this but the thing about it is that it is Ultra efficient this is as efficient
00:20:30.799
as it gets the when I designed this gem I looked at all of the other gems that were out that were doing the same thing
00:20:37.280
and the main criteria was to maintain as close to a hash API with the fastest
00:20:43.280
performance so this will perform probably about uh I don't know 30% better than the best other one out there
00:20:50.600
that is also using pretty good practices and we keep trying to get squeeze any extra performance we can out of it and
00:20:57.960
having this having a cash is very cool so we have a cash and we also have a TTL
00:21:03.640
cash so you can have a cash with a time to live um we have thread safe
00:21:09.679
versions uh and yeah the semantics mean this is Ruby 1 line up but hopefully
00:21:15.840
none of you are using ruby8 is anybody using ruby8 nobody this is awesome
00:21:24.960
hoay hoay for no more Ruby one sorry mats we have to move
00:21:34.039
forward um so now so we have these two building
00:21:39.200
blocks we have Liu Redux and we have the message bus and what happens if we marry these two things together is one example
00:21:46.039
of what we do at discourse you get a dist distributed cash now that is fantastic right
00:21:53.919
suddenly you can set something in a hash in one process and read it out in
00:21:59.600
another process and it's a cache so if it fills up it'll start dropping off the
00:22:05.960
old things so all of that is done in about 50 lines of code but once you have
00:22:11.960
these building blocks all of these other new magic things on top of them can be built and uh that is incredibly useful
00:22:19.760
for us to have like a very lightweight cache like this that we can use across
00:22:25.240
processes um and for us you know we we have to
00:22:32.320
optimize for adoption of discourse and to be able to get adoption we need to be able to install discourse on very very
00:22:39.600
cheap Hardware so we're constantly looking at ways of squeezing more grunt into less
00:22:47.320
memory which means that we always have to be profiling everything and that's why that demo when I put it that I that
00:22:54.200
I had at the beginning showing you know these are all of the things that we do uh is so
00:23:00.159
interesting because you know we have to do this to compete with these PHP
00:23:05.360
applications that take nothing even though we provide a thousand other features people still
00:23:10.480
want to be able to install this on a $10 server so I'll talk about this memory
00:23:17.120
profile a jam that uses a new Ruby 21 apis that are mine and kichi uh built
00:23:23.720
into MRI and uh give you a little crash course on it so this is something you
00:23:30.240
all could do right now for your rails app if you wanted to there's not a lot
00:23:35.279
of code here I'll make sure I publish these slides so you can kind of cut and paste this and run this this yourself
00:23:41.159
but what this does is it measures how much uh how many objects you're
00:23:46.440
allocating on boot of your rails app and how many are being retained in memory after
00:23:52.679
boot um and the answer is a lot
00:23:58.919
um so it breaks stuff into two types of reports you get an allocation report
00:24:05.279
which is the amount of objects and memory that was allocated within the
00:24:10.640
block and uh that may may not have a large impact on the process memory but
00:24:17.120
it has direct impact on performance the more you're allocating stuff the more work you're doing the slower stuff is
00:24:23.760
that's pretty simple so if you allocate less stuff you're going to be doing less work and then it's going to be
00:24:31.279
faster so when we look at um the discourse boot at allocation we can see
00:24:37.600
you know the Staggering amount of bites that we allocate and which files are
00:24:43.600
responsible for allocating um active support dependencies likes to allocate a
00:24:49.120
lot of memory when it's loading up stuff um and uh interesting one there
00:24:55.440
which is number two ruby gem specification I mean why are we loading why are we allocating so much memory in
00:25:03.600
specifications seems a little bit odd I mean do we really need this and the M
00:25:08.840
types gem which is wonderful at allocating a lot of memory um we can then break it down and look at the
00:25:15.399
strings that actually got allocated and we can see that we love allocating new lines so on boot discourse will allocate
00:25:22.399
the new line string 40,000 times um and we also love Ruby so we
00:25:30.159
allocate Ruby 3,157 times and it's the same string right so
00:25:37.880
uh that's part of the reason I was asking Matt's questions yesterday of you know why why can't the actual
00:25:44.600
programming language solve this for us but we can solve it ourselves because these libraries are all open source we
00:25:51.320
can contribute we can fix this and we can uh follow new patterns to make sure that these kind of problems don't happen
00:25:58.480
and there are very actionable reports that we can then run afterwards and see oh wow I just fixed that I removed like
00:26:05.399
those 1,000 allocations from that one line or 20,000 allocations from that one
00:26:11.399
line so we can zero in on where it actually happens now retention is
00:26:17.159
something that that relates directly to your long-term performance and memory of your app the more objects are in the
00:26:22.919
Heap the more memory you're going to take the slower your GC is going to take and so on so you do uh definitely for us
00:26:30.840
having low memory in our application is Paramount because without it uh people
00:26:36.000
can't install on a one gigabyte digital ocean slice
00:26:41.240
so uh seeing retention you can see where uh after the after the Block Run where
00:26:48.360
memory is stuck and you can see we've got a bunch of threads and each thread is taking a Meg so so that they're stuck
00:26:57.320
um bit of memory allocation that um so so we have to be careful not to run too
00:27:02.720
many threads because every thread that we run is going to be very expensive uh we can see that uh the mime
00:27:10.120
types Jam is uh is retaining that amount of memory like uh two Megs of memory so
00:27:17.200
your process would be two Megs lighter if it didn't have the MIM type gem but it's actually a little bit more because
00:27:23.039
there's fragmentation and all sorts of other stuff so but but that's how like we can look at all of these various gems
00:27:29.520
and you can find you know if I got a gem that's really consuming large amounts of memory we've removed most of them from
00:27:35.279
discourse except for the ones that we have to keep so there's not much for you to see here but I'm sure if you run your
00:27:40.399
app you'll see a bunch of gems that uh it it'll be very strange for you that they're you know holding 10 Megs or
00:27:48.279
whatever um we can also look and see what types of classes and uh memory
00:27:54.640
retained by GEM is very interesting to see which gems are the uh the most offensive gems when it comes
00:28:01.120
to me memory retention uh and now I'll talk through a
00:28:06.840
little optimization I did a discourse using memory profiler and techniques like that to uh give us a faster feature
00:28:15.320
in rails so I'm going to be talking about the pluck feature so if you're not familiar with rails what pluck does is
00:28:22.279
it's trying to select these two columns into an array uh so you want to get an
00:28:28.919
array of columns back with the data you don't care about anything else just an array of
00:28:34.159
arrays pretty simple so I want an array of arrays that has two columns and 10
00:28:41.559
rows and to get that from active record I have to allocate 286 objects and I
00:28:49.320
have to allocate 25ks of ram now that's is that high I don't know
00:28:56.240
let's check so look inside memory profiler and I see that the string product is allocated 28 times when I was
00:29:04.640
not even selecting it and then I see the string price is allocated three times
00:29:10.279
and I see that active record result is allocating this amount of memory so I go okay so what if I had no
00:29:17.799
active record so if you you notice the object count so it went from 286 down to
00:29:25.720
44 so there's an enormous amount of objects less and look at the memory went
00:29:31.840
down from 25k to 3.7 but still there's something odd there there are 21 strings allocated I
00:29:38.840
didn't even select a string
00:29:44.080
so so uh you have a little think about that and you go ah yeah well you're
00:29:49.120
selecting a price but really the PG gem used to return price in string and then you convert it on your side to an INT
00:29:56.840
which is kind of silly so um L uh canis in uh
00:30:03.320
pg18 added a feature that allows you to do this mapping inside the gem itself
00:30:09.399
which means we're doing all of this casting native and uh then we're able to only
00:30:15.200
allocate 13 objects and 1.1k down from 25k of allocation so that's an enormous
00:30:22.360
difference from where we were when we started um
00:30:29.200
so I decided to backport this new type mapper into active record and use it at
00:30:35.519
discourse the history was we have this forum of trolls that like trolling us
00:30:40.640
and one of the things they did was they created a topic with um uh 40,000 posts
00:30:47.960
and we have a pretty silly implementation in one place that is selecting the post number for every Post
00:30:54.440
in that topic so every time you go to this page you're select 40,000 numbers
00:30:59.840
out of the database now obviously we have to fix that but I wanted to have something you know that gives me a bit
00:31:06.200
of a stop Gap and just improves performance for all of these kind of cases so I built what I call fast plaque
00:31:12.760
which is completely backwards compatible with uh rails 41 and 42 uses a new PG
00:31:19.120
type map it reduced the allocations down from 286 to 198 for that block and the
00:31:25.960
memory allocation went down and it will not be backported into rails 4.2 so you
00:31:32.240
can forget about it the only way you can get these kind of things is to Monkey patch and the reason it won't is because
00:31:38.159
rails is a big mature project and proving that something like this will work in every single case takes a long
00:31:45.519
time uh so that is one of the reasons that we monkey patch so we're working
00:31:51.559
with sha and Aaron to try and get these fixes in uh sha is going it has a patch
00:31:56.880
that's using the type that is is going to come in to rails 5 but we need to keep looking into things
00:32:03.200
and I wanted something fixed today so literally I came up with the idea and two days later it was in production so
00:32:10.399
that's how we have very good test coverage so we caught all of the ede cases there so you can borrow that if
00:32:16.559
you want you can see a graph and this is where the graph is very interesting on the left hand side you can see PLU and
00:32:22.720
on the right hand side you can see fast PLU which is compatible so at the beginning it's performing quite close
00:32:28.639
just a little bit faster but then the more data you select the more significant the difference is so at some
00:32:36.600
point it becomes a th000 rows it becomes two times faster and 10,000 rows it becomes four times faster and these are
00:32:43.320
pretty narrow rows so the wider rows would mean even more um which brings me to
00:32:52.559
this why do we love active record so much and this is why we love active records so much this is the Paramount
00:32:59.799
example that explains why we love it and we love it because we can compose stuff
00:33:05.360
if the color is green then run add this to the work clause and if the max speed is that add that to the work clause and
00:33:12.559
pass these little things around and build them up and that is the big cell that active record has and it's
00:33:21.919
simple elegant flexible it's not SQL so if you love seq
00:33:28.039
this is kind of a bit weird to you and it is sadly a bit inefficient so we need to work on making it more
00:33:36.440
efficient but we can do this by hand we did this by hand we were doing this for many years by hand and this is how it
00:33:43.080
looks and this is terrible we hate this code this code is
00:33:49.120
very very horrible code andw variables and this and all sorts of IES that are
00:33:55.000
impossible to understand and it's incomprehensible it's fast risky and yes
00:34:05.000
sequel so for a discourse sometimes we need to run SQL so we built an
00:34:10.280
alternative to do things that where we need to run SQL but we want it to be
00:34:15.320
nice and we want to be able to use it in a in a clean way so I built this SQL Builder object it's only like a 100
00:34:22.720
lines of code or something it's not uh talking a large amount of code here and
00:34:28.200
it allows us to compose queries as well like we were able to in active record
00:34:33.679
and it allows us to um get objects out of it like we can at active record so I
00:34:39.720
can get car objects out of it instead of getting rows that I need to cast and move around and
00:34:46.240
whatnot and it's yes SQL it's fast it's sane it's tiny and it's in our source
00:34:52.359
code base and when I say how fast is this so I was running a comparison in
00:34:58.040
here this is uh thousands of operations per second uh two columns per row of
00:35:04.839
basic object mapping so we're just selecting objects out of the database so when you're selecting one object out of
00:35:10.520
the database you can see that it's only you know 40 or so% faster uh so there's definitely room to
00:35:18.400
improve it compared to the blue line which is raw performance but then active record is like you know less than a half
00:35:25.320
the speed of what it would be in raw at 100 rows it's starting to get closer to the speed of doing it and in a th000
00:35:32.880
rows we're actually beating um we're we're beating coding it by hand and the
00:35:38.520
reason we beat coding it by hand is because uh one demo was doing using the new type mapper and the other demo
00:35:43.880
wasn't using the type mapper so because we started using the typ mapper then we're catching up on performance but the
00:35:50.920
thing there is that it's not really a bug uh that Gray Line at the bottom at
00:35:56.000
1,000 rows is the numbers for active record once you're selecting a large amount of
00:36:04.000
data and I'll tell you a bit of a story when I was working at stack Overflow four years
00:36:09.760
ago um I I we had this problem of um stuff was slow and we had to optimize
00:36:18.560
performance and there was this quote from my blog post that is is is still relevant today so I started measuring
00:36:26.920
and I saw that the query is taking 12 milliseconds and the profiling showed
00:36:32.079
the block is taking 90 milliseconds so there was all this time lost between what SQL was doing and what the app was
00:36:38.400
doing and this is what we did at stack Overflow we built this uh lightweight
00:36:44.680
object mapper called Dappa and it had uh one goal and it was to be the fastest
00:36:51.280
object mapper that theet framework had and that meant that we measured every
00:36:58.240
single other framework out there we competitor every single framework out out there and its goal was to be fastest
00:37:03.839
and use every single trick that it could come with to be fastest at this particular problem which is taking SQL
00:37:10.319
queries and giving you back objects that's it and it just looks like this you do
00:37:18.880
connection. query dog and you give it some data and it gives you back a list of
00:37:24.520
dogs basically uh and and I I believe that active record needs
00:37:31.359
something like this at the bottom to to do all of the work that it's doing with
00:37:37.079
SQL at the moment there is no Blessed Way of running SQL statements and getting back active record objects that
00:37:44.520
is efficient that is isolated from the rest of the Jungle that is active
00:37:50.560
record uh and that is Standalone and reusable elsewhere and if we had what I like to
00:37:58.319
call Active SQL and you had active record built on top of active SQL you
00:38:03.920
could do stuff like this you could select 10 car objects very efficiently
00:38:09.480
and then go to active record and save them so that that is where I guess if
00:38:14.640
I'm looking at the future of where I'd like to see active record I'd like to see it in this kind of state where we
00:38:20.560
take the problem of object mapping and pull it out of active record into its own entity and make that really really
00:38:28.040
really fast okay I'm going to move on to a
00:38:33.440
different topic altoe which is job scheduling I know it's a bit of a segue from all of this active record stuff uh
00:38:41.200
but at at discourse we've got this problem of job scheduling we've got jobs that need to run daily hourly minutely
00:38:48.560
uh we've got uh jobs that uh jobs that we just need to cue to happen later like
00:38:54.319
send an email to this person send an email to that person so what we have is a little bit different than what in
00:39:01.280
general people would would have in their rails app and I think these concepts are very strong we have three different ways
00:39:07.160
of scheduling jobs we have regular jobs we have lightweight jobs and we have
00:39:12.800
background jobs so background jobs you all know about it's sidekick you're all using it regular jobs some of you may
00:39:19.079
use it and uh you'd be using stuff like side tick but we wrote Our Own which is ultra efficient and uh doesn't have a
00:39:26.160
lot of the issues that a lot of these other gems have uh we tried the the other gems first but they just did not
00:39:32.680
answer our problem so you could look at that and extracting that from discourse if you're looking for an adventure um and we have lightweight
00:39:40.440
jobs which very few of you have heard about and I think that's very cool so I'll expand a bit on it so lightweight
00:39:47.359
job we're able to uh extend unicorn and tell unicorn to run something after it's
00:39:52.440
finished with the request so a great example of this would
00:39:57.640
be I need to track a view for the page so what I do is I tell her look defer run this later and then once I've done
00:40:04.440
and I've given the request back to the client I can run that little piece of code and that piece of code is a piece
00:40:11.040
of code that I don't need to run within my request so it becomes a lot more efficient efficient and similarly there
00:40:17.440
are a bunch of stuff that we try to do between requests which is um one area
00:40:22.960
where we're struggling a bit to upgrade to 22 uh we uh run run the GC between
00:40:28.560
requests for example so the GC work does not impact your requests in general and
00:40:36.280
it's all amortized and put between requests now Ruby 22 does it a lot um
00:40:42.359
differently in that you get a lot shorter pauses but it's uh spread out
00:40:47.520
through the life cycle of your app so in general that's going to be a lot better for you unless you're used to doing
00:40:53.839
stuff like this where you're just pulling all of this work and trying to run it between requests where it is not
00:40:59.319
necessarily better for you so we're still struggling with how to kind of uh
00:41:04.599
marry these two things together um another thing that you uh
00:41:12.119
you noticed that I was able to debug JavaScript in our big Javascript app so we have Source maps in production we use
00:41:20.560
sprockets so we're still using that but we patch it to bits and get it to
00:41:26.359
generate sour for us and that's how I was able to show you that demo at the beginning that was debugging through
00:41:32.520
JavaScript code that means that anywhere that has discourse running I can just debug the JavaScript which is really
00:41:38.440
really really handy um another thing that we have
00:41:43.800
which I think is a building block that is a little bit missing is the idea of an anonymous cache so we've got the
00:41:49.480
front page of discourse it's 99% of the time it's accessed by Anonymous and then 1% of the time it's accessed by a logged
00:41:56.400
on user so caching becomes very very complicated how are you going to Cache this thing uh so we have rack middleware
00:42:05.400
that sits at the beginning of our rack stack that takes care of doing that and makes a decision of whether we need to
00:42:10.640
cash it or not and then uses redus as the backend to do that and the performance difference when we
00:42:16.400
implemented this was enormous like if you get Anonymous requests we can serve them at uh you know an more than an
00:42:23.839
order of magnitude faster like 50 times faster in some
00:42:31.040
cases so the logic is tricky which is why we didn't implement this in engine X
00:42:36.119
at the moment you could potentially do that but it was fast enough to do in rails and it's rack middleware and uh
00:42:43.280
it's available you can pull these things out of discourse and see how it's done and another piece that I don't have
00:42:50.920
to much time but I'll go through is that we're optimizing for development the development story at discourse is as it
00:42:57.200
is as important for me as the production story in some cases like if I'm hitting the wall while developing and not having
00:43:04.160
a fun time then I can't do work so I built things to make my work a lot more
00:43:12.200
fun and that is rack mini profiler we use uh better Errors By Charlie
00:43:17.319
Somerville which is really awesome to look at errors when they happen loger to look at the logs and uh have uh fast
00:43:25.640
browser reload times which I'll talk about and rake Auto to spec which I'll talk about and live CSS refresh um
00:43:33.599
thanks to message bu because we have this new building block so when you look at um rake
00:43:40.559
autospec you some of you may be familiar with God but this is a little bit
00:43:46.200
different and I'll show you why it's different because I think everybody should see this because it's
00:43:54.280
cool okay here we go
00:44:05.280
so so we have 5,000 specs at discourse and they take 5 minutes to
00:44:12.359
run and I want to start working on something so I don't
00:44:19.760
know I want to work on the user spec and what I really want to do to the
00:44:26.599
user back is I'd like to break this
00:44:33.400
here ah so my spec Runner figured out that I was I saved that file and now it's running the specs for that file I
00:44:40.680
didn't need to tell it to do anything it just did it and it told me that it and now it's running all the specs of the
00:44:46.720
file oops something failed ah okay I know I it was a different file that I
00:44:53.880
did it so maybe I'll go to the user controller
00:45:00.839
file and I'll try and edit it here and I'll fix it here but this remembers now
00:45:06.680
that I was that I had a failed spec from before and it'll try and run it to see if it if my change on this file fixed
00:45:13.640
that file and uh okay now it's time to continue let's not do this
00:45:20.839
anymore and now it'll run it and it'll continue so that's how I live with specs I I don't have to run any commands to
00:45:27.920
run my specs and tests they're constantly running I can use uh puts debugging if I need to so I can just um
00:45:35.520
at any point in time figure out what's going on output variables uh did I I didn't undo
00:45:42.480
it anyway um
00:45:48.960
so so it becomes much much simpler to
00:45:54.559
debug and another thing that I've got during uh
00:46:00.119
development and production for that matter I don't know if I showed you but
00:46:06.440
we've got this application called loger so I can see at any point in time this is rack middleware as well that you can
00:46:12.800
include I can look at all of the errors that I've seen and get um back traces I
00:46:18.480
can see environment as well of you know who made this request and this is also
00:46:27.599
useful in development because then while I'm developing I can just have another browser window that chose all of the
00:46:34.240
stuff that uh rails is running and uh the big advantage over the brow over the
00:46:41.000
uh console window that you have now is that I get back traces for everything so
00:46:48.680
um so I can see a back trace of every statement as it's happening so for debugging that makes it much much
00:46:55.160
simpler and uh another thing you may have
00:47:00.319
noticed in development Java discourse is a huge
00:47:05.480
huge JavaScript application uh we've got 900 JavaScript
00:47:11.559
files but I just pressed reload and it was real fast so what
00:47:18.240
you're probably thinking is ah yeah you're just cheating you can't debug this thing how can you debug this thing
00:47:25.079
you've probably got it all in one file and that's why it's fast which is true
00:47:31.559
but we can still debug this thing so I can go to the user
00:47:37.319
card template or I can go
00:47:43.400
to anywhere I want
00:47:48.559
anyway uh let's I can go to this user action file
00:47:54.440
so I've got all of these files that I can see but how how did we pull this little magic trick and I'll go through
00:48:01.559
that in a second additionally I've got mini profiler in developments I can see if stuff uh why stuff is taking so
00:48:09.200
long and I'll go back to the presentation and show you some of the tricks here that made make that
00:48:21.880
happen Okay oops I think I
00:48:28.359
yeah I'll start it again yeah
00:48:34.720
cool um so this trick of the fast debugging is something you all if
00:48:39.920
anybody's got who's got a big Javascript app raise your
00:48:46.160
hand and who has two second reloads in
00:48:51.680
development it's awesome a few of you and who's able to debug in those 2 second
00:48:57.720
who has one giant file everybody that has two second development so this is
00:49:03.119
how we do this trick we have 9:15 so that's more than probably all of you um which is a lot
00:49:11.599
and now if we didn't have our hacks you can see it would have taken us 30 seconds to load that
00:49:17.240
page now that's not really acceptable that's why I've got that emoji which we
00:49:23.200
need we need this Emoji um just
00:49:29.400
saying so the general thing we do is we take debug and we turn it true to false
00:49:34.680
and everybody's happy and it's fast but we can't debug but here's the magic trick that we
00:49:40.240
use there is this feature in JavaScript that allows us to eval a statement and
00:49:45.640
then at the end we give it a little comment saying what its logical name is
00:49:50.760
so we go and wrap every single snippet of JavaScript in an eval and that's how the wonderful debugging works there so
00:49:59.359
this is supported everywhere like if you got i1 Firefox or Chrome you've got this so you can debug your JavaScript
00:50:05.680
today um it's very easily back potted into the asset pipeline so I hope we can
00:50:10.799
include this in rails fairly soon and it's fast you saw how fast and
00:50:16.520
now Robin built this so that is a big thanks to Robin for building
00:50:22.839
it so I guess the conclusion here that I wanted to give for the talk is
00:50:28.640
that you you you can choose your own adventure we can make rails faster we
00:50:34.040
can make Ruby faster you can experiment you can add gems you can bolt on stuff
00:50:39.720
that will make your application work a lot better hopefully you learned a few interesting Concepts and you can take
00:50:45.160
them and uh extract them from discourse or use them yourself um and it's very important to
00:50:52.720
keep having fun and to experiment while we're doing all of this
00:50:58.440
and these are the things that I showed are not necessarily things that no no nobody's going come out with this
00:51:04.440
magical version of Ruby that is suddenly going to erase all of the inefficiency from the
00:51:10.359
libraries I'm afraid that's not going to happen so you could be dreaming and and
00:51:15.559
kichi can go and make Ruby two times faster but he's not going to make it 50 times faster the the libraries are what need
00:51:22.440
love and the libraries are things that we can work on today we don't need a new version of Ruby for that so we can make
00:51:30.119
our application faster we can make our Frameworks consume less
00:51:35.799
memory and we have the tooling to do it and uh I'm just going to go to
00:51:41.000
questions now which means I'll go to the chat and
00:51:46.440
see if there were any questions there so let's see let's see if this
00:51:53.160
still works I'll go in
00:51:58.760
oh wow there are a lot of people in the chat room I don't
00:52:05.319
think nice how do you check for online offline State okay so I'll just go through
00:52:13.200
this how about sde events just people
00:52:18.559
keep putting questions here I can't read stop
00:52:25.040
it okay um testing will it crash will it crash did it
00:52:32.400
crash having a not so lightweight job running a lightweight C okay this is a
00:52:37.720
good question uh I'll repeat it uh what would the potential consequences of
00:52:42.960
having a not so lightweight job running in a lightweight queue how do you define it as lightweight is it as long as it
00:52:50.000
doesn't impact the result of the request so um we have to be totally
00:52:57.359
careful if you if you put a real long running job in a lightweight queue you're going to mess stuff up big time
00:53:03.599
because you're going to choke your unicorn worker and it's not going to be able to process stuff and the Unicorn
00:53:09.319
Master will go and terminate it and it's just Havoc will ensue so what we decide
00:53:15.240
is like there's a very strong performance criteria of what makes uh a lightweight job ah cool it's
00:53:23.160
moving um how to be more
00:53:32.799
handsome eval is dangerous can passenger run lightwear jobs uh that is something
00:53:38.799
that we've mentioned uh with hung Ley he's talked about uh I've talked about that before he doesn't have a proper
00:53:45.280
story yet for dealing with um W with forking of lightweight jobs but I think
00:53:51.680
he'd be open to adding something like that my recommendation would be to to open a discussion with him directly on
00:53:56.920
his mailing list
00:54:02.599
and Max on the
00:54:08.520
um okay is there any reason to keep
00:54:15.040
nodejs and R are you looking at P instead of unicorn okay two questions I'll try and
00:54:21.280
answer these two questions so do I need to keep nodejs um we actually ship
00:54:27.400
discourse with no. JS and we ship it with no. JS because we use uglifyjs using no. JS to uh comply to to compile
00:54:35.599
our assets because it's a lot faster than doing it via the Ruby process but I love the the idea of having one process
00:54:42.680
that can do everything because when you have one process that can do everything it's very very easy to deploy uh if I
00:54:48.680
need to scale the amount of message bus stuff then I can just easily just deploy
00:54:53.760
the same app on another five machines and put an ha proxy rule that just Farms the message bus traffic there and it's
00:55:00.359
all done I don't need to think about uh deploying multiple different types of apps I've only got one type of app to
00:55:06.240
think about so I think that with stuff like message bus definitely if you're if
00:55:12.160
you're using it for this there's no reason to bring another tool into the stack that you need to learn you can use
00:55:17.920
Ruby to do these things um how about server side events
00:55:24.240
instead of instead of message bus so sorry it's supposed to be instead of webet not inste instead of so server
00:55:30.960
side events can work but it's just sugar on top of what long polling is anyway so
00:55:37.319
at the bottom you're just running running long polling so you're not really going to to to win much uh you
00:55:44.039
could potentially use server side events in message bus you'd need to implement an extra backend for it I'd be
00:55:49.160
completely open for a pull request that adds that as long as uh you measure that it's actually
00:55:55.079
helping uh and I'm I'm not confident that it would help that much because at the end it's just an HTTP request that's
00:56:01.640
hijacked just like the long pole
00:56:06.839
um how does distributed cash compared to redus om cach and the difference between
00:56:13.359
that is that you don't have to go to redus when I have distributed cash and message bus I am not going to another
00:56:20.920
service to figure out what the data is I'm going local in my process so it is
00:56:26.319
is much much much faster there's no IO when I go uh go to look stuff up in the
00:56:31.920
cache and that's what makes it magical that you know you've got these high performance structures without needing
00:56:37.319
all of the um needing to have uh use Rus
00:56:42.920
immediately to get that data uh do you have any recommendations
00:56:48.880
for profiling rails boot time my app currently boots in 30 seconds and I would like to understand why and I'll
00:56:55.640
publish my slides and my recommendation would be run memory profiler because
00:57:01.440
that will first of all give you a good picture of where stuff is allocated and
00:57:06.680
the other thing that you should run is run stack Prof which I'm unbuilt to see
00:57:11.720
where time is being spent when you're loading your app the I guess the key is
00:57:17.520
that little baby script that was in my thing you can use that just to boot and then you can measure that little chunk
00:57:23.640
of code so it's only like four lines of code to boot a rails app and then you just wrap it in whatever you need to
00:57:29.520
figure out what it's doing now there are lots of tricks you can do like you can just uh do a binary
00:57:36.039
search to see which gems are causing the problem if you need to uh all sorts of
00:57:41.839
tricks like that dhh how do you think I don't think
00:57:47.000
dhh is on the message on this list so I'm not
00:57:52.480
sure flame graph view showed you is impressive is it something similar graphically speaking for object
00:57:59.880
allocations um interesting I need to talk to you but I'm not sure I'm not
00:58:07.240
like how would you present that information in object allocations is a
00:58:14.160
hard thing like what what it does is it displays Stacks side by side so if you
00:58:19.680
can give this script the stack side by side it can render it for you so as long
00:58:24.880
as you can give it that information I think it would be able to be used for memory if you can represent it
00:58:30.599
conceptually like that um I have a backbone JS app wrapped
00:58:36.839
in a rails app to run it every time I deploy it takes forever to compile assets what can I do to speed this up
00:58:42.960
look at our rake task that we have for asset precompile it uses no JS directly
00:58:51.200
instead of using agfy JS what we found was that agfy JS was taking six times
00:58:56.319
slower than using uh via VIA Ruby was taking six times slower than doing it
00:59:01.920
via node we don't know why there's uh rails uh Google sumare of code working
00:59:07.640
on this problem and hopefully we'll get to the bottom of it in the meantime you can use node to do that so and you can
00:59:14.920
use the discourse source code to figure out how to Port that into your rake task
00:59:20.520
so you can do that will loger gem be able to detect an error container
00:59:26.720
within begin and I I don't understand M H&M's
00:59:33.000
question so I'm not sure um oh
00:59:41.119
cool anyone suffers from PG duplicated statements when provisioning extra dinos
00:59:46.520
on hoko or when I'm skipping that question why did he use rails for
00:59:51.680
discourse oh this is a good question especially half the function in the browser so when we started discourse the
00:59:58.720
two contenders were node and uh rails and we went with rails uh we went with
01:00:05.200
Ruby because Ruby wasn't cool which is it's cool not to be cool like because
01:00:11.160
node was cool then but we didn't know where it would be in a year or in two years and node had a lot had 50
01:00:17.160
Frameworks to choose from and we didn't know which one to choose express was kind of winning but we weren't sure
01:00:22.599
about that uh and on the other hand like R rails was well established it's a
01:00:28.319
framework that everybody uses we know it's going to be around uh personally I love Ruby more than any other
01:00:34.799
programming language so it just fit very well for us uh and that's why we went
01:00:41.240
with it can we invite dhh to RDR I will make
01:00:48.319
sure Winston invites dhh rdrc 2016 are there any thoughts about
01:00:54.880
shipping discos met via statsd and generally how do you feel about straight l i we uh I didn't mention this in this
01:01:02.000
talk but we extensively use um statsd and I I recommend if any of you haven't
01:01:07.559
looked at statsd to have a look at statsd and we ship metrics uh from all sorts of spots into staty and then we
01:01:14.839
graph them with a tool called um grafana which is uh some nice UI sugar on top of
01:01:21.039
graphite and I think that is critical when you're when you have an app uh doing stuff like uh graphing memory over
01:01:28.480
time like how do you know that you've got a memory leak uh and the shape of the memory leak m m makes a big
01:01:35.440
difference like if if your memory is just going up in a slow Trend then it means one thing if it's spiking every 30
01:01:42.039
minutes it means another thing so by having these long-term graphs you're actually able to figure out what it is
01:01:48.480
that is making your app um your app leak memory and it's just that visual
01:01:55.279
representation is really important so I think graphite and having long-term graphs of stats is very very critical
01:02:01.839
and I actually created a Docker container that you could all use to just play around with graphite which you can
01:02:06.920
just launch with Docker run if you're familiar with Docker and just start sending metrics to and graph it so I I
01:02:13.079
took all of the little pits and pieces that you need and put them in a
01:02:19.400
container and I think we're out of time so I'm going to say a big thank you and
01:02:24.960
if anybody has any questions hit me up and I'll try and answer I'll leave the
01:02:30.039
chat on as well so I'll check it again in a few hours if there are any question I'll try to answer that cool thank you