00:00:03.530
I don't know if it's just me but seems
00:00:09.290
like there are far fewer people here this morning because there were yesterday maybe has everybody had too
00:00:14.840
much fun last night hi my name is Emily stole pho I work for
00:00:19.850
MongoDB if you've ever used Ruby with MongoDB you've probably used a couple
00:00:26.539
lines of my code through the gems beasts on visa extension origin of
00:00:33.400
different versions of Kerberos if you ever feel like doing authentication using Kerberos I think
00:00:40.820
all the downloads I have an item or me just testing it missing environment so
00:00:46.550
I'm coming from Berlin but I come from New York I've been in Berlin now for three years and I've been working there
00:00:54.170
because originally I went there to work with the other person who built the Ruby driver and before I start I want to
00:01:01.219
thank the organizers for having me I've never been to Singapore before I arrived a week ago and I think I've sweat my
00:01:09.260
weight in water every single day it's kind of like Bikram yoga I feel really good at the end of the day really cleanse and I brought home a Yogi's for
00:01:17.330
the organizers but I also brought omiyage for you too thank you for coming
00:01:22.729
here at 10:00 a.m. on the second conference they brought so I can't bring
00:01:27.979
something for everybody but I brought these amazing accusal mustard from Germany and anybody's Germans
00:01:34.909
disqualified from this competition by the way so these are little mustards
00:01:40.130
from a little man who has a mustard shop next to my apartment and I've hidden
00:01:45.860
three Sandi Metz quotes in my talk and if you can email me with one of those
00:01:51.020
three quotes Emily at MongoDB I'll give you a little mustard
00:01:56.180
so the first three people basically to trick you into paying attention and also so they can feel someone for a quote
00:02:08.930
and they're all different types so okay so this talk is called refactoring
00:02:16.340
Humpty Dumpty back together again so because it's 10 a.m. and there's no
00:02:21.620
better time to talk about physics I'm going to start with the second law of thermodynamics specifically the second
00:02:29.720
law of thermodynamics accounts for the direction of natural processes we've all heard of this right ok no well good
00:02:37.880
thing I'm telling you about it the law says that it's highly unlikely though not impossible to restore our system to
00:02:44.570
a previous state it accounts for the asymmetry between past and future in
00:02:50.300
modern times this law is defined in terms of entropy we've all heard of entropy right yeah more so than the
00:02:57.230
second law of thermodynamics it's kind of abstract but it basically is the measure of the number of ways in which a
00:03:03.590
system can be arranged measuring entropy is taking to be the measure of disorder of a system the
00:03:09.470
higher the entropy the higher the disorder and usually it's depicted like this where it requires a certain amount
00:03:15.440
of work to take something that's in a high level of disorder and make it orderly or restore order so once upon a
00:03:24.050
time there was this egg named Humpty Dumpty and his story was told in this
00:03:29.750
nursery rhyme Humpty Dumpty sat on a wall Humpty Dumpty had a great fall all the king's horses and all the king's
00:03:35.959
women couldn't put Humpty Dumpty back together again has anybody heard of this but a lot of British people have heard
00:03:43.040
of this so this particular Nursery Rhyme is the most well-known Nursery Rhyme in
00:03:48.230
the English language and it references to it can be found in many works of literature and frequently in popular
00:03:53.269
culture I think there's a character in Shrek from some key dump scene one of the Shrek like number 15 movies that
00:03:59.900
they've had and but the first recorded version dates from the late 18th century
00:04:05.900
England like many traditional stories or poems it's pretty much impossible to pinpoint what the original version was
00:04:12.769
what Humpty Dumpty actually was or to take the poem litter
00:04:19.209
for example we have other versions of the poem this is the actual first recorded version published in 1797 but
00:04:26.750
we have no idea if this existed way before 1797 or people just learned how to write in 1797 Humpty Dumpty sat on a
00:04:34.340
wall Humpty Dumpty had a great fall forescore men and four score more couldn't make Humpty Dumpty what he was
00:04:39.650
before so Humpty Dumpty there are clearly many other versions throughout popular
00:04:46.910
culture throughout history but we can't ignore is that Humpty Dumpty is always the pictured as an egg despite the fact
00:04:54.410
that there's nothing indicating in the poem that he actually was an egg my favorite is that woman dressed as an egg
00:05:00.169
was really chic sitting on a wall in the corner it's likely the rhyme was
00:05:05.479
originally a riddle that could have exploited a well-known meaning of the term Humpty Dumpty at the time for
00:05:11.479
example the Oxford English Dictionary says that the term Humpty Dumpty refers to a drink of brandy boiled with air ale
00:05:17.000
and I don't know about you but when I drink my brandy boiled with air al something magical happens and it starts
00:05:22.970
seeing eggs perhaps the rhyme was equivalent to the 17th century's don't drink and drive
00:05:28.789
propaganda warned you about sitting on wall as a few drink but still why an egg perhaps it was meant to convey that
00:05:36.020
whatever it was that sat on that wall it was extremely fragile and virtually impossible to put back together so as I
00:05:44.180
said there have been many other series many other versions and what are the ones that I find kind of funny or absurd
00:05:50.060
is that was put forth by this scholar I don't know what his scholar of but I
00:05:57.020
guess he spent his time trying to figure out what Humpty Dumpty was in the 50s and he said that Humpty Dumpty was in
00:06:03.199
fact a tortoise siege engine which is this kind of machine battering ram that was invented by the Romans and used
00:06:09.530
unsuccessfully in the English Civil War in the sixteen hundreds and apparently
00:06:15.020
was used and the thing broke without breaking the thing it was trying to break and so they wrote a poem about it
00:06:20.240
I don't know about you that sounds really silly to me I think I think the
00:06:25.370
idea of an egg better this theory was eventually determined to be totally ridiculous but it
00:06:31.289
idea was incorporated into a children's opera called all the king's men so it just is true according to popular
00:06:36.719
culture as the other series so whichever
00:06:42.240
form Humpty Dumpty takes what can't be ignored is that he's a fragile guy he's actually become a sort of symbol for the
00:06:48.990
second law of thermodynamics Humpty Dumpty fell from the wall and subsequently ended up in pieces as we've
00:06:55.409
discussed the law says that it's highly unlikely so not impossible to restore him to his
00:07:00.779
exact state before the Falls and this is what the poem also emphasizes as we also
00:07:07.169
also discuss the second mouth thermodynamics modern definition is in terms of entropy the measure of the
00:07:12.809
number of ways in which an isolated system can be arranged specifically assuming for simplicity that each of the
00:07:18.629
microscopic configurations is equally probable entropy of the system is the
00:07:23.669
natural algorithm of a number of configurations multiplied by the Boltzmann constant KB this is theoretically how we can measure entropy
00:07:30.599
but nothing ever is like you can't have a system where all the arrangements are equally probable so this is highly
00:07:36.719
theoretical we can also find some examples of things that were broken and
00:07:43.050
that had been returned to their original States with help the Beauvais Cathedral which is located in Vevey France 60
00:07:49.319
kilometers north of Paris is a symbol of the ambition of gothic architects the pet project of a wealthy and disaffected
00:07:55.649
Bishop of nun tui the construction of the cathedral may have been partly intended as an act of defiance against
00:08:01.889
the French crown so basically the bishop was a punk and he wanted to prove that he was better and more powerful than the
00:08:07.529
crown by building this massive building and you'll see that it was a total disaster the whole project was extremely
00:08:13.199
unrealistic and the cathedral was never finished construction was started in 1225 and it was meant to be the greatest
00:08:19.740
church in the kingdom but centuries of construction were marked by structural problems and collapses if the nave which
00:08:25.979
is the main body of the church the church built cathedrals are normally shaped like a cross so they need is the main body and all that was actually
00:08:32.459
constructed as a tiny portion at the top like the head of the Cross so the nave has been constructed the plan for the
00:08:39.389
Cathedral were such that it would have been the tallest building of its time the foundations in order to
00:08:45.030
support this massive structure where in some places 10 meters deep even so in 1284 part of the choir class
00:08:51.990
just like the front of the cathedral that was actually constructed then the transepts actually don't know what part
00:08:57.270
of the cathedral that is I forgot to put it up this other part of the cathedral was started 150 years later and was
00:09:03.300
completed in 1548 then shortly afterwards aspire and half of the bell tower collapse on Ascension Day during a
00:09:09.000
service and apparently nobody was hurt in 1600 construction the knees so that main body of the cathedral began again
00:09:15.210
but only the first arch was erected and they gave up in the 1990s because this
00:09:20.940
became such a similar look into like the buildings that that exist today from
00:09:26.850
this time that were great and hearing feats by definition were great engineering feats because they're still
00:09:33.270
around today but this one's a look into how these projects can be started and fail because of ineptitude or over
00:09:40.770
ambitious people so in the 1990s like we really want to preserve this building and in 1990s it was determined to be so
00:09:48.410
immensely unstable because the pillars had been measured to have moved 30 centimeters and they wanted to do
00:09:55.740
something about it so this building could still stand so why is it so unstable why's it so weak and why was
00:10:01.980
this project so difficult to be realized the building is a perfect storm of poor architectural plans different architects
00:10:09.150
hacking on the same building no real ownership of the projects architects coming and going over the centuries
00:10:15.480
which by the way means they have much different styles and the fierce skål winds wind would call force winds that
00:10:22.440
come from the English Channel that are less than 100 miles away so basically the cathedral might as well have been
00:10:27.750
made out of paper mache it's on the World Monuments fund list of 100 most endangered sites but today the Cathedral
00:10:35.370
is more stable than it has ever been thanks to a team of researchers from Columbia University so what did they do
00:10:42.300
they did what you would expect someone to do who needs to repair a weak structure they study the structure so in
00:10:49.260
2001 a team of Columbia University from Communiversity went to Beauvais to acquire 3d range scans and imagery of
00:10:55.950
the Cathedral the goal was to create 3d model of the cathedral to assist historic preservation efforts including
00:11:03.079
structural analysis of the Cathedral so for 10 days they roamed around the Cathedral using instruments to record
00:11:09.629
digital images of its facade and interior by bouncing laser beams off its surface they returned to New York City
00:11:15.660
with 75 of these scans each one containing more than a million data points and remember this is 2001 so 16
00:11:22.800
years ago and at the time like we could probably do that with our iPhones now but at the time this was the largest
00:11:29.220
structure to ever be scanned with that yields are the most amount of data and this is a combination of all those scans
00:11:37.189
from the data that they collected so here's the flyover of the Cathedral this
00:11:42.629
is what the image that they were able to collect looks like and as you can see it's only a small portion of what the
00:11:48.990
original Cathedral was meant to be but the structure is really large and
00:11:54.329
complex and has a lot of cavity it's not just like a block you know like there's a lot going on in this Cathedral and
00:12:00.899
then this is the inside so I did my undergraduate education in art history in computer science and actually took
00:12:06.600
this professors class and he showed us this and I was like super excited because I was like this is why I'm doing
00:12:13.050
both of these fields you can do things like this and preserve cultural heritage and so um that just as an aside the the
00:12:22.470
reason this Cathedral was meant to be so large or like what motivated that was Gothic architecture part of its
00:12:28.980
principle was too especially with cathedrals was to elongate the structures they felt closer to God we
00:12:34.860
had this sense of being in this infinite space and so that's why the bishop was particularly hubris in doing this
00:12:41.339
because he was trying to bring himself too close to God he was flying too close to the Sun because of the model that the
00:12:48.509
team of researchers was able to create the support beams have been able to be installed in the right places restoring
00:12:54.569
stability physics exceed role and allowing visitors to appreciate the ambition and engineering of the graphic
00:12:59.790
builders 700 years ago and also for academics to study how this project was
00:13:05.399
started and failed so what do the Beauvais Cathedral and humpty-dumpty have in common
00:13:11.350
both were in need of being put back together for stability to be re-established so this system in
00:13:18.100
particular has been restored to better order in stability because as we said it's improbable not impossible
00:13:24.960
furthermore what if we are an interested in restoring the system to its original state what if we want to alter it
00:13:31.120
arranging the pieces to make it even better what is breaking something allows you to rearrange the pieces so that can be even
00:13:37.390
more structurally sound does this sound familiar to you well it certainly sounds
00:13:42.640
familiar to me because otherwise we wouldn't be doing this talk and it's something I've had to think a lot about
00:13:48.340
lately particularly with so recently I had to study the structure of
00:13:54.370
this project break it a little and then rearrange the pieces that was inherently stronger I'd even argue that I decide
00:14:01.630
the second law of thermodynamics and the entropy has been decreased in this system who would disagree that their
00:14:09.940
projects entropy increases over time so who thinks their projects entropy decreases over time with no work right
00:14:18.720
so I maintain active records replacement for using MongoDB with rails it's called
00:14:25.330
Mon goid it's actually 10 years old which is basically 700 years in Cathedral years the first version of mon
00:14:33.010
good version zero point 2 point 5 was released by whom someone who's now my colleague during Jordan it's the
00:14:40.870
original author and by the way on the original documentation site of Mon droit he said long guard was conceived one
00:14:48.940
late night in February in somewhere in Florida after five glasses of whiskey
00:14:56.380
and looks like pretty much to seeing looks like how Mon going was built just like someone on whiskey version I mean I
00:15:03.010
loved her and he's amazing but we're talking about during 10 years ago version zero point two point five was
00:15:08.890
released by Durham on October 1st 2009 version zero point two point six was released on October 1st 2009 version
00:15:17.020
0.8.1 was released on October 1st 2009 this sounds like any Cathedral Xena
00:15:23.420
the MongoDB server version at that time was less than one point 2.0 I actually don't know what version it was because
00:15:28.970
in our project matching project tracking tool the earliest version recorded is
00:15:35.450
post mongoloids first release and so for reference MongoDB server version the
00:15:40.550
current version is 3.4 so then get one point 2.0 it was still in this phase where we had this feature that it
00:15:46.190
dropped your data anyway monka continued to be developed by
00:15:51.710
Durand and also by the way I'm on gonna be doesn't jacquard data I don't know if you've like or anything in the last five years but we've solved that problem
00:15:59.380
anyway Mondrian continually developed by Duren who is working at Sound Cloud in Berlin in his free time it was a true
00:16:05.390
open source project for many years and that many people contributed many pull requests are open and merged many
00:16:11.270
discussions were had in the github issues list many people solve approximate problems but nobody has a
00:16:17.060
big picture it was built when the Hmong observer was quite simple compared to what it is now there weren't many features or even
00:16:23.720
replica sets at the time so the history
00:16:29.690
of this this project and the complexity of the ecosystem built around and
00:16:35.810
how it fit into rails and how to use the driver is really complex and it might sound familiar to you if you're working
00:16:41.570
on open source the first version of Mon right so like following along with his diagram anything gray is not developed
00:16:47.990
by among way to be Inc the company that I work for anything in color is so the first versions of I'd use among me
00:16:55.370
to be inks Ruby driver the 1x series this is the driver that I was hired to work on five years ago at the time I
00:17:01.370
joined the company during had just built his own driver called moped because the official monami driver hadn't developed some features who was hoping to have you
00:17:08.120
know sent back and forth and some friction so he was because the server was kind of simple at that time he was like okay I'm just going to build my own
00:17:13.370
driver so I don't need to like have this extra level of of diplomacy to to get
00:17:23.240
changes to move forward with ma annoyed so at that time amongst the the Ruby
00:17:30.560
offering if you're using MongoDB with rails was entirely was developed entirely outside
00:17:36.419
of MongoDB Inc and wasn't developed by anybody who's actually paid money to do
00:17:42.870
it so at MongoDB at the time we knew how important monrad was to the Ruby community basically if anybody wanted to
00:17:48.960
use moong would be with rails went through Mon going and basically anybody wanting to program in Ruby was
00:17:54.299
unfortunately as someone said yesterday using rails so by the transitive property anybody wanting to use mom mom
00:18:01.200
gonna be what Ruby with MongoDB would have to go through any code that wasn't actually developed by the company that
00:18:07.230
make sense the company was growing as were the features of MongoDB and the sophistication opacity of behavior so
00:18:15.330
it's really difficult for someone in the open-source community to keep up with what the server was doing because they didn't have that insight then I have
00:18:23.190
written for the company where they're like inside our knowledge where you know what the row vamp is you know what the
00:18:28.289
internal issues are what the priorities are you can walk over to our server engineer's desk and ask them about something specifically because MongoDB
00:18:35.820
has a lot of quirks a lot of between server versions the implementation of certain features can differ wildly so
00:18:44.010
sure enough at that time on Boyd's issue a screw and the projects are to lose traction and Trust in the community
00:18:49.830
because it just couldn't react fast enough in 2014
00:18:55.139
the one next driver need to rewrite and so we started a great opportunity to approach turn and say hey do you want to
00:19:00.330
come work at MongoDB we can build a new driver and then we can't mon going to use that new driver a number taking offset burden from your side because we
00:19:08.220
can maintain the driver so he was up for it in 2014 he joined us and he and I
00:19:14.490
work together to build a new driver which is kind of my way to get to Berlin and it was the gem version 2.0 and
00:19:21.779
then in doing that we were able to we decided that we bring Eden in
00:19:27.240
house as well so then became an official project since then Turin has moved on to work on another team at
00:19:33.510
MongoDB competence if you're familiar with our products there it's a GUI for navigating your data in collections and
00:19:40.370
I've taken over and the driver and just a little aside just and like show
00:19:45.690
you how this is actually simplified version of the story there's also a gem called origin which is the DSL query
00:19:52.950
language for querying MongoDB that was a separate gem but in versions 6 oh I
00:19:58.830
brought it into the codebase because I realized not a lot of people were using it independently so that's
00:20:04.350
super-complicated also so like for example if I need to fix a bug in language 6 I can do it in Mongoloids
00:20:10.470
codebase and then if I want it back porn it I have to go and release a separate version of origins so now that
00:20:18.780
and the driver are back together again they're getting along quite well except for the occasional bickering over who
00:20:23.970
does the dishes the work is done the relationships going well but a lot of
00:20:30.210
baggage has been brought back into the relationship by Mon droit so at first I
00:20:35.340
was excited about all this everything seemed so clean and centralized and I was excited to start working on one
00:20:41.100
going in the driver and that turn would be moving on to another team so I'd have more responsibility but I quickly
00:20:46.559
realized that I inherited a ton of work namely there were 100 99 problems and
00:20:52.440
they were all issues we imported the github issues list from Hanoi for Hmong going into JIRA it was a
00:20:59.010
disaster I almost had a heart attack there are a ton of issues and I didn't think I would ever get through them I
00:21:04.500
think there actually were 199 a lot was broken the project with some cases the
00:21:09.630
community was fragmented how could I bring the project back into good standing with its users restore trust
00:21:15.179
and communication how could restore its structure and reduce entropy hopefully restoring entropy to its original state
00:21:22.020
was it possible to make an even better than it was before so I did what
00:21:27.059
the king's men and women tried to do for Humpty Dumpty I did what friends the world's monuments fund the Columbia computer science team tried to do I
00:21:33.809
study the structure identify the pieces the weaknesses and I tried and I kind of
00:21:39.600
succeeded to put mine going back together again so how did I do this I'm going to spend
00:21:46.559
a little bit of time talking to you about how you can take an existing project because I I'm sure you all have them who are in dire need of a refresh
00:21:54.030
and put them back together there are many presentations and books on how to refactor the problem is solved
00:22:00.419
and no need to reinvent the wheel or retell you a lot of the things that you can just look up or watch other
00:22:05.480
presentations on every type type of code smell is identified and recipes are
00:22:10.610
given for refactoring the definitions can be overwhelming but who can really apply them perfectly like I read the
00:22:16.520
definitions too and I would like identify some of those things in my code basis they couldn't kind of like this
00:22:21.760
equation on the second law of thermodynamics it's a guide for how to understand the concept but it can't actually be applied in practice so I'm
00:22:29.450
going to tell you much more human story of how a refractor and and put it back together again because it's a very real project with
00:22:35.450
very real problems I'm going to share with you some tricks and things that I did that I applied to my process we look
00:22:43.910
at how I studied the structure then we'll talk about refactoring and there's definitely a way to refactor and many
00:22:49.610
better ways to refactor finally we'll talk about how to avoid landing a project slipped in into this date in the
00:22:55.040
future so regardless of whether you're an open source project maintainer I think you'll find that a lot of what I'm
00:23:00.980
about to say can be applied to your own projects raw maintainer x' of some legacy codebase some pre-existing
00:23:06.260
project I bet you agree that the entropy and disorder of your zero code base increases over time but I do think that
00:23:14.300
we can pause repair and restructure our code bases to actually be stronger than they were before we started again the
00:23:20.930
second law of thermodynamics is it's improbable but not impossible to restore system to its original state
00:23:26.120
we're engineers and we put our minds to something we can make it happen so one great structural analysis I spent
00:23:33.170
a while dressing bugs in mon guide one-by-one going through those 199 problems because I didn't have a good
00:23:38.600
sense of how everything worked and at a time that turns Altman goid mentor program was really popular so he did
00:23:44.900
things like yeah I don't wanted fascist sorry we can talk about that later I
00:23:52.040
mean as I said during its ten years ago so I glad he doesn't really watch
00:23:57.140
conference talks that much but I knew in the back of my mind I had to build up a familiarity with the structure of the
00:24:03.470
code base so I took notes in the code in a notebook like literally with a pencil on how everything worked together
00:24:09.200
I drew diagrams like an architect I step through the code with pride and wrote down the call stack
00:24:14.419
as I said before many solutions were applied that approximately solve problems but because not many people had
00:24:20.480
the full picture so like typical case obvious cases of pull requests fixing something very specific it's really
00:24:26.659
important to have a mental model of how a code-based works in order to make high-quality changes luckily Dern also
00:24:32.389
had my back in this case as I said he was still at the company so me trying to figure out why something was changed
00:24:38.720
with it wasn't good enough to look at get blame I could look at get blame and say like hey Gen why did you do this and
00:24:44.480
who'd give me this like whole story and luckily had a good memory and a lot of stories and so that was I recognized
00:24:51.080
that was something that not everybody has like that resource but I was also really good for helping me understand
00:24:56.779
the history of this project so the like one thing that that made this reflection
00:25:03.379
team possible to me was grouping your issues into category my issues into
00:25:08.690
categories if you categorize the issues you can see where the hot spots are and
00:25:13.850
focus on them when rebuilding repairing the structure so 3d models of a cathedral was necessary for the same
00:25:19.460
exact reason in particular with I realized most of our issues had to do with the behavior of related objects so
00:25:25.070
I created an epoch in JIRA to track all those issues related to relations bugs and so when I say relations object it's
00:25:31.519
when you define a model and you say like a book has one off two author there's a
00:25:37.070
macro that runs and it creates this object called relation and it saves it as into this global variable on the book
00:25:43.489
class and that object itself is what caused a lot of problems and I tried to
00:25:49.279
cluster and categorize my issues around that one thing so that when I focus on refactoring it I knew what its needs
00:25:56.359
were stepping through major code paths and taking notes that's really important
00:26:01.879
also choose chrome pads that you don't understand and step through them with pry I know they're scary and it's really really tedious but it's really helpful
00:26:08.749
to do that and as I said there's a lot of meta programming so that means it really opaque really difficult but I
00:26:14.330
took notes in the code with with comments as well if something was for example an attribute accessor in one
00:26:19.340
file language and it's structure is made up of behaviors and different modules so
00:26:24.889
there were like a lot of different files that define a lot of different things about this one document class and so I would I
00:26:30.530
pepper the codebase with a lot of notes so so if I was following code Beth and
00:26:35.840
then I saw a variable I would say like this is defined in X module and that really helped me to understand the shape
00:26:41.360
of the code base and then lastly draw
00:26:46.460
diagrams yourself like literally with a pencil like an architect it was really helpful to do this as well and seeing
00:26:52.130
the structure visually helps you I mean again by coming back to art history it sounds like a sculpture you really like
00:26:58.460
there is a shape to your codebase and you want to understand it so after I did all of that
00:27:03.860
what did identified was a weakness so I'm going to give you a concrete example like making that relations issue and I
00:27:10.070
built that epoch around like more concrete so you can follow along with it and see how I focused on one element of
00:27:16.700
the codebase that was the weakest and I spent on which I spent them with some refactoring after IRA factored this one
00:27:23.180
thing I was able to close about 40 issues which at the time that I was doing this was 50% of our issues so I
00:27:29.000
was really happy about that identified that we had one object that contained all the information about the
00:27:34.430
relationship between two models in I'd it was called metadata and it was
00:27:40.190
inherited from a hash so essentially was a hash it is basically like the laziest class you could ever have because it's
00:27:46.220
just keys and values with no specific logic or behavior so like a nightmare it
00:27:51.560
was an object created when the model was loaded so like when you write that actual release relation in the model
00:27:57.200
class it would create this metadata class which was just a hash so like
00:28:02.690
writing books has one author would use a macro to create this metadata object sticking onto the book class and that's
00:28:07.850
what it used throughout all of the code to determine what behavior an instance of a book should have or even the class
00:28:14.690
itself if you're querying or whatever so in code small terms this is a classic
00:28:20.210
bloated smell this class knew and did way too much I'm sure there are tons of other code small terms you can apply to this as well so this is a mandated class
00:28:28.970
definition does anybody notice something alarming about this comment the Grand
00:28:34.220
Poobah of information about any relation in this class it contains everything you could ever possibly want to know
00:28:40.460
and by the way possibly was spelled wrong which goes back to what I was saying about this being a whiskey
00:28:46.790
project port niranda as you can see um
00:28:52.820
it was basically like an Eightball like you just asking anything and can give you the answer and it's totally random
00:28:59.440
writing simple code is important but let's define simplicity simplicity mean
00:29:04.640
that we should have the least number of classes we should doesn't mean that we should savor one basic object over
00:29:10.160
multiple smaller different objects is having one metadata object saving all
00:29:15.260
information about every type of relation the simplest and thus best design design decisions I understand do involve
00:29:21.680
trade-offs we are so frequently chance DRO I don't repeat yourself but sometimes we need to introduce a little
00:29:28.010
bit of duplication in order to have a simpler design preferred duplication over the wrong abstraction I'll give
00:29:34.940
some examples of how the metadata object was used so you'll see how it became obvious what rearranging had to be done
00:29:41.090
even without understanding anything about I think you'll you'll recognize what patterns should that kind
00:29:48.020
of come out of this code and what needed to be done to restore structural stability to Mongoloids codebase but
00:29:55.160
before I do that I just want to say briefly that ed has two main types of relations because MongoDB is the
00:30:01.550
document database it has reference relations which is what you would recognize from active record
00:30:06.620
so it's straight up like reference relations or IDs foreign keys saved on
00:30:11.630
objects peasant belongs to many is has money through but there's no join table because ma going to be has a feel that
00:30:17.600
can be an array so it's just saving a raise of the related objects on either end and it's kept in sync and then
00:30:23.600
embedded which is a pretty self-explanatory you can have embedded documents among going to be so you have these types which implements that
00:30:30.430
relationship between embedded and parent documents so this is one example
00:30:35.570
instance method it's called the term in foreign key and actually what I was reviewing my slides this morning I
00:30:41.200
didn't even see the first line says determine the value for the relations foreign key performance improvement what
00:30:47.000
I don't that something would have to add some of that at first for the life of me
00:30:52.490
I could not understand going on basically it's a know off it's a relation is embedded but embedded
00:30:58.879
relations don't save foreign keys because they're embedded they don't need them so like why would it return a foreign key in its options why would
00:31:04.999
even allow an option of a foreign key if it was an embedded relation um yeah so
00:31:12.499
the this doesn't count as the Sandi Metz quote by the way but Sandi Metz says this thing where like if you squinted the code you can kind of see the
00:31:18.529
structure and the shape will come out at you and so I kept sprinting my eyes obsessed thinking like maybe something
00:31:23.720
would come out of it but I didn't really see much else besides what was there but
00:31:28.789
I did notice that basically there are a couple of things like when you're doing a lot of refactoring you get pretty good at recognizing these hidden patterns and
00:31:35.299
so when I came to this one thing I noticed was in my refactoring mindset is
00:31:40.639
that like first of all before and key option is said it's returned nothing
00:31:46.249
else is done if the object is embedded it returns nil and it should probably be before it checks if there's an option of
00:31:53.599
a foreign key set and the last thing is the relation object knows something the meta data object doesn't so it uses the
00:31:59.840
metadata data or the meta metadata metadata but and also the other thing
00:32:06.559
which I want to add to this list is that this metadata object is supposed to be
00:32:12.409
the relation but there's also a relation object saved on the metadata class so why aren't so it's conflated like so
00:32:18.679
there were actually these relation objects but had different behavior but I didn't see why we needed to have this
00:32:24.139
metadata class that we could just have these objects that have their own behavior so here's another method just
00:32:30.769
to give you a sense of how sticky this code base was it's used to get the names
00:32:36.109
of an inverse relation given a certain relation the first method checks if the type is polymorphic I know it's kind of long but
00:32:43.369
its itself polymorphic lookup inverses and then otherwise it determines in warth inverses and and when I looked at
00:32:50.330
those two methods they had a lot of overlap in logic so I it was really difficult to determine what logic should
00:32:57.349
be extracted if some of the checks were repeated after they've been branched
00:33:03.320
but basically from the point of showing you this is to show you that it's pretty clear that the metadata object was
00:33:09.110
begging to be refactored into smaller object oriented objects the entropy or disorder of the system was way too high
00:33:15.559
any bugs having to do with this code were virtually impossible to fix and the structure was weak and what's obvious
00:33:23.179
need was for there to be a reference and embedded namespace with objects that knew that they were referenced or embed
00:33:28.250
and had their own behavior so embarked on a journey to refactor the metadata object into different objects
00:33:35.059
under the namespace reference and embedded how did I do this did I do it all at once did I read a lot of books and learn
00:33:43.880
about how to do this perfectly and then apply those practices so I had a couple of false starts I bought martin fowler's
00:33:49.610
or factoring book but I honestly don't really get through much of it kind of wanted to learn on I was doing it I
00:33:56.390
talked to my manager a lot had some nervous breakdowns but I learned there are a lot of wrong ways to reflector and
00:34:02.330
a couple of right ways are better ways so it's really important to do proper
00:34:08.690
refactoring not random factoring I like to think of in terms of the health of a
00:34:14.570
project this is something very similar to the way there's something very similar between the way every factor and
00:34:20.540
work on my code bases and how I design my weekly exerciser team I always ask myself when making changes to a codebase
00:34:27.109
is this a healthy change is this a quick fix like a piece of candy or a bag of chips that has a short-term payoff like
00:34:32.960
it's really yummy right now but I know in the long term this probably isn't good for me we all have to reflect her at some point
00:34:39.830
it's really important to have a plan designed for what you're going to do refactoring should require the same
00:34:45.409
effort and process that you apply to building something from scratch I think sometimes we forget that just plowing
00:34:52.010
through and trying to fix everything you can along the way it's definitely one of the wrong ways to refractor so at this
00:34:58.250
stage in the repair of I had done the structural analysis and identify the weaknesses the next steps was to
00:35:03.380
reflector with a plan over the course of my resetting of I learned a lot
00:35:09.950
and I'm going to share some of the highlighted steps with you again I'm not going to go through like recipes or series because you can read about that
00:35:15.500
and it's thing we've talked about a lot and because it's something we do a lot but I'm going to show you a couple of things
00:35:20.840
that I did using this metadata object as an example because I think it's like the classic case of something begging to be
00:35:26.540
refactored as I said we can yeah read
00:35:32.210
about this but I also watched a lot of talks along the way for guidance I'm not
00:35:37.520
saying that you should just dispose of all of the theory I think it's really good to know it but when it like shares
00:35:42.770
something that's kind of not really something things that I read or heard about so the things that I learned were
00:35:51.860
one to perfect her one piece at a time use tests at every step and don't fix
00:35:59.030
bugs this one was really important so Martin Fowler we know this we can probably recite it in our sleep
00:36:05.000
defines or factoring as the process of changing a software system but in such a way that does not alter the external
00:36:10.730
behavior of the code yet improves the internal structure so if we rearrange the system so the external behavior
00:36:16.760
doesn't change it doesn't matter if we rearranged one corner of the system and then another corner of the system and do
00:36:22.280
it piece by piece because the outside behavior is not going to change so what
00:36:27.320
I did was first define a namespace called reference and create a class called belongs to which seems like a
00:36:33.620
really obvious way to refactor this I returned this object when a model was defined as a belongs to relationships
00:36:39.890
and make sure all the tests pass before moving on to create another object the largest benefit of rearranging the
00:36:45.410
system piece-by-piece is you can test out different designs and not waste too much time overhauling everything only to realize
00:36:51.710
your new design doesn't work agile principles aren't only for building new
00:36:57.620
things so as I said you should apply the same practices to refactoring as you do to building something from scratch
00:37:02.990
I ate it over my reflector design I tried out different hierarchies I tried creating classes for things like a
00:37:08.930
builder so a builder is something like if you have a book and you build author there's a builder thing MIT did that for
00:37:14.780
you if you bind something it would be book author equals another author and so those things are objects originally and
00:37:21.320
I tried out having them be objects but I thought it would be much better if there were modules because their behavior and
00:37:27.860
things that do something once like there's no to save instance variables on that builder because it's created as a side
00:37:36.020
effect or byproduct of building an object or binding it to another one
00:37:42.430
secondly before you begin to refactor make sure you have a solid suite of tests tests are the wallet or back
00:37:49.030
reflector your tests simultaneously with your code I can't emphasize this enough you want people to figure out what went
00:37:55.460
wrong with your design if you do all of your refactoring then run the test and realize they're not passing so this is
00:38:00.770
just an example of that same look up inverses that as I showed before I
00:38:06.110
decided that I wanted to I wanted each object to know what their compliments were and so this is just one example
00:38:11.930
like you can you have to port your existing test so you also have to write new tests and this was a new test I had
00:38:18.230
to write because when I reflector each relation object had to be able to ask
00:38:23.420
another relation if it was a compliment of itself so that's the test that I wrote for that so it's really important
00:38:29.870
to add those tests in as well and the last thing is don't fix bugs so along
00:38:35.120
the way I would I I had never I was really familiar with this list of bugs in JIRA and I when I
00:38:42.290
was reflecting the code I would sometimes find the places or the sources of these bugs and I would I was really
00:38:47.750
excited to find these places and I really wanted to fix them but it had to be really self discipline about not
00:38:52.910
fixing them and saving them for later so this is just one example when you have a list of embedded documents and a parent
00:38:58.880
document I'd would allow you to append that same document with the same ID onto that list and it's a pretty
00:39:06.080
simple bug nothing really that exciting but when I was working on this code this is the the binder object and it would
00:39:15.140
allow you to append that that embedded document twice to a list and I when I was refactoring about this line I was
00:39:20.840
like wow that's where this is happening but I was like I'm not going to do it I'm going to do it later but I would
00:39:25.880
note it down in the JIRA ticket like where to go to fix it and also this idea
00:39:32.600
became crystallized for me by my manager who's he maintains a Java driver and so
00:39:38.030
he has a different way of thinking than I do he's he has this like really booming
00:39:43.230
godlike voice that makes anything he say sound extremely significant but it is
00:39:49.920
significant and in this case we have these one-on-ones every other week and one time he knew how much work I was
00:39:57.480
putting into referring this project and he also knew how many bugs I had to get through in the in the bug list and so
00:40:04.440
this one time he was like so Emily you're fixing you're doing all this for
00:40:09.480
factoring I recognize it's a lot of work are you fixing bugs as well and I was like and I got really defensive I was
00:40:15.480
like no I'm not fixing bugs like I don't have time for it I really want to finish the reflection before I do that he was like good you should never fix bugs
00:40:22.140
while you're refactoring so it's like okay test Pettis you like to ask these
00:40:28.380
like trick questions and make him think really deeply and like he's amazing and great so the crucial pork point was
00:40:34.980
perhaps the one that took the most self-discipline as I said I I really had
00:40:41.550
to like tie my hands behind my back when I was doing a lot of it so the last
00:40:47.369
thing that was part of this restoration is not to is to not always discuss in
00:40:53.520
it's not always discussing books or refactoring presentations nor is it specific to an open-source project it's
00:40:59.820
not of responsible maintenance and restoring user trust so this is kind of
00:41:05.160
the same idea of like sustainable farming or responsible farming where farmers don't use chemicals or things
00:41:12.570
that harm the environment for short-term benefit and financial payoff like like
00:41:18.630
red or tomatoes that are at the expense of the soil and the long-term the
00:41:24.630
longevity of Sur land so it's kind of same idea with your project you want to make sure that everything you do is with
00:41:30.750
the long-term and the health in mind so it's just like starting a new exercise
00:41:36.030
or eating regime improving I didn't mean applying this quick fix like running for a couple days and then like
00:41:42.530
calling that my exercise regime and stopping for a while I had to establish healthy habits going forward for the
00:41:49.410
codebase this meant properly categorizing issues as they were open in the deer project it
00:41:54.570
meant responding to users right away in order to get most relevant information even if I can fix the issue right away so that I could
00:42:01.260
reproduce it because that's part of the problem with all these issues that were imported into JIRA from github there's a
00:42:06.330
lot of people like kink even code anymore it's like going back to 2014 2013 so I knew these problems existed
00:42:12.300
but I didn't really know how they got themselves into said hole and so release
00:42:19.200
notes documentation basically any interface in the community had to be kept up to date so that people knew that was alive so for example I
00:42:28.290
made sure our API documents the docs for links for a main documentation because a lot of people also were getting confused
00:42:35.490
with it's old documentation which was still around had to make sure the documentation was really specialized and
00:42:41.310
obvious I release new versions regularly I make sure that I'm always in step with
00:42:46.650
rails and I respond to relative movement in our march forward
00:42:53.460
I follow semantic versioning closely I make sure to tweet and I send out
00:42:59.610
announcements on google routes so that people know the project is always moving forward and that they can trust that
00:43:04.650
ID is active that there's someone working on it and that's alive as I said
00:43:09.840
and the benefit of having worked on something working on something that was
00:43:15.380
quiet and passive for so long is that people don't realize I'm paid to work on
00:43:21.360
language and so when I respond to them right away they're like so happy to get a response and like like thanks so much
00:43:27.450
for work and when I compared to do this no but it's great like I think people I
00:43:32.580
can tell people are really happy that finally like the project is treated like it's that it it's being responsibly
00:43:40.500
maintained so after all this work had I succeeded in reducing the entropy of the
00:43:46.440
project how did it compare to the entropy of the project before these changes entropy is pretty abstract
00:43:53.070
concept as we've seen and measuring it in the context of code bases seems even more intangible as we heard in the
00:43:58.830
beginning of the presentation entropy is measured in terms of the number of ways in which a system can be arranged we
00:44:04.620
can't quite measure this in a code base but I did however need to prove somehow to myself my community
00:44:09.650
my manager that my time spent was time well spent but as your entropy in these
00:44:15.109
three ways and also other ways I think about this constantly to help myself like make sure that is always kept
00:44:23.119
at a stable state so how difficult is it to make changes when you have a bug sure
00:44:28.490
it's difficult to find the source of the bug but how difficult is it to actually fix that bug you have to fix it in five
00:44:33.619
different files do you fix it in one place and then run the test and cross your fingers and if it passes and move
00:44:38.869
on do you understand the structure of the codebase enough to be confident that one fix is the only place you need to make
00:44:46.609
that fix and then the other thing is can you explain the design so this was something I had to do again involving my
00:44:52.970
manager he we were looking for someone from internally for MongoDB join the
00:44:58.849
Ruby team and someone who's a little bit newer to coding into the company and so
00:45:04.700
he said to me he was like in preparation for talking to this person he said like why don't you write down everything you need to know everything you know working
00:45:12.770
on this project in something and everything someone should know joining this project and so I was like it's a
00:45:18.950
lot but okay I met with the the guy who is going to join the team in a Senate about an hour just explaining the
00:45:25.930
complexity of the gem dependencies and what projects I was maintaining and what
00:45:31.160
had where the tentacles were between the gems and and I so I did that and I was
00:45:38.869
like I don't know if I can write down everything I know about working on Ruby but but explaining them ongoing codebase
00:45:45.680
I can definitely do that now and I couldn't do it before because I I both didn't really have that mental model and
00:45:50.839
I also didn't think there was much structure and then the last thing is how is performance
00:45:56.390
slow performance is another indicator of high entropy in your system the Messier and more inefficient code paths the
00:46:02.809
worst performance will be previously doing this refactor our test suite would take between three and four
00:46:08.150
minutes after this refactor taking hate so I was freaked out for a little bit
00:46:13.490
and I was like wow that was a lot of work for nothing but it's because I introduced a thousand new tests and they
00:46:19.369
all dealt with creating classes and creating relationships which is actually quite code heavy and involves creating classes
00:46:27.079
so it was understandable the test suite got slower but it made me realized that I needed to do some benchmarking and
00:46:32.809
luckily we had a pretty rigorous benchmarking sweep and I was able to use to confirm that actually made the
00:46:38.420
performance a little increase performance slightly so that's really important also like sure you can do this
00:46:44.839
refactoring but like make sure you always benchmark before and after so in
00:46:49.969
the end I'm able to say with confidence that I reduced entropy among good and I'm particularly happy that allow other
00:46:55.459
engineers easily join the Ruby project and potentially open poll requests and makes the codebase less opaque this work
00:47:02.839
has also shifted my perspective and I think differently about my projects so again my manager likes athletes with
00:47:08.839
trick questions and or impart wisdom on me and about a month and a half ago
00:47:14.239
again in our one-on-one meeting he was like um he so I come into our meetings
00:47:21.199
like with the things that I've been working on to tell him an update he was from the Java driver so you doesn't really know he doesn't really track like
00:47:27.799
every commits I do in every JIRA ticket update that I do so he comes into our
00:47:33.589
meeting he's like so Emily what did you do today to make Mon boy better and I
00:47:39.259
was like I have to my list any of these make my grade better so now I always think about that like I go into my
00:47:44.539
office in the morning and I ask myself like what am I going to do today even if it's just a little thing to make mungo
00:47:49.880
better and then when I leave I'm like anything to make one way better so I encourage you to ask yourselves when you
00:47:55.999
go to work on Monday before you start coding what am I going to do today to make my codebase better so on that note
00:48:07.329
I'm going to remind you for people of all people came in late you might have
00:48:13.880
disqualified yourselves or at least made it so you can't email me but email me with any Mets coast to get mustard and
00:48:20.959
the other thing is if I could hijack part of my question session and ask you
00:48:29.869
a question um can everybody take out their phones you got your phone and put it in the air
00:48:38.220
and turn it on and sing along one no just kidding
00:48:43.650
can you can you open your email clients on your phone and write Emily at MongoDB
00:48:53.760
is the receiver you don't actually have to do this if you don't want to but so
00:48:59.820
right I'm doing this is also so that I know how long it takes to okay so right knee is the receiver and then write to
00:49:09.150
me one sentence that like one sentence that thumbs up if you use MongoDB and
00:49:15.660
you no longer use MongoDB why if you use MongoDB and you still use MongoDB why
00:49:21.270
and if you've never used MongoDB and you don't want to use mom going to be why just one sentence and that would help me
00:49:28.470
so much because so on this note of making I'd better I have been
00:49:34.860
tasked now that it's kind of it's easier to fix bugs and it's a healthier project and other people can work on it with me
00:49:41.600
it's not so much of a black box anymore I I know that in the Ruby community
00:49:47.160
MongoDB is not super popular it's not the default database and I attribute it
00:49:52.590
largely to the mindset that we're all in when we write web up that is created for
00:49:59.100
us or at least imparted on us or talk to us by using rails we all think
00:50:04.590
relationally for the most part and MongoDB challenges that and makes it kind of confusing because it's kind of
00:50:11.580
relational in some ways and kind of not and so I know that it's there's a learning curve and am ongoing has always
00:50:18.780
followed for the last 10 years this philosophy of following exactly what active record does to reduce the
00:50:25.680
friction and the the learning curve if you're going from a relational database
00:50:31.020
to MongoDB but I'm now I'm pretty torn because I think that I can tell from all
00:50:38.280
of the issues that a log and the way people ask me questions that having that be the philosophy of makes it so
00:50:44.820
that people don't really learn MongoDB and how do you sit properly and so they end up building relational schemas with MongoDB which is
00:50:52.950
not always the best solution and you end up doing many more requests to the database because we don't have joins when you want to do and that you
00:50:59.760
wouldn't do if you had a post-grad second for example is for example so that's that's one philosophy that's what
00:51:06.270
it's always been doing and I know that the switch to MongoDB is not that bad and if we follow this path but now
00:51:13.230
there's this other path where I think that maybe I can either build a new audience for rails or for Rudy or adapt
00:51:21.900
Aman going to make it more modular so that you you're not imposed you're not
00:51:27.830
being put into this relational mindset automatically so I don't know if I should make it like totally its own
00:51:33.990
thing and have an recognize that's a
00:51:39.420
learning curve but then be higher because like it's a trade-off like I might have fewer users in that way because the path of entry is much harder
00:51:47.160
so I I can't determine that just from because by definition the only people I
00:51:53.370
hear about our people using so I don't really know why other people who aren't looking at MongoDB as a perfectly
00:52:01.200
acceptable option of database for rails why they're not using it I can determine
00:52:07.020
that without you all telling me why so please email me and I have a thick skin
00:52:12.240
like I can handle it you can tell be as brutal as you want I wouldn't be working
00:52:18.060
for MongoDB or sending on the stage if I took everything personally for working
00:52:23.580
for money for five years remember what everybody said of long going to be five years ago so please be honest with me
00:52:29.460
it'll help me a lot and will help me make one good better that's my question for you I don't know if we have time I
00:52:35.640
wasn't really paying attention to time for questions yeah so we're running a bit late oh okay
00:52:41.160
we can still take some questions if anyone has them although I'm happy to take questions later I know um if I'm
00:52:49.980
know how humans Minds work you guys are doing the single thing where you're replying to the emails right now so I
00:52:55.530
don't know if you can think of any questions but you have an email address so the questions can go by the email
00:53:02.140
yes gonna do the question of blame I think yeah okay so thank you very much