Summarized using AI

Keynote: Refactoring Humpty Dumpty back together again

Emily Stolfo • June 23, 2017 • Singapore • Keynote

In her keynote presentation titled "Refactoring Humpty Dumpty Back Together Again" at the Red Dot Ruby Conference 2017, Emily Stolfo explores the complexities of refactoring legacy code while drawing parallels between the nursery rhyme character Humpty Dumpty and software development. Stolfo, who works for MongoDB, begins her talk by discussing the second law of thermodynamics and entropy, emphasizing how both Humpty Dumpty—who represents fragility and disorder—and software systems can fall apart and require significant effort to restore order.

Key points discussed throughout the presentation include:

- Entropy and Order: The second law of thermodynamics states that systems tend toward disorder (high entropy), and restoring them to their previous state is unlikely but not impossible.

- Case Study - Beauvais Cathedral: Stolfo uses this example to illustrate how long-term projects can face structural failures but can be brought back to stability through careful analysis and restoration, much like refactoring code.

- Refactoring Process: She shares her journey with a project called 'Mongoid', explaining the importance of understanding the structure of the existing codebase, identifying weak points, and categorizing issues for targeted improvements.

- Approach to Refactoring: The presentation highlights several key strategies:

- Conduct thorough structural analysis before making changes.

- Organize issues into manageable categories.

- Avoid quick fixes that provide temporary relief rather than long-term stability.

- Implement changes incrementally to maintain the integrity of the system's functionality.

- Communication and Maintenance: Stolfo emphasizes the value of clear documentation and effective communication with the user community to foster trust and improve the project’s health.

In conclusion, Emily Stolfo encourages developers to view refactoring as a necessary process to enhance codebases, reduce entropy, and make systems more robust. She advocates for maintaining a long-term vision in software projects, akin to sustainable farming practices, to ensure lasting quality and stability. She underscores that while restoring a codebase can be challenging, it ultimately leads to healthier, more manageable projects.

Keynote: Refactoring Humpty Dumpty back together again
Emily Stolfo • June 23, 2017 • Singapore • Keynote

Speaker: Emily Stolfo

Event Page: http://www.reddotrubyconf.com/

Produced by Engineers.SG

Help us caption & translate this video!

http://amara.org/v/8HYO/

Red Dot Ruby Conference 2017

00:00:03.530 I don't know if it's just me but seems
00:00:09.290 like there are far fewer people here this morning because there were yesterday maybe has everybody had too
00:00:14.840 much fun last night hi my name is Emily stole pho I work for
00:00:19.850 MongoDB if you've ever used Ruby with MongoDB you've probably used a couple
00:00:26.539 lines of my code through the gems beasts on visa extension origin of
00:00:33.400 different versions of Kerberos if you ever feel like doing authentication using Kerberos I think
00:00:40.820 all the downloads I have an item or me just testing it missing environment so
00:00:46.550 I'm coming from Berlin but I come from New York I've been in Berlin now for three years and I've been working there
00:00:54.170 because originally I went there to work with the other person who built the Ruby driver and before I start I want to
00:01:01.219 thank the organizers for having me I've never been to Singapore before I arrived a week ago and I think I've sweat my
00:01:09.260 weight in water every single day it's kind of like Bikram yoga I feel really good at the end of the day really cleanse and I brought home a Yogi's for
00:01:17.330 the organizers but I also brought omiyage for you too thank you for coming
00:01:22.729 here at 10:00 a.m. on the second conference they brought so I can't bring
00:01:27.979 something for everybody but I brought these amazing accusal mustard from Germany and anybody's Germans
00:01:34.909 disqualified from this competition by the way so these are little mustards
00:01:40.130 from a little man who has a mustard shop next to my apartment and I've hidden
00:01:45.860 three Sandi Metz quotes in my talk and if you can email me with one of those
00:01:51.020 three quotes Emily at MongoDB I'll give you a little mustard
00:01:56.180 so the first three people basically to trick you into paying attention and also so they can feel someone for a quote
00:02:08.930 and they're all different types so okay so this talk is called refactoring
00:02:16.340 Humpty Dumpty back together again so because it's 10 a.m. and there's no
00:02:21.620 better time to talk about physics I'm going to start with the second law of thermodynamics specifically the second
00:02:29.720 law of thermodynamics accounts for the direction of natural processes we've all heard of this right ok no well good
00:02:37.880 thing I'm telling you about it the law says that it's highly unlikely though not impossible to restore our system to
00:02:44.570 a previous state it accounts for the asymmetry between past and future in
00:02:50.300 modern times this law is defined in terms of entropy we've all heard of entropy right yeah more so than the
00:02:57.230 second law of thermodynamics it's kind of abstract but it basically is the measure of the number of ways in which a
00:03:03.590 system can be arranged measuring entropy is taking to be the measure of disorder of a system the
00:03:09.470 higher the entropy the higher the disorder and usually it's depicted like this where it requires a certain amount
00:03:15.440 of work to take something that's in a high level of disorder and make it orderly or restore order so once upon a
00:03:24.050 time there was this egg named Humpty Dumpty and his story was told in this
00:03:29.750 nursery rhyme Humpty Dumpty sat on a wall Humpty Dumpty had a great fall all the king's horses and all the king's
00:03:35.959 women couldn't put Humpty Dumpty back together again has anybody heard of this but a lot of British people have heard
00:03:43.040 of this so this particular Nursery Rhyme is the most well-known Nursery Rhyme in
00:03:48.230 the English language and it references to it can be found in many works of literature and frequently in popular
00:03:53.269 culture I think there's a character in Shrek from some key dump scene one of the Shrek like number 15 movies that
00:03:59.900 they've had and but the first recorded version dates from the late 18th century
00:04:05.900 England like many traditional stories or poems it's pretty much impossible to pinpoint what the original version was
00:04:12.769 what Humpty Dumpty actually was or to take the poem litter
00:04:19.209 for example we have other versions of the poem this is the actual first recorded version published in 1797 but
00:04:26.750 we have no idea if this existed way before 1797 or people just learned how to write in 1797 Humpty Dumpty sat on a
00:04:34.340 wall Humpty Dumpty had a great fall forescore men and four score more couldn't make Humpty Dumpty what he was
00:04:39.650 before so Humpty Dumpty there are clearly many other versions throughout popular
00:04:46.910 culture throughout history but we can't ignore is that Humpty Dumpty is always the pictured as an egg despite the fact
00:04:54.410 that there's nothing indicating in the poem that he actually was an egg my favorite is that woman dressed as an egg
00:05:00.169 was really chic sitting on a wall in the corner it's likely the rhyme was
00:05:05.479 originally a riddle that could have exploited a well-known meaning of the term Humpty Dumpty at the time for
00:05:11.479 example the Oxford English Dictionary says that the term Humpty Dumpty refers to a drink of brandy boiled with air ale
00:05:17.000 and I don't know about you but when I drink my brandy boiled with air al something magical happens and it starts
00:05:22.970 seeing eggs perhaps the rhyme was equivalent to the 17th century's don't drink and drive
00:05:28.789 propaganda warned you about sitting on wall as a few drink but still why an egg perhaps it was meant to convey that
00:05:36.020 whatever it was that sat on that wall it was extremely fragile and virtually impossible to put back together so as I
00:05:44.180 said there have been many other series many other versions and what are the ones that I find kind of funny or absurd
00:05:50.060 is that was put forth by this scholar I don't know what his scholar of but I
00:05:57.020 guess he spent his time trying to figure out what Humpty Dumpty was in the 50s and he said that Humpty Dumpty was in
00:06:03.199 fact a tortoise siege engine which is this kind of machine battering ram that was invented by the Romans and used
00:06:09.530 unsuccessfully in the English Civil War in the sixteen hundreds and apparently
00:06:15.020 was used and the thing broke without breaking the thing it was trying to break and so they wrote a poem about it
00:06:20.240 I don't know about you that sounds really silly to me I think I think the
00:06:25.370 idea of an egg better this theory was eventually determined to be totally ridiculous but it
00:06:31.289 idea was incorporated into a children's opera called all the king's men so it just is true according to popular
00:06:36.719 culture as the other series so whichever
00:06:42.240 form Humpty Dumpty takes what can't be ignored is that he's a fragile guy he's actually become a sort of symbol for the
00:06:48.990 second law of thermodynamics Humpty Dumpty fell from the wall and subsequently ended up in pieces as we've
00:06:55.409 discussed the law says that it's highly unlikely so not impossible to restore him to his
00:07:00.779 exact state before the Falls and this is what the poem also emphasizes as we also
00:07:07.169 also discuss the second mouth thermodynamics modern definition is in terms of entropy the measure of the
00:07:12.809 number of ways in which an isolated system can be arranged specifically assuming for simplicity that each of the
00:07:18.629 microscopic configurations is equally probable entropy of the system is the
00:07:23.669 natural algorithm of a number of configurations multiplied by the Boltzmann constant KB this is theoretically how we can measure entropy
00:07:30.599 but nothing ever is like you can't have a system where all the arrangements are equally probable so this is highly
00:07:36.719 theoretical we can also find some examples of things that were broken and
00:07:43.050 that had been returned to their original States with help the Beauvais Cathedral which is located in Vevey France 60
00:07:49.319 kilometers north of Paris is a symbol of the ambition of gothic architects the pet project of a wealthy and disaffected
00:07:55.649 Bishop of nun tui the construction of the cathedral may have been partly intended as an act of defiance against
00:08:01.889 the French crown so basically the bishop was a punk and he wanted to prove that he was better and more powerful than the
00:08:07.529 crown by building this massive building and you'll see that it was a total disaster the whole project was extremely
00:08:13.199 unrealistic and the cathedral was never finished construction was started in 1225 and it was meant to be the greatest
00:08:19.740 church in the kingdom but centuries of construction were marked by structural problems and collapses if the nave which
00:08:25.979 is the main body of the church the church built cathedrals are normally shaped like a cross so they need is the main body and all that was actually
00:08:32.459 constructed as a tiny portion at the top like the head of the Cross so the nave has been constructed the plan for the
00:08:39.389 Cathedral were such that it would have been the tallest building of its time the foundations in order to
00:08:45.030 support this massive structure where in some places 10 meters deep even so in 1284 part of the choir class
00:08:51.990 just like the front of the cathedral that was actually constructed then the transepts actually don't know what part
00:08:57.270 of the cathedral that is I forgot to put it up this other part of the cathedral was started 150 years later and was
00:09:03.300 completed in 1548 then shortly afterwards aspire and half of the bell tower collapse on Ascension Day during a
00:09:09.000 service and apparently nobody was hurt in 1600 construction the knees so that main body of the cathedral began again
00:09:15.210 but only the first arch was erected and they gave up in the 1990s because this
00:09:20.940 became such a similar look into like the buildings that that exist today from
00:09:26.850 this time that were great and hearing feats by definition were great engineering feats because they're still
00:09:33.270 around today but this one's a look into how these projects can be started and fail because of ineptitude or over
00:09:40.770 ambitious people so in the 1990s like we really want to preserve this building and in 1990s it was determined to be so
00:09:48.410 immensely unstable because the pillars had been measured to have moved 30 centimeters and they wanted to do
00:09:55.740 something about it so this building could still stand so why is it so unstable why's it so weak and why was
00:10:01.980 this project so difficult to be realized the building is a perfect storm of poor architectural plans different architects
00:10:09.150 hacking on the same building no real ownership of the projects architects coming and going over the centuries
00:10:15.480 which by the way means they have much different styles and the fierce skål winds wind would call force winds that
00:10:22.440 come from the English Channel that are less than 100 miles away so basically the cathedral might as well have been
00:10:27.750 made out of paper mache it's on the World Monuments fund list of 100 most endangered sites but today the Cathedral
00:10:35.370 is more stable than it has ever been thanks to a team of researchers from Columbia University so what did they do
00:10:42.300 they did what you would expect someone to do who needs to repair a weak structure they study the structure so in
00:10:49.260 2001 a team of Columbia University from Communiversity went to Beauvais to acquire 3d range scans and imagery of
00:10:55.950 the Cathedral the goal was to create 3d model of the cathedral to assist historic preservation efforts including
00:11:03.079 structural analysis of the Cathedral so for 10 days they roamed around the Cathedral using instruments to record
00:11:09.629 digital images of its facade and interior by bouncing laser beams off its surface they returned to New York City
00:11:15.660 with 75 of these scans each one containing more than a million data points and remember this is 2001 so 16
00:11:22.800 years ago and at the time like we could probably do that with our iPhones now but at the time this was the largest
00:11:29.220 structure to ever be scanned with that yields are the most amount of data and this is a combination of all those scans
00:11:37.189 from the data that they collected so here's the flyover of the Cathedral this
00:11:42.629 is what the image that they were able to collect looks like and as you can see it's only a small portion of what the
00:11:48.990 original Cathedral was meant to be but the structure is really large and
00:11:54.329 complex and has a lot of cavity it's not just like a block you know like there's a lot going on in this Cathedral and
00:12:00.899 then this is the inside so I did my undergraduate education in art history in computer science and actually took
00:12:06.600 this professors class and he showed us this and I was like super excited because I was like this is why I'm doing
00:12:13.050 both of these fields you can do things like this and preserve cultural heritage and so um that just as an aside the the
00:12:22.470 reason this Cathedral was meant to be so large or like what motivated that was Gothic architecture part of its
00:12:28.980 principle was too especially with cathedrals was to elongate the structures they felt closer to God we
00:12:34.860 had this sense of being in this infinite space and so that's why the bishop was particularly hubris in doing this
00:12:41.339 because he was trying to bring himself too close to God he was flying too close to the Sun because of the model that the
00:12:48.509 team of researchers was able to create the support beams have been able to be installed in the right places restoring
00:12:54.569 stability physics exceed role and allowing visitors to appreciate the ambition and engineering of the graphic
00:12:59.790 builders 700 years ago and also for academics to study how this project was
00:13:05.399 started and failed so what do the Beauvais Cathedral and humpty-dumpty have in common
00:13:11.350 both were in need of being put back together for stability to be re-established so this system in
00:13:18.100 particular has been restored to better order in stability because as we said it's improbable not impossible
00:13:24.960 furthermore what if we are an interested in restoring the system to its original state what if we want to alter it
00:13:31.120 arranging the pieces to make it even better what is breaking something allows you to rearrange the pieces so that can be even
00:13:37.390 more structurally sound does this sound familiar to you well it certainly sounds
00:13:42.640 familiar to me because otherwise we wouldn't be doing this talk and it's something I've had to think a lot about
00:13:48.340 lately particularly with so recently I had to study the structure of
00:13:54.370 this project break it a little and then rearrange the pieces that was inherently stronger I'd even argue that I decide
00:14:01.630 the second law of thermodynamics and the entropy has been decreased in this system who would disagree that their
00:14:09.940 projects entropy increases over time so who thinks their projects entropy decreases over time with no work right
00:14:18.720 so I maintain active records replacement for using MongoDB with rails it's called
00:14:25.330 Mon goid it's actually 10 years old which is basically 700 years in Cathedral years the first version of mon
00:14:33.010 good version zero point 2 point 5 was released by whom someone who's now my colleague during Jordan it's the
00:14:40.870 original author and by the way on the original documentation site of Mon droit he said long guard was conceived one
00:14:48.940 late night in February in somewhere in Florida after five glasses of whiskey
00:14:56.380 and looks like pretty much to seeing looks like how Mon going was built just like someone on whiskey version I mean I
00:15:03.010 loved her and he's amazing but we're talking about during 10 years ago version zero point two point five was
00:15:08.890 released by Durham on October 1st 2009 version zero point two point six was released on October 1st 2009 version
00:15:17.020 0.8.1 was released on October 1st 2009 this sounds like any Cathedral Xena
00:15:23.420 the MongoDB server version at that time was less than one point 2.0 I actually don't know what version it was because
00:15:28.970 in our project matching project tracking tool the earliest version recorded is
00:15:35.450 post mongoloids first release and so for reference MongoDB server version the
00:15:40.550 current version is 3.4 so then get one point 2.0 it was still in this phase where we had this feature that it
00:15:46.190 dropped your data anyway monka continued to be developed by
00:15:51.710 Durand and also by the way I'm on gonna be doesn't jacquard data I don't know if you've like or anything in the last five years but we've solved that problem
00:15:59.380 anyway Mondrian continually developed by Duren who is working at Sound Cloud in Berlin in his free time it was a true
00:16:05.390 open source project for many years and that many people contributed many pull requests are open and merged many
00:16:11.270 discussions were had in the github issues list many people solve approximate problems but nobody has a
00:16:17.060 big picture it was built when the Hmong observer was quite simple compared to what it is now there weren't many features or even
00:16:23.720 replica sets at the time so the history
00:16:29.690 of this this project and the complexity of the ecosystem built around and
00:16:35.810 how it fit into rails and how to use the driver is really complex and it might sound familiar to you if you're working
00:16:41.570 on open source the first version of Mon right so like following along with his diagram anything gray is not developed
00:16:47.990 by among way to be Inc the company that I work for anything in color is so the first versions of I'd use among me
00:16:55.370 to be inks Ruby driver the 1x series this is the driver that I was hired to work on five years ago at the time I
00:17:01.370 joined the company during had just built his own driver called moped because the official monami driver hadn't developed some features who was hoping to have you
00:17:08.120 know sent back and forth and some friction so he was because the server was kind of simple at that time he was like okay I'm just going to build my own
00:17:13.370 driver so I don't need to like have this extra level of of diplomacy to to get
00:17:23.240 changes to move forward with ma annoyed so at that time amongst the the Ruby
00:17:30.560 offering if you're using MongoDB with rails was entirely was developed entirely outside
00:17:36.419 of MongoDB Inc and wasn't developed by anybody who's actually paid money to do
00:17:42.870 it so at MongoDB at the time we knew how important monrad was to the Ruby community basically if anybody wanted to
00:17:48.960 use moong would be with rails went through Mon going and basically anybody wanting to program in Ruby was
00:17:54.299 unfortunately as someone said yesterday using rails so by the transitive property anybody wanting to use mom mom
00:18:01.200 gonna be what Ruby with MongoDB would have to go through any code that wasn't actually developed by the company that
00:18:07.230 make sense the company was growing as were the features of MongoDB and the sophistication opacity of behavior so
00:18:15.330 it's really difficult for someone in the open-source community to keep up with what the server was doing because they didn't have that insight then I have
00:18:23.190 written for the company where they're like inside our knowledge where you know what the row vamp is you know what the
00:18:28.289 internal issues are what the priorities are you can walk over to our server engineer's desk and ask them about something specifically because MongoDB
00:18:35.820 has a lot of quirks a lot of between server versions the implementation of certain features can differ wildly so
00:18:44.010 sure enough at that time on Boyd's issue a screw and the projects are to lose traction and Trust in the community
00:18:49.830 because it just couldn't react fast enough in 2014
00:18:55.139 the one next driver need to rewrite and so we started a great opportunity to approach turn and say hey do you want to
00:19:00.330 come work at MongoDB we can build a new driver and then we can't mon going to use that new driver a number taking offset burden from your side because we
00:19:08.220 can maintain the driver so he was up for it in 2014 he joined us and he and I
00:19:14.490 work together to build a new driver which is kind of my way to get to Berlin and it was the gem version 2.0 and
00:19:21.779 then in doing that we were able to we decided that we bring Eden in
00:19:27.240 house as well so then became an official project since then Turin has moved on to work on another team at
00:19:33.510 MongoDB competence if you're familiar with our products there it's a GUI for navigating your data in collections and
00:19:40.370 I've taken over and the driver and just a little aside just and like show
00:19:45.690 you how this is actually simplified version of the story there's also a gem called origin which is the DSL query
00:19:52.950 language for querying MongoDB that was a separate gem but in versions 6 oh I
00:19:58.830 brought it into the codebase because I realized not a lot of people were using it independently so that's
00:20:04.350 super-complicated also so like for example if I need to fix a bug in language 6 I can do it in Mongoloids
00:20:10.470 codebase and then if I want it back porn it I have to go and release a separate version of origins so now that
00:20:18.780 and the driver are back together again they're getting along quite well except for the occasional bickering over who
00:20:23.970 does the dishes the work is done the relationships going well but a lot of
00:20:30.210 baggage has been brought back into the relationship by Mon droit so at first I
00:20:35.340 was excited about all this everything seemed so clean and centralized and I was excited to start working on one
00:20:41.100 going in the driver and that turn would be moving on to another team so I'd have more responsibility but I quickly
00:20:46.559 realized that I inherited a ton of work namely there were 100 99 problems and
00:20:52.440 they were all issues we imported the github issues list from Hanoi for Hmong going into JIRA it was a
00:20:59.010 disaster I almost had a heart attack there are a ton of issues and I didn't think I would ever get through them I
00:21:04.500 think there actually were 199 a lot was broken the project with some cases the
00:21:09.630 community was fragmented how could I bring the project back into good standing with its users restore trust
00:21:15.179 and communication how could restore its structure and reduce entropy hopefully restoring entropy to its original state
00:21:22.020 was it possible to make an even better than it was before so I did what
00:21:27.059 the king's men and women tried to do for Humpty Dumpty I did what friends the world's monuments fund the Columbia computer science team tried to do I
00:21:33.809 study the structure identify the pieces the weaknesses and I tried and I kind of
00:21:39.600 succeeded to put mine going back together again so how did I do this I'm going to spend
00:21:46.559 a little bit of time talking to you about how you can take an existing project because I I'm sure you all have them who are in dire need of a refresh
00:21:54.030 and put them back together there are many presentations and books on how to refactor the problem is solved
00:22:00.419 and no need to reinvent the wheel or retell you a lot of the things that you can just look up or watch other
00:22:05.480 presentations on every type type of code smell is identified and recipes are
00:22:10.610 given for refactoring the definitions can be overwhelming but who can really apply them perfectly like I read the
00:22:16.520 definitions too and I would like identify some of those things in my code basis they couldn't kind of like this
00:22:21.760 equation on the second law of thermodynamics it's a guide for how to understand the concept but it can't actually be applied in practice so I'm
00:22:29.450 going to tell you much more human story of how a refractor and and put it back together again because it's a very real project with
00:22:35.450 very real problems I'm going to share with you some tricks and things that I did that I applied to my process we look
00:22:43.910 at how I studied the structure then we'll talk about refactoring and there's definitely a way to refactor and many
00:22:49.610 better ways to refactor finally we'll talk about how to avoid landing a project slipped in into this date in the
00:22:55.040 future so regardless of whether you're an open source project maintainer I think you'll find that a lot of what I'm
00:23:00.980 about to say can be applied to your own projects raw maintainer x' of some legacy codebase some pre-existing
00:23:06.260 project I bet you agree that the entropy and disorder of your zero code base increases over time but I do think that
00:23:14.300 we can pause repair and restructure our code bases to actually be stronger than they were before we started again the
00:23:20.930 second law of thermodynamics is it's improbable but not impossible to restore system to its original state
00:23:26.120 we're engineers and we put our minds to something we can make it happen so one great structural analysis I spent
00:23:33.170 a while dressing bugs in mon guide one-by-one going through those 199 problems because I didn't have a good
00:23:38.600 sense of how everything worked and at a time that turns Altman goid mentor program was really popular so he did
00:23:44.900 things like yeah I don't wanted fascist sorry we can talk about that later I
00:23:52.040 mean as I said during its ten years ago so I glad he doesn't really watch
00:23:57.140 conference talks that much but I knew in the back of my mind I had to build up a familiarity with the structure of the
00:24:03.470 code base so I took notes in the code in a notebook like literally with a pencil on how everything worked together
00:24:09.200 I drew diagrams like an architect I step through the code with pride and wrote down the call stack
00:24:14.419 as I said before many solutions were applied that approximately solve problems but because not many people had
00:24:20.480 the full picture so like typical case obvious cases of pull requests fixing something very specific it's really
00:24:26.659 important to have a mental model of how a code-based works in order to make high-quality changes luckily Dern also
00:24:32.389 had my back in this case as I said he was still at the company so me trying to figure out why something was changed
00:24:38.720 with it wasn't good enough to look at get blame I could look at get blame and say like hey Gen why did you do this and
00:24:44.480 who'd give me this like whole story and luckily had a good memory and a lot of stories and so that was I recognized
00:24:51.080 that was something that not everybody has like that resource but I was also really good for helping me understand
00:24:56.779 the history of this project so the like one thing that that made this reflection
00:25:03.379 team possible to me was grouping your issues into category my issues into
00:25:08.690 categories if you categorize the issues you can see where the hot spots are and
00:25:13.850 focus on them when rebuilding repairing the structure so 3d models of a cathedral was necessary for the same
00:25:19.460 exact reason in particular with I realized most of our issues had to do with the behavior of related objects so
00:25:25.070 I created an epoch in JIRA to track all those issues related to relations bugs and so when I say relations object it's
00:25:31.519 when you define a model and you say like a book has one off two author there's a
00:25:37.070 macro that runs and it creates this object called relation and it saves it as into this global variable on the book
00:25:43.489 class and that object itself is what caused a lot of problems and I tried to
00:25:49.279 cluster and categorize my issues around that one thing so that when I focus on refactoring it I knew what its needs
00:25:56.359 were stepping through major code paths and taking notes that's really important
00:26:01.879 also choose chrome pads that you don't understand and step through them with pry I know they're scary and it's really really tedious but it's really helpful
00:26:08.749 to do that and as I said there's a lot of meta programming so that means it really opaque really difficult but I
00:26:14.330 took notes in the code with with comments as well if something was for example an attribute accessor in one
00:26:19.340 file language and it's structure is made up of behaviors and different modules so
00:26:24.889 there were like a lot of different files that define a lot of different things about this one document class and so I would I
00:26:30.530 pepper the codebase with a lot of notes so so if I was following code Beth and
00:26:35.840 then I saw a variable I would say like this is defined in X module and that really helped me to understand the shape
00:26:41.360 of the code base and then lastly draw
00:26:46.460 diagrams yourself like literally with a pencil like an architect it was really helpful to do this as well and seeing
00:26:52.130 the structure visually helps you I mean again by coming back to art history it sounds like a sculpture you really like
00:26:58.460 there is a shape to your codebase and you want to understand it so after I did all of that
00:27:03.860 what did identified was a weakness so I'm going to give you a concrete example like making that relations issue and I
00:27:10.070 built that epoch around like more concrete so you can follow along with it and see how I focused on one element of
00:27:16.700 the codebase that was the weakest and I spent on which I spent them with some refactoring after IRA factored this one
00:27:23.180 thing I was able to close about 40 issues which at the time that I was doing this was 50% of our issues so I
00:27:29.000 was really happy about that identified that we had one object that contained all the information about the
00:27:34.430 relationship between two models in I'd it was called metadata and it was
00:27:40.190 inherited from a hash so essentially was a hash it is basically like the laziest class you could ever have because it's
00:27:46.220 just keys and values with no specific logic or behavior so like a nightmare it
00:27:51.560 was an object created when the model was loaded so like when you write that actual release relation in the model
00:27:57.200 class it would create this metadata class which was just a hash so like
00:28:02.690 writing books has one author would use a macro to create this metadata object sticking onto the book class and that's
00:28:07.850 what it used throughout all of the code to determine what behavior an instance of a book should have or even the class
00:28:14.690 itself if you're querying or whatever so in code small terms this is a classic
00:28:20.210 bloated smell this class knew and did way too much I'm sure there are tons of other code small terms you can apply to this as well so this is a mandated class
00:28:28.970 definition does anybody notice something alarming about this comment the Grand
00:28:34.220 Poobah of information about any relation in this class it contains everything you could ever possibly want to know
00:28:40.460 and by the way possibly was spelled wrong which goes back to what I was saying about this being a whiskey
00:28:46.790 project port niranda as you can see um
00:28:52.820 it was basically like an Eightball like you just asking anything and can give you the answer and it's totally random
00:28:59.440 writing simple code is important but let's define simplicity simplicity mean
00:29:04.640 that we should have the least number of classes we should doesn't mean that we should savor one basic object over
00:29:10.160 multiple smaller different objects is having one metadata object saving all
00:29:15.260 information about every type of relation the simplest and thus best design design decisions I understand do involve
00:29:21.680 trade-offs we are so frequently chance DRO I don't repeat yourself but sometimes we need to introduce a little
00:29:28.010 bit of duplication in order to have a simpler design preferred duplication over the wrong abstraction I'll give
00:29:34.940 some examples of how the metadata object was used so you'll see how it became obvious what rearranging had to be done
00:29:41.090 even without understanding anything about I think you'll you'll recognize what patterns should that kind
00:29:48.020 of come out of this code and what needed to be done to restore structural stability to Mongoloids codebase but
00:29:55.160 before I do that I just want to say briefly that ed has two main types of relations because MongoDB is the
00:30:01.550 document database it has reference relations which is what you would recognize from active record
00:30:06.620 so it's straight up like reference relations or IDs foreign keys saved on
00:30:11.630 objects peasant belongs to many is has money through but there's no join table because ma going to be has a feel that
00:30:17.600 can be an array so it's just saving a raise of the related objects on either end and it's kept in sync and then
00:30:23.600 embedded which is a pretty self-explanatory you can have embedded documents among going to be so you have these types which implements that
00:30:30.430 relationship between embedded and parent documents so this is one example
00:30:35.570 instance method it's called the term in foreign key and actually what I was reviewing my slides this morning I
00:30:41.200 didn't even see the first line says determine the value for the relations foreign key performance improvement what
00:30:47.000 I don't that something would have to add some of that at first for the life of me
00:30:52.490 I could not understand going on basically it's a know off it's a relation is embedded but embedded
00:30:58.879 relations don't save foreign keys because they're embedded they don't need them so like why would it return a foreign key in its options why would
00:31:04.999 even allow an option of a foreign key if it was an embedded relation um yeah so
00:31:12.499 the this doesn't count as the Sandi Metz quote by the way but Sandi Metz says this thing where like if you squinted the code you can kind of see the
00:31:18.529 structure and the shape will come out at you and so I kept sprinting my eyes obsessed thinking like maybe something
00:31:23.720 would come out of it but I didn't really see much else besides what was there but
00:31:28.789 I did notice that basically there are a couple of things like when you're doing a lot of refactoring you get pretty good at recognizing these hidden patterns and
00:31:35.299 so when I came to this one thing I noticed was in my refactoring mindset is
00:31:40.639 that like first of all before and key option is said it's returned nothing
00:31:46.249 else is done if the object is embedded it returns nil and it should probably be before it checks if there's an option of
00:31:53.599 a foreign key set and the last thing is the relation object knows something the meta data object doesn't so it uses the
00:31:59.840 metadata data or the meta metadata metadata but and also the other thing
00:32:06.559 which I want to add to this list is that this metadata object is supposed to be
00:32:12.409 the relation but there's also a relation object saved on the metadata class so why aren't so it's conflated like so
00:32:18.679 there were actually these relation objects but had different behavior but I didn't see why we needed to have this
00:32:24.139 metadata class that we could just have these objects that have their own behavior so here's another method just
00:32:30.769 to give you a sense of how sticky this code base was it's used to get the names
00:32:36.109 of an inverse relation given a certain relation the first method checks if the type is polymorphic I know it's kind of long but
00:32:43.369 its itself polymorphic lookup inverses and then otherwise it determines in warth inverses and and when I looked at
00:32:50.330 those two methods they had a lot of overlap in logic so I it was really difficult to determine what logic should
00:32:57.349 be extracted if some of the checks were repeated after they've been branched
00:33:03.320 but basically from the point of showing you this is to show you that it's pretty clear that the metadata object was
00:33:09.110 begging to be refactored into smaller object oriented objects the entropy or disorder of the system was way too high
00:33:15.559 any bugs having to do with this code were virtually impossible to fix and the structure was weak and what's obvious
00:33:23.179 need was for there to be a reference and embedded namespace with objects that knew that they were referenced or embed
00:33:28.250 and had their own behavior so embarked on a journey to refactor the metadata object into different objects
00:33:35.059 under the namespace reference and embedded how did I do this did I do it all at once did I read a lot of books and learn
00:33:43.880 about how to do this perfectly and then apply those practices so I had a couple of false starts I bought martin fowler's
00:33:49.610 or factoring book but I honestly don't really get through much of it kind of wanted to learn on I was doing it I
00:33:56.390 talked to my manager a lot had some nervous breakdowns but I learned there are a lot of wrong ways to reflector and
00:34:02.330 a couple of right ways are better ways so it's really important to do proper
00:34:08.690 refactoring not random factoring I like to think of in terms of the health of a
00:34:14.570 project this is something very similar to the way there's something very similar between the way every factor and
00:34:20.540 work on my code bases and how I design my weekly exerciser team I always ask myself when making changes to a codebase
00:34:27.109 is this a healthy change is this a quick fix like a piece of candy or a bag of chips that has a short-term payoff like
00:34:32.960 it's really yummy right now but I know in the long term this probably isn't good for me we all have to reflect her at some point
00:34:39.830 it's really important to have a plan designed for what you're going to do refactoring should require the same
00:34:45.409 effort and process that you apply to building something from scratch I think sometimes we forget that just plowing
00:34:52.010 through and trying to fix everything you can along the way it's definitely one of the wrong ways to refractor so at this
00:34:58.250 stage in the repair of I had done the structural analysis and identify the weaknesses the next steps was to
00:35:03.380 reflector with a plan over the course of my resetting of I learned a lot
00:35:09.950 and I'm going to share some of the highlighted steps with you again I'm not going to go through like recipes or series because you can read about that
00:35:15.500 and it's thing we've talked about a lot and because it's something we do a lot but I'm going to show you a couple of things
00:35:20.840 that I did using this metadata object as an example because I think it's like the classic case of something begging to be
00:35:26.540 refactored as I said we can yeah read
00:35:32.210 about this but I also watched a lot of talks along the way for guidance I'm not
00:35:37.520 saying that you should just dispose of all of the theory I think it's really good to know it but when it like shares
00:35:42.770 something that's kind of not really something things that I read or heard about so the things that I learned were
00:35:51.860 one to perfect her one piece at a time use tests at every step and don't fix
00:35:59.030 bugs this one was really important so Martin Fowler we know this we can probably recite it in our sleep
00:36:05.000 defines or factoring as the process of changing a software system but in such a way that does not alter the external
00:36:10.730 behavior of the code yet improves the internal structure so if we rearrange the system so the external behavior
00:36:16.760 doesn't change it doesn't matter if we rearranged one corner of the system and then another corner of the system and do
00:36:22.280 it piece by piece because the outside behavior is not going to change so what
00:36:27.320 I did was first define a namespace called reference and create a class called belongs to which seems like a
00:36:33.620 really obvious way to refactor this I returned this object when a model was defined as a belongs to relationships
00:36:39.890 and make sure all the tests pass before moving on to create another object the largest benefit of rearranging the
00:36:45.410 system piece-by-piece is you can test out different designs and not waste too much time overhauling everything only to realize
00:36:51.710 your new design doesn't work agile principles aren't only for building new
00:36:57.620 things so as I said you should apply the same practices to refactoring as you do to building something from scratch
00:37:02.990 I ate it over my reflector design I tried out different hierarchies I tried creating classes for things like a
00:37:08.930 builder so a builder is something like if you have a book and you build author there's a builder thing MIT did that for
00:37:14.780 you if you bind something it would be book author equals another author and so those things are objects originally and
00:37:21.320 I tried out having them be objects but I thought it would be much better if there were modules because their behavior and
00:37:27.860 things that do something once like there's no to save instance variables on that builder because it's created as a side
00:37:36.020 effect or byproduct of building an object or binding it to another one
00:37:42.430 secondly before you begin to refactor make sure you have a solid suite of tests tests are the wallet or back
00:37:49.030 reflector your tests simultaneously with your code I can't emphasize this enough you want people to figure out what went
00:37:55.460 wrong with your design if you do all of your refactoring then run the test and realize they're not passing so this is
00:38:00.770 just an example of that same look up inverses that as I showed before I
00:38:06.110 decided that I wanted to I wanted each object to know what their compliments were and so this is just one example
00:38:11.930 like you can you have to port your existing test so you also have to write new tests and this was a new test I had
00:38:18.230 to write because when I reflector each relation object had to be able to ask
00:38:23.420 another relation if it was a compliment of itself so that's the test that I wrote for that so it's really important
00:38:29.870 to add those tests in as well and the last thing is don't fix bugs so along
00:38:35.120 the way I would I I had never I was really familiar with this list of bugs in JIRA and I when I
00:38:42.290 was reflecting the code I would sometimes find the places or the sources of these bugs and I would I was really
00:38:47.750 excited to find these places and I really wanted to fix them but it had to be really self discipline about not
00:38:52.910 fixing them and saving them for later so this is just one example when you have a list of embedded documents and a parent
00:38:58.880 document I'd would allow you to append that same document with the same ID onto that list and it's a pretty
00:39:06.080 simple bug nothing really that exciting but when I was working on this code this is the the binder object and it would
00:39:15.140 allow you to append that that embedded document twice to a list and I when I was refactoring about this line I was
00:39:20.840 like wow that's where this is happening but I was like I'm not going to do it I'm going to do it later but I would
00:39:25.880 note it down in the JIRA ticket like where to go to fix it and also this idea
00:39:32.600 became crystallized for me by my manager who's he maintains a Java driver and so
00:39:38.030 he has a different way of thinking than I do he's he has this like really booming
00:39:43.230 godlike voice that makes anything he say sound extremely significant but it is
00:39:49.920 significant and in this case we have these one-on-ones every other week and one time he knew how much work I was
00:39:57.480 putting into referring this project and he also knew how many bugs I had to get through in the in the bug list and so
00:40:04.440 this one time he was like so Emily you're fixing you're doing all this for
00:40:09.480 factoring I recognize it's a lot of work are you fixing bugs as well and I was like and I got really defensive I was
00:40:15.480 like no I'm not fixing bugs like I don't have time for it I really want to finish the reflection before I do that he was like good you should never fix bugs
00:40:22.140 while you're refactoring so it's like okay test Pettis you like to ask these
00:40:28.380 like trick questions and make him think really deeply and like he's amazing and great so the crucial pork point was
00:40:34.980 perhaps the one that took the most self-discipline as I said I I really had
00:40:41.550 to like tie my hands behind my back when I was doing a lot of it so the last
00:40:47.369 thing that was part of this restoration is not to is to not always discuss in
00:40:53.520 it's not always discussing books or refactoring presentations nor is it specific to an open-source project it's
00:40:59.820 not of responsible maintenance and restoring user trust so this is kind of
00:41:05.160 the same idea of like sustainable farming or responsible farming where farmers don't use chemicals or things
00:41:12.570 that harm the environment for short-term benefit and financial payoff like like
00:41:18.630 red or tomatoes that are at the expense of the soil and the long-term the
00:41:24.630 longevity of Sur land so it's kind of same idea with your project you want to make sure that everything you do is with
00:41:30.750 the long-term and the health in mind so it's just like starting a new exercise
00:41:36.030 or eating regime improving I didn't mean applying this quick fix like running for a couple days and then like
00:41:42.530 calling that my exercise regime and stopping for a while I had to establish healthy habits going forward for the
00:41:49.410 codebase this meant properly categorizing issues as they were open in the deer project it
00:41:54.570 meant responding to users right away in order to get most relevant information even if I can fix the issue right away so that I could
00:42:01.260 reproduce it because that's part of the problem with all these issues that were imported into JIRA from github there's a
00:42:06.330 lot of people like kink even code anymore it's like going back to 2014 2013 so I knew these problems existed
00:42:12.300 but I didn't really know how they got themselves into said hole and so release
00:42:19.200 notes documentation basically any interface in the community had to be kept up to date so that people knew that was alive so for example I
00:42:28.290 made sure our API documents the docs for links for a main documentation because a lot of people also were getting confused
00:42:35.490 with it's old documentation which was still around had to make sure the documentation was really specialized and
00:42:41.310 obvious I release new versions regularly I make sure that I'm always in step with
00:42:46.650 rails and I respond to relative movement in our march forward
00:42:53.460 I follow semantic versioning closely I make sure to tweet and I send out
00:42:59.610 announcements on google routes so that people know the project is always moving forward and that they can trust that
00:43:04.650 ID is active that there's someone working on it and that's alive as I said
00:43:09.840 and the benefit of having worked on something working on something that was
00:43:15.380 quiet and passive for so long is that people don't realize I'm paid to work on
00:43:21.360 language and so when I respond to them right away they're like so happy to get a response and like like thanks so much
00:43:27.450 for work and when I compared to do this no but it's great like I think people I
00:43:32.580 can tell people are really happy that finally like the project is treated like it's that it it's being responsibly
00:43:40.500 maintained so after all this work had I succeeded in reducing the entropy of the
00:43:46.440 project how did it compare to the entropy of the project before these changes entropy is pretty abstract
00:43:53.070 concept as we've seen and measuring it in the context of code bases seems even more intangible as we heard in the
00:43:58.830 beginning of the presentation entropy is measured in terms of the number of ways in which a system can be arranged we
00:44:04.620 can't quite measure this in a code base but I did however need to prove somehow to myself my community
00:44:09.650 my manager that my time spent was time well spent but as your entropy in these
00:44:15.109 three ways and also other ways I think about this constantly to help myself like make sure that is always kept
00:44:23.119 at a stable state so how difficult is it to make changes when you have a bug sure
00:44:28.490 it's difficult to find the source of the bug but how difficult is it to actually fix that bug you have to fix it in five
00:44:33.619 different files do you fix it in one place and then run the test and cross your fingers and if it passes and move
00:44:38.869 on do you understand the structure of the codebase enough to be confident that one fix is the only place you need to make
00:44:46.609 that fix and then the other thing is can you explain the design so this was something I had to do again involving my
00:44:52.970 manager he we were looking for someone from internally for MongoDB join the
00:44:58.849 Ruby team and someone who's a little bit newer to coding into the company and so
00:45:04.700 he said to me he was like in preparation for talking to this person he said like why don't you write down everything you need to know everything you know working
00:45:12.770 on this project in something and everything someone should know joining this project and so I was like it's a
00:45:18.950 lot but okay I met with the the guy who is going to join the team in a Senate about an hour just explaining the
00:45:25.930 complexity of the gem dependencies and what projects I was maintaining and what
00:45:31.160 had where the tentacles were between the gems and and I so I did that and I was
00:45:38.869 like I don't know if I can write down everything I know about working on Ruby but but explaining them ongoing codebase
00:45:45.680 I can definitely do that now and I couldn't do it before because I I both didn't really have that mental model and
00:45:50.839 I also didn't think there was much structure and then the last thing is how is performance
00:45:56.390 slow performance is another indicator of high entropy in your system the Messier and more inefficient code paths the
00:46:02.809 worst performance will be previously doing this refactor our test suite would take between three and four
00:46:08.150 minutes after this refactor taking hate so I was freaked out for a little bit
00:46:13.490 and I was like wow that was a lot of work for nothing but it's because I introduced a thousand new tests and they
00:46:19.369 all dealt with creating classes and creating relationships which is actually quite code heavy and involves creating classes
00:46:27.079 so it was understandable the test suite got slower but it made me realized that I needed to do some benchmarking and
00:46:32.809 luckily we had a pretty rigorous benchmarking sweep and I was able to use to confirm that actually made the
00:46:38.420 performance a little increase performance slightly so that's really important also like sure you can do this
00:46:44.839 refactoring but like make sure you always benchmark before and after so in
00:46:49.969 the end I'm able to say with confidence that I reduced entropy among good and I'm particularly happy that allow other
00:46:55.459 engineers easily join the Ruby project and potentially open poll requests and makes the codebase less opaque this work
00:47:02.839 has also shifted my perspective and I think differently about my projects so again my manager likes athletes with
00:47:08.839 trick questions and or impart wisdom on me and about a month and a half ago
00:47:14.239 again in our one-on-one meeting he was like um he so I come into our meetings
00:47:21.199 like with the things that I've been working on to tell him an update he was from the Java driver so you doesn't really know he doesn't really track like
00:47:27.799 every commits I do in every JIRA ticket update that I do so he comes into our
00:47:33.589 meeting he's like so Emily what did you do today to make Mon boy better and I
00:47:39.259 was like I have to my list any of these make my grade better so now I always think about that like I go into my
00:47:44.539 office in the morning and I ask myself like what am I going to do today even if it's just a little thing to make mungo
00:47:49.880 better and then when I leave I'm like anything to make one way better so I encourage you to ask yourselves when you
00:47:55.999 go to work on Monday before you start coding what am I going to do today to make my codebase better so on that note
00:48:07.329 I'm going to remind you for people of all people came in late you might have
00:48:13.880 disqualified yourselves or at least made it so you can't email me but email me with any Mets coast to get mustard and
00:48:20.959 the other thing is if I could hijack part of my question session and ask you
00:48:29.869 a question um can everybody take out their phones you got your phone and put it in the air
00:48:38.220 and turn it on and sing along one no just kidding
00:48:43.650 can you can you open your email clients on your phone and write Emily at MongoDB
00:48:53.760 is the receiver you don't actually have to do this if you don't want to but so
00:48:59.820 right I'm doing this is also so that I know how long it takes to okay so right knee is the receiver and then write to
00:49:09.150 me one sentence that like one sentence that thumbs up if you use MongoDB and
00:49:15.660 you no longer use MongoDB why if you use MongoDB and you still use MongoDB why
00:49:21.270 and if you've never used MongoDB and you don't want to use mom going to be why just one sentence and that would help me
00:49:28.470 so much because so on this note of making I'd better I have been
00:49:34.860 tasked now that it's kind of it's easier to fix bugs and it's a healthier project and other people can work on it with me
00:49:41.600 it's not so much of a black box anymore I I know that in the Ruby community
00:49:47.160 MongoDB is not super popular it's not the default database and I attribute it
00:49:52.590 largely to the mindset that we're all in when we write web up that is created for
00:49:59.100 us or at least imparted on us or talk to us by using rails we all think
00:50:04.590 relationally for the most part and MongoDB challenges that and makes it kind of confusing because it's kind of
00:50:11.580 relational in some ways and kind of not and so I know that it's there's a learning curve and am ongoing has always
00:50:18.780 followed for the last 10 years this philosophy of following exactly what active record does to reduce the
00:50:25.680 friction and the the learning curve if you're going from a relational database
00:50:31.020 to MongoDB but I'm now I'm pretty torn because I think that I can tell from all
00:50:38.280 of the issues that a log and the way people ask me questions that having that be the philosophy of makes it so
00:50:44.820 that people don't really learn MongoDB and how do you sit properly and so they end up building relational schemas with MongoDB which is
00:50:52.950 not always the best solution and you end up doing many more requests to the database because we don't have joins when you want to do and that you
00:50:59.760 wouldn't do if you had a post-grad second for example is for example so that's that's one philosophy that's what
00:51:06.270 it's always been doing and I know that the switch to MongoDB is not that bad and if we follow this path but now
00:51:13.230 there's this other path where I think that maybe I can either build a new audience for rails or for Rudy or adapt
00:51:21.900 Aman going to make it more modular so that you you're not imposed you're not
00:51:27.830 being put into this relational mindset automatically so I don't know if I should make it like totally its own
00:51:33.990 thing and have an recognize that's a
00:51:39.420 learning curve but then be higher because like it's a trade-off like I might have fewer users in that way because the path of entry is much harder
00:51:47.160 so I I can't determine that just from because by definition the only people I
00:51:53.370 hear about our people using so I don't really know why other people who aren't looking at MongoDB as a perfectly
00:52:01.200 acceptable option of database for rails why they're not using it I can determine
00:52:07.020 that without you all telling me why so please email me and I have a thick skin
00:52:12.240 like I can handle it you can tell be as brutal as you want I wouldn't be working
00:52:18.060 for MongoDB or sending on the stage if I took everything personally for working
00:52:23.580 for money for five years remember what everybody said of long going to be five years ago so please be honest with me
00:52:29.460 it'll help me a lot and will help me make one good better that's my question for you I don't know if we have time I
00:52:35.640 wasn't really paying attention to time for questions yeah so we're running a bit late oh okay
00:52:41.160 we can still take some questions if anyone has them although I'm happy to take questions later I know um if I'm
00:52:49.980 know how humans Minds work you guys are doing the single thing where you're replying to the emails right now so I
00:52:55.530 don't know if you can think of any questions but you have an email address so the questions can go by the email
00:53:02.140 yes gonna do the question of blame I think yeah okay so thank you very much
Explore all talks recorded at Red Dot Ruby Conference 2017
+12