Indexing
Voila, Indexes! A Look at Some Simple Preventative Magick

Summarized using AI

Voila, Indexes! A Look at Some Simple Preventative Magick

Jamis Buck • May 20, 2015 • Atlanta, GA

In the talk titled "Voila, Indexes! A Look at Some Simple Preventative Magick" presented by Jamis Buck at RailsConf 2015, the importance of database indexing in schema design is examined through a whimsical story involving a wizard named Basil and his manservant, Fabian. The core theme emphasizes the pitfalls of neglecting indexes in database design and practical strategies to address this oversight.

Key points discussed include:
- Schema Design Errors: Basil, excited about his new library, hastily designs a database schema without including indexes, resulting in significant performance issues as the number of scrolls rapidly increases.
- Performance Implications: As the scrolls accumulate, the absence of indexes leads to inefficient queries, effectively forcing Basil and Fabian to sift through every record, thereby degrading the library's performance and causing patron dissatisfaction.
- Indexing Solutions: The narrative highlights the critical need for indexes to facilitate quick access to data. Basil learns that appropriate indexing can drastically improve query performance, which he initially overlooked.
- The Role of Foreign Keys: Buck introduces the concept of foreign keys as a foundation for creating effective indexes. These keys help establish relationships between tables, providing a systematic approach to indexing based on how data is accessed.
- Avoiding Over-Indexing: The story further explains that while it's necessary to index important fields, over-indexing can unnecessarily burden the database, potentially leading to more issues.
- Using EXPLAIN Command: The talk concludes by introducing the EXPLAIN command, which allows developers to analyze query performance and verify whether indexes are utilized effectively in database queries.

Significant examples, such as Basil's development and subsequent destruction of his library due to indexing mistakes, reinforce the importance of thoughtful database schema design. Buck encourages attendees to implement indexing as soon as possible to avoid high costs and operational delays later on.

The video serves as an engaging yet educational reminder for developers to prioritize indexing in their database applications, illuminating how often overlooked practices can lead to disasters and inefficiencies.

Voila, Indexes! A Look at Some Simple Preventative Magick
Jamis Buck • May 20, 2015 • Atlanta, GA

By Jamis Buck
A gentleman wizard and his sarcastic manservant examine a common anti-pattern in schema design, in which indexes are “left for later”. The pitfalls and dangers of this approach are set forth. Right incantations (which is to say, scenarios and sample code) for battling this devious tendency will be presented, with all magic (that is, “buzz”) words thoroughly demystified and clearly explained. Walk away with a new understanding of why your application tables deserve indexes from day one, and how to make sure you’ve got them covered.

RailsConf 2015

00:00:12.160 I'm going to tell you a story today about a wizard named Basil Smok Whitener. He styles himself a gentleman as well, but his manservant kind of takes issue sometimes. A little quirky, this is Basil on the left. Ignore the similarity to Mark Twain; the artist has issues. That's Fabian there on the left. They're currently vacationing in the Sahara, trekking across the beautiful, rippling golden sands or dune after dune. Basil's having a blast, but Fabian's not so happy about being the one to carry the luggage. But when you're the manservant, that's kind of how it works out. They're doing all the touristy things. They're there to see Basil's old school friend, Ptolemy, and they’re spending some time doing the stuff tourists do.
00:01:00.160 They visit the pyramids and get their portraits painted. They've visited all the good restaurants and sampled the local cuisine. They’ve explored all the flea markets and found all the touristy trinkets. Somewhere along the way, Basil discovers an old brass oil lamp, you know the type—super tarnished, really ugly, battered and beat up. But you buff it up a little bit, and it brings out a nice shine. Not only that, it brings out a nice genie.
00:01:13.600 Now that’s a little bit cliché, but a cliché that offers you three wishes; like, who's going to turn that down, right? Basil certainly wasn't. But when you're a wizard, you've already got a lot of power. What is a wizard going to wish for that he can't already conjure? So, you know, Basil thinks about this and it’s a tough decision. He spends a lot of time contemplating it. He finally remembers, though, that when he was a kid, before he started dabbling in all this magic stuff, there was a time when he was passionate about librarians. He knew all of the great librarians by name. In fact, he had all of the trading cards. His Eratosthenes card was even signed. He was that hardcore.
00:01:36.960 Now, the problem is that back then, there weren't many libraries; they were few and far between. So what do you mean if you want to be a librarian? What are your options? You've got to wait for a librarian to die, or you could off them yourself to make room, but you've got to be pretty bloodthirsty. I mean, honestly, let's face it: librarians are a cutthroat bunch. Basil was neither patient enough to wait for one to die nor bloodthirsty enough to kill one. So reluctantly, he followed his father's advice: he traded in his dream and his collection of cards for a career in magic.
00:02:02.400 But now it all comes rushing back as he's staring this genie in the eyes, and he realizes that the thing he wants more than anything else is a library. So he wishes for one. Not only that, being a wizard, he thinks big. He's like, 'I don’t just want any library; I want the biggest, the most magnificent, like world wonder library that there is.' It's going to have more content than anyone else's. Eratosthenes, his idol from childhood, will envy this thing.
00:02:20.000 To make sure he has more content than any other library, he stipulates a very specific condition to the genie. He says, 'Genie, I want this library, but for every minute that the sun is up, I want one new scroll to appear in the library.' So every minute during the day, a new scroll appears. The genie is like, 'Alright, whatever,' snaps his fingers, and poof, there's a library. Basil is stoked! He is so excited about this! Fabian has never seen him like this before. Basil is seriously jumping up and down and running in circles. He's so excited!
00:02:39.120 He runs inside the library and immediately begins writing out a schema to describe how the data is all going to relate, like how the scrolls are going to relate to each other. And he’s really excited! Now, because he's excited and in a hurry, he leaves out a few, you know, minor details. We've all done it, right? Things that you know you can take care of later, it’s not a big deal. But he rushes through it and gets the schema ready, and this is what he comes up with. He's got a bunch of tables up there: languages, nations, people, scrolls, obviously, and how they relate to subjects, and the materials for each scroll.
00:03:03.680 He's really serious about this. Obviously, he's spent a lot of time thinking about libraries, but he's left something off. You caught it—no indexes! But who's going to need them, right? Not a big deal in his little world of imagination, here he can go and grab a scroll really easily off his shelf. It’s not a big deal. Now, how many of us have ever shipped a production table without putting indexes on it? Yeah, I mean, we do it; we're not proud of it, but it happens. How many of you have lived to tell the tale? I mean, obviously, if you’re here, those of you who aren’t here obviously did not live to tell the tale. You are the fortunate ones.
00:03:30.000 So Basil, to his chagrin, is going to learn soon why you don’t do this. Because though it starts small, the first couple of minutes, it's a few scrolls—big deal! But by lunchtime, he's got more than 200 scrolls on his shelves. By the time the sun sets that night, he's got more than 700 scrolls. 700 scrolls! I mean, that's a lot of scrolls, let’s face it. And the only way to satisfy the requests that are coming in, because people have already heard about this world-renowned library, even in the first few minutes, they are coming in and getting in line saying, 'I want this scroll! I want Aristotle! I want Plato! I want this, I want that!'
00:04:23.000 And so Basil and Fabian both are starting to have to answer these queries, and the only way to do it is to go scroll by scroll. Is it this one? No. Is it this one? No. Not a good situation to be in because by the end of the next day, there are more than 1,500 scrolls. By the end of the week, we’re talking 5,000 or 6,000 scrolls on the shelves, and he has to scan the entire thing. Now this is exactly what happens in our databases when we are foolish enough to leave off an index. Our poor beleaguered database has to go row by row.
00:05:04.000 When we say, 'Give me any scroll by Aristotle,' that poor database has to go, 'Okay, is it this row? Is it this row? Is it this row? This row?' Which is not a big deal when you’re in development and you have 10 rows in your database. And when you first deploy your database to production before your users get their grubby little fingers all over your clean, pristine data, things run really fast too. But as soon as those users are in, man, things go south in a hurry, as you all I'm sure know. One day you’re sitting there doing just fine; you start getting alerts that your system is slowing down.
00:05:50.000 You look at it, you're like, your database is really working hard. What in the world is going on? And people start to panic, and the requests come in, and you’re working your brains out, and it’s just not enough. And pretty soon, the whole system goes down in a blaze of panic and flame. Right? Ugly situation! This was the case here; it was just too big of a job for Basil and Fabian to handle alone. They couldn't do it; they couldn't keep up! Patrons were queuing up in line, and they were starting to get restless. I know—I mean, you think librarians are cutthroat? They’re cutthroat for a reason! Their users are merciless.
00:06:27.000 These patrons are lining up, the line goes out the door, down the street, and over the next two sand dunes! And no one wants to wait in line that long, least of all the patrons. So they've had enough—they torched the library. Basil and Fabian barely escape with their lives. Now you have to give it to Basil; you can't keep that guy down, especially when a genie owes him two more wishes. So singed a little, fire-shy, he's like, 'Okay, what did we do wrong?' Well obviously, we forgot indexes, right? If we had indexes, it would have all been okay because we would have been able to look up stuff.
00:07:06.000 And so Basil, he’s taking no chances. He indexes all the things! Right? If no indexes are bad, all the indexes have got to be better. So I mean, he's got indexes on things like the ink color, and he's got indexes on the word count of the scrolls. He's taking no chances. He jumps into his new library! Now, how many of you remember card catalogs—the legacy of other days? I'm glad to see so many hands going up; it makes me feel less old. Basically, these card catalogs are these great big enormous cabinets with these little tiny drawers filled with thousands and thousands of cards.
00:07:36.000 You'd go to the author cabinet, find if you're looking for Aristotle, you’d pull open the 'A' drawer and you'd thumb through till you find Aristotle in alphabetical order in the cards. Then it would tell you all the different books that Aristotle had written that existed in your collection. Pretty straightforward, right? Same for title; if you wanted to find a book by title, you'd go to the title collection. For subject, you’d go to the subject collection. Really, not much to it—but this is exactly what an index is! It's a sorted list of data that points to where something exists. That's all an index is!
00:08:06.000 Basil's indexes were implemented as these card catalogs, and they were enormous because remember, he had to account for a new scroll coming in every minute and he was planning to be in it for the long run. Here’s an example of what Basil’s title index might look like: when a user came in, you know, they would say, 'I’m looking to kill Medusa. I need something that lets me kill Medusa in 21 steps.' So Basil says, 'Okay, I have that one right here,' and he pulls it off and the query is answered in seconds. Super easy!
00:08:37.920 In fact, Basil and Fabian together have lots of downtime in the mornings, especially when things are slow. Scrolls come in, and they have to fill out a few dozen slips to file in all the different cabinets. But when it’s slow, it’s not a big deal; they're able to answer these queries so fast. Basil's like, 'Yes, totally nailed it this time!' He has this spare wish in the background and he's like, 'I think about what to do with that later.' Sadly, load is not always constant. In the mornings, it’s really slow. When people come in for lunch, though, people want to check out a book and do some light reading during lunch.
00:09:06.000 After work, there's a huge spike because people come in by the droves trying to get something to read for the evening. And when you've got this huge rush that lasts for an hour, and you're working hard to answer those, even when it's a fast query, it still takes time. And people are lining up—they don’t have time to deal with these scrolls that are coming in; that’s lower priority. We're serving requests right now! And so these scrolls are piling up. At the end of an hour, there are 60 scrolls that have piled up, and each one has dozens of slips that need to be filled out for it.
00:09:43.440 So suddenly they finish that rush. Now they've got this huge backlog of work, and they’re working through it. Patrons are still coming in at a lower rate, but those patrons are now having to wait while things are indexed, and it's just not pretty. On top of that, patrons do ruin scrolls. I mean, sometimes they'll fall asleep in a carol and drool all over a scroll, and the ink runs, and Basil has to throw the scroll out. It’s really tragic, and he's upset about it. But that’s not even the worst part. What drives Fabian nuts is how do you know what indexes to use? Oh my gosh!
00:10:30.400 I mean something like, 'I need a book named X!' Well, okay, that’s easy. You go to the title index and you’ve solved it. But what if someone comes up and says, 'I'd like a book by a Greek author about linguistics published more than 200 years ago in blue ink on yellow parchment?' Fabian would just get the deer-in-the-headlights look; he's like, 'I have no idea where to even begin.' And Basil would say, 'It’s okay, you go handle the other ones; I’ll take care of this one.' He’s chugging through it, trying to find the indexes—that's a lot of indexes to choose from. You know, they did their best, things were backing up.
00:11:13.040 Basil did his best to work through the backlog. He summoned a small army of homunculi, these cute little things up to about your knee that would work like mad and could get a lot of stuff done. But holy cow, they have an appetite, and they were eating Basil out of house and home! They were breaking the bank. He finally had to dismiss them before they totally ruined everything, but that bought them a little bit of time. It was just a band-aid in the end; they did not have enough power to process everything. Patrons were lining up again, down the road, up the hill, over the dunes, and again, they torched the library!
00:11:55.040 Now Basil's getting a little despairing at this point; you can't blame him. Right? What am I doing wrong? Now Fabian—Fabian's not despairing so much; he’s used to Basil's escapades. This is par for the course. Pretty much being almost burned to death twice in a row is kind of a bummer, but he’s like, 'You know, whatever, Basil; you figure it out and let me know what we’re doing next.' Basil’s racking his brain. Okay, no indexes are bad. All the indexes are bad. What do we do? I mean, you think about it: if this is your database and you have indexes on everything, your database has to work really hard too!
00:12:45.680 Every time something comes in, your database has to go through and file something in every single one of those indexes that correspond to it. If there's a deletion, it has to go through and pull them out. If there’s an update, if the title of a scroll changes, your database has to go in and update every index that is affected by that. And every bit—I mean, work is a finite resource, right? We all know that. Every bit of work that your database is doing not serving queries is work that it can't spend serving queries. That sounds obvious, but it's an important point. The more work your database has to do that’s not related to serving information, the less work it can do serving information.
00:13:35.680 Basil learns this to his chagrin as he’s staring at the pile of ashes that was once his pride and joy. He realizes, 'Okay, I have this list—about four days' worth, you know, two days here, two days there, four days' worth of queries from previous patrons.' He can analyze to figure out, you know, maybe all the indexes were a bad idea. Let’s index just the things we need! So he’s like, 'Okay, I can do that by looking through this data.' But this is a small snapshot of data. Like he didn’t have the library open for very long before everything came crashing down.
00:14:21.760 How do you know the window, the snapshot you have of your queries is sufficient to say what indexes you really need? Who's to say that the next day a patron wouldn’t have come in and queried something totally new that wasn’t in there? I mean, obviously, you’ve got 'Let’s look at my title, let’s look up by author, let’s look up by nationality.' You know, these different things. But what if there’s something else? How do you know? This is the crux of the problem? How do you know what indexes your application needs? It’s not this magic wand kind of a problem.
00:15:01.760 Basil had to rack his brain, and it turns out that the key is a foreign one. No, that’s bad—foreign keys! That’s better! Foreign keys! What are foreign keys? Well, a foreign key is a column—a field in a database that refers to a field in another table. Excuse me, that refers to a record in another table. In this case, we have scrolls that have an author ID, and we have people that are referenced by this ID. So the scroll has an author that points to the other table; it’s like foreign, right? External; outside of that table.
00:15:29.760 Foreign keys are a great place to start with your indexes. Now this makes sense when you think of it in terms of Rails. Because with Rails, 'belongs to' and 'has many' correspond to these foreign keys. You put 'belongs to' on the child table, the one that has the foreign key in it, and you put 'has many' on the parent table, the one that is referenced by it. So a scroll belongs to an author, and a person has many scrolls. It’s an easy reciprocal relationship, and bingo! There’s a foreign key for you. You can just look at all of your 'belongs to' and all of your 'has many' queries and be able to infer a bunch of your foreign keys, and that is a great place to start for your indexes.
00:15:56.800 But should you index every foreign key? Well, it depends. Let’s look at this example where we have a person and a nation—two models, right? A person belongs to a nation, and a nation has many people. Easy enough! person.nation—that's not really using a foreign key, right? It's taking the value in the nation ID and then doing a lookup by ID in the nations table. Since we get primary key indexes for free, right? An index on the ID—that's not really helping us a whole lot. We’ve already got that one down.
00:16:26.320 But the other one, if you’re saying nation.people, 'Give me all of the people in Germany! Give me all of the people in China!' Yeah, you know that's going to go through and look at the nation ID column in the people table. But is that a useful query? What are you going to do with millions of rows from a single query? You're going to crash, is what you're going to do! That’s not a very helpful thing, so maybe you don’t need to index nation ID in that case. Maybe, okay, we'll see; we'll see in a bit. Now, maybe that's not the case, but consider the direction of your queries in order to determine which foreign keys you really need to index, because you might not need to index them.
00:17:05.760 Now here’s another one that Rails does for you: 'has many through' also has 'and belongs to many,' right? This is the type of query that gives you two models with a join table in the middle. It lets you do the many-to-many relationships where a subject has many scrolls, and a scroll has many subjects—all through that intermediate join table. Now, you look at this and you’re saying, 'Okay, so where is the foreign key in this one?' With 'belongs to' and 'has many,' the foreign key existed on the child table. But where's the child table in this case, right? It’s not scroll, and it’s not subject; it’s that join table in the middle!
00:17:44.640 It has both of the foreign keys on it—it has the scroll ID and it has the subject ID on it. Alright, so you index both? Again, that depends. Let’s take a look. A scroll has many classifications—that’s that central table that connects subjects to scrolls. A scroll has many classifications, and it has many subjects through that classifications table. If you’re going to say scroll.subjects, it’s going to go through that classifications table and find every classification where the scroll ID matches the ID of the scroll.
00:18:23.680 So that foreign key, that scroll ID, is the one you'd want to index for a query going in this direction. And going the other direction, it’s the other one, right? This time we have a scroll that has many classifications and a subject with many classifications and many scrolls through that classifications table. So if you’re going to find classifications by subject, now you’re looking at that subject ID column— that's the foreign key you're going to want to index that time.
00:18:50.720 Now in Basil's case, he's looking at it and he's like, 'Yeah, people are going to want to know what subject the scroll has, and people are going to want to know which scrolls belong to a particular subject.' So he’ll index both of them—totally reasonable, right? Aha! Fabian now thinks he’s caught his master out! This never happens, so Fabian’s pretty chuffed about it! He’s thinking, 'Okay, I finally have one-upped this guy.' He says, 'Master, you know, you say we need to cut down the number of indexes, and then you go ahead and create two indexes! Can’t you just do one with two columns? Isn’t that the same thing? Can't you just put an index on scroll ID and subject ID and get the same result?'
00:19:30.000 And here we have to disappoint Fabian, because no! That’s not the same thing at all! Remember, an index is a sorted list of data. When you have more than one column, that data is sorted first by the first column, then by the second column, and then the third and fourth and however many columns you have in your index. So in this case, if we had scroll ID and subject ID, you could totally look something up by scroll ID here, right? Because it’s sorted! You could go in and say, 'Okay, scroll number 1027—bam!' You can find that in a hurry and you can find exactly which subjects correspond to it.
00:20:19.840 But if some patron were to come along and say, 'Okay, I need everything corresponding to subject number seven,' that second column is not sorted by subject ID, and so it's not going to help you one bit! You’re going to have to resort to a full table scan to answer that without a specific column. So sorry, Fabian, you have to index both of them if you want the two-way query support! So Fabian’s like, 'Alright then! Well, what is the point of a multi-column index, right? If it’s not saving you anything, why use it?'
00:21:13.040 I’m so glad you asked, Fabian, because that’s the very next slide! Let’s say—remember I said that 'nation.people' was a bad idea because it could return millions and millions of rows? Well, you combine that index with another column and suddenly it becomes much more useful! The second column acts to refine the query. So you say, 'Nation first' is going to be sorted by nation; you say, 'Okay, I want to find all the Greek authors named Aristotle!' Great! The database says, 'Okay, here we go!’ Greece, Greece! Because there it is! Greece, Aristotle—every style comes back with however many happen to be in there. It’s a very fast query!
00:21:55.040 So that index that maybe wasn’t useful by itself now suddenly is useful when we combine it with another column. Does the order matter? Does it matter that we put nation first and full first name second? Not particularly, but if we were to swap those, we might be able to get two useful indexes for the price of one! Because if we swap it, now we can answer questions like, 'I need all of the people named Plato'—and that’s Plato the philosopher and not the toy, right? We can find all of them just like that regardless of where they’re from! Or we can say, 'I need all the people named Plato from Greece,' and we can answer that query too! So with a little bit of thought, you can engineer your indexes to satisfy multiple types of queries.
00:22:58.080 This kind of multi-column solution is especially useful when you’re dealing with data that is continuous as opposed to discrete. Discrete, like IDs—1, 2, 3, 4, 5—right? Versus continuous, like a time value. You’re not going to go in and say, 'Okay, if we had a published-at column on our scrolls, we’re not going to—you’re never going to have someone come in and say, 'I need every scroll that was published on the 23rd of April 38 BC at 9:42 AM.' You’re never going to have that because it’s too specific, and it’s not the kind of information people keep in their heads about the kind of scroll they’re interested in! However, you add author ID to that.
00:23:44.640 Sort things by author ID first and then publish that, and you can answer queries like, 'I need every scroll published by this author in order of publication date!' Your index will give that to you because your indexes are just sorted lists. In fact, when it uses an index like that many times, I think it will return the data in that order, whether or not you provide an ORDER BY, which is a nice interesting side effect. Another example of this: here we have a little query that’s going to say, 'Give me all the 10 most recently added scrolls!' The scrolls that were added most recently to the collection. Notice, there’s no criteria here. There’s no WHERE clause; we’re not saying, 'Give me all the scrolls where this is the case.' We’re just saying, 'Give me the 10 most recent scrolls!'
00:24:33.920 Now, without an index on 'added,' what is your database going to have to do? It’s going to have to—without that index, it’s going to have to basically reorder the entire table just so it can return the ten rows that correspond to the most recent. Not a happy thing to do when you’ve got a million rows in your database! But if you add an index on that, suddenly that query becomes trivial because all the database has to do— in this case, we’re sorting descending—so all the database has to do is go to the very end of the index and return the ten rows at the bottom, and you’ve got the ten most recently added scrolls, just like that! So please, don’t do like Basil did at first and forget your indexes! Because indexes on sorted columns are as important as pretty much anything, too.
00:25:29.200 Make sure that if you’re doing sorts in the database that you're indexing those columns! So Fabian’s like, 'Okay, one more question!' And Basil's like, 'Dang it! We’re so close! Stop interrupting me!' Fabian’s like, 'Well, how do you know that these indexes are actually going to be used? How can you know that the queries you’re going to be getting are actually going to be satisfied by these indexes?' That’s a head scratcher! Basil thinks about that for a minute, and he’s like, 'You know, let me check my spellbook.' And so he flips through it a little bit, looks at the index in the back, and he stumbles upon one—a special little incantation called EXPLAIN.
00:26:03.920 And EXPLAIN is an incantation everyone in this room can and should learn! It’s really not hard to master! Let’s take a look at how it works. Here we’ve got a query that says, 'Give me all of the scrolls by Greek authors!' Right? It joins on the author and the nation table and finds all the nations named Greece and then returns all the matching scrolls. Not too bad! Especially if you have an index, right? But now, let’s say we want to see what indexes—in fact, how the database, in general, is going to satisfy this query. You ready for this? Boom! You just put .explain on the end of it, and Rails will automatically run the explainer.
00:26:51.920 Now, a little side effect of this—Rails will actually execute the query first and then return the explain, which I don’t entirely understand, but I do understand that there’s a rationale behind it. If that’s a problem, like if your query is one that’s going to take like three hours to run, then you might want to go directly to a database console and key in the explain yourself. But this is a quick way to get familiar with how the explain works. If we were to run this explain without indexes, it would spit out something like this. Now, this is really arcane output—very typical of the kind of thing Basil breathed, and this is MySQL output. MySQL output is going to be different depending on the database you’re using, but let’s talk through this quickly.
00:27:39.840 On the left, well first you see there’s three rows in the output. I’ve kind of had to break the table in half because it was so long, so bear with me on that. The top is the left side of the table, and the bottom is the right side of the table. So the database is saying, 'First thing I’m going to do is I’m going to look at the scrolls table.' Under the type column, you see that word ALL in all caps. That is bad news! That is MySQL saying I have no other recourse but to scan the entire table row by row! A bunch of nulls after that—we’ll talk about those in a minute.
00:28:16.720 But it’s saying under the rows column on the bottom there, it’s saying, 'I expect we’re going to look at about 43,000 rows!' But that number could be much, much higher! If that’s the number of rows in your table, right? If you’ve got four million scrolls in your database, that’s going to mean that it’s going to say, 'Yes, today we’re going to look at about four million rows—not good!' So next, the database says, 'Okay, for each scroll that we see, we’re next going to look at the people table.' This time, I have a key to look at because the primary key index is given for you for free! So it says, 'Okay, I can use that one! I’m going to use the scrolls.author reference or foreign key and I estimate that there’s going to be one matching record, which you’d expect because each scroll has one author!
00:29:04.960 And then it says, 'From that, then we’re going to join on the nations table.' Again, we can use the primary key, and we’re going to look at the people.nation_id and compare that with the nation_id on the nations table. I estimate there's gonna be one row there. So for every single row in the scrolls column, the database is saying, 'Okay, get the person; get this, get the nation. Is the nation Greece? Nope! Throw it out! Let's do it again.' Scroll, people, nation, yep, match, keep it! Over and over again for the entire scrolls table! That is what kills performance!
00:29:56.320 So once we add indexes, though, suddenly the query plan changes! Notice it’s not even starting with scrolls anymore! Now the database is saying, 'Aha! If I start with the nations table, this is going to be a lot more efficient.' We’re going to—now it’s suddenly saying, 'Oh, there’s possible keys to use here!' I have a primary key and I have this nation’s name key, and I think we’ll go with the nation’s name key because that lets me use the name column.
00:30:24.320 It estimates that there will be one matching row, and it’s using WHERE and USING INDEX. USING INDEX is a lovely sound because that means that the database is going to be looking at the index to solve it, and it can actually use just the index! It doesn’t even have to go to the data on disk. USING WHERE says it’s actually comparing the information in the WHERE clause to the index in order to come up with it. So it finds every nation named Greece, and then for every nation it finds there, it’s going to use the people table.
00:30:57.920 And here, it’s using the people.nation index because now it’s going the other direction, right? It can’t use—it’s not using the primary keys now; it’s looking up people by nation ID. And again, it says, 'I estimate we're going to find about 1.' Sometimes it's just an estimate, too! USING WHERE, USING INDEX—and then finally scrolls, and we’ll look up scrolls by author ID, and the result comes out, hey presto! I’m thinking about 107 rows we’re going to have to look at to get this, and that is a whole lot better than 40,000.
00:31:33.680 So see how explain can help you answer these questions? Like 'Why is this query going so slow?' You can look at it and see, 'Oh heck! It’s doing a full table scan on this table that used to have just a few but now has thousands and thousands of rows!' It’s a useful little spell to know! So this is Basil's new schema; he’s ripped out a bunch—you know, dozens of indexes that he doesn’t need, significantly slimming down his schema and he’s ready to try again. He summons the genie, makes his wish and hey presto, there’s a brand new library!
00:32:23.680 He gets to work—scrolls start coming in, one a minute. He’s filling out just a few indexes now for each one, and he and Fabian are answering the queries. A few hours go by; they're biting their nails, but it seems to be working! They’re actually staying on top of it this time! The queries come in, they’re handling it just fine! And every once in a while, you get the weirdo that’s like, 'You know, I need every Egyptian scroll about geometry that was written between the fifth and eighth century BC, and it needs to be in blue ink on yellow parchment.' They’re like, 'You go wait over there; I don’t have time to deal with you right now!' Next!
00:32:57.920 And then they’ll keep dealing with it—another person then goes and waits over in the corner until they have time to deal with them, which is usually going to be after the library closes and the scrolls stop coming in, and things are calm and they can go in and deal with these one-off obscure queries. The thing is, there’s only going to be like one, two, maybe three of them each day, so it’s totally doable! Sadly, and who knew, but Alexandrians have this thing about libraries—on general principle, they torched it! Basil and Fabian, you know, Basil was pretty crushed. Fabian’s like, 'Whatever, I almost cut— I almost died again, but that’s par for the course!'
00:33:34.080 But Basil has learned that he can do this! He’s found a system that’s going to work and he’s going to go now someplace where they will appreciate his libraries and not burn them down on general principle! And he is going to be happy! So now it comes back to you: how are your databases looking? What’s the health of your database? How have you been treating your database? What indexes are you missing? What can you do to increase the speed with which you can respond to customer queries? Look at your foreign keys!
00:34:19.040 Look at the 'belongs to,' the 'has many.' How many through? Look at your sorting—see what columns you’re sorting on. Use EXPLAIN to figure out whether or not your queries are actually using indexes. It’s a lot better to discover that your queries aren’t using indexes when the load is still low, like when your queries still seem fast. That’s the time to be checking to see whether or not they’re using indexes, because you don’t want to wait until suddenly these queries are taking four or five seconds to do and the requests are backing up.
00:34:53.920 Because what do you do to add an index? You have to do a migration! And a migration to add an index does a full table scan, which is not going to help your situation one little bit! Take my advice—add the indexes as soon as you can! And if that means adding them at the moment that you create the migration for the table, please do; that is the way to go! So if you’ve enjoyed Basil and Fabian at all, last summer I wrote a little novella online at blog.jamisbuck.org. I encourage you to go read it; it has nothing to do with databases!
00:35:38.720 It’s about them discovering pathfinding algorithms—totally unrelated to this at all. But because I have the opportunity to do a shameless plug, I’m also writing a book about maze algorithms for programmers, Mazes for Programmers, and if that sounds interesting to you at all, you can go to pragueprog.com/book/jbmaze. And it's in beta! I think I encourage all of you to go out right now and buy it! So, thank you!
00:36:12.920 Thank you.
Explore all talks recorded at RailsConf 2015
+122