RailsConf 2016
Level-up Your Active Record Skills: Learn SQL!

Level-up Your Active Record Skills: Learn SQL!

by Ryan Dlugosz

In the video "Level-up Your Active Record Skills: Learn SQL!" by Ryan Dlugosz, presented at RailsConf 2016, the speaker emphasizes the importance of SQL for Junior Rails Developers. He highlights that while Active Record is a powerful ORM (Object-Relational Mapping) tool, understanding SQL can greatly enhance a developer's ability to build efficient applications and answer complex business queries. The talk aims to provide an overview of basic-to-intermediate SQL techniques applicable in real-world scenarios.

Key points discussed include:

- Introduction to SQL: The speaker provides insights into the role of SQL in application development, highlighting its omnipresence in databases and its advantages over relying solely on Active Record.

- Transition from Traditional Coding to ORM: Dlugosz illustrates the evolution from embedding SQL queries in applications to using ORM tools like Active Record, noting the messiness and complexity this transition resolves.

- Real-world Example: He shares an anecdote about a junior developer trying to create a dashboard, demonstrating how SQL can simplify querying large datasets compared to Ruby code.
- Importance of Learning SQL: Dlugosz stresses that knowledge of SQL is essential as ORMs can vary, but SQL remains constant across different platforms. He likens SQL to the native language of databases, advocating for a deep understanding of it to develop as a programmer.

- Complex Queries and Joins: The speaker introduces various SQL queries, including counting records and performing aggregates with GROUP BY, as well as using joins to link related data across tables.

- Performance Optimization: Dlugosz explains tools like EXPLAIN, which help diagnose performance issues by providing insights into how the database processes queries.

- Cautions for Direct SQL Usage: The video concludes with a warning about the risks of executing raw SQL within Rails applications, emphasizing the need for input sanitization to mitigate vulnerabilities.

Important conclusions and takeaways from the presentation include:

- Learning SQL enhances a developer's skills and understanding of their applications.

- SQL makes it easier to perform complex queries efficiently compared to trying to manipulate large datasets in application code.

- Active Record should be complemented by a solid knowledge of SQL, allowing developers to know when to use raw SQL for specific requirements.

- Continuous learning and practical application, such as pairing with more experienced developers, are key to mastering these skills.

00:00:09.570 Good afternoon, everybody! How was lunch? Good? Good.
00:00:16.890 Awesome! This is the Junior Developer track, and I'm curious about everyone's backgrounds.
00:00:22.990 How many of you came to Rails from some kind of boot camp? Okay, how long was that boot camp? Eight weeks? Six weeks? Twelve weeks? Six months? Seven months? That's a long time, that's basically a degree.
00:00:34.870 How many people have a traditional computer science degree? Okay, cool.
00:00:49.390 And how many of you are learning by doing things, like watching internet videos and so forth? Well, just so you know, that never changes. I've been developing for a long time, and internet videos and just trying things out is pretty much how you have to roll.
00:00:57.309 Behind almost every Rails app, and almost every app in general, is a powerful language called SQL. I got the idea for this talk when I was working with a junior developer who was trying to build a dashboard page to provide some stats on user activity.
00:01:09.640 He was using the tools that are built into Ruby, because that was what he was familiar with. The problem with that is that when you put that code in production, all of a sudden instead of downloading just a couple dozen rows you have in your development system, you end up asking Ruby to pull down millions of rows for calculations.
00:01:25.210 This is a situation where clearly the database is the right place to be doing the heavy lifting. When we started pairing on this task, he explained the problem and I immediately jumped into a SQL prompt and started writing a SQL query that gave us the answers we were looking for for the dashboard.
00:01:39.159 The thing he said to me was, 'Wait, how did you do that?' This is a key moment for me because I've found that any time I say, 'Wait, how did you do that?' I'm about to learn something. It became a learning moment for both of us. I realized not everyone has the same SQL background, and he learned a lot about the tools available behind the scenes.
00:01:58.659 We spent the whole hour writing SQL queries, and I think we both came away learning a few things. SQL, to put it another way, is the hero that Rails deserves.
00:02:11.110 Let's address the pronunciation. Is it SQL or is it Sequel? I go back and forth between both; it doesn't really matter.
00:02:15.700 SQL is officially the correct term, but if you say Sequel and someone corrects you, that's totally fine. It used to be called Sequel and changed to SQL due to trademark disputes.
00:02:28.489 So, today we're going to talk a bit about the backstory of SQL, review some great examples, and discuss where to go from here.
00:02:42.560 A little about me – my name is Ryan DeLucas. I didn't sneeze through my name; it's actually pronounced that way. Anyone else here Polish? All right, we’ll find some barbecue pierogi after this.
00:02:56.000 I've been programming for about twenty years, with fifteen of those being professional work. I live in Cincinnati, Ohio with my family.
00:03:05.290 You can find me on Twitter, I'm @Lebowski. So, I don't really need to tell you my favorite movie.
00:03:15.420 I'm a consultant, which pretty much means I say yes to things before I actually know how to do them. Learning as you go is a skill that you'll develop as a developer.
00:03:24.430 Recently, I have started working as an instructor for Girl Development, which is an organization that helps adult women launch into technology careers. If that's something that interests you, talk to me afterward.
00:03:36.220 Now, let’s discuss the world before ORMs.
00:03:47.920 ORM stands for Object-Relational Mapping, which is an all-encompassing term for things like Active Record.
00:03:54.940 In the late '90s, we were building web applications, but we weren't using tools like Active Record.
00:04:01.000 Instead, we were putting the SQL directly in line with our code, using programming languages like Perl, ColdFusion, and PHP. PHP is still in use today, and more recently, the structure of PHP apps is much better than it used to be.
00:04:15.260 We would embed these SQL strings in the code and it got messy, complicated, and frustrating. Let me give you an example.
00:04:31.080 Say we had an application and wanted to display a list of all the people in our system. We could write a very basic SQL query to select the name and some parameters from our people table.
00:04:42.210 That's great; we have a query, but now we must execute it against our database and get back a bunch of data collected into an array of arrays.
00:04:54.510 We have all this data, but now to use it in our application in an object-oriented way, we need to create those people objects and collect them into an array.
00:05:07.439 We'll iterate through each row of the results to create a new person object and manually assign the data elements into the corresponding attributes of our objects.
00:05:27.400 Additionally, we must consider things like translating values, such as how in the database, it's often stored as 0 or 1 but in our object, we require boolean types.
00:05:38.700 This process of creating objects gets repetitive, monotonous, and error-prone. The code also becomes fragile.
00:05:52.560 If you change something in the query structure, it's easy to break that mapping. As developers, we often try to avoid repetitive tasks by looking for ways to automate.
00:06:02.060 I must admit something hilariously naive from my first paid application, back in the late '90s, it was a PHP app, a standard CMS-type of application for a small business.
00:06:15.800 Many of the forms looked pretty basic; we collect a few fields and submit them. But take a look at the top line where it submits to something called save_data.php.
00:06:28.000 My workaround was to pass a hidden field called table_name, and on the backend I would take that and dynamically build SQL queries to insert, update, or delete data from the database.
00:06:44.900 This way of coding is incredibly unsafe. If you don't understand why this approach is a bad idea, I suggest you attend some application security courses this week; you'll find them very insightful.
00:06:56.760 My solution went something like, 'Please don’t hack me on submit'. Although this was behind an admin interface, it’s still not the way to code.
00:07:09.170 Fast-forward to the mid-2000s, and ORMs were starting to gain popularity. In the Java space, we had Hibernate and JPA, while in Rails, we have Active Record.
00:07:25.080 With Active Record, we can take a giant mess of code and replace it with a simple one-liner.
00:07:37.020 If Active Record can do all that, why should I even bother learning SQL? One reason is because SQL is everywhere.
00:07:49.580 SQLite, which ships with Rails for development, is among the top five most popular pieces of software in the world.
00:08:00.340 SQLite is installed on every Android phone, every iPhone, every Mac, every Windows 10 PC, across all Linux distributions, and in many cars, airplanes—it's literally everywhere.
00:08:11.740 But even with that, why learn SQL if you can interact with it through Active Record and ORM tools? The truth is, ORMs come and go.
00:08:26.260 I've used three or four different ORMs, each with different APIs that change every time. Once you finally learn how to do something beyond the basics, you're looking everything up in documentation.
00:08:39.750 Three or four years from now, you might be using a different language and ORM, and you won’t be able to apply what you've learned about Active Record because the structures change.
00:08:51.570 However, the one constant language is SQL; it will always be the same.
00:08:59.100 So, apps, frameworks, and databases may come and go, but SQL will endure.
00:09:10.780 If you think this narrative is cheesy, you can thank my friend Chris Nelson from the Cincinnati user group for it.
00:09:17.470 The main point is that you want to learn the native language of whatever you are working on.
00:09:29.700 Richard Hipp, the creator of SQLite, mentioned in a podcast that being a tourist can get you by with pointers and a few keywords, but to succeed, you must learn the language.
00:09:38.750 By learning SQL, the native language of databases, you'll transition from a junior programmer to a seasoned pro who understands the right tool for the job.
00:09:56.170 Today, we'll look at some examples to demonstrate this.
00:10:04.310 To do this, we need to create a basic schema for authors and books using Active Record, as it's great for this kind of thing.
00:10:22.730 In our schema, an author has many books, while a book belongs to an author.
00:10:30.160 We will start by defining fields for authors, such as name, country, email, and birthdate.
00:10:40.070 For books, we’ll have fields like titles, pages, publication date, topics, ISBN number, and most importantly, an author ID.
00:10:47.340 This helps establish the relationships in the database where books are tied back to their corresponding authors.
00:10:56.140 Let’s assume you’re on a new project, or perhaps it's an existing one, and your boss or client wants information about the data you have.
00:11:04.480 For instance, they might want to know how many books you have. The SQL for this is pretty straightforward, and similarly, here's the Active Record call to achieve it.
00:11:25.360 To count how many authors we have, the SQL statement would just be a simple select to count the IDs from the authors table.
00:11:31.660 In Active Record, you would just call Author.count.
00:11:41.650 Now, let's see how we would handle this in a database terminal. It depends on your database, but if you're using Postgres, which I suggest, you can launch it against your database.
00:11:54.150 You will find yourself in another prompt where you can type commands to interact directly with the database. You would input your query and receive the results.
00:12:03.270 For example, we could discover that we have about four thousand books. Similarly, we can check the authors' table query and find that we have around five hundred authors.
00:12:11.030 That’s all very basic, but let’s dive deeper. We could ask how many books we have in each topic, which adds an interesting layer.
00:12:19.760 To do this, we would use a tool we haven't yet discussed called GROUP BY.
00:12:26.080 We would select our topic from the books table, count the occurrences, and group them by that topic.
00:12:38.160 This gives us a set of results with one line for each topic, along with the count of books associated with each topic.
00:12:48.630 Does that make sense to everyone? You can also execute this in Active Record. However, both are straightforward queries.
00:13:00.510 As you get more complicated in your queries, like the one we're about to create, you may find that writing the Active Record code takes as long as writing it directly in SQL.
00:13:14.060 Let's explore a more complex query—we want to find out how many books we have in each topic while also finding the minimum, maximum, and average page count.
00:13:23.410 To achieve this, we would start our SQL query like this: select topic, count(topic), minimum pages, maximum pages, average pages.
00:13:34.060 We will call minimum as a function the database provides, along with maximum and average calculations.”
00:13:43.930 After aggregating, we will group by topic. When viewing the results, we will have our topics listed along with their respective counts and our calculated page statistics.
00:13:56.360 While you may not explicitly calculate pages count, you might track user activity behind the scenes, requiring you to analyze metrics scattered across various tables.
00:14:05.660 Allowing the database to do the heavy lifting is far more efficient than trying to pull extensive data sets into your Rails application.
00:14:16.020 Now, let's discuss how to join data together.
00:14:29.390 How can we answer the question, 'Which authors have written five or more books on a given topic?' This question is relevant for outreach opportunities.
00:14:41.180 A join combines two tables based on criteria; in this case, an author's ID and a book’s author ID.
00:14:53.700 For our SQL, we'd start with selecting the author's name and the count of their books.
00:15:05.950 We'll select from the authors' table and join it with the books table linked by our criteria, grouping by both the author's name and the topic.
00:15:17.750 Next, we will want only those authors who have written more than four books, so we'll use a HAVING clause to limit our result set.
00:15:29.020 This gives us the data needed to see which of our 500 authors have actually written five or more books.
00:15:36.530 While reviewing the results, we might notice that topics like Ruby and Rails didn’t show up, but that’s just the randomness of our test data.
00:15:46.600 You can build upon these queries and nest them as needed. If you had sales data, for example, you could refine your search to include how many authors sold over a thousand copies.
00:15:54.890 Joining data is essential. We just covered a basic join, but there are multiple types of joins to explore.
00:16:09.570 Suffice it to say, there are ways to retrieve unmatched data alongside matched data.
00:16:19.470 Let's discuss a tool called EXPLAIN, which shows how the database is working behind the scenes.
00:16:36.300 This is useful for diagnosing performance issues. When you run into a situation where your local application runs well but the production version does not, this tool can help.
00:16:47.350 You can easily run EXPLAIN by prefixing any query. It will analyze the execution and tell you how it plans to find the information.
00:17:03.000 You can also use EXPLAIN ANALYZE, which runs the query and provides actual execution details.
00:17:12.020 EXPLAIN outputs can look dense and complicated, but a key takeaway is that if you see a 'sequential scan', that's usually a red flag.
00:17:23.050 A sequential scan means the database has to look through every single row, which is very slow. You can add an index to your database to help with this.
00:17:34.890 There is definitely an art to creating effective indexes, but the first step is running EXPLAIN on any slow queries or even just as a habit.
00:17:43.700 You can also check your development logs used by Rails; they log actual SQL queries, letting you investigate further.
00:17:50.720 By understanding the database's workings, you can learn to optimize.
00:18:01.840 Now, we have to discuss direct SQL calls from within a Rails application.
00:18:10.570 I want to emphasize that executing raw SQL directly can be dangerous; there are many edge cases.
00:18:21.440 You might use it in migration when Rails does not know how to create certain column types, such as geography type columns.
00:18:36.950 In this case, you would acquire a connection from the Active Record connection base to execute your SQL.
00:18:45.300 Another method can be using find_by_sql, but you must be cautious not to include unsanitized user input.
00:18:57.630 Using find_by_sql would return instantiated objects of the model type you've invoked.
00:19:04.480 For any data not specified in your parameters within your SQL, they return as virtual attributes on your model.
00:19:14.550 If you don’t know how to sanitize inputs, be careful: remember Bobby Tables?
00:19:27.050 You never know when little Bobby tables will show up in your application, so always sanitize your user inputs.
00:19:41.010 So, where do you go from here? There is obviously a lot to learn about SQL.
00:19:52.510 If you don’t have a book on databases and SQL at your desk, pick one up. SQL doesn’t change much, so a good book will serve you well.
00:20:02.610 Check the documentation for your specific database to learn about features and aggregation functions.
00:20:15.390 Pair programming is the best way for a junior developer to level up quickly; working alongside senior developers exposes you to real-world problems.
00:20:31.260 If you don’t have an opportunity to pair in person, there are many online platforms to connect with others.
00:20:41.800 Remember to scour your logs for database queries, understand them, and run EXPLAIN to get insight into your application's performance.
00:20:55.370 The takeaway is to learn the native language of your database.
00:21:02.460 Understanding what Active Record does for you behind the scenes is crucial.
00:21:10.420 Use Active Record as a great tool, but know when to jump in and use the database directly.
00:21:18.190 You have many tools at your disposal; knowing when and how to use them is key.
00:21:29.380 Thank you very much! If you have questions, I’d be happy to take them; we have a couple of minutes.
00:21:57.170 Yes?
00:22:02.460 My recommendation is don't execute raw SQL from within your Rails app without knowing what you're doing.
00:22:15.160 There are a few different ways to do it. You can get a connection from Active Record’s connection base and execute raw SQL.
00:22:22.950 This is useful inside migrations, especially when you need to create specific types of columns.
00:22:31.850 Using find_by_sql is another way to incorporate raw SQL, but be careful to avoid user data.
00:22:41.950 Using find_by_sql allows for returning instantiated objects.
00:22:48.600 These objects will have virtual attributes based on what you specify in your SQL.
00:22:56.370 Are there any other questions?
00:23:04.880 I don’t have a specific example for when table rewrites are necessary.
00:23:12.400 With any migration, you should be cautious to ensure your application remains stable.
00:23:20.670 For example, in Postgres, creating indexes concurrently is one way since Rails doesn’t usually support this.
00:23:31.260 You can build your migration by hand that utilizes this feature.
00:23:36.350 Does anyone have any additional questions?
00:23:45.550 It's not necessarily faster to write SQL directly; Active Record usually does the right thing.
00:23:55.450 You can observe this when looking at how your application operates.
00:24:06.250 Active Record translates your commands directly into SQL queries executed against the database.
00:24:15.730 In some scenarios, stepping in with raw SQL will provide better efficiency.
00:24:24.790 The key is to observe what Active Record does and learn from it. Thank you all!