Ruby Video | A Hands-on Experience with Complex SQL

Keep Ruby Weird 2016

A Hands-on Experience with Complex SQL

Play on YouTube

Craig Kerstiens

@craigkerstiens

Keep Ruby Weird 2016

#database-design

#query-optimization

#data-processing

A Hands-on Experience with Complex SQL

Craig Kerstiens • November 04, 2016 • Earth • Talk

In this engaging talk titled A Hands-on Experience with Complex SQL, presented by Craig Kerstiens at the Keep Ruby Weird 2016 event, the speaker addresses the intricacies of SQL, particularly focusing on the PostgreSQL database. The session is aimed at demystifying SQL, showing that it can be enjoyable and efficient to work with, especially when utilizing PostgreSQL’s powerful features.

Key points covered during the presentation include:

- Introduction to Speaker’s Background: Craig shares insights into his professional history, highlighting his role at Sidekiq Cloud and his involvement with Postgres Weekly, offering resources for keeping up with PostgreSQL.

- PostgreSQL Features: The speaker outlines the advantages of PostgreSQL, listing features such as JSONB support, transactional DDL, and the Listen/Notify pub/sub system, emphasizing its capabilities beyond the standard relational database.

- Common Challenges with SQL: Kerstiens addresses the common frustrations encountered with SQL, particularly regarding the readability and complexity of queries, which can deter developers from enjoying its use.

- Live SQL Demonstration: Throughout the talk, Craig conducts live demonstrations of SQL queries, illustrating real-world application. He creates a daily billing report live, showcasing the processes of querying, using Common Table Expressions (CTEs), and applying window functions for growth measurements.

- SQL Best Practices: The speaker provides practical tips for writing better SQL, suggesting the setup of a personalized psqlrc, choosing organized formatting styles, and emphasizing the importance of readability in SQL queries.

- Conclusion and Takeaways: Craig concludes by reinforcing that well-organized SQL aids in debugging and optimizing processes, encouraging attendance to Postgres Weekly for ongoing learning.

The session underlines that while SQL can seem daunting, especially in live coding situations, proactive use of PostgreSQL’s features and adhering to best practices can significantly enhance the experience.

A Hands-on Experience with Complex SQL
Craig Kerstiens • November 04, 2016 • Earth • Talk

A Hands-on Experience with Complex SQL by Craig Kerstiens

Keep Ruby Weird 2016

00:00:08 I'm going to cheat a little bit. I'm actually more nervous for this talk than usual because I am going to be doing a lot of live SQL, and SQL is not fun or exciting, but we'll see how this goes. A quick background: I head up Sidekiq Cloud. Before that, I worked at Roku for five and a half years, working a lot with parents and teams. I curate Postgres Weekly, so if you like this talk at all, go sign up. It's a weekly newsletter about Postgres trying to make SQL not so awful.

00:00:39 A little bit of audience participation here: Does anyone here not use Postgres? A few hands. I'm sorry; this may not be interesting to you at all, but by the end of it, you should realize why SQL's not so bad and why you should be using Postgres. This is a slide I show at pretty much every conference I talk about Postgres: it's a great database, really powerful, and it does a ton of things. If you're not sure what all these things are, there’s JSONB, a binary JSON directly in your database. There's transactional DDL, so if you're running a migration and something fails, you can roll it back. Listen/Notify is pub/sub directly in your database. You've basically got anything you could want; it's less of a relational database and more of a platform.

00:01:31 This is a really fun email from the Postgres archives from about ten years ago. If you're not familiar with Tom Lane, he's written a ton of code that you've leveraged. He co-authored the spec for JPEG, wrote the spec for PNG, and authored LibJPEG and LibPNG. For the last ten to twenty years, he's been a major committer to Postgres—probably 50% of every Postgres release in a way. But in short, the naming of PostgreSQL was the worst decision I ever made. This was ten years ago, and they still haven't changed it since. Postgres is a perfectly fine synonym.

00:02:18 All right, one more kind of fun point: This is from a colleague. I think this sums it up pretty well: it's more of a platform than anything else. How many people like writing SQL? That’s a few hands. How many people like reading other people's SQL? There's a reason for this because most of the time when people write SQL, it comes out looking like this. It doesn't have to; we'll get to that shortly.

00:03:00 Thanks to Kylie's talk, I had to interject this, so let me go ahead and execute that query. I don't know if you're going to be able to see it at all.

00:03:08 You can see the results directly from this query; we get a Mandelbrot generated from that one query. I'm not going to talk about how to write that query at all today. But I will now start to dig into a little bit of live SQL.

00:03:47 First of all, I'm curious how many of you have a bash profile set up? Most people with Vim, etc., how many of you have a psqlrc set up? A few hands there. It's super handy; you can do all sorts of things in there to trick out your psql, which is the editor that comes with Postgres.

00:04:06 Let me show you mine. There are a lot of things in here; one really nice feature is that I record my history file for every single database. I can actually go and look at which databases I’ve connected to and catch all the queries I've ever run for my Keeper Beware database. So if you’ve ever written a query, came back two months later, and completely forgotten what it was, it's there; I haven't lost it.

00:04:35 Backslash timing is a really nice feature to have on. This is going to show how long it took me to run every single query, actually making notes look somewhat distinguishable. So now I'm on my database. In this context, I have a really simple application to assume this is a database as a service where people come in and provision things and deprovision things. The query we’re going to try to create here live will show us our daily run rate and how that's changing over time.

00:05:18 So this is something pretty simple that a lot of businesses would care about: basically, if nothing changes today, on Monday, this is how many users have provisioned a database, how much I would bill on a monthly basis, and then how does that change tomorrow based on what people provision.

00:05:31 I can do backslash T and describe my tables, so this is going to show me their structure. So now I’m going to query. This will show me one record. I have the ID of this provision request—just a standard kind of ID of the billings—I’ve got a formation ID which references something else in my system, my organization, which refers to my user. Then there are a couple of interesting fields here; I have this period which uses a timestamp range. Postgres has a range type with values of 'from' and 'to' in that column.

00:06:10 This is actually a data type known as a range type, so I don't have to do 'from' and 'to' in two different fields. I also have the price for a month. I mentioned my psqlrc; one helpful feature is I have backslash auto on. If you notice, it formatted my output like this. If I grab two of these, it's going to format them nicely on my screen.

00:06:32 If I grab maybe just the period and my price per month, it knows my screen width and is going to format this for me. So it’s pretty handy in terms of interacting with it without having to do anything extra. Let’s see if we can actually do this: the first thing I'm going to do is look at all of our billings. We can see I started billing at the first of the year, and I’m first going to go ahead and generate some time, starting from the first of the year up to now with a one-day interval.

00:07:10 This will give me an output that I can project into an Excel file. I was going to have my date for every single day of the year; I’ll shorten this to make it easier to iterate with.

00:07:48 Now let’s start doing a little bit more here. The first thing I'm going to do is backslash e. Backslash e is going to pop open my default editor; if you set your environment variable with your editor, this could be Sublime Text, Emacs, VI, etc. This will be really handy for me to work in an environment that’s not just a CLI.

00:08:39 I will set up my dates; this is a Common Table Expression (CTE). CTEs are extremely handy and going to be key to making SQL look like something that people can follow. This will create a temporary table view during this query. From here, I can come back down and say select star from dates.

00:09:12 This will name the column, and we will do the exact same thing. Nothing special there, but now I can start to chain these together. First, I want to look on the billing side to see what my period and price per month data looks like.

00:09:41 Let’s grab my billings, getting the period and price for a month. Now, I'm actually going to try to do some logic here and say where 'period' is less than the current date, ensuring the query only returns relevant billing records.

00:10:32 To extract this, I want to make sure I check that the upper date is greater than the current date or that it is null—if I scroll back up, you'll see that an example one was provisioned on January 1st and deprovisioned on May 3rd. This way, I ensure the upper bound meets our criteria for the current date.

00:11:32 So basically, it's going to fall inside this date range, and I'm using a cross join here due to the lack of a direct join on the date, which means it’ll match every record to the correct date.

00:12:13 From here, I can get the correct table now, but we still see that the result isn't complete. I need to refine the query to get accurate outputs.

00:12:50 Now, I want to sum this up to see if we can get a daily rate. I will sum the price per month and group by the relevant periods.

00:13:04 Let's check if we’ve got a daily summary. Perfect!

00:13:32 So this is starting to look almost like a dashboard. What I want to try to do, if it works, is a window function. It's great to have my data for billing each day, but now I’ll also want to measure growth—month-over-month and week-over-week growth would be helpful.

00:14:59 This could be extracted into Ruby or calculated iteratively in Excel, but why not just do it in SQL? Let me grab the date and name this as my total for the day, and then I will be grouping by window.

00:15:19 The window function gives you the ability to iterate over the results. For example, I could rank everything by department and see where individuals fall based on their salaries. In this case, my aim is to measure comparison, like comparing January 7th to January 1st.

00:16:04 I’m going to apply a lag function which is built into window functions. This basically lets me compare today’s total to yesterday’s total; I could also add another layer for the previous week’s data.

00:17:03 As we check our data, remember we don’t have enough entries for this; I might need to adjust what I’m analyzing. Now I can create a comparison that allows month-over-month growth indications to be calculated seamlessly.

00:17:58 So just with what we’ve created, I’m calculating the growth and will showcase the results from last week as well.

00:18:28 With SQL, we’re creating outputs that often replace what could take a lot of ETL work. Here I have my run rate on a daily basis and my yesterday's total—we can see the metric with calculated growth.

00:19:34 Hopefully, that wasn't too painful to watch with live coding. If I summarize what we built here, it should be clear to anyone coming in that it’s straightforward to read. Clean, organized SQL helps with debugging, reasoning about data, and optimizing processes.

00:20:11 A few tips for writing better SQL: Set up your psqlrc; it’s really handy. Use a defined style, align your columns, and make sure common table expressions are structured for readability. Hard-core Postgres experts will point out they are an optimization boundary, but on a report like this, with performance measured in milliseconds, it’s not a big concern. Lastly, subscribe to Postgres Weekly for more tips like these!

Craig Kerstiens

@craigkerstiens

explore all talks recorded at Keep Ruby Weird 2016

Explore all talks recorded at Keep Ruby Weird 2016

Keep Ruby Weird 2016