00:00:08.519
Good morning! I'm Barrett Clark. I've been working with Ruby for about a decade now, and as you can tell, I'm from the United States. I currently work at The Container Store, a retailer in North America that specializes in organization and storage solutions. We sell boxes and help our customers organize their closets. We've also started playing with Docker containers.
00:00:24.519
Today, I'm going to share a dozen tips and tricks to help you get the most out of your database and your application. Some of these tips are supported by ActiveRecord, while others fall outside of what ActiveRecord provides. We're going to cover topics like awesome data types, proactive and reactive performance testing, time-series data, backup and restore strategies, PostGIS, scripting with Postgres, and more!
00:00:50.280
So, let's get started. For the first trick, we're going to explore how Postgres executes a query. We have a sample query here that filters by a couple of fields. The '10K' table is commonly used by Postgres developers for developing their features and running benchmarks. It contains 10,000 records with a variety of fields.
00:01:08.680
Here's one of the records, showing unique values in fields like 'unique_one' and 'unique_two.' With 10,000 records, there are 10,000 unique values and 100 possible field values, ranging from 0 to 99. To get a query plan in Postgres, you simply prepend 'EXPLAIN ANALYZE' to your query. When you execute it, you receive output instead of the actual query results, which provides insights about the query being analyzed.
00:01:43.720
The default output format is text, but you can also get it in JSON or XML if needed. Looking at the output, we can see that a sequence scan will occur on the '10K' table, as no indexes exist. For small tables, a table scan may be acceptable, but for larger tables with frequent queries, we should consider adding indexes.
00:02:20.879
It’s a good rule of thumb to index foreign keys, as they are often used for lookups and filtering. After adding some indexes and rerunning the query, we notice that we have transitioned to a bitmap heap scan and a bitmap index scan. Essentially, a heap scan is a smarter table scan that incorporates index consideration, thus increasing efficiency.
00:02:51.920
While we can improve execution times with indexes, keep in mind that query plans can get lengthy. For instance, I analyzed a query related to flight data that took about 50 seconds to run, which is much longer than my target of 500 milliseconds. I could visualize the query plan using tools like visual query plan tools. Analyzing the output helps identify hotspots, such as in this case where filtering by 'United Airlines' appears problematic.
00:03:44.360
From here, I could either explore if the field is indexed or make further considerations like partitioning the data or creating a materialized view. There are several ways to visualize query plans, but keep in mind that different environments yield different results. Factors like data, hardware, and configuration all affect how the query engine executes a query.
00:04:53.760
Now, sometimes we need to observe the database’s performance in real time. This is where PG Stat Statements come into play. By enabling an extension with the syntax 'CREATE EXTENSION,' we can capture statistics on executed queries. This assumes that the extension is already installed and permutations may be necessary depending on the operating system.
00:05:52.879
Once activated, this view logs statistics such as total minutes executed, the number of executions, and average execution time. However, be cautious, as it can introduce performance overhead. It’s not advisable to keep it active in a production environment all the time. Instead, enable it when needed, keeping in mind that it stores the last 5,000 statistics.
00:06:42.640
Let’s dive a bit deeper into how we can access the database using Ruby. You don’t necessarily need to use ActiveRecord or Rails for database access. For example, here's some plain Ruby code that runs the PG Stat Statements query. It involves loading a configuration file and specifying how to connect to the database.
00:07:03.760
You can expect a return of raw data as a PG result array. Remember, ActiveRecord simplifies working with PG result objects, but plain Ruby allows for direct interaction. This example also illustrates how to use Bundler in a Ruby script by including the required gems in a Gemfile.
00:07:51.920
Now, let’s shift our focus onto some handy data types that can enhance your application’s functionality. You can leverage non-relational structures within a relational database. Let’s consider how you manage data from APIs—many create local models mirroring those external resources.
00:08:30.600
It’s often useful to store the entire payload to ensure that no vital information is lost. Storing data in formats like JSON or JSON binary is beneficial for flexibility and indexing. JSONB, in particular, allows more efficient indexing and retrieval of specific elements within the payload.
00:09:11.120
Using PostgreSQL's jsonb features allows querying at more granular levels, including indexing JSON fields to improve performance. Additionally, PostgreSQL offers range data types, like date range and timestamp range, which consolidate start and end dates into a single field, thus optimizing data management.
00:10:00.000
By utilizing the range types, we can efficiently perform calculations like determining flight durations and enhancing the clarity of our data insights. For instance, I can compare flight durations with actual elapsed time recorded in the data.
00:10:39.920
The way these range data types operate might seem unusual, but the special operators available can facilitate quick queries. For example, we can check all flights in the air at a specific time using these range types, significantly simplifying data analysis.
00:11:25.920
Another valuable data type in postgresql is UUID, which supports storing identifiers that are unique across databases. To utilize UUIDs in Rails, it’s critical to enable the corresponding extension and configure migrations accordingly. This gives the added benefit of ensuring data integrity by preventing invalid values from being stored.
00:12:39.920
Bear in mind, while UUIDs prevent overflow in primary keys, they can also complicate queries due to their non-sequential nature. Adjusting ActiveRecord's default sorting to utilize fields like 'created_at' helps overcome issues related to the ordering of UUIDs.
00:13:28.240
Next up, let’s explore creating your own data, which is a pretty exciting trick. You can generate a series of timestamps or numbers to fill in gaps in your time series data, which can be particularly helpful when plotting data points.
00:14:19.920
Using a Common Table Expression (CTE) allows for composing complex queries by using simpler subqueries. For instance, generating timestamps for every hour in 1999 can be combined with data about flight statuses to produce a comprehensive view of when a given airplane was in the air.
00:15:13.520
By joining the generated timestamps with flight data, we can visualize flight activity alongside hourly timelines, even reflecting periods when no flights were logged. This kind of operation can be particularly useful for creating insightful visualizations.
00:16:08.240
Furthermore, using functions like COALESCE enables us to handle missing data. Imagine counting occurrences in a column where we only have even numbers. By generating a set of numbers while filtering out the even integers, we can cleverly use COALESCE to fill in gaps with zeroes where no data exists.
00:17:07.040
Now, let’s talk about database views, which are stored queries that ActiveRecord can interpret as tables. Views can be useful when you have frequent queries that consist of complex joins or subqueries.
00:18:01.160
Introducing views into your database can streamline your data retrieval process and provide a clean way to handle complex queries without cluttering your application’s general logic.
00:18:54.960
Using tools like the Scenic gem, you can efficiently manage views in Rails. Scenic allows you to create models and migrations for views and materialized views, promoting a cleaner separation of concerns.
00:19:50.480
This not only simplifies future edits to the underlying SQL but also enhances maintainability as you version those SQL scripts separately from your Rails application’s codebase.
00:20:31.760
Additionally, we can create functions in PostgreSQL, making it easy to encapsulate reusable logic for complex queries. This might include calculations such as mortgage payments that would benefit from being defined and maintained directly in the database.
00:21:19.920
PostgreSQL allows for functional programming with SQL, letting you write functions to shrink complex calculations into manageable, straightforward queries. Migrations in Rails can help keep those functions in sync as you develop your application.
00:22:53.920
Median calculations, despite being absent in PostgreSQL by default, can be done through custom functions written in SQL. It's essential to access such resources when performing statistical analysis.
00:23:40.080
To expand your database's capabilities utilizing schemas effectively means you can organize database objects by grouping related components logically and safely within your database.
00:24:32.960
Utilizing schemas to manage ETL processes allows for cleaner data transformation as well, making clear distinctions between original data and processed data.
00:25:28.160
Backup and restore strategies can also benefit from schemas, providing a simpler way to handle merges or report generation without affecting your main operations.
00:26:25.760
Extensions, such as PostGIS, further enhance PostgreSQL functionality for geospatial data analysis. This extension allows you to leverage GIS capabilities directly within your database.
00:27:07.920
By using the active record PostGIS adapter, developers can easily integrate geospatial queries into their Rails applications, allowing for powerful location-based data manipulation.
00:28:08.160
In working with geospatial data, consider using functions to calculate distances correctly. For instance, comparing airports involves understanding latitude and longitude appropriately to avoid confusion in data.
00:29:16.560
The calculations and queries allow for engaging, interactive locations to be incorporated in your application, whether for travel planning, shipping logistics, or other geographical analyses.
00:30:04.920
Finally, explore using foreign data wrappers, allowing access to external databases within your PostgreSQL instance. This feature enables easier integration of disparate data sources.
00:31:02.080
While useful, coupling different database systems can come with challenges regarding compatibility and management, so it's typically advisable to keep your Rails application managing those interactions.
00:31:46.960
As a bonus tip, don’t forget you can personalize your psql command line experience with a configuration file. It allows customization of the command line structure and visuals for an improved user interface while working.
00:32:31.960
Finally, I've authored a book on data visualization, which includes sections covering maps and geospatial analytics. If you’re interested, I have some copies to give away.
00:33:28.040
Feel free to reach out with any questions, as I enjoy discussing data. Thank you all for having me here; it’s my first time in Australia, and I’ve really enjoyed it!