Postgres 10, Performance, and You

by Gabe Enslein

The video titled "Postgres 10, Performance, and You" presented by Gabe Enslein at RailsConf 2018 focuses on the performance enhancements introduced in Postgres 10, highlighting new features that significantly improve database efficiency. Enslein begins with a brief introduction to Postgres, detailing its widespread popularity due to community-driven features like native JSON support, data integrity, and numerous extensions.

Key points discussed in the presentation include:

- Native Table Partitioning: Postgres 10 introduced efficient table partitioning to minimize bloat and improve query performance. This allows bulk operations to occur on partitions without overhead, enhancing performance and reducing database load.

- Hash Indexes: The new full support for hash indexes ensures reliability in indexing JSON data and enhances concurrency, resolving previous issues with index corruption and replication failures.

- Full-text Index Searching: A significant feature added is the ability to perform efficient searches directly on JSON blobs, making it easier to manage unstructured data.

- Parallel Queries: Enhancements in parallel processing allow better utilization of multiple CPU processors, which includes parallel bitmap heap scans, gather merges, and merge joins. These improvements reduce the need for sequential scans and optimize overall query performance.

Enslin illustrates these points through practical examples, such as managing a social interest platform and backend data structures that require handling large datasets efficiently. He notes that querying for quotes and user-generated content is more efficient due to Postgres 10's new capabilities.

The conclusion underscores the importance of adopting Postgres 10 for improved performance, especially for applications handling terabytes of data, and the value of leveraging the community's tools and Heroku's enhancements to manage and optimize Postgres databases effectively.

Overall, the talk conveys the message that the advancements in Postgres 10 can enable developers to build robust applications that handle complex queries and large datasets more efficiently.

Main Takeaways:

The shift to native features like table partitioning and enhanced indexing is crucial for managing large data efficiently.
Using Postgres 10's parallel processing capabilities is vital for optimizing query performance.
Leveraging community tools and extensions can significantly aid in maximizing Postgres’s performance capabilities.

00:00:11 Hello, everyone. I'm Gabe, and I'll be giving the talk on Postgres 10, performance, and you.

00:00:17 To kick things off, I'd like to ask how many of you currently use Postgres? Can I see a show of hands? A whole lot of folks! I wasn't sure if we had any MySQL users or anyone else. We're all gathered here for similar reasons.

00:00:34 We recognize that Postgres has a lot of exceptional features, including native JSON support, native hash support, and strong community standards for data integrity. It excels in handling large and complex subqueries and offers numerous extensions, maintained by the community with built-in support. Postgres 10 has introduced even more improvements, making it a more performant database.

00:01:12 Today, I'll discuss several significant topics, including native table partitioning, hash index resiliency, and full-text index searching on JSON and JSON blobs, which is now natively supported. I will also delve into parallel queries, their impact on large joins, indexes, and the accuracy of the query analyzer.

00:01:46 For those of you who are avid Postgres users, you'll find some terminology familiar. However, I won't go into detail about certain topics like bloat, vacuuming, sequential scans, or replication, which are changes that occurred in Postgres 10 but aren't the focus of this talk.

00:02:05 A little about me: I joined Heroku a little over a year and a half ago and have primarily worked on Sinatra and Ruby on Rails applications for the past four years.

00:02:18 Currently, my focus is on Postgres and high availability modes, dealing with large data growth, often in the terabytes.

00:02:37 I've worked with various data service engines that I won't list due to time constraints. On the side, I'm focusing on self-improvement and life coaching. I'm also involved in a super-secret project for a social interest platform.

00:02:57 I'm interested in providing a positive online environment, unlike some social networks that can be negative. My platform will include features like profiles curated content, which ranges from inspirational quotes to movies and TV shows.

00:03:14 Now, let's jump right into the content. First up, I want to discuss inspirational quotes. One quote that resonated with me during my travels to New Zealand is: "Happiness is not a destination; it's a way of life." I'm hopeful to see many variations of this quote populate the platform.

00:03:46 Even though quotes are just small strings of text, databases at any scale have significant RAM and plenty of disk space. However, most quotes are rarely updated or revisited after some time, and people typically desire the most recent ones, leading to challenges.

00:04:18 It's vital to manage the growth of one particular table quickly with efficient data handling. Postgres attempts to load as much data into memory as possible, and when overflow occurs, it starts writing to disk while handling data in a disk-efficient manner.

00:04:44 Temp disk space is not free. There are overhead costs associated with storing data on disk, and handling records in large quantities can significantly impact performance, especially with concurrent access issues for large datasets.

00:05:09 One crucial issue that users might encounter is the possibility of losing the ability to connect to the database entirely when running too many heavy queries simultaneously.

00:05:29 Who here likes fast queries? I hope everyone does! Fast queries and database uptime are essential.

00:05:41 One significant improvement in Postgres 10 is native table partitioning. While there have been extensions supporting this in the community, let's examine the advantages of native partitioning.

00:06:02 Native partitioning can avoid the need for vacuuming and reducing bloat altogether. Bulk loads, deletions, and offloads can happen with minimal overhead on your Postgres internals.

00:06:21 Access to your database connections will have much less cost due to less stress on any one singular table, optimizing query performance.

00:06:39 With native partitioning, you only interact with the sections of partitions that matter, while seldom-used data can reside on slower disk storage or even cold storage, allowing for noticeable performance improvements.

00:07:09 Moreover, previous data access methods like random index scans and access reads would sometimes be slower than sequential scans. However, native partitioning optimizes this.

00:07:24 Although pg_partman is still supported, it lacks some of the newer parallel support features I will address later.

00:07:36 It's important to note that native partitioning is not natively supported in Active Record. While there are several gems attempting to streamline this, they are not well-supported currently. One advantage is that tables can be created from your partitions, such as monthly or weekly tables, allowing for easier data segmentation.

00:08:12 Creating partitions from your partitions is possible, although that might be overkill. Next, I want to talk about customer addresses.

00:08:29 As we pivot to providing home delivery services on our platform, we need to collect a myriad of customer addresses accessible to our mobile app.

00:08:49 Address management is complicated as formats vary across North America, and regulations differ by state, impacting where delivery drivers can stop. We also need to gather unstructured data to ensure proper delivery.

00:09:14 While Postgres can support this with its key-value store feature, implementing when many records are involved can be slow. We might need to add an index for improved lookups.

00:09:33 There's a consensus that many people are wary of hash indexes due to their lack of support in Postgres 9.x, especially regarding replication and index corruption.

00:09:58 However, in Postgres 10, HStore is treated as a first-class citizen, making it crash-resistant, improving replication and query speed. We utilize HStore extensively at Heroku, particularly for maintaining records concurrently.

00:10:34 Now, turning back to unstructured data, I'm interested in incorporating more curated content on my platform, like movies, TV shows, and favorite characters.

00:11:09 Each show may have hundreds or thousands of characters. This diversity necessitates a flexible data storage method.

00:11:25 So, leveraging JSON blobs in Postgres allows us to store dynamic character data efficiently, making it easy to parse updates without complex relationships.

00:11:52 However, storing data in this manner opens us up to potential issues as each row is dynamically sized, leading to challenges in tracking.

00:12:10 The dynamic nature of JSON records can complicate the tracking of how data changes over time, which raises questions about data organization.

00:12:26 Using full-text index searching in Postgres 10 can help manage this unstructured data effectively, allowing for greater flexibility in how we handle these dynamic queries.

00:12:47 The ability to index various views on the data enhances our search capabilities and offers a way to target specific groups on the platform.

00:13:13 For example, we can tailor our searches to engage particular user interests, such as Marvel fans or those who can't get enough of certain characters.

00:13:32 Quotes were a significant feature of our platform, particularly on Throwback Thursdays, where users enjoy viewing past quotes.

00:13:52 To enhance this experience, our streams should include all the posters they like, not just their own posts, which considerably increases the number of queries.

00:14:05 Postgres 10 has implemented several optimizations to improve the speed of these types of queries. Today, I'll be covering three prominent optimizations:

00:14:24 parallel bitmap heap scans, gather merges, and merge joins.

00:14:50 Parallel searching and scanning were introduced in version 9.6 but had limitations primarily focusing on sequential scans. Postgres 10 changed this by enhancing parallelization for every level of table schemes, including indexes.

00:15:16 This means that Postgres can now perform parallel searches through indexes first, significantly reducing the amount of data loaded upfront and improving caching efficiency.

00:15:40 The efficient construction of index queries minimizes costly disk reads, leading to faster performance.

00:15:59 The gatherer mechanism introduced enables parallel query construction and result retrieval. It allows results to be returned in a more context-aware fashion, rather than just arbitrariness.

00:16:17 In Postgres 10, the optimization ensures the context of where data is found is maintained, leading to a more natural ordering of operations and improved runtime efficiency.

00:16:42 This context-driven retrieval significantly enhances retrieval times for large datasets, making operations much more intuitive.

00:17:00 Merge joins allow consistent parallel operations across various merges at all levels, improving the efficiency of requests as they traverse multiple layers of data.

00:17:24 In previous versions, this was constrained by indexed subquery loops or hashing. Now, parallel merge join enhances these operations, reducing overhead further as the requests scale.

00:17:56 Another example involves the tracking system we have at Heroku. We document internal and external issues related to our repositories as they come up, mapping them effectively to a repo ID.

00:18:23 Over the years, we've improved query performance significantly. The query analyzer has adjusted its processes with Postgres 10 to leverage parallel indexes, even for smaller datasets.

00:18:42 Despite being a smaller table, the performance gains displayed by the query analysis are enormous. This improved performance isn't just for larger datasets.

00:19:05 As we conclude, there's a couple of last-minute considerations with Postgres 10 related to Rails. I want to mention that Rails 4 unfortunately does not support Postgres 10.

00:19:29 If you're running Rails 4, you'll need to upgrade to Rails 5 to take advantage of Postgres 10 due to changes in connection protocols.

00:19:51 In addition, partitioning isn't natively supported by all Ruby on Rails ORMs. This includes specific index types like hash and full-text query searching capabilities available in Postgres.

00:20:17 Despite these limitations, we at Heroku strive to provide tools that facilitate smooth transitions and improvements with Postgres 10.

00:20:39 With the Postgres 10 release, we are adding numerous extensions, managed credentials for better permission management, and improved CLI analytics.

00:21:06 We've also introduced new hardware plans that optimize performance at the same price points. This includes doubling CPU cores on tiers and increasing disk sizes.

00:21:27 In closing, leveraging the Heroku CLI tools, particularly PG Diagnose, can greatly assist in identifying issues related to query performance, sequential scans, and caching efficiency.

00:21:56 Moreover, PG Bouncer provides effective connection pooling, an essential feature for managing load times and ensuring database stability.

00:22:24 Thank you for your time and attention. I hope you found this talk informative and useful. Feel free to visit our booth; I'm the last presenter for Heroku today. It’s been a pleasure to present to you all at RailsConf.