Talks

Postgres Demystified

by Craig Kerstiens

The video titled 'Postgres Demystified,' presented by Craig Kerstiens at the MountainWest RubyConf 2013, explores the multifaceted capabilities of Postgres as a powerful and flexible database solution. Kerstiens begins by introducing the audience to Postgres, emphasizing its robust nature as a community-driven project that has evolved significantly since its inception in 1995. He outlines the agenda, which covers a brief history of Postgres, performance optimization, querying techniques, and various features that enhance its development capabilities. Key points discussed include:

  • Postgres Origins and Community Ownership: Postgres began in 1995 from the Ingress database, and its community-driven nature ensures longevity and innovation independent of commercial interests.

  • Multiversion Concurrency Control (MVCC): This key feature allows users to read data simultaneously while transactions are processed, enhancing performance and user experience.

  • Rich Data Types: Postgres supports a variety of data types including arrays and range types, enabling efficient data manipulation and integrity.

  • Useful Extensions: Kerstiens highlights extensions such as CITEXT for case-insensitive text, Hstore for key-value functionality, and JSON data types for semi-structured data management.

  • Performance Insights: He explains the importance of sequential scans versus index scans and discusses tools like the EXPLAIN command to optimize query performance. The use of appropriate indexes is stressed for ensuring efficient database operations.

  • Advanced Querying Features: He introduces window functions and Common Table Expressions (CTEs) that simplify complex queries, and discusses the HyperLogLog extension for managing large sets effectively.

  • Recent Enhancements: The video covers how Postgres continuously evolves with new features and extensions that support big data and complex application needs, reflecting its growing community and strong development momentum.

In conclusion, Kerstiens positions Postgres as an ideal choice for modern applications due to its extensive features, performance optimization techniques, and strong community support. The audience is encouraged to consider Postgres for their data management due to its capabilities and ongoing development. Kerstiens’s insights illuminate why Postgres stands out in the landscape of database solutions today.

00:00:20.480 All right, hi everyone. My name is Craig.
00:00:21.920 I work at Heroku, specifically on the Heroku Postgres team.
00:00:24.119 Before I get started, I have a quick question: show of hands, how many here use Postgres already?
00:00:30.160 Awesome! And who does not use it at all?
00:00:31.840 Okay, okay! Hopefully, I can convince a few people by the time we're done.
00:00:38.719 Before diving in, here are some quick PSAs.
00:00:43.239 If you're on a Mac, which most people are in this room, check out Postgres.app if you're not already using it.
00:00:47.760 It's a more native way to use Postgres; it just works seamlessly like most things on a Mac.
00:00:55.160 Another resource is Postgres Weekly, a good weekly newsletter if you’re into Postgres.
00:01:02.719 Lastly, if you're already running Postgres and it's exposed to the internet and you weren't busy upgrading your database yesterday, please go do that now. It is the worst vulnerability they've had in six or seven years, including an escalation of privileges, a remote code execution, and a permanent denial of service.
00:01:11.240 This could be quite serious if your database is publicly facing, so go update if you haven't already.
00:01:17.040 So, the agenda today includes a brief history of Postgres and the project itself. We'll also discuss developing with Postgres, understanding performance—which is usually a black box for most application developers—and some querying techniques.
00:01:36.640 First off, it's Postgres or PostgresQL, not P-O-S-T-R or P-O-S-T-R QL. This is Tom Lane, who created or collaborated on the development of Tiff, JPEG, and PNG formats. He is also the primary contributor to Postgres.
00:02:02.240 Here's a mail from about eight years ago where he mentioned that renaming the project was probably the worst decision we made. Postgres has been around for a long time, with most considering its origin to be around 1995, coming out of Ingress, which shares roots with Oracle.
00:02:13.319 I'm not sure if that association is good or bad, but it's very different in that it's completely community-owned and driven. No one person can ever own Postgres, which ensures its longevity. If a company goes bankrupt, it cannot be acquired; it will always be community-owned and driven.
00:02:30.840 This aspect contributes significantly to its momentum. A key concept in Postgres is MVCC, or Multiversion Concurrency Control. It allows other users to read data while a transaction is being processed without waiting for locks, which is beneficial for performance and user experience.
00:03:01.600 A quote from a coworker describes Postgres as the 'Emacs of databases.' He suggests that it acts as a platform to build upon, giving users the ability to add extensions and functionalities beyond just storing text and numbers.
00:03:34.200 Now, developing with Postgres can be greatly enhanced with tools such as psql, which is your friend. If you don't use it, you should. Psql allows you to describe tables, query easily, and it can open your default editor, making it a robust query editor.
00:03:59.879 Postgres has a unique approach to data types, offering more than just the obvious numeric types. For example, it includes support for things like IP and MAC addresses, shapes for basic geospatial calculations, and arrays.
00:04:56.600 Using arrays inside your database can eliminate the need to join against other tables for simple use cases like tagging blog posts. Inserting and querying arrays is straightforward, allowing for easy data manipulation.
00:05:22.480 Range types are another newer feature introduced in Postgres 9.2. This is particularly useful for applications that require a classification based on a start and end, such as calendar events or appointment slots.
00:06:19.680 By adding exclusion constraints on these types, Postgres can automatically enforce data integrity, preventing issues such as overbooking in a classroom or conference room.
00:06:53.160 Beyond basic data types, Postgres is a powerful platform with a range of extensions available, significantly expanding its functionality. Extensions introduced in version 9.4 enabled applications to leverage additional capabilities easily.
00:07:38.440 For example, the case-insensitive text type (CITEXT) can help users migrating from MySQL. Users often appreciate discovering tools like Hstore, which enables users to integrate key-value store functionality within their databases, adding a flexible, NoSQL-like capability.
00:07:59.399 Postgres 9.2 also introduced a native JSON data type, allowing users to validate data formats directly in the database, making it easier to manage semi-structured data.
00:08:50.240 For those familiar with JavaScript, the embedded V8 engine opens up exciting possibilities for manipulating data within SQL queries, but caution is advised against running arbitrary scripts.
00:09:08.680 Postgres supports full-text search out-of-the-box, which is highly useful for applications that need robust text search capabilities. There are various extensions available to extend full-text features, including those focusing on geospatial data.
00:09:59.800 Now, let's discuss performance. How many of you are familiar with sequential scans and index scans? Very few. The general idea is that sequential scans are often assumed bad, but they are not always detrimental depending on the data workload.
00:10:43.680 Postgres has a variety of index types available, each tailored for different use cases, such as B-tree, GIST, KNN, and SP-GIST. Understanding the use cases for these indexes is crucial for optimizing database performance.
00:11:00.839 To grasp the performance of your queries, utilizing the EXPLAIN command is extremely helpful. It allows developers to see how queries will be executed, showing estimated times and the expected number of rows returned.
00:11:45.240 By analyzing the EXPLAIN output and using the EXPLAIN ANALYZE command, you can understand how your query performed in real-time versus the prediction it made, allowing for better adjustments and optimizations.
00:12:38.640 When designing a web application, common query times should ideally be around 10 milliseconds, with rare queries remaining below 100 milliseconds for an effective user experience.
00:13:00.240 Adding indexes to critical fields like salary will help optimize queries significantly. For example, using indexed scans improves efficiency from sequential scans dramatically, demonstrating the importance of strategically placing indexes.
00:14:12.439 Creating indexes concurrently is recommended to maintain table availability during index creation. Conditional indexes also come in handy when only a subset of your data is of importance.
00:15:01.920 While wildcard queries can be detrimental, full-text search queries have some distinct optimizing features in Postgres. Extensions like PG RoRocks and PG Stat Statement can be powerful tools for monitoring and maintaining performance.
00:15:54.760 As we dive into querying, window functions are a powerful feature allowing users to aggregate and filter results on specific subsets of data efficiently.
00:16:57.120 You'll often find that when performing operations to retrieve data, tools like CTEs (Common Table Expressions) enhance readability and usability of SQL queries.
00:17:51.280 For handling data at scale, Postgres's HyperLogLog extension enables efficient storage of unique values while maintaining a small memory footprint. This functionality combined with Postgres’s diverse set of extensions provides an avenue for tackling big data challenges.
00:18:37.679 Other extensions such as LISTEN and NOTIFY provide a publish/subscribe system in your database, allowing for efficient communication between applications and the database.
00:19:20.479 Reflecting on the evolution of Postgres, we see that the community is growing along with strong momentum. Features are continuously being added, enhancing extensibility, flexibility, and support for JSON and Big Data.
00:20:04.359 The foundation is being laid for future innovations, making Postgres a worthy database choice for numerous applications. Thank you for listening!