Chris Hasinski

Fantastic Databases And Where To Find Them

Fantastic Databases And Where To Find Them

by Chris Hasinski

In the presentation titled "Fantastic Databases And Where To Find Them" given by Chris Hasinski at Euruko 2023, the speaker explores the often misunderstood and underappreciated realm of databases, likening them to mythical creatures. He begins with a historical reference to 'heavy dragons' marked on old maps to represent unknown territories, illustrating how databases are similarly perceived in software development as daunting yet crucial components.

Key Points Discussed:

  • Understanding Databases: The dialogue often shifts when discussing databases, with many developers uncertain about their database systems, highlighting the importance of this component in applications.
  • Typical Rails Application Architecture: A Rails application can be divided into three parts: the user-facing front-end, the Ruby code, and the database, the latter being pivotal for data manipulation and storage.
  • Database as Mythical Creatures: Much like the 'heavy dragons' of old maps, databases can seem formidable but understanding them can demystify their role.
  • SQLite: An overview of SQLite is provided, emphasizing its speed, lightweight nature, and ease of use for small applications. It's ideal for quick setups and testing environments, though it requires consideration of its single-threaded write constraints.
  • PostgreSQL: PostgreSQL is presented as a versatile solution capable of handling various tasks, including acting as a cache and a message queue. The potential for performance improvements using PostgreSQL's features, such as unlocked tables for CI testing, is discussed.
  • Advanced Query Features: The discussion touches on window functions, cumulative sums, and effective indexing strategies within PostgreSQL, which can enhance query performance and efficiency.
  • Exotic Databases: The presentation mentions alternatives like CockroachDB and ClickHouse, noting their special capabilities and use cases, helping to broaden the audience’s understanding of available database technologies.

In conclusion, Hasinski encourages attendees to delve deeper into the fascinating world of databases, emphasizing the need to explore their features and functions to leverage them effectively in development. He also underscores the importance of understanding performance implications and optimizing queries to ensure efficient database management, which is essential in today’s data-driven applications.

00:00:10.559 All right, so my name is Chris, and today I'm going to talk to you about mythical creatures.
00:00:15.480 Let me start with heavy dragons. The term 'heavy dragons' is used to mark something on a map when you don't know what it is. It's an unexplored part of the map that you don't know anything about. This concept actually appears in very old maps where the phrase used was 'heavy lions' because they thought lions were mythical creatures. Now we know that's not true, but dragons still have that mythical quality.
00:00:21.720 This phrase appears on one old item, which is the Hum Linx globe. It has this inscription in Latin: 'Hirison', which means heavy dragons. I want you to keep this in mind for a moment when I talk about another scary thing, which is layoffs. The big topic in the door industry this year and last year has been layoffs. However, with every end comes a new beginning—essentially new recruitments. And with every recruitment, we get the same discussions and the same questions.
00:00:51.840 We usually ask questions to show that we care about the projects, such as: 'Which Rails version are you using right now?' If it's version seven, that's great! If it's six, that's okay; if it's five, it's kind of scary. But if it's four, you might want to ask for some extra money. If you are front-end inclined, you might ask to use React or perhaps TypeScript, which was all the trend until yesterday.
00:01:19.620 Other common topics include how you manage state on the front end, or if front-end isn't your thing, questions might focus on process-related topics like continuous integration, continuous deployment, or the very trendy topic of testing. Maybe they have mutation testing set up on something. These all sound good, but the problem begins when you ask about the database. Most conversations will shift when you say, 'Tell me about the database.' The other person might reply, 'I think it's PostgreSQL.' Then you ask, 'Which version?' and the answer is often uncertain.
00:01:48.180 Most of the real conversations we have revolve around data—specifically, storage and manipulation. That is precisely what the database does. If you don't believe me, let me show you a very simplified schema of how a typical Rails application functions. You can divide it into three parts: the part of the application that the user cares about, which is front-end; the Ruby part, which is where we spend our time; and the part that does most of the heavy lifting—the database itself.
00:02:30.780 Still don't believe me? There’s this concept called the test pyramid, which illustrates the ideal testing distribution in your application. When you Google 'test pyramid,' it usually looks something like this: a significant number of unit tests at the bottom, sparse integration tests that check interactions, and a sprinkle of end-to-end tests that ensure the business logic works. However, this is not how the typical split in a Rails application looks.
00:03:05.160 The typical split instead often resembles something more like this: we might implement some unit tests, but a lot of the testing ends up being integration testing. The integration testing primarily interacts with—you guessed it—the database. Rails has a peculiar relationship with databases. Many front-end applications become practically useless without a database. These applications exist as a point of interaction with databases, yet we often treat databases as mythical creatures, like saying 'Here be dragons.'
00:03:45.420 I'd like to talk about how to handle these mythical creatures and make them feel more familiar. First, handling mythical creatures requires acknowledging that databases can be very quirky. They are neither object-oriented nor purely functional; most databases utilize declarative languages. They typically facilitate connections for one user, who may have access to all tables, but they possess a redeeming quality: they are extremely fast at processing data.
00:04:02.099 Let’s start with a small one: SQLite. SQLite has a specific use case that we're all familiar with—I built this site on a whim and want to showcase it to the world without the hassle of setting up a database server. SQLite is actually wicked fast and robust. In its own documentation, it touts its capabilities and mentions it’s used by organizations handling millions of queries every day. It manages to do all this on minimal resources and just an ordinary VM shared with a handful of other applications on old hardware.
00:04:49.680 SQLite only performs single-threaded writes, but its reading speed is incredible. There's a public domain code, not even GPL, that's a level above that. Backing it up is a breeze—you can simply copy the file. For developers, the benefits lie in its ability to handle substantial traffic on free-tier hosting and embed easily into various applications. For example, I can create an Android library or integrate SQLite into a gem effortlessly. Additionally, because you can easily back up data, it becomes very useful for testing. If you can duplicate the data multiple times quickly, it provides the capability for parallel testing environments, which is pretty awesome.
00:05:41.040 While SQLite has immense strengths, there are some downsides. You might want to consider using processes instead of threads for your servers, as SQLite uses a lock when querying due to the single writer constraint. Retries may be necessary for long-running insert queries, and full-text search capabilities, a popular feature, is not inherently built-in; there are ways around this, but if you're heading down that path, you should also consider the elephant in the room: PostgreSQL.
00:06:14.040 PostgreSQL is super versatile and one of the best open-source projects out there. It can basically function as any part of your tech stack. There's an article by Stefan Schmidt on a blog called 'Amazing CTO' that emphasizes how PostgreSQL can handle various functions including caching, acting as a message queue, or even generating GraphQL responses. You can replace many components of your stack with PostgreSQL.
00:06:37.260 While you don’t generally want to do everything using PostgreSQL, it does offer interesting features, like the caching feature. This feature allows you to create an unlocked table, meaning it won't have a write-ahead log, allowing most data to remain in memory. This is advantageous, especially in Rails 6, where there are helpers within migrations. Now, you may ask, 'But I already have Redis, why do I need this?' Let's consider a use case.
00:07:13.920 Attach this to your RSpec configuration: you can create unlocked tables instead of real ones during CI. This approach can dramatically improve write speed—on less optimized projects, performance times improved from three minutes down to mere seconds. It’s fantastic! There are definitely more hidden features in PostgreSQL worth exploring, but let's check them out.
00:07:57.000 Looking through the PostgreSQL documentation, we’ll focus on advanced features. One of the more exciting features is using window functions, which allows for parallel calculations in queries. Window functions, though frequently misunderstood, are incredibly powerful. For example, instead of using two separate queries to compare an employee's salary with the departmental average, you can handle it within a single query using a window function.
00:08:34.680 You can also do something similar in Ruby without needing to dive deep into SQL syntax, making it easier for other Ruby developers to read and understand. There's also the ability to find the cheapest product per category, which is valuable for upselling in e-commerce. This can be accomplished using pattern matching, similarly to Ruby or Elixir, where you assign a row number to each category and order them by price, making it crystal clear when matching products.
00:09:07.560 Furthermore, the cumulative sum is another frequently queried function in SQL. PostgreSQL allows for composed queries, meaning you can express aggregate functions in a very readable manner. This function is truly useful when you need running totals or cumulative figures. It gives you aggregate data succinctly without overly complicated SQL, making operations faster and easier to manage.
00:09:48.180 Understanding assortment of query planning is also essential for efficiency. PostgreSQL keeps statistics on data, allowing it to estimate query rows effectively. There’s an explain helper in Active Record; however, it often lacks certain flags for detailed insights, necessitating you to revert to SQL for analysis. For more complex queries, joining data can be executed in a streamlined manner, combined with filtering for optimal performance.
00:10:26.160 When running an explain command without any options, it provides an output of strategies indicating how PostgreSQL interprets your query. These strategies, denoted as arrows, are crucial as each arrow represents a method the database will use to obtain the final result. Understanding these strategies can significantly improve how you structure your queries for optimally obtaining data.
00:10:59.640 You also have the ability to perform analyses alongside executing queries. By using the analyze command, you can assess your query's performance—just ensure to perform it within a transaction to avoid unintended data modifications. The output will yield useful information about the timing of execution, including planning time, execution time, and the estimated number of rows that were output during the querying process.
00:12:01.380 In practice, the discrepancy between estimated time and actual time can sometimes lead to unclear results. For example, sequential scans need time to gather data from disks, leading to variable performances. Thus, leveraging counts and estimated calculations becomes important, especially in cases where accurate numbers are required without incurring high computational costs.
00:12:42.720 Another dimension to look at within PostgreSQL is index creation. Indexing in PostgreSQL can only be done once per subquery, and they’re mainly aimed at optimization, either for filtering, sorting, or both. To effectively utilize indexing, it’s beneficial to identify existing or missing indices within your database. You can also look into the stats after running a vacuum analyze to determine which indices are no longer in use to maintain efficiency and prevent bloating.
00:13:36.279 PostgreSQL hosting options can also bring exciting prospects when engaging more advanced queries in your database. Analyzing the performance of your queries can help identify potential bottlenecks while offering valuable insights into how your indices are contributing to overall system performance. You can track performance over time, enabling a better understanding of the overall health and efficiency of your database.
00:14:29.400 As we explore performance enhancements, we must also keep an eye on exotic databases. One fun example is CockroachDB, which is designed for automatic sharding in PostgreSQL-like environments without extensive customization. CockroachDB aims to ensure data is evenly distributed across multiple nodes, therefore mitigating risks like data loss during outages or natural disasters.
00:15:44.760 Then, we have MySQL, which is more commonly used, especially among PHP developers. Once considered a joke, its capabilities have evolved considerably—a serious alternative to PostgreSQL worth looking into. Finally, there are very exotic databases like ClickHouse, which was developed in Russia specifically for high-performance data analysis without compromising on speed or efficiency.
00:16:22.740 Notably, Cloudflare processes six million requests per second through ClickHouse, showcasing its potential for handling hefty workloads. If you'd like to dive deeper into advanced database architectures, you might want to consider NoSQL and document databases, though such topics would indeed merit their own extensive discussions.
00:17:01.920 In conclusion, reach out to the fascinating world of databases that are out there. Explore the various features and functionalities that each specialty database has to offer. Thank you very much for your time!