00:00:10.559
All right, so my name is Chris, and today I'm going to talk to you about mythical creatures.
00:00:15.480
Let me start with heavy dragons. The term 'heavy dragons' is used to mark something on a map when you don't know what it is. It's an unexplored part of the map that you don't know anything about. This concept actually appears in very old maps where the phrase used was 'heavy lions' because they thought lions were mythical creatures. Now we know that's not true, but dragons still have that mythical quality.
00:00:21.720
This phrase appears on one old item, which is the Hum Linx globe. It has this inscription in Latin: 'Hirison', which means heavy dragons. I want you to keep this in mind for a moment when I talk about another scary thing, which is layoffs. The big topic in the door industry this year and last year has been layoffs. However, with every end comes a new beginning—essentially new recruitments. And with every recruitment, we get the same discussions and the same questions.
00:00:51.840
We usually ask questions to show that we care about the projects, such as: 'Which Rails version are you using right now?' If it's version seven, that's great! If it's six, that's okay; if it's five, it's kind of scary. But if it's four, you might want to ask for some extra money. If you are front-end inclined, you might ask to use React or perhaps TypeScript, which was all the trend until yesterday.
00:01:19.620
Other common topics include how you manage state on the front end, or if front-end isn't your thing, questions might focus on process-related topics like continuous integration, continuous deployment, or the very trendy topic of testing. Maybe they have mutation testing set up on something. These all sound good, but the problem begins when you ask about the database. Most conversations will shift when you say, 'Tell me about the database.' The other person might reply, 'I think it's PostgreSQL.' Then you ask, 'Which version?' and the answer is often uncertain.
00:01:48.180
Most of the real conversations we have revolve around data—specifically, storage and manipulation. That is precisely what the database does. If you don't believe me, let me show you a very simplified schema of how a typical Rails application functions. You can divide it into three parts: the part of the application that the user cares about, which is front-end; the Ruby part, which is where we spend our time; and the part that does most of the heavy lifting—the database itself.
00:02:30.780
Still don't believe me? There’s this concept called the test pyramid, which illustrates the ideal testing distribution in your application. When you Google 'test pyramid,' it usually looks something like this: a significant number of unit tests at the bottom, sparse integration tests that check interactions, and a sprinkle of end-to-end tests that ensure the business logic works. However, this is not how the typical split in a Rails application looks.
00:03:05.160
The typical split instead often resembles something more like this: we might implement some unit tests, but a lot of the testing ends up being integration testing. The integration testing primarily interacts with—you guessed it—the database. Rails has a peculiar relationship with databases. Many front-end applications become practically useless without a database. These applications exist as a point of interaction with databases, yet we often treat databases as mythical creatures, like saying 'Here be dragons.'
00:03:45.420
I'd like to talk about how to handle these mythical creatures and make them feel more familiar. First, handling mythical creatures requires acknowledging that databases can be very quirky. They are neither object-oriented nor purely functional; most databases utilize declarative languages. They typically facilitate connections for one user, who may have access to all tables, but they possess a redeeming quality: they are extremely fast at processing data.
00:04:02.099
Let’s start with a small one: SQLite. SQLite has a specific use case that we're all familiar with—I built this site on a whim and want to showcase it to the world without the hassle of setting up a database server. SQLite is actually wicked fast and robust. In its own documentation, it touts its capabilities and mentions it’s used by organizations handling millions of queries every day. It manages to do all this on minimal resources and just an ordinary VM shared with a handful of other applications on old hardware.
00:04:49.680
SQLite only performs single-threaded writes, but its reading speed is incredible. There's a public domain code, not even GPL, that's a level above that. Backing it up is a breeze—you can simply copy the file. For developers, the benefits lie in its ability to handle substantial traffic on free-tier hosting and embed easily into various applications. For example, I can create an Android library or integrate SQLite into a gem effortlessly. Additionally, because you can easily back up data, it becomes very useful for testing. If you can duplicate the data multiple times quickly, it provides the capability for parallel testing environments, which is pretty awesome.
00:05:41.040
While SQLite has immense strengths, there are some downsides. You might want to consider using processes instead of threads for your servers, as SQLite uses a lock when querying due to the single writer constraint. Retries may be necessary for long-running insert queries, and full-text search capabilities, a popular feature, is not inherently built-in; there are ways around this, but if you're heading down that path, you should also consider the elephant in the room: PostgreSQL.
00:06:14.040
PostgreSQL is super versatile and one of the best open-source projects out there. It can basically function as any part of your tech stack. There's an article by Stefan Schmidt on a blog called 'Amazing CTO' that emphasizes how PostgreSQL can handle various functions including caching, acting as a message queue, or even generating GraphQL responses. You can replace many components of your stack with PostgreSQL.
00:06:37.260
While you don’t generally want to do everything using PostgreSQL, it does offer interesting features, like the caching feature. This feature allows you to create an unlocked table, meaning it won't have a write-ahead log, allowing most data to remain in memory. This is advantageous, especially in Rails 6, where there are helpers within migrations. Now, you may ask, 'But I already have Redis, why do I need this?' Let's consider a use case.
00:07:13.920
Attach this to your RSpec configuration: you can create unlocked tables instead of real ones during CI. This approach can dramatically improve write speed—on less optimized projects, performance times improved from three minutes down to mere seconds. It’s fantastic! There are definitely more hidden features in PostgreSQL worth exploring, but let's check them out.
00:07:57.000
Looking through the PostgreSQL documentation, we’ll focus on advanced features. One of the more exciting features is using window functions, which allows for parallel calculations in queries. Window functions, though frequently misunderstood, are incredibly powerful. For example, instead of using two separate queries to compare an employee's salary with the departmental average, you can handle it within a single query using a window function.
00:08:34.680
You can also do something similar in Ruby without needing to dive deep into SQL syntax, making it easier for other Ruby developers to read and understand. There's also the ability to find the cheapest product per category, which is valuable for upselling in e-commerce. This can be accomplished using pattern matching, similarly to Ruby or Elixir, where you assign a row number to each category and order them by price, making it crystal clear when matching products.
00:09:07.560
Furthermore, the cumulative sum is another frequently queried function in SQL. PostgreSQL allows for composed queries, meaning you can express aggregate functions in a very readable manner. This function is truly useful when you need running totals or cumulative figures. It gives you aggregate data succinctly without overly complicated SQL, making operations faster and easier to manage.
00:09:48.180
Understanding assortment of query planning is also essential for efficiency. PostgreSQL keeps statistics on data, allowing it to estimate query rows effectively. There’s an explain helper in Active Record; however, it often lacks certain flags for detailed insights, necessitating you to revert to SQL for analysis. For more complex queries, joining data can be executed in a streamlined manner, combined with filtering for optimal performance.
00:10:26.160
When running an explain command without any options, it provides an output of strategies indicating how PostgreSQL interprets your query. These strategies, denoted as arrows, are crucial as each arrow represents a method the database will use to obtain the final result. Understanding these strategies can significantly improve how you structure your queries for optimally obtaining data.
00:10:59.640
You also have the ability to perform analyses alongside executing queries. By using the analyze command, you can assess your query's performance—just ensure to perform it within a transaction to avoid unintended data modifications. The output will yield useful information about the timing of execution, including planning time, execution time, and the estimated number of rows that were output during the querying process.
00:12:01.380
In practice, the discrepancy between estimated time and actual time can sometimes lead to unclear results. For example, sequential scans need time to gather data from disks, leading to variable performances. Thus, leveraging counts and estimated calculations becomes important, especially in cases where accurate numbers are required without incurring high computational costs.
00:12:42.720
Another dimension to look at within PostgreSQL is index creation. Indexing in PostgreSQL can only be done once per subquery, and they’re mainly aimed at optimization, either for filtering, sorting, or both. To effectively utilize indexing, it’s beneficial to identify existing or missing indices within your database. You can also look into the stats after running a vacuum analyze to determine which indices are no longer in use to maintain efficiency and prevent bloating.
00:13:36.279
PostgreSQL hosting options can also bring exciting prospects when engaging more advanced queries in your database. Analyzing the performance of your queries can help identify potential bottlenecks while offering valuable insights into how your indices are contributing to overall system performance. You can track performance over time, enabling a better understanding of the overall health and efficiency of your database.
00:14:29.400
As we explore performance enhancements, we must also keep an eye on exotic databases. One fun example is CockroachDB, which is designed for automatic sharding in PostgreSQL-like environments without extensive customization. CockroachDB aims to ensure data is evenly distributed across multiple nodes, therefore mitigating risks like data loss during outages or natural disasters.
00:15:44.760
Then, we have MySQL, which is more commonly used, especially among PHP developers. Once considered a joke, its capabilities have evolved considerably—a serious alternative to PostgreSQL worth looking into. Finally, there are very exotic databases like ClickHouse, which was developed in Russia specifically for high-performance data analysis without compromising on speed or efficiency.
00:16:22.740
Notably, Cloudflare processes six million requests per second through ClickHouse, showcasing its potential for handling hefty workloads. If you'd like to dive deeper into advanced database architectures, you might want to consider NoSQL and document databases, though such topics would indeed merit their own extensive discussions.
00:17:01.920
In conclusion, reach out to the fascinating world of databases that are out there. Explore the various features and functionalities that each specialty database has to offer. Thank you very much for your time!