Krzysztof Hasiński

Fantastic Databases And Where To Find Them

Fantastic Databases And Where To Find Them

by Krzysztof Hasiński

In the talk titled "Fantastic Databases And Where To Find Them," Chris Hasiński discusses the often-overlooked topic of databases in the context of Ruby on Rails (RoR) applications. Drawing analogies to mythical creatures, Hasiński emphasizes the need to better understand and engage with databases, which are frequently treated as complex, mysterious entities within the development community. The presentation begins by connecting the familiar yet daunting concept of dragons on ancient maps to the uncomfortable realities developers face regarding layoffs and employment security.

Key Points Discussed:
- Importance of Database Knowledge: Hasiński highlights how job interviews often lack discussions about databases, despite their critical role in applications. Developers frequently struggle to answer queries about database versions, reflecting a larger trend of ignorance regarding database functionalities.
- The Testing Pyramid: The speaker introduces the concept of the testing pyramid, illustrating how most Rails applications rely heavily on integration tests rather than unit tests due to their complex interactions with databases. This deviation from the ideal leads to a phenomenon termed the "ice cream cone" anti-pattern.
- SQLite Use Cases: Hasiński begins with SQLite, noting its effectiveness for small projects and low-traffic applications. He shares insights into its performance, ease of use, and limitations, particularly regarding write operations and threading, as well as its ability to support high request volumes efficiently.
- Postgres Features: Transitioning to Postgres, Hasiński describes it as a versatile choice that can fulfill multiple roles in an application stack. He touches on features like unlocked tables, window functions, and cumulative sums, emphasizing their power in optimizing database interactions and the overall efficiency of RoR applications.
- Performance Optimization Tools: The talk discusses the importance of understanding query execution plans and using tools like PG Hero for performance enhancement. The role of indices in database query optimization is also highlighted, along with the necessity of employing them judiciously.
- Emerging Database Technologies: Finally, he briefly mentions distributed databases such as CockroachDB and analytical solutions like ClickHouse, noting their strengths in handling data integrity and complex queries, respectively.

Concluding Thoughts:
Hasiński wraps up with the assertion that understanding databases is crucial for effective software development. By recognizing their unique characteristics and integrating them better into development practices, developers can harness their full potential while navigating the complexities of database management.

Overall, the talk serves as a call for developers to confront and understand the "mythical creatures" of their projects—the databases—ultimately leading to more robust and efficient applications.

00:00:13.080 Today, I would like to talk to you about mythical creatures. There's a saying, 'Here be dragons,' which apparently you can find on very old maps. People used to place them on maps when they didn’t know what was actually outside the boundaries. However, this isn't exactly true for the maps that are really old, as the original phrase was 'Here be lions,' not 'Here be dragons.' Dragons are mythical creatures, while lions are often thought to be mythical as well. This particular phrase, 'heavy dragons,' is specifically annotated on an ancient globe called the Hunt-Lenox Globe, which is written in Latin.
00:01:30.000 Now, I would like to relate this to programming, more specifically to the somewhat scary topic of layoffs. Layoffs are very much a reality right now, and many of you are likely looking for work. When you’re on an interview, it’s crucial to ask intelligent questions about the project.
00:01:50.000 Ask about the version of Rails being used; it shows you have a genuine interest in whether the software is up to date, maintainable, and secure. You can also inquire about architectural decisions, which are quite common. Additionally, you can ask front-end related questions to demonstrate that you possess knowledge beyond just backend development.
00:02:05.000 Questions about the process can reveal your infrastructure knowledge and show that you can collaborate well with others. Similarly, test-related questions are always appropriate since there are often discussions around testing practices.
00:02:18.000 However, once you start discussing databases, many people seem lost. When asked about the version of Postgres being used, you often hear, 'I’d need to check' because general shop talk revolves around data manipulation and storage.
00:02:35.000 Most conversations revolve around tasks related to work, which is precisely what databases do. If you look at a simplified diagram of Rails, most of it revolves around Ruby and its backend functionalities. The frontend usually caters more to front-end developers while the middle part is where the significant processing occurs.
00:02:50.000 If you’re not convinced about the critical role that the database plays, consider looking at the concept of the test pyramid. Google it, and you'll find that ideally, most of your code should be tested through unit tests, isolating them from database dependencies. After that, you add an integration layer of tests to see if all components work together, with a scant sprinkling of expensive end-to-end tests overseeing your business logic.
00:03:09.000 In the Ruby on Rails ecosystem, however, we encounter an anti-pattern known as the ‘ice cream cone,’ which flips this ideal pyramid upside down with mostly manual testing on top. My experience of over fifty Rails projects reveals that the actual testing pyramid generally does not resemble the ideal. We have unit tests, but it's challenging to isolate them; a larger portion consists of integration tests.
00:03:28.000 The reason for this lean towards integration tests is simple: Rails applications are often highly dependent on databases. Most of them are essentially points of interaction with databases, and I’m not saying that every single application is like this, but a vast majority are.
00:03:46.000 Yet, we treat these databases as mythical creatures that we’re not supposed to be concerned about. We hide them behind ActiveRecord and multiple abstraction layers. Here be dragons!
00:03:56.000 How do we handle these mythical creatures? First, we need to acknowledge that databases are indeed very strange. They are not object-oriented at all; most utilize declarative approaches, which differ from functional programming. Even those familiar with Elixir might find them odd. Generally, databases operate on a single-user model, leading to potential security risks because typically we only use one user account to connect to the database.
00:04:30.000 However, databases do possess remarkable strengths. They are typically much faster than anything we can accomplish in Ruby. Starting with SQLite—an example I generated during my journey—it’s usually ideal for small projects.
00:04:44.000 When thinking of SQLite, you might envision a simple website you created in your free time that you want to publish without the hassle of setting up an entire database server.
00:05:00.000 This is indeed a valid use case. Interestingly, the SQLite website itself operates using SQLite, supporting hundreds of thousands of requests daily. Each of those requests can generate nearly 200 queries, resulting in an astonishing total of 20 million queries each day!
00:05:11.000 It's impressive, considering that the SQLite infrastructure relies on seven-year-old hardware and a shared virtual machine with twenty-three other applications. This showcases just how powerful SQLite can be. It operates effectively under demanding conditions.
00:05:25.000 SQLite excels in reading data but has limitations in writing; only one thread can perform write operations at any given time. It’s essential to note that it's public domain software, which can be used freely without legal constraints.
00:05:39.000 Additionally, its backup process is incredibly straightforward—simply copy and paste files. These features are certainly appealing, especially when considering high traffic on a minimal budget for hosting.
00:05:53.000 One significant advantage of SQLite is its ability to prepare databases for quick backups or testing. It allows for cloning multiple versions of the database easily, enabling parallel testing. However, there are a few challenges.
00:06:05.000 For instance, when using SQLite in applications hosted on threaded servers, it can lock the entire interpreter, which significantly disrupts performance. Heavy write operations may lead to database locking errors, so it’s wise to implement retry logic, especially for frequently accessed endpoints.
00:06:19.000 Moreover, there is a lack of built-in support for search functionality, which means we often need to implement cumbersome workarounds. Rails doesn’t inherently support some features necessary for SQLite to operate with full effectiveness.
00:06:30.000 So, considering SQLite's limitations, let’s move on to the elephant in the room: Postgres, another tool designed for omnipresent database management. Almost everyone seems to use Postgres—raise your hand if you do? Yes, just about every one of you.
00:07:05.000 Postgres is often deemed the safest choice for a database because you can virtually do anything with it. It can even replace components of your entire stack, including Rails, if necessary. A notable post by Stefan Schmidt discusses how Postgres can substitute for nearly every part of your stack.
00:07:25.000 Essentially, it can function as a document database, a caching layer, a GraphQL server, a job queue, and could even replace Kafka if desired. While this might not be practical for every project, it’s advantageous for smaller companies looking for less maintenance.
00:07:45.000 Let’s explore one of Postgres's features. Starting with unlocked tables—these are tables without write-ahead logs. The write-ahead log is a crucial component of a database that captures all data changes to ensure data persistence.
00:08:05.000 If you don’t have this log, your tables will remain in-memory and will not persist through a crash. Consequently, this feature effectively behaves as a cache. You can use it in Rails, even employing helper methods in migrations.
00:08:20.000 In Rails 6, you can create these unlocked tables easily. Although you might think, 'I already have caching credits for this,' consider which database allows you to leverage this SQL-compatible caching feature.
00:08:35.000 If you set up a few lines in your test environment, you can realize substantial performance improvements in potentially horrible test suites, yielding a 25% boost in one of the most resource-heavy projects I worked on.
00:08:52.000 Next, let’s discuss window functions. They are quite powerful features in SQL. Remember that window functions allow you to perform calculations across sets of table rows, refining results without affecting the aggregate level of calculations.
00:09:08.000 Some people might find window functions confounding, believing they integrate complex syntax, but they are exceptionally powerful. For example, calculating employee salaries versus average salaries within a department demonstrates their utility.
00:09:23.000 You can extract data that indicates growing sales using these functions, making it easy to apply window functions in Rails while maintaining clarity.
00:09:42.000 Window functions allow you to aggregate data rolled up over certain timeframes or categories while retaining individual row details available for further processing.
00:09:57.000 The second example incorporates pattern matching in SQL, which can be particularly useful when it enables you to number every ID while restarting the count whenever you encounter a new category ID. This capability allows for efficient querying for the cheapest items in each category.
00:10:18.000 The last example focuses on cumulative sums, which is a frequently searched SQL query on Stack Overflow. By extracting values based on prior months, you can create cumulative revenues that adapt as more data gets collected.
00:10:36.000 These kinds of operations prove to be significantly swifter in SQL than executing similar operations in Ruby, especially with larger datasets.
00:10:57.000 Since SQL can understand the data it processes inherently, it can innovate strategies that often outpace Ruby implementations. SQL explains the execution plan that outlines its performance.
00:11:12.000 In learning to read execution plans, you can gain insight into your database's underlying processing and optimize your queries accordingly.
00:11:30.000 Being aware of query costs helps indicate potential performance bottlenecks. Using tools like PG Hero, which examines indices and provides suggestions for optimization, can significantly enhance database performance.
00:11:46.000 Indices serve to optimize database queries, but remember that utilizing too many can counterintuitively lead to performance degradation. Finding the most efficient indexing approach for your queries is vital.
00:12:07.000 Furthermore, testing different indexing strategies based on multiple factors can yield unexpected benefits for unique applications, particularly when tight on resources.
00:12:20.000 Distributed databases like CockroachDB leverage these principles effectively, eliminating much of the complexity that comes with sharding in Postgres, while ensuring data integrity across multiple machines.
00:12:35.000 The incorporation of geographical indices, or knowledge of where user data is located, can drastically improve performance and reduce costs for servers based on user proximity.
00:12:56.000 If considering alternatives, technologies like ClickHouse provide efficient handling of large datasets and are excellent for complex queries commonly found in business intelligence areas.
00:13:12.000 When utilizing databases that can pre-aggregate data, there’s potential not just for speed improvements, but also for large resource savings in endpoint activity when querying multiple sources.
00:13:28.000 In conclusion, various databases each have their advantages and are suited for specific use cases in application design. While NoSQL databases offer flexibility and performance, combining database functionalities seamlessly can yield excellent results.
00:13:42.000 Thank you for your time. Are there any questions?
00:14:00.000 I really appreciate the insight you provided on the unlocked tables, but I would love to take a picture of that slide. I was also wondering about the constraints when these can be used in tests.
00:14:11.000 I can find the slide on unlocked tables for you. It was rather early on, before we discussed window functions.
00:14:25.000 Here it is! The part that you'll want to note is that this content belongs in config environments test.rb to make it successful.
00:14:38.000 Remember, however, that if you try to restart the DB, those unlocked tables won’t persist unless you migrate again. This is best suited for continuous integration pipelines.
00:14:57.000 Right, the write-ahead log is essential for maintaining data integrity. If you lack this, there's no backing up which leads to data loss.
00:15:12.000 Right, and in the context of using `database_cleaner`, tables will truncate and thus effectively cater to your needs.
00:15:24.000 Would you please explain what happens after a Docker restart? Well, in that case, the existing tables would be detached unless you migrate them.
00:15:38.000 Thank you for clarifying that; it was exactly the kind of insight I was looking for. Any other questions?
00:15:53.000 Thanks for the great talk! A quick query: Have you worked with any gems such as Scenic?
00:16:10.000 I haven’t dealt with Scenic, but it certainly leverages Postgres's capabilities for managing views efficiently.
00:16:25.000 Yes, it allows for effective use of resources across your application when considering various views.
00:16:39.000 I think it might be beneficial to shift towards a model that’s more resilient against database changes.
00:16:53.000 True; while integrations of various libraries can bring more features, you can still use Rails capabilities without drastic measures.
00:17:04.000 Absolutely, and with migration, certain features can be easily integrated while not losing core functionalities.
00:17:12.000 Thank you again for all the information you shared during the talk.