00:00:15.940
My name is Rocio Delgado, and I live in Brooklyn, though I am originally from Mexico. I am a platform engineer at GitHub, and today I want to share some thoughts on database performance.
00:00:24.170
Why is it important to discuss performance in Active Record? Rails and similar frameworks make it easy to interact with a database, allowing us to focus on building amazing products. However, as we grow, with more users and increased load, our codebase becomes more complex. We often find ourselves duplicating queries across different parts of the code or optimizing for reads and pagination on large tables. Performance issues can affect your development cycle, team morale, and ultimately your customers. My experience has shown me that many of these problems can be prevented with the right information available upfront.
00:00:45.770
At GitHub, there are three key aspects of building a performance culture within any organization or team. First, performance must be the responsibility of the entire team—not just the engineers or management, but every individual in the organization. This requires processes and tools, as well as infrastructure to support automation, monitoring, and metrics. It's critical to measure everything, from individual query performance to loading times, so that we can proactively address any issue.
00:01:26.390
GitHub, a Rails and MySQL application, serves 60 million users and hosts 19 million repositories. Each week, users generate approximately 250,000 issues, resulting in a massive database size. I will share how we maintain optimal application and database performance amidst such scale. One of my favorite tools is Deployable, which is open-source and can be deployed anywhere. It greatly assists in metric tracking.
00:02:02.179
With tools like Graph, we can visually monitor performance and easily share this information with our team. This approach not only aids in teaching but also ensures that everyone has access to metrics, which is particularly useful in a remote working environment. Monitoring MySQL commands in GitHub is also crucial so that engineers can independently check table sizes and make data-driven decisions without needing approval from a database administrator.
00:02:42.209
There are two common database performance problems that we often encounter, regardless of the app's size. The first pitfall is N+1 queries, which occur when querying associated data inefficiently—typically when you fetch a parent record and subsequently retrieve child records individually. The solution is straightforward: by using eager loading with the 'includes' method, we can fetch associated data more efficiently.
00:03:06.360
To proactively detect N+1 queries, we leverage a tool called Pig, which was developed at GitHub. Pig places a bar at the top of the development page that displays helpful performance insights, including the number of objects being loaded and the time taken for various requests. If there are multiple queries being executed, it indicates a potential performance issue that should be addressed. We also incorporate instrumentation for our API calls to visualize how many queries are executed pre- and post-optimization.
00:04:08.010
The second performance issue to discuss is poor indexing. Increasing the number of indexes can lead to performance penalties during writes, so it's vital to balance the indexing strategy. Unused indexes can hurt write performance, while a lack of necessary indexes can slow down read operations. We use tools to identify which indexes may be outdated or unnecessary, allowing us to clean up our database schema effectively and maintain performance.
00:05:36.750
When creating indexes, ensure they align with the most critical queries. Collaborating with the DBA or understanding how to use tools like New Relic can reveal which queries need optimization. Creating multi-column indexes often yields better performance for complex queries. Additionally, cleaning up unused indexes regularly is crucial to maintain database performance. Regular analysis helps decouple the selective and non-selective index usage in a database architecture.
00:07:06.030
To identify when to create new indexes or address slow queries, teams can use monitoring tools like Haystack, which track exceptions. By analyzing which queries exceed the acceptable performance threshold, we can adjust our strategies accordingly. Understanding the need for indexes based on query performance allows us to systematically approach database optimization.
00:08:01.240
Tools like Hebert make it easier to clone production tables to staging environments for testing new indexing strategies before deploying to production. This ensures that changes do not adversely affect performance once applied. Proper infrastructure support for tooling is crucial in scaling these operations.
00:09:59.430
In conclusion, the key to good database performance is to stay ahead of issues proactively. Limiting round trips to the database and carefully considering pagination can significantly improve application responsiveness. Moreover, fostering a performance-centric culture within your team—where everyone has a role in maintaining performance—will lead to better outcomes. Start small, set achievable goals, and remember that sometimes slowing down leads to faster progress in the long run.
00:12:04.260
Thank you for your attention. I hope you found this information useful and that you can apply some of these insights in your work.