GORUCO 2016
Micro Talk: Database Performance at Scale for RoR Applications

Summarized using AI

Micro Talk: Database Performance at Scale for RoR Applications

Rocio Delgado • June 25, 2016 • Earth

In the video 'Micro Talk: Database Performance at Scale for RoR Applications,' Rocio Delgado, a platform engineer at GitHub, discusses the importance of optimizing database performance when building Ruby on Rails (RoR) applications. She highlights that while RoR facilitates rapid prototyping, performance issues can arise as applications scale, impacting user experience and team productivity.

Key points include:
- Team Responsibility: Performance should be a collective responsibility across the organization—not just for engineers. This requires proper processes, tools, and infrastructure for monitoring and automation.
- Monitoring Tools: Rocio emphasizes the use of tools like Deployable for metric tracking and Graph for visual performance monitoring. These tools improve accessibility to performance data and enhance team collaboration, especially in remote work settings.
- Common Pitfalls: Two prevalent performance issues are N+1 queries and poor indexing. N+1 queries occur when associated data is inefficiently retrieved, while poor indexing can lead to decreased performance for reads and writes.
- Eager Loading: To mitigate N+1 queries, using eager loading with the 'includes' method is a simple yet effective solution.
- Indexing Strategies: Strategies for indexing include maintaining a balance to avoid performance hits during writes while ensuring that necessary indexes are present for read operations. Tools like New Relic help identify queries that require optimization.
- Proactive Measures: Tools such as Haystack can be used to track query performance, allowing teams to adjust their indexing strategies in response to detected issues.
- Testing Changes: Adopting tools like Hebert enables testing of new indexing strategies in staging before applying them to production, ensuring that performance remains stable.
- Cultural Shift: Rocio underscores the necessity of fostering a performance-centric culture within the team, where everyone plays a part in maintaining database efficiency.

In conclusion, Rocio urges developers to take proactive measures and consider the implications of their architectural decisions on performance. By doing so, teams can avoid potential issues and create more responsive applications. The video serves as a comprehensive guide to enhancing database performance for RoR applications, encouraging teams to continuously optimize their systems for better user experiences.

Micro Talk: Database Performance at Scale for RoR Applications
Rocio Delgado • June 25, 2016 • Earth

Database Performance at Scale for RoR Applications by Rocio Delgado

We love Ruby on Rails because we can get a working prototype up and running in a very short period of time. It follows coding conventions that make it simple to go from one developer to the next, however, when the application scales to a certain point, there are some performance penalties that come from the auto-magic of Active Record. I intend to show the common pitfalls for performance degradation, monitoring tools to help identify problems and common solutions including MySQL indexing optimization, all this based on my current experience with one of the biggest RoR apps working at GitHub.

Help us caption & translate this video!

http://amara.org/v/PssJ/

GORUCO 2016

00:00:15.940 My name is Rocio Delgado, and I live in Brooklyn, though I am originally from Mexico. I am a platform engineer at GitHub, and today I want to share some thoughts on database performance.
00:00:24.170 Why is it important to discuss performance in Active Record? Rails and similar frameworks make it easy to interact with a database, allowing us to focus on building amazing products. However, as we grow, with more users and increased load, our codebase becomes more complex. We often find ourselves duplicating queries across different parts of the code or optimizing for reads and pagination on large tables. Performance issues can affect your development cycle, team morale, and ultimately your customers. My experience has shown me that many of these problems can be prevented with the right information available upfront.
00:00:45.770 At GitHub, there are three key aspects of building a performance culture within any organization or team. First, performance must be the responsibility of the entire team—not just the engineers or management, but every individual in the organization. This requires processes and tools, as well as infrastructure to support automation, monitoring, and metrics. It's critical to measure everything, from individual query performance to loading times, so that we can proactively address any issue.
00:01:26.390 GitHub, a Rails and MySQL application, serves 60 million users and hosts 19 million repositories. Each week, users generate approximately 250,000 issues, resulting in a massive database size. I will share how we maintain optimal application and database performance amidst such scale. One of my favorite tools is Deployable, which is open-source and can be deployed anywhere. It greatly assists in metric tracking.
00:02:02.179 With tools like Graph, we can visually monitor performance and easily share this information with our team. This approach not only aids in teaching but also ensures that everyone has access to metrics, which is particularly useful in a remote working environment. Monitoring MySQL commands in GitHub is also crucial so that engineers can independently check table sizes and make data-driven decisions without needing approval from a database administrator.
00:02:42.209 There are two common database performance problems that we often encounter, regardless of the app's size. The first pitfall is N+1 queries, which occur when querying associated data inefficiently—typically when you fetch a parent record and subsequently retrieve child records individually. The solution is straightforward: by using eager loading with the 'includes' method, we can fetch associated data more efficiently.
00:03:06.360 To proactively detect N+1 queries, we leverage a tool called Pig, which was developed at GitHub. Pig places a bar at the top of the development page that displays helpful performance insights, including the number of objects being loaded and the time taken for various requests. If there are multiple queries being executed, it indicates a potential performance issue that should be addressed. We also incorporate instrumentation for our API calls to visualize how many queries are executed pre- and post-optimization.
00:04:08.010 The second performance issue to discuss is poor indexing. Increasing the number of indexes can lead to performance penalties during writes, so it's vital to balance the indexing strategy. Unused indexes can hurt write performance, while a lack of necessary indexes can slow down read operations. We use tools to identify which indexes may be outdated or unnecessary, allowing us to clean up our database schema effectively and maintain performance.
00:05:36.750 When creating indexes, ensure they align with the most critical queries. Collaborating with the DBA or understanding how to use tools like New Relic can reveal which queries need optimization. Creating multi-column indexes often yields better performance for complex queries. Additionally, cleaning up unused indexes regularly is crucial to maintain database performance. Regular analysis helps decouple the selective and non-selective index usage in a database architecture.
00:07:06.030 To identify when to create new indexes or address slow queries, teams can use monitoring tools like Haystack, which track exceptions. By analyzing which queries exceed the acceptable performance threshold, we can adjust our strategies accordingly. Understanding the need for indexes based on query performance allows us to systematically approach database optimization.
00:08:01.240 Tools like Hebert make it easier to clone production tables to staging environments for testing new indexing strategies before deploying to production. This ensures that changes do not adversely affect performance once applied. Proper infrastructure support for tooling is crucial in scaling these operations.
00:09:59.430 In conclusion, the key to good database performance is to stay ahead of issues proactively. Limiting round trips to the database and carefully considering pagination can significantly improve application responsiveness. Moreover, fostering a performance-centric culture within your team—where everyone has a role in maintaining performance—will lead to better outcomes. Start small, set achievable goals, and remember that sometimes slowing down leads to faster progress in the long run.
00:12:04.260 Thank you for your attention. I hope you found this information useful and that you can apply some of these insights in your work.
Explore all talks recorded at GORUCO 2016
+1