00:00:00.160
Our next speaker is Dmitry Tsepelev. He is going to talk about building high-performance GraphQL APIs. Dmitry has a background in engineering at Evil Martians, where he works on open-source projects. Today, he will share insights into improving the performance of GraphQL APIs.
00:00:12.000
One day, you decided to give GraphQL a try and implemented an API for your new application using GraphQL. You deployed the app to production, but after a while, you realized that the responses were not super fast. How can you find out what is causing your GraphQL endpoint to be slow?
00:00:35.120
In this talk, we will discuss how queries are executed and what makes processing slower. We will learn how to measure performance in the GraphQL ecosystem and identify the aspects we should improve. Finally, we’ll explore possible solutions and some advanced techniques to keep your GraphQL app fast.
00:01:19.119
You can send your questions to Dmitry via the stream chat, and I will be picking them up from there. Dmitry, the floor is yours.
00:01:37.280
Okay, I hope you can hear me now and see my screen. Hello everyone and welcome to my talk called 'Building High-Performance GraphQL APIs.' My name is Dmitry, or Dima if that's easier.
00:01:43.119
I live in a small Russian town called Vladimir and I am a backend engineer at Evil Martians. I frequently engage in open-source projects, conference talks, and sometimes I write articles. Today, I plan to share some of my articles with you.
00:02:07.840
We use GraphQL extensively in our production applications, so this talk is specifically for people who are already using GraphQL in production. It's not an introductory talk, so we won't discuss the pros and cons. However, if you need details about the technology itself, I suggest looking at my previous talk titled 'Thinking Graphs,' which I gave two years ago.
00:02:30.080
You can also check one of Evil Martians' articles about writing GraphQL applications in Rails. There are three articles that guide you in building a GraphQL application from scratch, including a client for it. This can be a good starting point.
00:02:53.840
Now, let's start talking about performance. I want to begin with the query path that is executed when a GraphQL application processes a request. It's essential to know the points where things can go wrong.
00:03:10.720
In GraphQL, a query is just a string, which we can refer to as a query string. This string typically goes to the '/graphql' endpoint via a POST request. The request is sent to the GraphQL controller, where it must be passed to the GraphQL schema along with the arguments and other necessary variables.
00:03:56.160
Next, we need to prepare the abstract syntax tree or AST. We send the query string to the query parser method. Once we have the abstract syntax tree, we must validate it to ensure that it has valid syntax and that all required variables are present. Afterward, we execute the query. As you can see, the query is structured like a tree, which we traverse recursively to execute each field.
00:04:42.560
This execution stage is called 'execute field,' which continues until we execute all fields. Sometimes, we also need to access data sources, like a database. Occasionally, we might not want to execute a field immediately; in such cases, we can mark it as 'lazy,' postponing its execution until all non-lazy fields are processed.
00:05:01.120
After finishing the execution of all fields, we obtain the query result object. The final step is to return this result to the client by converting it to JSON format before sending it back.
00:05:50.280
This is the general query execution path. Our plan today involves beginning with different methods on how to identify bottlenecks in your applications. Following that, we will discuss how to fix common issues typically encountered in GraphQL applications. Finally, we will cover some advanced techniques for addressing specific cases.
00:06:39.440
To optimize effectively, it doesn’t make sense to optimize before having measurements. If you do not have measurements, you should set them up first. In production applications, we often use services like New Relic; however, these may not be very helpful for GraphQL.
00:07:05.360
Given that all requests are directed to the same GraphQL endpoint, the data will be aggregated, preventing us from identifying which specific query is slow. To address this, Ruby GraphQL offers a feature called 'tracer.' This feature allows you to capture events from the execution engine and perform various actions with them.
00:07:50.720
For example, you can send that information to your monitoring tool, such as New Relic. To enable the tracing feature, you simply need to turn on the plugin in your schema, and afterward, you can see all the stages in New Relic. This setup allows you to identify which queries are slow.
00:08:59.919
If you are using your monitoring solution, you will have to set it up manually. I can recommend a set of gems known as 'Ebitda,' created by my colleague Andrew Nautical. The idea is that you can declare all the metrics you'd like to collect from various components of your applications, such as Rails or Sidekiq, and send them to any data source of your choice.
00:09:38.640
For instance, it can be sent to Prometheus. In the case of GraphQL, it sends various metrics such as GraphQL API request counts, field resolve run times, and counts for query and mutation fields. This setup provides visibility into the operations occurring in your application.
00:10:36.119
Next, I'd like to recommend a book titled 'Production Radical.' I read it one day and found it to be filled with useful tips, some of which will feature in today's talk. One notable tip discusses client headers, which are beneficial when clients, such as mobile applications, are not immediately updated.
00:11:04.639
The problem is that you might identify a slow query and resolve it on the client side, only for the issue to resurface later. Without client headers, you won’t be able to tell whether the problem is new or old. Therefore, the solution is to add headers indicating the client name and client version.
00:11:30.560
Now that we have our monitoring setup, we can find out if a query is slow. However, sometimes a query may be fast under certain conditions and slow under others. For instance, a specific variable might cause the query to slow down in some cases, and conventional monitoring won’t help us.
00:12:15.520
Let me illustrate this with an example. Imagine we have a list of items for sale, each with a price in USD, but we want to show the prices in the user's local currency. While the query that returns items with prices may normally be fast, it can occasionally log as being really slow.
00:12:56.880
To address this, we can take advantage of the tracer feature I previously mentioned. We can write our custom tracer, which is a class with a single method called 'trace.' This method accepts a key, which represents the name of the event, and the data, which is a hash that varies depending on the event key.
00:13:39.360
What we do here is remember the time when we received the event, allow the execution engine to process the stage, and compare the new time with the previous one, logging it accordingly. By doing this, we can see the various stages with the time spent within the code.
00:14:33.440
For example, we may find that the field execution takes a considerable amount of time, but we won’t know which specific field is the bottleneck. Thus, we can change our tracer to compare keys with the 'execute field' string. If it matches, we print the field name to understand what is slow.
00:15:56.320
If you're curious about one of my problems, I found that we stored conversion rates for currencies in Redis. When I executed the field for the first time, it resulted in loading a massive amount of data, taking a significant time. The fix was straightforward: I removed unnecessary data from Redis, and the query execution improved drastically.
00:16:43.440
Another effective technique is called directive profiling. You can write a custom directive, such as one called 'PP,' which can apply any profiler you want to a specific field. This allows you to test performance on staging or even production while logging the results on the server for further analysis.
00:17:25.760
Next, I'd like to discuss parsing issues. When I prepared for the Ruby Russia conference in 2019, I wanted to explore how slow parsing could be. I created a small benchmark with a list of queries to test various aspects, such as nested fields and query complexity.
00:17:59.360
The results showed that smaller queries are parsed quickly, while larger queries with more fields and nesting take significantly longer, sometimes up to a second. Therefore, it’s essential to keep your queries simple, as there isn’t a way to reduce parsing time significantly.
00:18:44.800
The second issue worth mentioning is response building. Sometimes building a response can be slow due to the large response size. In one scenario I encountered, profiling indicated that most time was spent on the library responsible for handling large records, not due to developer error.
00:19:11.760
In such cases, keeping your responses small is crucial. Implementing pagination can significantly reduce the amount of data transferred and processed. Additionally, applying maximum depth and complexity limits can help regulate the size and complexity of queries.
00:19:49.680
One key recommendation is to ensure that you are using the latest version of GraphQL Ruby, as each iteration usually brings performance improvements. I ran benchmarks from 2019 again after two years, and I noticed solid performance increases and reduced memory consumption.
00:20:20.160
Another common problem arises from database interactions. Database queries might be suboptimal, which can create a bottleneck. This talk won’t cover database optimization methods as that’s a topic in itself.
00:20:50.880
A significant issue for GraphQL users is the 'N+1 problem,' which occurs when you have a list of entities with associations. If you forget to preload these associations, you could execute one query to load the entities and many additional queries to retrieve associations for each.
00:21:33.520
To mitigate this, the simplest solution is to eager-load all associations using 'includes' or 'eagerload.' While this approach works well for smaller applications, it isn't always efficient since it may load unnecessary data. Thus, using techniques like 'look-ahead' can improve efficiency by checking if an association is needed before loading it.
00:22:38.960
Lazy preloading is another useful technique, where associations are loaded only when they are accessed for the first time. Various gems can implement this feature, and I have one called 'ar_lazy_preload' for those interested.
00:23:15.680
Moving on to advanced techniques for improving application performance, let’s discuss HTTP caching, which allows us to avoid loading data from the database when we know that the client already has valid data. We can set cache control and expiration headers to specify how long the data is valid.
00:24:08.480
Additionally, we can use 'ETags' or 'Last-Modified' headers for determining when data needs to be refreshed. However, typical HTTP caching can be challenging for GraphQL because it primarily uses POST requests that are not cacheable according to the specification.
00:25:05.440
A solution is to allow GET requests for your GraphQL endpoint, enabling you to build an HTTP caching mechanism from scratch. For instance, if you have static data, like banners that don’t change frequently, you can cache them efficiently.
00:25:51.440
Another useful technique is called 'Apollo Persisted Queries.' Instead of sending the entire query string, you send a hash representing the query. The server checks its storage for the query associated with that hash. If found, the server can use it directly; if not, it returns an error until the query is saved.
00:26:50.720
With GraphQL Ruby, there’s an existing implementation, although it's currently in the pro version. I have also created my implementation called 'persistent_queries,' which stores queries and processes them based on a hash.
00:27:36.080
To sum up, first ensure you have monitoring configured before making optimizations. If issues arise with specific queries, you can use a custom tracer to pinpoint performance issues directly.
00:27:51.840
If parsing issues persist, consider simplifying your queries or implementing compiled queries. It's crucial to address any N+1 problems as they can lead to significant performance issues. Keep your responses small, as smaller responses can be serialized quickly, and explore various caching strategies for better performance.
00:28:57.760
Now, I'm ready to take your questions. I will also post my slides on Twitter later, so please follow me for updates. Thank you all for your attention!
00:29:31.680
Thank you, Dima! Once again, I will drop your links in the stream chat. If you’re attending from the other side of the event, feel free to share your links with us too. We have a couple of questions. Are you ready to answer them?
00:29:44.800
Yes, I’m ready! Let’s do it.