Talks

GraphQL on Rails

EuRuKo 2016

00:00:05.050 Okay, so we're back and we'll continue with our next speaker.
00:00:08.830 That will be Marc-André Giroux, who comes all the way from Canada. He's from Shopify and is really into GraphQL, Rails, and Relay. He also likes to lift barbells over his head just for fun. So please welcome Marc-André Giroux.
00:00:42.859 So a lot of people! Who actually uses GraphQL? Okay, a bit less. Yes, I'm really happy to be here in Sofia for a couple of days. Yesterday, I tried Rakiya, and it was so good! If you haven't tried it yet, do it! I suspect I might have a few more tonight after the talks.
00:01:05.000 As you heard, I come from the cold Montreal, Canada, and I work at Shopify where I help people sell things on the Internet. So today, we're going to talk about GraphQL, but before we start, let me tell you a little story about how we currently fetch data from our client apps and how we structure our APIs generally.
00:01:27.039 Let's start with a simple UI component and examine what data we need and how we're going to fetch it. I've chosen the GitHub cart component here. Let's take a look at what we need. We might have a card component and we may need a card resource for it. As you see, we have products; we might need a product resource as well, and we might decide to have a product image nested inside that.
00:01:56.329 So, generally, how do we fetch this? We might start by using reusable endpoints or a typical REST architecture. In a typical REST architecture, we have resources that expose hyperlinks to other resources. By following these links, we're going to change states from, let’s say, our card resource to the product resource.
00:02:26.590 We might start by getting our card, and that card resource is going to return us maybe a few hyperlinks toward the products. So we're going to fetch those, and these product resources might point us toward the product images, and we're going to fetch those too. That provides us some advantages, but there's also something a little annoying with this approach—especially on mobile and slow networks—and that is, way too many round trips.
00:02:58.290 Imagine on mobile having, I think we had seven round trips there. It's kind of annoying. So we try to find a solution to this. Sometimes we have something like a query parameter where we can tell it to give us the card but also expand that resource and provide us with the products resource too. So that works, but what about deeper nesting? What if I want to select only some fields of these products? Maybe you're going to do another type of query parameter and say, 'Oh, I'd like the products, but only the name, description, and price.' That works too.
00:03:30.650 I've seen many ways of handling this by the way—different query parameters—and we kind of have to reinvent the wheel every time. I've yet to see a nice and pretty solution to this, so we might be tempted to use something else instead, such as custom endpoints. Imagine I want a cart with specific requirements and instead of using REST and having full resources returned to me, I'm just going to build my own endpoint that returns everything I need. Pretty easy, right? We're done and our client’s app is perfect!
00:04:05.040 However, as you might expect, things can get a little more complicated. We might have a card version for a completely different view in our app and we're going to build a custom endpoint for that too. We might build another custom endpoint for the cases when we only want products and images but nothing else. It's a little annoying, right? So what ends up happening is that every time our client updates views or creates a new view version, or if the model itself changes, our server needs to either update endpoints or, even worse, create new endpoints specific for these changes.
00:04:54.550 If we come back to the REST or reusable endpoints method, we notice we have a specific relationship between the client and server. Our clients might say, 'Yo, give me the resource with ID ABC,' and the server would simply respond, 'Here’s that resource with ID ABC.' But what happens when we have a second version? The server needs to respond with that version 2. This means the server becomes responsible for the shape of the data the client wants. So, when views change and the model changes too, the server must keep up with every change. Imagine that every time our view changes, our endpoint changes, and the model changes, and we end up realizing that our views and models are pretty coupled.
00:06:16.020 This is where GraphQL comes in. With GraphQL, you can finally decouple your views from your models. But before we get deeper, let’s clarify what GraphQL is not: GraphQL is not a database, and it's not a special kind of graph database either. It’s also not a library—you can't just install GraphQL and solve all your problems. Furthermore, it's not language-specific.
00:06:56.640 So, what is it? GraphQL is a simple query language that allows us to fetch data shapes instead of resources by specifying exactly what we require. It's also a server specification that explains how a server parses and executes that query.
00:07:13.200 Here is a bit of my hello world example. This query here is basically fetching my shop and its name. We have selection sets, represented by brackets, which tell GraphQL that we want to select specific fields on that object. So it’s very simple. When we send that query to a GraphQL server, it gets parsed, validated, and finally executed. The cool part is that this query language is similar to JSON, but without the values, and what we get in return is actually JSON. So our query shape matches our response shape almost one-to-one.
00:07:52.700 Here we have the name of my shop, and we understand easily that 'name' is a field on my shop. But what is my shop here? It’s not some special field or some magic—it’s that GraphQL needs you to define entry points into the graph. For example, this query would be equivalent in GraphQL if you defined both on the route of your graph. To get the shop with ID 1 would simply be a programmatic way of getting my shop, but both are valid.
00:08:29.490 In this case, we could define that on the route of our GraphQL schema, where 'shop' takes an ID argument. This leads us to discover another feature of GraphQL: variables. Variables can be applied on fields, and you can think of fields as simple functions that take arguments and return a value.
00:09:13.400 Let’s take a more complicated example. Here, not only do we have the name of the shop, but we also have its location and products. Location and products are not exactly like the name, right? The name is a scalar field, which returns simply a string. However, location and products here are complex fields, referred to as object types, and we can query other fields on these object types.
00:09:52.610 The response is going to resemble our previous example but filled with the actual values. All that's possible because of the type system at the core of GraphQL, which is a powerful static type system. This allows you to define exactly what’s possible. Let’s walk through that query step by step. As I talked about earlier, my shop is at the root of our schema, and we have that type defined. On a query route, we could fetch my shop or we could get the shop with an ID. The returned type would contain the name, location, and products of the shop, and we know this because there’s a type attached to it.
00:10:51.440 You can imagine how the query process works: the same location returns a location type or address type, which contains the city and address, while the products return as a product type with name and price. If I take this query, does it look valid to you? Obviously, we know it’s not valid because it doesn’t make sense to have a 'no' field on my shop, and the type system knows that too. It validates that the response matches the query structure.
00:11:18.970 Validation can refuse to respond to us with a response if it doesn’t match. Not only does the server know that, but the client can know that too through introspection. At the root of a GraphQL schema, we can find a special schema field that enables us to query meta fields on a GraphQL schema. We can fetch the fields, the description of the fields, the names of objects, and this allows the client to be almost as intelligent as the server schema. This enables really fun features like auto-documentation.
00:12:01.080 With all of this capability, we have the names and descriptions of those fields, which allows us to generate boilerplate code. Imagine we have some front-end code where we can guess the GraphQL types behind them because we have the schema. We can do static validation on the client; it's pointless to send a query with 'no' on my shop if we know client-side that it's not possible. We can also do some really nice IDE integration, like auto-completion of GraphQL queries and validation within the IDE.
00:12:43.610 An example of this is GraphiQL, a graphical schema explorer built in JavaScript. It allows us to navigate through our entire API easily, understanding everything that's possible on the right-hand side. This is when I first became amazed by GraphQL. It’s capabilities are exciting!
00:13:20.080 So far, we've been sending a huge query containing everything we need to the GraphQL server. Yet, as developers, we like to decompose things into smaller units and perhaps later recompose them. Similarly, we do this with modern UI frameworks like React, where we divide components. For instance, if we had a product component, the only data it cares about is the product data, namely the ID, name, and price.
00:14:01.810 It would be nice if we had that kind of abstraction in GraphQL. It turns out it exists, and it’s called fragments. We can extract certain parts of our queries into fragments. All we need is to name it and identify the type on which it applies, enabling reuse of that fragment inside our original query. Each component can have its own fragment, and at runtime, when the query is actually sent, it gets consolidated into a larger query.
00:15:00.690 However, with GraphQL, we don’t just have 'read' capabilities; we can also write mutations. Mutations are quite similar to normal fields, except they aren't nested. We have a mutation route and a query route. On the mutation route, we deal only with top-level actions, such as creating a product. They have side effects, but we maintain the same capability of retrieving only what we want back from the result.
00:15:54.890 The cool thing about GraphQL is that you don't need to change much about your existing business logic. It's all based on your existing code. In Ruby, we can use the GraphQL Ruby gem, which helps create the schema and execute the queries we receive. Let’s look at some code examples. We start by defining our product type, which is quite simple. We give it a name and a description, the basic stuff, and then we’ll add fields to that product.
00:16:45.890 For example, the field name here has one-to-one mapping with the database. Every field has a resolve function, which takes three arguments: the parent object, the arguments passed to the field, and context, which is a top-level context you pass with your query. This could include session-related data, authorization, authentication, or anything else really. However, not all fields are straightforward to map to the database. We can compute some new fields as needed.
00:17:39.650 Now that we have our product type, we can define our shop type, where we establish our products field. It’s relatively easy tasked with keeping the schema systematic. We build our query type for reading routes and establish our schema like this. The tricky bit is that our queries can be quite large, meaning we can’t really use query parameters due to length constraints. Thus, we’ll need to post to our endpoint.
00:18:30.270 Looking at our controller, we need three components: the query string, which contains our query variables, any arguments we’re going to pass to our fields, and the context object mentioned before. With these three elements, we simply tell our schema to give us a response based on the query. Using GraphQL means our controller handles HTTP-related tasks like cookies and sessions while the rest gets directed to your schema.
00:19:24.790 But GraphQL is not without its challenges. There's the infamous N+1 queries problem. Let’s understand it better with this example: let's say we have our shop type's products field, which returns an array of products associated with your shop. If we have an image field on the product type, the server will load each image individually since resolve functions are not executed in batches.
00:20:05.450 Shopify implemented a solution for the N+1 query problem called GraphQL Batch. It works by defining loaders that handle loading an image without fetching it immediately. Instead, we tell it to remember the ID and batch all resolves, loading the images at once. This method efficiently avoids the N+1 query issue.
00:20:48.090 Another challenge is HTTP caching. As queries become complex, we find ourselves constrained to using POST requests to handle them. The solution to this problem tends to confuse people, but it's a client-side cache. Most developers opt for client-side caching due to the loss of HTTP caching. This normalized cache structure resolves issues when executing multiple queries for the same ID but in different contexts.
00:21:24.560 Take, for example, if I fetch all the products on a shop and later request a particular product with ID 1. While I might have the product ID in two different context caches due to the separate requests, linking them allows seamless synchronization. A client-side cache remains consistent by avoiding duplicate entries as both requests will point to the same product.
00:22:34.690 Additionally, Apollo and Relay are two examples of effective client-side caches. Their utility lies in restoring HTTP caching functionality while allowing us to perform advanced operations with the data available in our cache.
00:23:06.390 We can apply similarly useful measures from the server side, implementing creative structures to mitigate complex queries. Employing simple query timeouts to terminate queries that take too long or applying limits on query depth can safeguard the server integrity.
00:23:32.690 Query complexity assessment can involve scoring each field and restricting queries that exceed the set threshold from being executed. Everything I've showcased so far is part of the GraphQL specification, usable in most GraphQL servers today, alongside upcoming developments that express even more possibilities.
00:24:01.810 One significant development is subscriptions. While querying allows flexible data fetching, real-time updates are quite challenging to implement with static caches. Subscriptions enable clients to express real-time data interests, prompting the server to send responses upon changes to the related data.
00:24:58.250 Another interesting feature under consideration is deferred queries. Imagine requiring some data from your query now while being comfortable waiting for other pieces of information later. With GraphQL's directive at defer, you provide the server details on what you prioritize processing first, allowing efficient response generation.
00:25:35.040 This approach flips the conventional interaction between clients and servers. Instead of the client making requests for static resources, we express our requirements for data shapes, giving the server the flexibility to meet those specifications. This promotes remarkable efficiency and predictability in how we fetch data, as it prevents over-fetching or under-fetching of data.
00:26:07.920 I’m thrilled to see the trajectory GraphQL is headed. Just last week, GitHub announced their public GraphQL API. We've been experimenting with it at Shopify and it’s exciting to see other companies adopt these practices too. I hope you all will consider using GraphQL and have a bit of fun doing so. Thank you!