GraphQL - From Beginner To Expert
GraphQL is growing fast. Created by Facebook in 2012 and released to the public in 2015 it has taken the world by storm. Companies using it include Airbnb, Shopify, Lyft, GitHub and hundreds more. So something seems to be working right. Let’s take a look why it might make sense to start your next project with GraphQL or even to convert your current API.
Why GraphQL
The standard before GraphQL came along was RESTful web services. While you might get 12 different answers if you ask 10 people what REST means in general you have a main API endpoint (e.g. https://api.github.com
) which exposes URLs for resources:
// repos for a specific organisation
https://api.github.com/orgs/:org/repos
// assigned issues to the authenticated user
https://api.github.com/issues
Then you can use the standard HTTP verbs to query those endpoints. A GET
request retries data, a POST
request invokes the resource (usually to create a new resource), a PATCH
request updates a resource, etc.
GraphQL changes all that. There is one root endpoint (e.g. https://api.github.com/graphql
) against which all queries are run. Also, there is only one HTTP verb to use, you’re always POST
ing your query with a JSON-encoded body. (There is one exception: the introspection query with which you can retrieve the schema is a GET
request).
What are the advantages of GraphQL?
So that’s nice, I don’t have to remember those HTTP semantics. What else?
Introspection
Every GraphQL API exposes their complete, up-to-date schema at any time so as a frontend developer you always see exactly what you can query (of course you can turn this introspection off in production).
If you want to see an example, take a look at this Star Wars API .
No Overfetching
You can query exactly those fields that you need from the API. Let’s go back to our Star Wars API. You want a list of all movie titles. The movies resource has a lot more data about the film than the titles, though. There’s a list of characters, the director, producers, which planets they play on, and a lot more. (Try to find it yourself. Go to https://swapi.graph.cool/
, open the docs and open the allFilms
query).
Now, loading all this data from possibly different database tables and sending all of it over the wire is quite a lot of work for just wanting the movie titles. Of course you could ask your friendly backend developer to create a new endpoint for just the titles. But what if the next developer needs an endpoint for directors?
Which GraphQL you can ask for exactly what you need:
{
allFilms {
title
}
}
Which results in this response (somebody should update it with the newest movies):
{
"data": {
"allFilms": [
{
"title": "A New Hope"
},
{
"title": "Attack of the Clones"
},
{
"title": "The Phantom Menace"
},
{
"title": "Revenge of the Sith"
},
{
"title": "Return of the Jedi"
},
{
"title": "The Empire Strikes Back"
},
{
"title": "The Force Awakens"
}
]
}
}
Now your product owner changed her mind and wants to show the director alongside the title? No problem:
{
allFilms {
title
director
}
}
Which results in this response:
{
"data": {
"allFilms": [
{
"title": "A New Hope",
"director": "George Lucas"
},
{
"title": "Attack of the Clones",
"director": "George Lucas"
},
{
"title": "The Phantom Menace",
"director": "George Lucas"
},
{
"title": "Revenge of the Sith",
"director": "George Lucas"
},
{
"title": "Return of the Jedi",
"director": "Richard Marquand"
},
{
"title": "The Empire Strikes Back",
"director": "Irvin Kershner"
},
{
"title": "The Force Awakens",
"director": "J. J. Abrams"
}
]
}
}
(Good that you didn’t have to ask your backend dev for the title endpoint. He might get mad.)
No Underfetching
The opposite of overfetching is of course underfetching, which means you don’t get all the data you need in your request.
Let’s say you want to fetch all starships appearing in the movie A New Hope along with their cargo capacity (this is important to me!).
With a REST API you want probably need a query first to get a list of the starships from the movie and then a second one (to the /starship endpoint or something) to get the names and cargo capacities. This means at least two round trips from the client to the server. And in the worst case you need a separate API call for every starship’s cargo capacity (in our case 7).
With GraphQL you just send the following query:
{
Film(title:"A new hope") {
starships {
name
cargoCapacity
}
}
}
which results in
{
"data": {
"Film": {
"starships": [
{
"name": "Sentinel-class landing craft",
"cargoCapacity": 180000
},
{
"name": "Death Star",
"cargoCapacity": 1000000000000
},
{
"name": "Millennium Falcon",
"cargoCapacity": 100000
},
{
"name": "Y-wing",
"cargoCapacity": 110
},
{
"name": "X-wing",
"cargoCapacity": 110
},
{
"name": "TIE Advanced x1",
"cargoCapacity": 150
},
{
"name": "Star Destroyer",
"cargoCapacity": 36000000
},
{
"name": "CR90 corvette",
"cargoCapacity": 3000000
}
]
}
}
}
No versioning needed.
With GraphQL you don’t need to version your schema if are careful in your schema design. You can also add deprecation warnings to different fields.
Disadvantages
Setting up a GraphQL server might be overkill for small APIs. Also concerns like rate-limiting are not straightforward (but definitely doable - check the later chapters).
The Basics of GraphQL
Type System
In the center of GraphQL is the schema with its type system. It is the contract between frontend and backend. At any time the frontend client can introspect the schema to see what data is available and in what form.
Every field can be a Scalar
type, an Enum
, a list or an object type. A Scalar
just means a basic primitive. There are Int
, Float
, String
, Boolean
and ID
.
Queries and Mutations
With Queries
you fetch data (think a GET request in a REST API), with Mutations
you send data to the server (think POST request).
Each can be modified with arguments, like we did when we asked for the movie with a specific title:
{
Film(title:"A new hope") {
starships {
name
cargoCapacity
}
}
}
I could keep going but there are a lot of great introductions to GraphQL on the internet so I will focus our time here on the advanced parts. To learn more take a look at the great Introduction to GraphQL in the official docs and then come back to this article.
Schema Design
While changing the API with GraphQL is easier than with REST, try to put some effort in your first design before you start implementing. Think about what kind of queries you want to support.
While there are tools like Hasura that can create your whole API based on your database schema, this might not always be the best idea. It couples your database implementation to your API and might make refactorings more difficult.
Expose as little as possible
The less you expose the less you have to deprecate later. Will the client really need those fields? Adding them later is easier than removing them.
breaking changes: explicit naming: imageUrl instead of image because you might introduce an image object later (with url, title and thumbnail). likeCount instead of likes because list of people who liked might come later.
Consistent Naming
A query for retrieving repositories could be called getRepositories
, findRepositories
or just repositories
. A mutation for creating one could be called addRepository
, createRepository
or newRepository
. You’re free to pick what you like but it’s best to stay consistent.
The same for symmetric mutations. If you publishPost
the opposite should be unpublishPost
, not deletePost
.
Specific Naming
Keep in mind that you might add fields later on. If your app has a feature of following another user it might be better to call the mutation followUser
instead of just follow
from the beginning as there might be coming a new feature of following a project or organisation as well.
Versioning
Why are types by default nullable? The spec could have been written to be non-nullable by default and made nullable with a ?
(like: String?). One idea is that it makes deprecations easier which in turn allows you to never need another version of your API. Having to support multiple versions adds complexity to the codebase and is error-prone.
So if you remove a field from your schema that was nullable anyway things will not break for the client.
So it is highly recommended to make your fields nullable if there is a change that it will be removed in the future.
Use Descriptions
You can add a description to any field via the triple quote syntax:
"""
A banana represents a exactly what you think it would
"""
type Banana {
}
Having these descriptions means every user definitely sees it when she checks the schema instead of it being somewhere buried in a /docs page.
GraphQL Client-Side
Together with GraphQL itself the open-source ecosystem has been growing quite big. The two biggest players in this regard are arguably Apollo and Prisma .
The most well-known GraphQL client surely is Apollo Client . Facebook’s own library is called Relay .
An up-and-coming contender is Urql which tries to keep things simple but extensible (I’ve heard people say you need a PhD in Apollo to understand that stack completely).
In the following chapters I will implicitly talk about Apollo as it’s basically industry standard but most of it will apply to the other client libraries as well. I won’t go too deep into implementation details as those depend on your chosen client as well as your framework (React, Angular, Vue, etc.).
Mocking a Server
Once your team has defined the schema you don’t have to wait for the backend team to test the integration. Mocking a server is quite easy with, for example, mocking from GraphQL Tools . There are fine-grained controls for what exactly you want to return for each field and you can combine mocks with real resolvers. So once some backend functionality is implemented the mock can be replaced with the real resolver without hiccups.
Working with TypeScript
If you work with TypeScript you want to keep the GraphQL types in sync with the TypeScript types. One easy way to do it is with a code generator like GraphQL code gen . Just point it at your schema and it generates TS types you can import into your code. If the schema changes and you forgot to update your code you get a nice compilation error to tip you off.
Testing the client
End-to-end testing with, for example, Cypress shouldn’t change if you have a REST or GraphQL API, as we don’t want to test implementation details.
For unit tests you can mock responses with, for example, Apollo’s Mocked Provider . This works similar to how you have always mocked your API reponses. You just hard-code the response so your test does not depend on an outside entity and you can focus on your own code.
Implementing Servers
Most server implementations in the JS/TS world are actually based on Apollo Server (like GraphQL Yoga or AWS Amplify ). It comes with integrations for all well-known server frameworks like Express and has some nice extra goodies for prototyping and testing.
If you use a Django server, check out Graphene . If Spring is your thing, check out GraphQL Java .
Resolver Design
Resolvers are the bread and butter of backend GraphQL development. Here you define how fields of query should resolve to values we can send back to the client and how mutations should be processed.
Don’t overuse this layer, though. Keep your domain logic out of it and keep resolvers in general as dumb as possible. For example, keep your authorization code out of it . Delegate logic to a deeper layer (a service layer for example).
Resolver Arguments
Let’s take a quick look at the four resolver arguments:
Query: {
human(obj, args, context, info) {
[...]
)
}
}
obj
Don’t make the mistake of thinking this is the ‘root’ object. It’s actually just the parent. If you have a query that gets resolved multiple layers deep you always get handed the object just one level up from the current one.
args
The arguments parameter is straightforward: An object with the arguments provided by the client.
context
The context object can be initialized on server start and mutated in resolver calls. Here you can store state from the current request like the user
with his permissions, header information but also a database connection.
Try not to change the context from resolvers, though, as this might make it dependend on resolver execution order and quite brittle.
info
This is a grab bag of information of the current field that is resolved. The type is the following:
type GraphQLResolveInfo = {
fieldName: string,
fieldNodes: Array<Field>,
returnType: GraphQLOutputType,
parentType: GraphQLCompositeType,
schema: GraphQLSchema,
fragments: { [fragmentName: string]: FragmentDefinition }, // fragments used in query
rootValue: any,
operation: OperationDefinition, // the complete query
variableValues: { [variableName: string]: any },
}
Misc
Per spec resolvers wait for promises (or their equivalent in other languages).
You don’t need to define resolvers for every field. If there is none defined by you the engine falls back to trivial resolvers
. That means it just looks for fields on the parent object that are named like the field. So if the query asks for a firstName
of an author
field it will return author.firstName
.
Error Handling
Handling errors with GraphQL is quite confusing in the beginning. In general, the API can return data and errors in the same response. Because if one field of the query errors you still want to see the rest, right? Let’s take it step by step.
General considerations
Whatever happens on the GraphQL layer, you always get a 200 response code. Other codes are reserved for errors on the HTTP layer.
GraphQL errors don’t result in 4xx or 5xx error codes
Also, GraphQL errors are always caught so you don’t need a global mechanism to prevent your server from shutting down because of an uncaught exception.
User errors
If there is a syntax error (e.g. a missing bracket) or a validation error (e.g. you’re querying a field that doesn’t exist) you get back a response with only an error entry. Here is an example where I ask for a titles
field instead of title
:
{
allFilms {
titles
}
}
Which results in this response
{
"data": null,
"errors": [
{
"locations": [
{
"column": 5,
"line": 3
}
],
"message": "Cannot query field 'titles' on type 'Film'. Did you mean 'title' or 'vehicles'? (line 3, column 5):\n titles\n ^"
}
]
}
As you can see we get back an errors
array filled with error objects. Quite simple.
Errors during execution
If there is an error during execution you can get back both a data
and an errors
field. If an error is thrown resolving a field a null
value is returned. If the field was declared non-null
the error is propagated to its parent. If the parent is nullable
then the whole parent object returns null. If the parent was declared as non-null
, though, the error propagates further upwards and so on.
Apollo Server adds a human readable code to the extensions
field of an error object so the client can handle problems more easily. See here
.
Security
Regarding security there are two main concerns: handling auth and handling malicious actors. Let’s take a look at auth first.
Authentication and Authorization
Authentication is the question of who the user is, authorization is about what he is allowed to do.
Authentication is handled best by putting a user
object on the GraphQL context. Then every field resolver can check if an user exists (is authenticated) or not. Some fields can be open to all users (a preview of a news article) while others are only allowed to be accessed by logged-in users.
How the user is put on the context is quite flexible and can be handled outside GraphQL. For example you might have an Express middleware like Passport
that runs before your GraphQL middleware and puts the user
object on the request
object. The context creating function then has access to the request
object and can just pluck it from there.
Here’s a simple example with Apollo Server
const server = new ApolloServer({
typeDefs,
resolvers,
context({req}) {
const token = req.header.authorization
const user = getUserFromToken(token)
return {...db, user, createToken}
}
})
Regarding authorization you have multiple options. The cleanest would be to let your business logic layer handle it. This makes sense as you might have other entrypoints besides your GraphQL API and thus want to have a single source of truth of your authentication rules. So in your resolver you just delegate it like this:
starship: (parent, args, context) => {
return starshipRepository.get(context.user, parent)
}
Another option would be to wrap your resolvers in authorized and authenticated helper functions:
const authenticated = next => (root, args, context, info) => {
if (!context.user) {
throw new Error('not authenticated')
}
return next(root, args, context, info)
}
const authorized = (role, next) => (root, args, context, info) => {
if (context.user.role !== role) {
throw new Error(`Must be a ${role}`)
}
return next(root, args, context, info)
}
// a resolver example
const myResolver = authenticated((root, args, context, info) => {
...
})
Handling malicious actors
GraphQL APIs have some unique challenges that didn’t exist in simple REST APIs. One is that you can send queries that take a lot of processing time.
Take a look at the following, valid query:
{
Film(title:"A New Hope") {
characters {
films {
characters {
films {
characters {
films {
characters {
films {
# [...]
}
}
}
}
}
}
}
}
}
}
You could keep going as far as you want.
Options to handle this are
Timeout
You just stop any query that takes longer than x seconds of server time.
Limit query depth
There are utilites to analyze how deep a query is allowed to be (like this library ). You could say every query is only allowed 7 levels, for example.
Limit query complexity
You can calculate how complex the query is (e.g. with this library ) and only allow queries under a certain limit.
Rate Limiting
If you implement rate limiting you can expose this info to the client either with HTTP headers like X-RateLimit-Remaining
or even with an extra query field like GitHub does
.
Performance
The n+1 Problem and Loaders
Look at this innocent little query:
{
allFilms(first:3) {
characters {
name
}
}
}
If the API is backed by a relational database the most efficient way to get our data would be to select the IDs of the first films
select id
from films
limit 3;
and then grab the names
select name
from characters
where film_id in (those, author, ids);
GraphQL has a problem here, though. Every resolver only has access to their parent (ignoring context
), they don’t know anything about their “siblings”. So we have to make a request to the database for every character separately. Instead of two requests all in all, it could be dozens.
To solve this problem the dataloader was introduced. It’s actually quite simple: it collects all requests in a specific time frame (by default a single tick of the event loop ) and then sends a batched request.
Using it is highly recommended.
Inspiration
The GraphQL specification is found here and the reference implementation here .
To see a complete, production-grade application using GraphQL take a look at Spectrum .
Thanks for reading!