HandbookArchitectureVersioning schema releases

Versioning schema releases

This example demonstrates using a GitHub repo as a central registry that coordinates the versioning and release of subschemas. Similar to the goals of managed federation, a central registry allows subschemas to be precomposed and tested together before releasing into production. This isn’t prohibitively difficult to setup—a simple Git repo with some light code wrappings can generally get the job done as well or better than hosted services. Using a Git repo actually offers several distinct advantages:

  • Multiple subservice schemas may be composed on a branch together and released at once, affording the opportunity for hard schema cutovers across multiple services.
  • Comprehensive test suites may be written to assure the integrity of the composed gateway schema, and can easily run using continuous integration services. In fact, versioning subschemas in your gateway app’s repo allows CI to run tests using both your release candidate schemas and your actual gateway server code.
  • Git is a defacto-standard tool with numerous frontends available, and can be adapted to meet virtually any versioning and deployment need.

This example demonstrates:

  • Using GitHub API to manage a simple schema registry.
  • Hot reloading from a remote Git repo.
  • Running development and production environments.

Related examples:

Local Setup

cd versioning-schema-releases
yarn install

1. Configure a registry repo

This example uses live integration with the GitHub API. You’ll need to setup a repo:

  1. Create a new GitHub repo with a README (or other default file, so long as the repo is not empty). Public or private doesn’t matter.
  2. Create a personal access token with repo-only access.
  3. Make a copy of the provided /repo.template.json file and rename it /repo.json. This copy will contain secrets (so is git-ignored). Add the following:
  • owner: your GitHub username.
  • repo: name of the new registry repo.
  • token: your personal access token.
  • mainBranch: name of the default (master) branch in your repo; GitHub defaults to main, but you can customize that.
  • registryPath: a directory path to where versioned schemas will be placed in the repo.

2. Development server & initial schema commit

With that done, you’re ready to start development mode:

yarn start-development

The following services are available for interactive queries:

  • Development gateway: listening on 4000/graphql
  • Inventory subservice: listening on 4001/graphql
  • Products subservice: listening on 4002/graphql

Visit the gateway, and note that you have some development-only mutations available:

  • createSchemaReleaseBranch
  • updateSchemaReleaseBranch
  • createOrUpdateSchemaReleaseBranch
  • mergeSchemaReleaseBranch

Run createSchemaReleaseBranch to build a release candidate:

mutation {
  createSchemaReleaseBranch(name: "initial-release") {
    name
    sha
    url
    pullRequestUrl
  }
}

The name argument is a GitHub branch name for this release candidate. Running that mutation, you should get a response that includes the pullRequestUrl of a newly-created PR in the remote repo containing the initial schema version (if you don’t get a link, check your repo config). Review the PR on GitHub, then merge it! You can also get fancy and merge it using a local mutation:

mutation {
  mergeSchemaReleaseBranch(name: "initial-release") {
    name
    sha
  }
}

3. Production server

Now that you’ve comitted an initial version of your schema to the registry repo, you’re ready to start the production server (the initial commit must be made first so that your production server has a schema to load).

In a second terminal window:

yarn start-production

The following services are now available for interactive queries:

  • Production gateway: http://localhost:4444/graphql
  • Development gateway: listening on 4000/graphql
  • Inventory subservice: listening on 4001/graphql
  • Products subservice: listening on 4002/graphql

4. Hot-reload version releases

You’ll notice that the production server is polling the registry repo for version changes. Let’s give it something to find… go into the Products subservice and add a new dummy field to the Product type:

type Product {
  ...
  dummy: String
}

Now commit those changes to a release branch:

mutation {
  createOrUpdateSchemaReleaseBranch(name: "incremental-release") {
    name
    sha
    url
    pullRequestUrl
  }
}

Try adding another dummy field to the schema, and then run the above mutation again to add the additional change into the release branch. You can even make changes to both subservices and stage them together for release.

With all changes staged in your PR, watch the production gateway terminal logs and then merge the release PR (via GitHub UI or mergeSchemaReleaseBranch). You should see the production gateway cutover on the next polling cycle (remember, you’ll still need to reload GraphiQL to bring the revised schema into the UI):

version 1607572592741: 9d2a592013beddcca779e58d5f8fbb6930434ef4
version 1607572592741: 9d2a592013beddcca779e58d5f8fbb6930434ef4
VERSION UPDATE: d2709f88d9f60994d6d248902b8183534d4715a9
version 1607572603563: d2709f88d9f60994d6d248902b8183534d4715a9

Without polling

Polling is by no means necessary to trigger gateway schema reloads. An even simpler solution is to setup a dedicated mutation that reloads the gateway schema, and then call it manually or in response to deployment hooks. Try running the _reloadGateway mutation in this example to manually trigger a reload:

mutation {
  _reloadGateway
}

Of course, this sort of reload trigger should not be exposed in public APIs. You should remove this sort of utility from public-facing schemas, or use a separate non-GraphQL endpoint (such as a REST service) to trigger the action.

Summary

There’s a lot to be said about schema versioning and release strategies. Let’s be (fairly) brief and distill a few key points:

Post-deploy hooks don’t fix everything

Say you dilligently call mergeSchemaReleaseBranch (or a similar release command) from your subservice’s post-deploy hook… Neat! However, there will still be latency between that release and the next gateway polling interval. Even if you’re not polling, post-deploy hooks probably won’t time perfectly with your subservice(s) starting up, especially when deploying many instances. Long story short: there will always be latancy.

Therefore, you may find that post-deploy hooks aren’t a magic bullet for orchestrating seamless schema rollouts. Whether the revised gateway schema rolls out in seconds (thanks to post-deploy hooks), or minutes (until you press a merge button by hand), either scenario presents a window in which conflicting schema errors may occur. That said, releases really boil down to either being non-breaking or breaking; the later of which needs a maintenance window or a deeper strategy.

Zero-downtime rollouts follow a basic formula

The best release strategies will always deploy updated subservices quietly behind the gateway proxy layer in a way that activates new features without breaking existing ones. Following this pattern, it doesn’t really matter if a gateway schema rollout takes seconds or minutes after a subservice deploy:

  1. Deploy all updated subservices while keeping existing subservice features operational.
  2. Push all updated subservice schemas to the gateway as a single cutover.
  3. Decommission old subservices, and/or outdated subservice features.

Versioning subschemas with gateway code is a neat idea

Keeping your subschemas and gateway code versioned in proximity of one another using a shared repo or submodules actually solves a lot of problems:

  • Test coverage can be tailored to your application design; subschemas are composed together and run integration tests using your real application code, versus relying on a representational gateway test harness.
  • The production gateway may start itself up using a local copy of subschemas, versus relying on the availablity of a remote registry. In the event of a registry outage, the gateway is only ever blocked from performing live updates. It can always be redeployed and restarted safely.
  • Development environments can be run with only select subservices running (this is extremely important as your service cluster grows). The rest of the gateway schema can then be filled in with mocked subschemas from the local registry.