Hotfixes for Shipped Apps with GraphQL Query Overrides

In earlier posts, we covered how Zalando uses persisted queries and schema stability and how GraphQL directives let us control server behavior through query metadata. In this post, we look at how “persisted query overrides” can re-route a query ID in production without changing the client.

The problem: you cannot update an App Store app

One of the core guarantees of our persisted queries system is that query IDs are stable and immutable. The query IDs are bundled into the mobile app at build time. In production, the GraphQL server accepts only these known IDs and executes the associated queries.

This setup gives us significant safety and observability guarantees. But it creates a very specific operational problem: what happens when a query that is already bundled in a shipped mobile app needs to change?

Consider this scenario: a new version of the Zalando app is released. A week later, the team realizes that a particular query bundled in that release has some subtle issues like:

it is triggering excessive load on a downstream system
a directive encoded rate limiting rule needs to be tightened
a field reports unexpected errors on the server side

Or another scenario: we still support an old version of the app that is still widely used and has a query that doesn’t quite work with the new changes in the GraphQL server. We want to re-route that query to a different compatible query to progress further with the new changes on the server.

The query ID is baked into the binary. Users cannot update their apps overnight. And even if they could, there is always a long tail of users running older versions.

Editing the query body associated with the ID is one option, but it comes with significant drawbacks.

Editing the query in place

The ID of the query is the hash of its body. ID → Query relationship is immutable and it gives us the ability to inspect what any client version is actually running. The moment we edit the body of the query, the ID becomes an opaque, mutable pointer. The ID “A” would be whatever we last decided to store under this key. It gives us problems in observability where query metrics are no longer directly useful based on their IDs.

The deeper problem isn’t messy history or awkward rollbacks. It’s that editing in place destroys the foundational guarantee that ID → Query is immutable. It is possible to build a complex system with history, tracked changes, observability pointing to the right version of the query, rollbacks, etc. But it ends up being too complex for the purpose we originally started this with.

In our systems, the clients do not store / retrieve the state between builds, and this makes persisting the queries a stateless operation. So, if the clients were to store the query IDs created in the previous build and use them in the next build, it makes the clients stateful and messy. To keep the clients stateless between builds, the GraphQL server recomputes the query hash and checks if it already exists. If the hash exists, the server simply returns the ID and does not create a new query. Adding an edit feature and breaking the ID → Query immutable contract would disturb this too.

Overrides: Re-routing, not editing

Overrides are re-routing pointers on the query record. When the client asks the GraphQL server to execute query A, we compile and execute query B if we have an override from source:A to target:B. Query A is left exactly as it was. Its body, hash, and other metadata are untouched. The override is a separate column that points to Query B. The identity of A never changes.

Query B is a real independent persisted query with its own hash and its own body. It can also be separately executed. Overrides help us keep the ID → Query contract intact for both queries. The clients remain stateless in creating persisted queries.

What an override looks like

An override is a field on the query record that points to another query ID. The source query A remains unchanged. Its body and hash are exactly as the client bundled them. Only the override pointer is new:

{
  "id": "a1b2c3",
  "body": "query productCard @maxCountInBatch(value: 100) { ... }",
  "override": "d4e5f6"
}

{
  "id": "d4e5f6",
  "body": "query productCard @maxCountInBatch(value: 10) { ... }",
  "override": null
}

When the client sends a request with id: "a1b2c3", the server checks the override field and transparently executes "d4e5f6" instead. From the client’s perspective, nothing has changed.

POST /graphql

{
  "id": "a1b2c3",
  "variables": { "id": "12345" }
}

The server resolves a1b2c3 → override → d4e5f6 and executes the target query with the same variables.

Query overrides are designed for a specific class of problems, not as a general-purpose query editing tool. In practice, we use them in situations like:

Fixing certain types of GraphQL query bugs in already released app versions.
Supporting older app versions during GraphQL server updates.
Changing directive-encoded values: e.g. @maxCountInBatch, @omitErrorTag, etc.
Temporarily suppressing errors.

Rate-limiting example

Say a query in a shipped app has a rate-limiting directive that turned out to be too permissive:

# Query A: already shipped, ID baked into the app binary
query productSearch($term: String!) @maxCountInBatch(value: 100) {
  search(term: $term) {
    name
  }
}

We cannot change this query or its directive in place. Instead, we persist a new query B with the tightened limit and create an override A → B:

# Query B: new, independently persisted
query productSearch($term: String!) @maxCountInBatch(value: 10) {
  search(term: $term) {
    name
  }
}

Error tagging example

Say a query in a shipped app causes a field to tag certain acceptable failures as errors and we continually get alerted.

# Query C: already shipped, ID baked into the app binary
query product($id: ID!) {
  product(id: $id) {
    id
    price
    reviews {
      starValues
    }
  }
}

Say we moved to a new reviews system available at the field reviewsAndRatings with richer content. We want to deprecate the old reviews field, but older apps are stuck with it. We have two options: continue supporting the old field for a long time, or create a degraded experience for older app users where reviews are suppressed and errors from that field are omitted.

Adding an override to omit errors would look like:

# Query D: new, with omitErrorTag
query product($id: ID!) {
  product(id: $id) {
    id
    price
    reviews @omitErrorTag {
      starValues
    }
  }
}

If the new schema for reviews can be made to support the older version, then simply aliasing it to the same data structure would be the alternative option for the override.

# Query E: new, with alternative override using aliases if the new schema is compatible
query product($id: ID!) {
  product(id: $id) {
    id
    price
    reviews: reviewsAndRatings {
      starValues: stars
    }
  }
}

Chain of overrides

To avoid a chain of overrides that is impractical to maintain for these use-cases, we enforce a policy that a query can participate in only one override - either as a source or as a target. So, if override A → B exists, we cannot add an override B → C or C → A, and also A → A.

Conclusion

Persisted query overrides are a narrow escape hatch, and not a general editing mechanism. The override is stored separately from the query body, so both the source and the target keep their own identity. Clients stay stateless when persisting queries, and we can roll back quickly by simply removing the override pointer. We use them sparingly for bugs in shipped queries, directive value fixes, or compatibility shims for older app versions.

If you have any comments, corrections, or questions about this post, feel free to reach out to me on Twitter - @heisenbugger.