Today, we’re excited to introduce Hive Router, a new open-source GraphQL federation gateway.

GraphQL federation is now the standard way to scale GraphQL across services. Most teams want three things: speed, predictability, and no vendor lock-in.

Hive Router is our answer: an open-source, Rust-based Federation gateway with top-tier performance and full Apollo Federation compatibility - without tying your architecture to a single vendor.

Open Source
MIT licensed, transparent, and community-driven.
Apollo Federation
Fully compatible - works seamlessly with Federation standards.
Fast and Efficient
Designed in Rust for speed, low memory use, and efficiency.
Familiair Query Plans
Apollo-style query plans, no new concepts to learn.

If you’re using Apollo Router for your GraphQL federation setup today, this is the alternative you’ve been waiting for.

Unmatched Performance and Efficiency

Performance is where Hive Router truly shines. In our benchmarks, Hive Router sustained ~1830 requests per second (RPS) with a p95 latency of only ~48 ms, while keeping CPU and memory usage remarkably low (around 49 MB max memory). This means Hive Router can handle heavy GraphQL workloads with minimal latency and resource footprint.

Hive Router1831 rps
Cosmo Router586 rps
Grafbase Gateway461 rps
Apollo Router330 rps

To ensure consistency and reproducibility, the numbers presented here are from our newly open-sourced benchmark, run on an Azure Standard D4s v4 VM (4 vCPUs, 16 GiB memory) with Linux (Ubuntu 24.04). The benchmark website provides a complete breakdown of how we measured performance, the queries we used, and the subgraph setup, so you can replicate our findings.

Hive Router’s optimizations allow it to serve ~6x more traffic in our benchmark than Apollo Router at a fraction of the latency. In practice, adopting Hive Router can translate to snappier responses for end-users and lower infrastructure costs due to its efficient use of CPU and RAM.

Gateway
version
RPS
reqs/s
P95
ms
P99.9
ms
CPU
max %
MEM
max MB
Hive Router
main
1831.0948.5878.8416748
Cosmo Router
v0.247.0
585.79128.25348.17263119
Grafbase Gateway
v0.48.1
461.19137.81395.7313394
Apollo Router
v2.6.0
329.84196.46472.21270192
Results from our open-source benchmark

Under stress tests, Hive Router also maintained 100% reliability (no errors or timeouts) and lowest p99.9 latency, indicating robust handling of concurrent requests. Whether you’re dealing with bursty traffic or high sustained load, Hive Router’s Rust engine keeps throughput high and tail latencies low.

Hive Routerp95 48.6 ms | p99.9 78.8 ms
Cosmo Routerp95 128.3 ms | p99.9 348.2 ms
Grafbase Gatewayp95 137.8 ms | p99.9 395.7 ms
Apollo Routerp95 196.5 ms | p99.9 472.2 ms
Axis scaled to the highest p99.9 across gateways

We’ve optimized everything from the networking to the execution engine for speed - so your federated GraphQL API remains performant at scale.

Of course, raw speed isn’t the only concern - correctness and spec compliance are equally important.

Built on a GraphQL Federation Audit

Over a year ago, we leveraged our GraphQL federation experience to create a comprehensive GraphQL Federation Audit tool. This audit was directly informed by real-world usage patterns and customer feedback: every test case in the suite was crafted based on scenarios we encountered through our clients’ federated GraphQL implementations.

We released this auditing tool (open-sourced for the community) well ahead of our rust gateway, ensuring we had a proven, foundation to build on. This approach meant that our development was based on actual insights from the field, rather than guesswork.

Crucially, the audit became the bedrock of our gateway’s development. Instead of relying on speculation or trial-and-error, we used the audit’s extensive test suite (over 180 test cases across ~40 test suites) as a base to implement and verify every aspect of Apollo Federation compatibility.

GatewayCompatibilityTest CasesTest Suites
Hive Router100.00%18942
Apollo Router97.88%1854402
Cosmo Router94.71%17910366
Grafbase Gateway90.48%17118357
Results from our open-source audit

Thanks to this audit-driven approach, our gateway was built from day one to meet the highest standards of correctness and completeness. In fact, it currently passes 100% of the Federation compliance tests (189 out of 189 test cases) in our audit.

What About Our JavaScript-Based Hive Gateway?

For those wondering about our existing JavaScript-based Hive Gateway, it isn’t going anywhere. We continue to offer and support both solutions. Hive Router is designed for users who need the highest level of performance, while our Node.js gateway remains a fantastic option for teams prioritizing easy integration with the JavaScript ecosystem. This allows you to choose the best gateway for your needs.


Why Is Our GraphQL Router So Performant?

Our GraphQL router was engineered from the ground up with performance as a top priority. In this chapter, we detail the key design decisions and optimizations that make it exceptionally fast and efficient.

We chose a high-performance systems language, optimized memory usage, implemented a parallel execution model, and fine-tuned networking and concurrency - all validated by real-world benchmarks. The result is a federated GraphQL gateway that delivers higher throughput and lower latencies than existing solutions, without sacrificing reliability or spec compliance.

Choosing Rust for Speed (No Garbage Collector)

One of the first decisions was to implement the router in Rust. Rust provides C++ level performance while eliminating the overhead of a garbage-collected runtime. Unlike languages with automatic garbage collection (e.g. JavaScript, Java or Go), Rust doesn’t pause execution for GC, which means no unpredictable latency spikes and lower CPU overhead. Memory is managed through Rust’s compile-time ownership system, so memory is freed deterministically when it’s no longer needed - with zero runtime GC cost. This gives us consistent low-latency performance, an important factor for a gateway that must respond to requests quickly.

Efficient Memory Management: Arena Allocations and Zero-Copy JSON

Memory allocation patterns have a huge impact on performance. We use arena allocation strategies (also known as region-based memory management) for request handling. In an arena allocator, we pre-allocate a block of memory and then create all related objects within that block. Allocation becomes a simple pointer bump, and freeing everything is one operation when the request completes. Both allocating and deallocating many small objects via an arena is much faster than using the general heap for each object.

Subgraph response as JSON

{"user":{"id":1,"name":"Ada","email":"ada@lab"},"active":true}

Subgraph response is stored as bytes

00000000000000000000000000000000000000000000000000000000000000

Colored bars are byte slices. Final response references these offsets; no copies.

Final Response (reuses the bytes from subgraph response)

{
  "user":{
    "id":1A: 1414,
    "name":"Ada"A: 2426
  },
  "active":trueA: 5760
}

Another optimization is zero-copy JSON processing for subgraph responses. When our router receives JSON data from underlying services, we avoid unnecessary serialization/deserialization passes. Instead of first parsing into a generic JSON tree and then copying data into our response, we directly stream data into our internal structures. By doing this, we eliminate redundant data copies and transformations. In practice, this meant not building large in-memory JSON DOMs, and it allowed us to insert subgraph results into the final response with minimal overhead. This one-pass parsing saves CPU time and memory, contributing to our router’s high throughput.

Parallel Query Execution with Wave Planning

GraphQL federation often requires orchestrating multiple subgraph calls to resolve a single client query. Our router executes query plans in waves of parallel operations, maximizing concurrency where possible. We modeled our execution after the Apollo Federation approach of having Parallel and Sequence sections in the plan. In simple terms, each “wave” corresponds to a set of subgraph fetches that can run concurrently, and waves are executed one after another in sequence when there are dependencies. This ensures we exploit all independent fetches in parallel before moving to the next stage that might depend on those results.

Hive Router produces a similar plan structure to Apollo Router, so you will feel right at home. Here’s an example of a query plan.

QueryPlan { Sequence { Fetch(service: "products") { { products { __typename upc } } }, Flatten(path: "products.@") { Fetch(service: "reviews") { { ... on Product { __typename upc } } => { ... on Product { reviews { author { username } } } } } } } }

Tuned I/O and Connection Management

Networking is another area we aggressively optimized. Our router is I/O-bound - it needs to accept client requests and fetch data from multiple services quickly. We built on Rust’s asynchronous runtime (Tokio) and the highly efficient Hyper HTTP library, which are known for handling massive numbers of connections with minimal overhead. The router uses non-blocking I/O, meaning threads are never stuck waiting on network calls, instead, they can handle other tasks while awaiting responses. This lets a single machine handle many concurrent GraphQL requests effectively.

We also implemented connection pooling for outgoing calls to subgraphs. Rather than establishing a new HTTP connection for every subgraph request, the router reuses persistent connections. By maintaining a pool of keep-alive connections to each downstream service, we can send GraphQL requests back-to-back without delay. This also helps avoid running out of ephemeral ports (the short-lived network ports that can get exhausted with too many new connections) or other system resources that too many new connections can consume (the stress test scenario in our benchmark covers this).

Lock-Free Concurrency

High concurrency can introduce contention if not managed properly. We designed the router to avoid global locks and heavy contention points. Each request is largely handled in isolation, and shared data structures are read-only to minimize synchronization. Our internal response assembly completely avoids mutex locks on the hot path. This lock-free approach means threads don’t block each other, allowing near-linear scaling with core count. As a result, even under parallel load, CPU cores spend time doing useful work rather than waiting on locks. This is one reason our router’s CPU utilization is very efficient in benchmarks - we achieve high RPS with relatively low CPU% because there’s little contention.

Real-World Profiling and Benchmark Results

We did not rely solely on theory or micro-benchmarks – our performance optimizations were guided by extensive profiling and real-world testing. The team collected traces of actual federated workloads running in production-like environments (multiple subgraphs, realistic query patterns, concurrent users) and used them to fine-tune the router. By analyzing flame graphs and latency distributions on dedicated VMs, we identified bottlenecks and verified that each optimization resulted in noticeable improvements. For example, profiling revealed how much time was spent in JSON serialization, which justified our zero-copy parsing approach.

We built a comprehensive benchmark suites ([1], [2]) with multiple scenarios to validate performance across different situations. These scenarios included:

  • Baseline throughput under constant load (to measure raw requests per second and latency).
  • Peak load bursts with high concurrent clients (to ensure reliability and no failures under stress).
  • Upstream delay scenarios where subgraphs respond slowly or have added network latency (to test backpressure and see how the router queues and maintains throughput).
  • Complex query scenarios with deep nesting and multiple subgraph hops (to test the effectiveness of our parallel execution waves and merging logic).
  • Resource usage tests to record memory footprint and CPU usage at steady state.

We ran these benchmarks on isolated hardware to get consistent, unbiased results.

Summary

The outstanding performance of our GraphQL router is no accident – it’s the result of deliberate choices at every layer of the stack. We built a system that handles federated GraphQL queries with exceptional efficiency, making our router a great choice for production environments that demand speed at scale.

With this solid foundation, users can expect faster responses, lower infrastructure costs (since a single router instance can do the work of many), and the ability to confidently handle large supergraphs or high request volumes.

Being open-source, Hive Router’s development is transparent and community-driven. We invite developers to report issues, suggest improvements, and even contribute code. Our goal is to build a GraphQL federation router that the community can rely on and help shape. The Guild will continue to maintain Hive Router with the same commitment we give to our other popular projects (like GraphQL Codegen and Yoga) - you can count on long-term support and integration with the broader GraphQL Hive platform (for schema registry, analytics, etc.).

Last updated on