Docs

Discover how Phala's AI Agent Contract offers the essential tools to develop and profit from intelligent applications.

Explore Now

Performance Benchmark Results: SP1 zkVM on TEE H200 GPUs Hits <20% Overhead

2025-03-21

This research was funded by Succinct, whose support made this evaluation possible.

Introduction

If you’re working with zero-knowledge virtual machines (zkVMs) or exploring secure computing, you’ve likely run into the trade-off between performance and privacy. Our latest benchmark study at Phala Network funded by Succinct tackles that head-on, testing SP1 zkVM on NVIDIA H200 GPUs in a Trusted Execution Environment (TEE).

Key Takeaway: Overhead stays below 20% for complex workloads like zkEVMs, ZK-TLS, zkRollups and zkMLs, with plenty of room to scale!

View full benchmark study here, , or keep reading for a quick summary.

Why zkVMs and TEEs?

zkVMs like SP1 (from Succinct) let you write code in familiar languages and automatically generate zero-knowledge proofs—no deep crypto knowledge required. They’re great for verifiable apps like zkEVMs or rollups, but proof generation is computationally heavy, often pushing you to GPUs.

The catch: outsourcing that work can expose your data, since the execution circuit is public. TEEs fix that by isolating the computation, and NVIDIA’s H200 GPUs bring hardware-level encryption to the table.

We wanted to see how this combo holds up in practice!

The Benchmark Setup

Hardware Config: We ran SP1 zkVM on a rig with 2 Intel Xeon Platinum 8558 CPUs (96 cores, TDX enabled), 2TB RAM, and 8 NVIDIA H200 NVL GPUs—each with 141GB memory and 4.8TB/s bandwidth, set to Confidential Computing mode. For the test, we used a VM with 8 CPU cores, 32GB RAM, and a single H200 GPU pass-through.

Software Config: Our dstack SDK (built on private-ml-sdk and dstack) let us run Docker images in the TEE without tweaks. We pulled Succinct’s zkvm-perf codebase, updated via this PR.

For comparison with AWS hardware performance, we used Succinct's existing SP1 benchmark data.

What We Learned

H200 brings privacy and verifiability to SP1 with low overhead, while providing ample computational power and memory to support larger, more complex zkVM computational payloads.

The numbers tell a clear story:

  • Overhead Trends: TEE overhead—mostly from encrypted CPU-GPU data transfers—drops below 20% as workload complexity rises. It’s a fixed cost, so longer tasks dilute it. For lighter jobs, pipelining can trim it further.
  • Memory Usage: The H200’s 141GB memory was underutilized, averaging 10% across tests. That’s a lot of headroom for bigger circuits or datasets.
  • Performance Edge: Against AWS G6 (L4 GPU), the H200 pulls ahead on heavier workloads, balancing TEE security with raw power.
Figure 1: Performance comparison showing decreasing overhead on larger workload
Figure 1: Performance comparison showing decreasing overhead on larger workload
Figure 2: TEE performance overhead over AWS G6 2xlarge
Figure 2: TEE performance overhead over AWS G6 2xlarge

You can dig into the raw data here. The H200’s specs—3341 TFLOPS (INT8), 4.8TB/s bandwidth—make it a beast, and TEE mode doesn’t choke it as much as you’d expect.

Hardware Costs in Context

The H200 packs 6x the memory and compute for 3x the cost. If your workload leans on memory or scales up, it’s worth considering—especially with TEE baked in.

Here’s a quick look at the H200 versus AWS G6’s L4:

When evaluating the practicality of this approach, cost is an important factor. While the H200 represents the high-end of NVIDIA's GPU offerings, the cost-performance ratio remains favorable for many use cases:

GPU ModelMemoryBandwidthCompute (INT8)Hourly Cost
H200 NVL141GB4.8TB/s3341 TFLOPS$2.5
L4 (AWS G6)24GB300GB/s485 TFLOPS$0.805

H200 offers approximately 6x the compute power and nearly 6x the memory of the L4, at roughly 3x the cost, making it a cost-effective choice for memory-intensive and computationally demanding zkVM workloads. For many applications, the performance benefits and additional security guarantees justify the increased cost.

What This Means for Developers

For complex stuff—zkEVMs, rollups, or even ML inference—the low overhead makes TEE-enabled H200 a solid pick. The memory headroom hints at supporting larger circuits or multi-workload setups. We’ve seen it work firsthand with SP1, and our research paper dives deeper into TEE GPU behavior.

Phala Cloud’s also prepping GPU H200 support for now you can deploy your CVM in CPU TEE and test our TEE tech stack, you’ll soon be able to spin up zkVM workloads in a TEE without fuss. The dstack SDK on which the Phala Cloud TEE tech is derived from does the heavy lifting, letting you deploy as-is via Docker.

It’s not about selling you something—it’s about giving you tools to experiment with.

Use Cases to Explore

Here’s where it gets practical:

  1. Financial services: Process sensitive transaction data with both privacy and verifiability, enabling compliant yet confidential financial applications.
  1. Healthcare analytics: Analyze private medical data while providing cryptographic guarantees about the correctness of the analysis without exposing the underlying data.
  1. Confidential machine learning: Train and run inference on private models and data while providing verifiable guarantees about the model outputs.
  1. Decentralized identity systems: Process identity verification without exposing personal information, with cryptographic guarantees about the verification process.

The H200’s capacity suggests zkVMs haven’t hit their ceiling yet. If you’re tinkering with bigger ideas, this could be your sandbox.

Wrapping Up

Running SP1 zkVM on TEE-enabled H200 GPUs keeps overhead low and leaves room to grow—good news if you’re building secure, verifiable systems. Phala’s work here (shoutout to Succinct for funding) shows what’s possible when zkVMs meet modern hardware.

Curious? Check the full results or drop by the Phala community to chat about it.

GPU H200 support on Phala Cloud is coming—stay tuned if you want to give it a spin.

About Phala

Phala Network is a decentralized cloud that offers secure and scalable computing for Web3.

With Phat Contracts, an innovative programming model enabling trustless off-chain computation, developers can create new Web3 use cases.

Get the latest Phala Content Straight To Your Inbox.