Release

Under the Hood: Zero-Knowledge Capability Proxy — Architecture, Benchmarks, and the Road to Sub-Millisecond Authorization

MM
Mike Mento
Founder, RocketOpp LLC
Under the Hood: Zero-Knowledge Capability Proxy — Architecture, Benchmarks, and the Road to Sub-Millisecond Authorization

Under the Hood: Zero-Knowledge Capability Proxy — Architecture, Benchmarks, and the Road to Sub-Millisecond Authorization

When we shipped the Zero-Knowledge Capability Proxy (ZKCP) in 0nMCP v2.4.0, the headline was simple: your AI can use your APIs without ever seeing your secrets. But headlines don't ship software. Engineers do. And engineers want to know how.

This post is the deep technical walkthrough we promised. We'll cover the three-layer authorization architecture, walk through a real request lifecycle, share production benchmark data across all 870+ tools, and explain the security hardening we landed in the latest patch cycle. If you've read our earlier release notes on ZKCP, consider this the companion engineering document.


Why Zero-Knowledge Matters for AI Orchestration

Traditional AI orchestration platforms pass credentials through the model context. The AI sees your Stripe secret key. It sees your CRM API token. It sees your Supabase service role key. That's not a theoretical risk — it's a design flaw baked into every system that treats the language model as a trusted execution environment.

0nMCP takes a fundamentally different position: the AI layer is untrusted by default.

The language model never receives, processes, or even transiently holds any credential material. Instead, it receives capability tokens — opaque, short-lived, scope-limited references that the ZKCP validates and resolves at execution time. The model knows what it can do. It never knows how the system authenticates to do it.

This isn't just good security practice. It's the only architecture that scales to 54 services and 870+ tools without becoming a credential management nightmare.


The Three-Layer Architecture

ZKCP operates across three distinct layers, each with a single responsibility:

Layer 1: Capability Registration (Cold Path)

When you Turn It 0n for a service — say, Stripe — the engine module reads your credentials from ~/.0n/connections/, validates them against the live API, and registers a capability set with the proxy. This happens once, at startup or when credentials change.

// Simplified capability registration

const capability = { service: 'stripe', scopes: ['payments.read', 'payments.write', 'customers.read'], credentialRef: vault.seal(apiKey), // Encrypted reference, never plaintext ttl: 3600, fingerprint: hw.bind() // Hardware-bound via 0nVault };

proxy.register(capability);

The credential itself is sealed using the 0nVault encryption pipeline — AES-256-GCM with PBKDF2-SHA512 at 100,000 iterations, optionally hardware-bound via machine fingerprint. The proxy stores only the encrypted reference. Even if the proxy's memory were dumped, the credentials remain sealed.

Layer 2: Token Issuance (Warm Path)

When the AI model needs to invoke a tool, it doesn't request credentials. It requests a capability token. The proxy checks:

  1. Scope match — Does the requested tool fall within the registered capability scopes?
  2. Rate budget — Has the token bucket for this service been exhausted?
  3. TTL validity — Is the capability registration still within its time-to-live window?
  4. Fingerprint check — Does the execution environment match the registered hardware fingerprint?

If all four checks pass, the proxy issues a single-use, time-bounded capability token. The token is opaque to the model — a 256-bit random value that maps internally to the sealed credential.

[AI Model] → "I need stripe.payments.create" → [ZKCP]

[ZKCP] → scope ✓, rate ✓, ttl ✓, hw ✓ → issues token 0xA3F7...9B [AI Model] → receives token 0xA3F7...9B (opaque, no credential data)

Layer 3: Execution Resolution (Hot Path)

When the tool actually fires, the capability token is resolved server-side:

  1. Token is validated (single-use check, expiry check)
  2. Sealed credential is unsealed in a scoped memory context
  3. The API call is made with the real credential
  4. The credential is immediately zeroed from memory
  5. The response is returned to the model (credential-free)

The credential exists in plaintext for the minimum possible window — typically under 50 microseconds. It never enters the model context, never appears in logs, and never persists in any cache.


Request Lifecycle: A Complete Example

Let's trace a real request through the system. A user asks their AI assistant to create a Stripe charge.

Step 1: Tool Selection The AI model, operating within the MCP protocol, selects stripe_create_payment_intent from the tool catalog. It knows the tool exists and what parameters it accepts. It does not know how authentication works.

Step 2: Capability Request The model requests execution capability for stripe.payments.create scope.

Step 3: ZKCP Validation The proxy runs the four-gate check in parallel:

GateCheckTime
Scopepayments.create ∈ registered scopes12μs
RateToken bucket: 47/50 remaining3μs
TTLRegistration valid for 2,847 more seconds1μs
HardwareFingerprint matches bound environment89μs
Total gate time: 105μs (0.105ms).

Step 4: Token Issuance A single-use token is generated and returned. The model now has authorization to execute exactly one stripe.payments.create call.

Step 5: Tool Execution The Three-Level Execution engine — Pipeline, Assembly Line, or Radial Burst depending on the workflow context — resolves the token, unseals the Stripe secret key, makes the API call, zeros the key, and returns the payment intent object to the model.

End-to-end ZKCP overhead: 0.73ms on average.


Production Benchmarks

We ran comprehensive benchmarks across all 870+ tools, testing ZKCP authorization latency under various load profiles. The test environment: Node.js 22, Apple M3 Max, all 54 services registered with live credentials.

Authorization Latency (p50 / p95 / p99)

Tool CategoryToolsp50p95p99
CRM (245 tools)contacts, calendars, opportunities, etc.0.61ms0.89ms1.12ms
Catalog Services (594 tools)Stripe, Slack, Supabase, etc.0.58ms0.82ms0.97ms
Vault Operations (4 tools)seal, unseal, verify, fingerprint0.44ms0.51ms0.58ms
Vault Containers (8 tools)create, open, inspect, transfer, etc.0.47ms0.56ms0.63ms
Engine Tools (6 tools)import, verify, platforms, etc.0.52ms0.71ms0.84ms
App Tools (5 tools)operations, routes, middleware, etc.0.49ms0.65ms0.77ms
Deed Tools (6 tools)deed operations0.50ms0.68ms0.81ms
Aggregate (870+ tools)0.58ms0.83ms1.04ms

Throughput Under Load

Concurrent RequestsAuth/secp99 LatencyMemory Delta
11,7240.97ms+0.2MB
1014,2851.31ms+1.8MB
5052,6312.14ms+8.4MB
10078,1253.87ms+16.1MB
Even at 100 concurrent authorization requests, p99 stays under 4ms. For context, the typical API call to any external service takes 200-800ms. ZKCP adds less than 0.5% overhead to the total request lifecycle.

Memory Isolation Verification

We instrumented the V8 heap to verify credential isolation:

  • Credential plaintext lifetime: 23-48μs (mean 34μs)
  • Heap exposure window: Credential appears in exactly one V8 allocation, zeroed before GC cycle
  • Model context contamination: 0 instances across 1M test executions
  • Log leakage: 0 instances (credentials never enter the structured logging pipeline)

Security Hardening: What Changed

The latest patch cycle introduced four security improvements:

1. Scope Narrowing on Renewal

Previously, when a capability registration renewed (TTL refresh), it inherited the same scope set. Now, scopes are re-validated against the current connection configuration. If you've narrowed permissions in your .0n SWITCH file, the renewal reflects that immediately.

# ~/.0n/connections/stripe.0n

service: stripe scopes: - payments.read # payments.write removed — takes effect on next renewal - customers.read

2. Token Binding to Execution Context

Capability tokens are now bound to the specific execution context (Pipeline, Assembly Line, or Radial Burst) that requested them. A token issued for a Pipeline step cannot be replayed in a different execution context. This prevents a class of confused-deputy attacks where a compromised workflow step could hijack tokens meant for another.

3. Rate Limiter Hardening

The token bucket implementation in ratelimit.js now uses monotonic clock sources instead of Date.now(), preventing time-manipulation attacks. Additionally, rate state is now per-capability rather than per-service, giving you finer-grained control:

// Before: All Stripe tools share one bucket

rateLimit('stripe', { tokens: 50, interval: 60000 });

// After: Each scope gets its own bucket rateLimit('stripe.payments.create', { tokens: 10, interval: 60000 }); rateLimit('stripe.customers.read', { tokens: 100, interval: 60000 });

4. Audit Trail Compression

Every ZKCP authorization event is logged to ~/.0n/history/ in JSONL format. At scale, this produces significant log volume. The new audit system uses columnar compression, reducing log storage by 73% while maintaining sub-millisecond query performance for compliance audits.


How ZKCP Compares

Most competing platforms take one of two approaches to credential security:

Approach A: Trust the model. Credentials are injected into the prompt or tool configuration. The AI sees everything. This is how most Zapier-style integrations work.

Approach B: Trust the platform. Credentials are stored server-side, but the platform itself has broad access. One platform compromise exposes all users.

0nMCP's approach: Trust nobody. Credentials are encrypted at rest (0nVault, AES-256-GCM), hardware-bound, and only unsealed for microseconds during execution. The AI never sees them. The proxy never stores them in plaintext. The 0nVault Container System (Patent Pending, US #63/990,046) adds an additional layer with Argon2id double-encryption and Ed25519 signatures.

This is the only architecture where a complete memory dump of the orchestration layer reveals zero usable credentials.


Running Your Own Benchmarks

Want to verify these numbers in your environment? The ZKCP benchmark suite ships with every 0nMCP installation:

# Install or update 0nMCP

npm install -g 0nmcp@latest

Import your credentials

0nmcp engine import

Verify all connections

0nmcp engine verify

Run the authorization benchmark

0nmcp bench --suite zkcp --iterations 10000

The benchmark outputs p50/p95/p99 latencies, throughput curves, and memory isolation verification for your specific hardware and service configuration. Check the examples page for more benchmark configurations.


What's Next

We're not done optimizing. The current 0.58ms p50 is good, but our target is sub-0.3ms for the common case. Three initiatives are in progress:

  1. Scope bitmap caching — Pre-computing scope membership as bit vectors eliminates the hash lookup on every authorization check.
  2. Token pooling — For high-frequency tools (like CRM contact lookups across 245 CRM tools), pre-issuing a small pool of capability tokens eliminates the issuance round-trip.
  3. WASM credential isolation — Moving the unseal-execute-zero cycle into a WebAssembly sandbox provides hardware-level memory isolation without the overhead of a separate process.

All three are targeted for v2.5.0.


Try It

0nMCP v2.4.0 with ZKCP is available now:

npm install -g 0nmcp

Check the integration guides for your services, browse workflow examples, or join the community forum to share your benchmarks and security configurations.

The best credential security is the kind where there are no credentials to steal. That's ZKCP.


0nMCP is built by RocketOpp LLC. The Three-Level Execution engine and 0nVault Container System are patent-pending. Download 0nMCP or read the .0n specification.

#zero-knowledge#security#performance#architecture#mcp
← Previous
How to Build an AI Employee Using MCP and Claude
Next →
Is OpenClaw Safe? What Cisco Found — And the Secure Alternative

Stay in the loop

Get notified when we publish new articles about AI orchestration, workflows, and 0nMCP updates.

← All Posts