// the_problem

Every distributed job system eventually hits the same wall.

You have an executor a third-party service, a community plugin, a sandboxed process and it needs credentials to do its job. A GitHub token. A database password. An API key. Without it, the executor can't function. But the moment you hand it over, you've lost control of it.

Most systems solve this by not solving it. They inject the secret into the environment and trust the execution context. CI runners get secrets as env vars. Plugin systems pass API keys directly to plugin code. LLM agent frameworks hand credentials straight to the agent and move on.

This works until it doesn't. A compromised executor, a log line with a secret in it, a misconfigured container and the credential is out. Worse, you often don't know for how long it's been exposed or how many times it's been used.

The pattern described here solves a specific problem:

How do you give an executor the ability to use a secret without ever sending it the secret?

// background

The pattern applies to any system where:

  • A central coordinator dispatches jobs to executors it doesn't fully control
  • Those executors need short-lived access to credentials to complete their job
  • The coordinator cannot trust the executor's environment unconditionally

This covers: AI agent platforms, plugin architectures, distributed CI systems, multi-tenant job runners, and any third-party execution model.

The two things that make this hard:

  1. The executor needs the real credential at runtime you can't fake it
  2. You don't want the credential to outlive the job or be reusable if stolen

Standard approaches fail on point 2. This pattern solves both.


// how_it_actually_works

The core idea: separate the credential pointer from the credential itself, and require proof of identity before releasing the value.

The coordinator sends the executor a signed request payload containing a reference string not the secret. Something like:

"github_token": "cref://tenant_abc/github_token"

That string is useless on its own. To get the real value, the executor must call a credential store with:

  • A one-time request token proving this job was legitimately issued
  • Its own registered client identity
  • The tenant scope it is authorised for

The credential store validates all three independently, marks the resolution as used, and returns a short-lived secret valid only for the expected duration of the job.

The executor uses the secret. The secret expires. Even if intercepted mid-flight, it's dead.

The communication path:

Coordinator
     │
     │  Builds + signs request payload
     │  (contains credential references, NOT credentials)
     ▼
  [mTLS encrypted channel]
     │
     ▼
Executor receives payload
     │
     │  Runs validation chain (8 steps)
     ▼
Credential Store
     │
     │  Validates request token + client ID + tenant scope
     │  Marks resolution as used (one-time)
     │  Returns short-lived credential
     ▼
Executor runs job
     │
     ▼
Returns signed response payload

// the_request_payload

Every job dispatch is a signed payload. This is the complete structure:

{
  "payload_version": "1.0",
  "request_id": "req_8f3kd92",
  "session_id": "sess_abc123_attempt1",
  "client_id": "data-sync-service-v1",
  "mode": "async",
  "scope_token": "scope::tenant_abc::sess_456",
  "webhook_url": "https://platform.example/webhooks/req_8f3kd92",
  "timeout_ms": 120000,
  "inputs": { "task": "sync_records", "source": "external_api" },
  "credential_refs": {
    "github_token": "cref://tenant_abc/github_token",
    "db_password": "cref://tenant_abc/db_password"
  },
  "request_token": "eyJhbGc...",
  "nonce": "a7f3k9x2",
  "signature": "hmac_sha256_of_full_payload"
}

Key fields:

Field Purpose
request_id Unique per dispatch. Ties nonce, audit log, replay detection together.
session_id Groups all executors in one logical session. Shared context namespace.
client_id Registered identity of the intended recipient. Verified against the executor itself.
mode sync or async. Tells the receiver which execution path to follow.
scope_token Access boundary. Executor can only read/write within its tenant + session scope.
credential_refs Pointer strings only. Never real values.
request_token RS256 JWT. One-time. 5 minute TTL. Signed by coordinator private key.
nonce Random. Burned on first use. Replay attacks blocked unconditionally.
signature HMAC-SHA256 of full payload. First check. Everything else conditional on this passing.

The credential_refs field is the core of the model. What travels in the payload is a string a pointer. The real credential never leaves the credential store until the executor earns it through the validation chain.


// the_8_step_validation_chain

Every incoming payload goes through this sequence in order. No step is optional. No shortcut exists. credential_refs are never resolved until Step 7.

On any failure at any step: hard reject, no credentials resolved, alert logged. The chain does not continue.


// the_6_layers

Strip away the job dispatch context and what you have is six independent security layers stacked on top of each other. Bypassing one does not help with the others.

LAYER 1 : HMAC SIGNATURE
  Every payload signed with a per-executor shared secret
  Cannot forge without knowing the key
  First check  everything else conditional on this passing

LAYER 2 : REQUEST TOKEN (RS256 JWT, one-time)
  Issued per job by coordinator
  5 minute TTL
  RS256: executors can verify but cannot create tokens
  Contains: request_id, client_id, tenant_id, expiry

LAYER 3 : NONCE BURN
  Random nonce stored in Redis before dispatch
  Burned on first receipt
  Replay impossible even with valid HMAC + valid JWT

LAYER 4 : CREDENTIAL STORE AUTH CHAIN
  credential_refs only resolvable with:
    valid request_token + matching client_id + matching tenant_id
  Credential store validates independently of the executor
  Credential TTL = expected job duration + 20% buffer
  One-time resolution per request per credential

LAYER 5 : MUTUAL TLS (mTLS)
  Both coordinator and executor present CA-signed certificates
  Endpoint URL alone is useless without a valid cert
  Man-in-the-middle impossible

LAYER 6 : NETWORK ISOLATION
  Executor has egress firewall
  Only declared domains reachable
  Compromised executor cannot exfiltrate to arbitrary endpoints

Each layer is independent. A stolen request token doesn't bypass the nonce burn. A valid nonce doesn't help without the HMAC. A valid HMAC without the request token doesn't unlock the credential store.


// the_shared_secret_lifecycle

The HMAC layer depends on a shared secret a unique cryptographic key between the coordinator and one specific executor. Every executor gets its own. No two executors share the same secret.

data-sync-service-v1   → secret: "sk_a7f3k9x2b8c4d1e6..."
image-processor-v1     → secret: "sk_m2n5p8q1r4s7t0u3..."
report-generator-v1    → secret: "sk_z9y6x3w0v7u4t1s8..."

Registration:

Executor registers with coordinator
        ↓
Coordinator generates unique shared secret:
  secret = crypto.randomBytes(32).toString('hex')
        ↓
Coordinator stores:
  Credential store → raw secret (coordinator uses to sign)
  Internal DB      → bcrypt hash only (audit/rotation)
        ↓
Returns to registrant ONE TIME ONLY:
{
  "client_id": "data-sync-service-v1",
  "shared_secret": "a7f3k9x2b8c4d1e6...",
  "note": "Store this securely. Not shown again."
}

Runtime coordinator signs every payload:

1. Fetch raw secret from credential store
2. Build payload
3. Compute canonical JSON
   (keys sorted alphabetically, no whitespace,
    signature field excluded)
4. Compute HMAC:
   signature = HMAC-SHA256(raw_secret, canonical_json)
5. Add signature to payload
6. Dispatch

Canonical JSON matters. Both sides must hash the exact same bytes. Sorting keys alphabetically ensures the coordinator and executor always compute the same hash regardless of serialisation order.

Executor verifies:

received_sig  = payload["signature"]
payload_copy  = {k: v for k, v in payload.items() if k != "signature"}
canonical     = json.dumps(payload_copy, sort_keys=True, separators=(',', ':'))
expected_sig  = hmac.new(shared_secret, canonical.encode(), sha256).hexdigest()

if not hmac.compare_digest(received_sig, expected_sig):
    raise HardReject("HMAC mismatch")

Secret rotation:

Registrant triggers rotation
        ↓
Coordinator:
  1. Generates new secret
  2. Updates credential store + internal hash
  3. Returns new secret ONE TIME
  4. Sets old secret grace TTL = 5 minutes
     (allows in-flight jobs to complete)
  5. After 5 minutes → old secret invalid
        ↓
Registrant:
  Updates shared secret in executor environment
  Redeploys
  New secret active
Location What is stored Why
Executor environment Raw secret To verify incoming payloads
Credential store Raw secret Coordinator uses to sign payloads
Internal DB Bcrypt hash only Audit and rotation never raw
Payloads Never Secrets never travel in any form
Logs Never Never logged anywhere, ever

// how_credential_resolution_actually_works

Step-by-step: how "cref://tenant_abc/github_token" becomes a real credential.

Step 1 : Executor receives payload

"credential_refs": {
  "github_token": "cref://tenant_abc/github_token"
},
"request_token": "eyJhbGc..."

Just strings. Steps 1–6 run first. Nothing is resolved yet.

Step 2 : Validation passes

All six checks cleared. Now credential resolution runs.

Step 3 : Client library calls credential store

GET http://localhost:8200/v1/secret/tenant_abc/github_token
Headers:
  X-Request-Token: eyJhbGc...
  X-Client-ID:     data-sync-service-v1
  X-Tenant-ID:     tenant_abc

Step 4 : Credential store validates independently

Check 1: Is request token signature valid?
  Credential store holds coordinator public key
  Verifies RS256 independently
  FAIL → 403

Check 2: Is request token expired?
  FAIL → 403

Check 3: Is this client authorised for this tenant?
  FAIL → 403

Check 4: Has this request already resolved this credential?
  Redis: GET cref_resolved:req_8f3kd92:github_token
  Already exists → one-time use violated
  FAIL → 403

Check 5: Does the credential exist?
  FAIL → 404

ALL PASS → release credential

Step 5 : Mark resolution as used

Redis: SET cref_resolved:req_8f3kd92:github_token 1 EX 300

This request can never resolve this credential again.

Step 6 : Return short-lived credential

{
  "secret": "ghp_actualRealToken123abc",
  "ttl": 54,
  "expires_at": 1720000354
}

TTL = expected job duration + 20% buffer. A job expected to run 45 seconds gets a credential valid for 54 seconds. After that dead, even if intercepted.

Step 7 : Injected into handler

def handle(inputs, credentials):
    token = credentials.get("github_token")
    # → "ghp_actualRealToken123abc"
    # Executor has no knowledge of the resolution chain

The executor gets the real value. Has no visibility into request tokens, nonces, or how the credential store works internally.


// attack_scenarios

Attack 1 : Direct call with no token

Attacker knows executor endpoint URL
Crafts a fake payload
Sends it directly

BLOCKED AT: Layer 1 (HMAC)
  No shared secret → cannot compute valid HMAC
  Executor rejects at Step 1
  credential_refs never touched

Attack 2 : Replay attack

Attacker captures a valid payload from the network
Resends it 1 second later

BLOCKED AT: Layer 3 (Nonce Burn)
  Nonce burned when original was processed
  Redis returns nil on second lookup
  Rejected at Step 3
  HMAC valid, JWT valid doesn't matter

Attack 3 : Stolen credential reference string

Attacker obtains: "cref://tenant_abc/github_token"
Calls credential store directly

BLOCKED AT: Layer 4 (Credential Store Auth)
  Where is your request token?
  Attacker has none
  403 Forbidden

Attack 4 : Stolen request token + credential reference

Attacker captures both from network
Calls credential store with valid request token

BLOCKED AT: Layer 4 (One-Time Resolution)
  Has req_8f3kd92 already resolved github_token?
  Redis: YES
  403 Forbidden
  Credential not released again

Attack 5 : Expired token replay

Attacker saves a valid payload
Waits 10 minutes
Tries to replay

BLOCKED AT: Layer 2 (JWT TTL)
  expires_at < now
  Rejected at Step 2
  Nonce check never reached

Attack 6 : Misdirected payload

Attacker captures payload intended for data-sync-service-v1
Sends it to report-generator-v1

BLOCKED AT: Step 5 (Client ID Check)
  Payload client_id: "data-sync-service-v1"
  This executor's ID: "report-generator-v1"
  Mismatch → hard reject

Attack 7 : Compromised executor exfiltration

Executor environment is compromised
Attacker tries to send resolved credentials to external server

BLOCKED AT: Layer 6 (Network Isolation)
  Egress firewall active
  Undeclared domain → connection blocked
  Data cannot leave to attacker's endpoint

// gotchas

The shared secret is the weakest point. It's a long-lived symmetric key sitting in an environment variable. If it leaks through a log line, a docker inspect, or a misconfigured secret manager an attacker can forge payloads. Rotation exists, but rotation is reactive. Treat the shared secret with the same care as a root certificate.

Canonical JSON is not optional. Both sides must hash identical bytes. Any difference in key ordering or whitespace produces a different HMAC and breaks validation silently. Pick a canonical form, document it, enforce it in both the coordinator and every client library.

Redis is a dependency of the nonce system. If Redis is unavailable, nonce validation cannot run. The safe failure mode is to reject all payloads not skip the check. A degraded nonce system that falls through to "accept" turns a temporary outage into a permanent replay window.

Short TTLs require clock sync. JWT expiry validation compares timestamps. If coordinator and executor clocks are skewed by more than the TTL buffer, valid tokens will be rejected. NTP sync is not optional in this architecture.

One-time resolution has no retry. If the executor resolves a credential and then crashes before using it the credential is gone for that request. A new request must be issued. Design your job retry logic around this.


// verdict

The pattern works because each layer answers a different question:

  • HMAC : was this payload built by someone who knows our shared secret?
  • JWT : was this job legitimately dispatched by the coordinator, recently?
  • Nonce burn : is this the first time this exact payload has been seen?
  • Client ID check : is this payload meant for me specifically?
  • Credential store auth : has this job earned the right to resolve this credential?
  • Short TTL : even if everything above fails, how long does the attacker have?
  • Network isolation : even if the executor is compromised, where can data go?

No single layer is sufficient. That's the point. An attacker who breaks one still faces six more independent checks. The credential never travels. It's issued, used, and expires in the time it takes the job to run.

The gap is operational. The shared secret is a long-lived symmetric key. That's the part that requires the most care in deployment not the pattern itself.

// knowledge should be free // code should be open // tools should serve people, not platforms

// end of transmission //