// the_problem
Every distributed job system eventually hits the same wall.
You have an executor a third-party service, a community plugin, a sandboxed process and it needs credentials to do its job. A GitHub token. A database password. An API key. Without it, the executor can't function. But the moment you hand it over, you've lost control of it.
Most systems solve this by not solving it. They inject the secret into the environment and trust the execution context. CI runners get secrets as env vars. Plugin systems pass API keys directly to plugin code. LLM agent frameworks hand credentials straight to the agent and move on.
This works until it doesn't. A compromised executor, a log line with a secret in it, a misconfigured container and the credential is out. Worse, you often don't know for how long it's been exposed or how many times it's been used.
The pattern described here solves a specific problem:
How do you give an executor the ability to use a secret without ever sending it the secret?
// background
The pattern applies to any system where:
- A central coordinator dispatches jobs to executors it doesn't fully control
- Those executors need short-lived access to credentials to complete their job
- The coordinator cannot trust the executor's environment unconditionally
This covers: AI agent platforms, plugin architectures, distributed CI systems, multi-tenant job runners, and any third-party execution model.
The two things that make this hard:
- The executor needs the real credential at runtime you can't fake it
- You don't want the credential to outlive the job or be reusable if stolen
Standard approaches fail on point 2. This pattern solves both.
// how_it_actually_works
The core idea: separate the credential pointer from the credential itself, and require proof of identity before releasing the value.
The coordinator sends the executor a signed request payload containing a reference string not the secret. Something like:
"github_token": "cref://tenant_abc/github_token"
That string is useless on its own. To get the real value, the executor must call a credential store with:
- A one-time request token proving this job was legitimately issued
- Its own registered client identity
- The tenant scope it is authorised for
The credential store validates all three independently, marks the resolution as used, and returns a short-lived secret valid only for the expected duration of the job.
The executor uses the secret. The secret expires. Even if intercepted mid-flight, it's dead.
The communication path:
Coordinator
│
│ Builds + signs request payload
│ (contains credential references, NOT credentials)
▼
[mTLS encrypted channel]
│
▼
Executor receives payload
│
│ Runs validation chain (8 steps)
▼
Credential Store
│
│ Validates request token + client ID + tenant scope
│ Marks resolution as used (one-time)
│ Returns short-lived credential
▼
Executor runs job
│
▼
Returns signed response payload
// the_request_payload
Every job dispatch is a signed payload. This is the complete structure:
{
"payload_version": "1.0",
"request_id": "req_8f3kd92",
"session_id": "sess_abc123_attempt1",
"client_id": "data-sync-service-v1",
"mode": "async",
"scope_token": "scope::tenant_abc::sess_456",
"webhook_url": "https://platform.example/webhooks/req_8f3kd92",
"timeout_ms": 120000,
"inputs": { "task": "sync_records", "source": "external_api" },
"credential_refs": {
"github_token": "cref://tenant_abc/github_token",
"db_password": "cref://tenant_abc/db_password"
},
"request_token": "eyJhbGc...",
"nonce": "a7f3k9x2",
"signature": "hmac_sha256_of_full_payload"
}
Key fields:
| Field | Purpose |
|---|---|
request_id |
Unique per dispatch. Ties nonce, audit log, replay detection together. |
session_id |
Groups all executors in one logical session. Shared context namespace. |
client_id |
Registered identity of the intended recipient. Verified against the executor itself. |
mode |
sync or async. Tells the receiver which execution path to follow. |
scope_token |
Access boundary. Executor can only read/write within its tenant + session scope. |
credential_refs |
Pointer strings only. Never real values. |
request_token |
RS256 JWT. One-time. 5 minute TTL. Signed by coordinator private key. |
nonce |
Random. Burned on first use. Replay attacks blocked unconditionally. |
signature |
HMAC-SHA256 of full payload. First check. Everything else conditional on this passing. |
The credential_refs field is the core of the model. What travels in the payload is a string a pointer. The real credential never leaves the credential store until the executor earns it through the validation chain.
// the_8_step_validation_chain
Every incoming payload goes through this sequence in order. No step is optional. No shortcut exists. credential_refs are never resolved until Step 7.
On any failure at any step: hard reject, no credentials resolved, alert logged. The chain does not continue.
// the_6_layers
Strip away the job dispatch context and what you have is six independent security layers stacked on top of each other. Bypassing one does not help with the others.
LAYER 1 : HMAC SIGNATURE
Every payload signed with a per-executor shared secret
Cannot forge without knowing the key
First check everything else conditional on this passing
LAYER 2 : REQUEST TOKEN (RS256 JWT, one-time)
Issued per job by coordinator
5 minute TTL
RS256: executors can verify but cannot create tokens
Contains: request_id, client_id, tenant_id, expiry
LAYER 3 : NONCE BURN
Random nonce stored in Redis before dispatch
Burned on first receipt
Replay impossible even with valid HMAC + valid JWT
LAYER 4 : CREDENTIAL STORE AUTH CHAIN
credential_refs only resolvable with:
valid request_token + matching client_id + matching tenant_id
Credential store validates independently of the executor
Credential TTL = expected job duration + 20% buffer
One-time resolution per request per credential
LAYER 5 : MUTUAL TLS (mTLS)
Both coordinator and executor present CA-signed certificates
Endpoint URL alone is useless without a valid cert
Man-in-the-middle impossible
LAYER 6 : NETWORK ISOLATION
Executor has egress firewall
Only declared domains reachable
Compromised executor cannot exfiltrate to arbitrary endpoints
Each layer is independent. A stolen request token doesn't bypass the nonce burn. A valid nonce doesn't help without the HMAC. A valid HMAC without the request token doesn't unlock the credential store.
// the_shared_secret_lifecycle
The HMAC layer depends on a shared secret a unique cryptographic key between the coordinator and one specific executor. Every executor gets its own. No two executors share the same secret.
data-sync-service-v1 → secret: "sk_a7f3k9x2b8c4d1e6..."
image-processor-v1 → secret: "sk_m2n5p8q1r4s7t0u3..."
report-generator-v1 → secret: "sk_z9y6x3w0v7u4t1s8..."
Registration:
Executor registers with coordinator
↓
Coordinator generates unique shared secret:
secret = crypto.randomBytes(32).toString('hex')
↓
Coordinator stores:
Credential store → raw secret (coordinator uses to sign)
Internal DB → bcrypt hash only (audit/rotation)
↓
Returns to registrant ONE TIME ONLY:
{
"client_id": "data-sync-service-v1",
"shared_secret": "a7f3k9x2b8c4d1e6...",
"note": "Store this securely. Not shown again."
}
Runtime coordinator signs every payload:
1. Fetch raw secret from credential store
2. Build payload
3. Compute canonical JSON
(keys sorted alphabetically, no whitespace,
signature field excluded)
4. Compute HMAC:
signature = HMAC-SHA256(raw_secret, canonical_json)
5. Add signature to payload
6. Dispatch
Canonical JSON matters. Both sides must hash the exact same bytes. Sorting keys alphabetically ensures the coordinator and executor always compute the same hash regardless of serialisation order.
Executor verifies:
received_sig = payload["signature"]
payload_copy = {k: v for k, v in payload.items() if k != "signature"}
canonical = json.dumps(payload_copy, sort_keys=True, separators=(',', ':'))
expected_sig = hmac.new(shared_secret, canonical.encode(), sha256).hexdigest()
if not hmac.compare_digest(received_sig, expected_sig):
raise HardReject("HMAC mismatch")
Secret rotation:
Registrant triggers rotation
↓
Coordinator:
1. Generates new secret
2. Updates credential store + internal hash
3. Returns new secret ONE TIME
4. Sets old secret grace TTL = 5 minutes
(allows in-flight jobs to complete)
5. After 5 minutes → old secret invalid
↓
Registrant:
Updates shared secret in executor environment
Redeploys
New secret active
| Location | What is stored | Why |
|---|---|---|
| Executor environment | Raw secret | To verify incoming payloads |
| Credential store | Raw secret | Coordinator uses to sign payloads |
| Internal DB | Bcrypt hash only | Audit and rotation never raw |
| Payloads | Never | Secrets never travel in any form |
| Logs | Never | Never logged anywhere, ever |
// how_credential_resolution_actually_works
Step-by-step: how "cref://tenant_abc/github_token" becomes a real credential.
Step 1 : Executor receives payload
"credential_refs": {
"github_token": "cref://tenant_abc/github_token"
},
"request_token": "eyJhbGc..."
Just strings. Steps 1–6 run first. Nothing is resolved yet.
Step 2 : Validation passes
All six checks cleared. Now credential resolution runs.
Step 3 : Client library calls credential store
GET http://localhost:8200/v1/secret/tenant_abc/github_token
Headers:
X-Request-Token: eyJhbGc...
X-Client-ID: data-sync-service-v1
X-Tenant-ID: tenant_abc
Step 4 : Credential store validates independently
Check 1: Is request token signature valid?
Credential store holds coordinator public key
Verifies RS256 independently
FAIL → 403
Check 2: Is request token expired?
FAIL → 403
Check 3: Is this client authorised for this tenant?
FAIL → 403
Check 4: Has this request already resolved this credential?
Redis: GET cref_resolved:req_8f3kd92:github_token
Already exists → one-time use violated
FAIL → 403
Check 5: Does the credential exist?
FAIL → 404
ALL PASS → release credential
Step 5 : Mark resolution as used
Redis: SET cref_resolved:req_8f3kd92:github_token 1 EX 300
This request can never resolve this credential again.
Step 6 : Return short-lived credential
{
"secret": "ghp_actualRealToken123abc",
"ttl": 54,
"expires_at": 1720000354
}
TTL = expected job duration + 20% buffer. A job expected to run 45 seconds gets a credential valid for 54 seconds. After that dead, even if intercepted.
Step 7 : Injected into handler
def handle(inputs, credentials):
token = credentials.get("github_token")
# → "ghp_actualRealToken123abc"
# Executor has no knowledge of the resolution chain
The executor gets the real value. Has no visibility into request tokens, nonces, or how the credential store works internally.
// attack_scenarios
Attack 1 : Direct call with no token
Attacker knows executor endpoint URL
Crafts a fake payload
Sends it directly
BLOCKED AT: Layer 1 (HMAC)
No shared secret → cannot compute valid HMAC
Executor rejects at Step 1
credential_refs never touched
Attack 2 : Replay attack
Attacker captures a valid payload from the network
Resends it 1 second later
BLOCKED AT: Layer 3 (Nonce Burn)
Nonce burned when original was processed
Redis returns nil on second lookup
Rejected at Step 3
HMAC valid, JWT valid doesn't matter
Attack 3 : Stolen credential reference string
Attacker obtains: "cref://tenant_abc/github_token"
Calls credential store directly
BLOCKED AT: Layer 4 (Credential Store Auth)
Where is your request token?
Attacker has none
403 Forbidden
Attack 4 : Stolen request token + credential reference
Attacker captures both from network
Calls credential store with valid request token
BLOCKED AT: Layer 4 (One-Time Resolution)
Has req_8f3kd92 already resolved github_token?
Redis: YES
403 Forbidden
Credential not released again
Attack 5 : Expired token replay
Attacker saves a valid payload
Waits 10 minutes
Tries to replay
BLOCKED AT: Layer 2 (JWT TTL)
expires_at < now
Rejected at Step 2
Nonce check never reached
Attack 6 : Misdirected payload
Attacker captures payload intended for data-sync-service-v1
Sends it to report-generator-v1
BLOCKED AT: Step 5 (Client ID Check)
Payload client_id: "data-sync-service-v1"
This executor's ID: "report-generator-v1"
Mismatch → hard reject
Attack 7 : Compromised executor exfiltration
Executor environment is compromised
Attacker tries to send resolved credentials to external server
BLOCKED AT: Layer 6 (Network Isolation)
Egress firewall active
Undeclared domain → connection blocked
Data cannot leave to attacker's endpoint
// gotchas
The shared secret is the weakest point. It's a long-lived symmetric key sitting in an environment variable. If it leaks through a log line, a docker inspect, or a misconfigured secret manager an attacker can forge payloads. Rotation exists, but rotation is reactive. Treat the shared secret with the same care as a root certificate.
Canonical JSON is not optional. Both sides must hash identical bytes. Any difference in key ordering or whitespace produces a different HMAC and breaks validation silently. Pick a canonical form, document it, enforce it in both the coordinator and every client library.
Redis is a dependency of the nonce system. If Redis is unavailable, nonce validation cannot run. The safe failure mode is to reject all payloads not skip the check. A degraded nonce system that falls through to "accept" turns a temporary outage into a permanent replay window.
Short TTLs require clock sync. JWT expiry validation compares timestamps. If coordinator and executor clocks are skewed by more than the TTL buffer, valid tokens will be rejected. NTP sync is not optional in this architecture.
One-time resolution has no retry. If the executor resolves a credential and then crashes before using it the credential is gone for that request. A new request must be issued. Design your job retry logic around this.
// verdict
The pattern works because each layer answers a different question:
- HMAC : was this payload built by someone who knows our shared secret?
- JWT : was this job legitimately dispatched by the coordinator, recently?
- Nonce burn : is this the first time this exact payload has been seen?
- Client ID check : is this payload meant for me specifically?
- Credential store auth : has this job earned the right to resolve this credential?
- Short TTL : even if everything above fails, how long does the attacker have?
- Network isolation : even if the executor is compromised, where can data go?
No single layer is sufficient. That's the point. An attacker who breaks one still faces six more independent checks. The credential never travels. It's issued, used, and expires in the time it takes the job to run.
The gap is operational. The shared secret is a long-lived symmetric key. That's the part that requires the most care in deployment not the pattern itself.
// knowledge should be free // code should be open // tools should serve people, not platforms
// end of transmission //