Choosing among five options through the agent-access-architecture lens
A grounded walkthrough of Vercel AI SDK, Claude Agent SDK, Gas City, Cloudflare Agents, and the new OpenAI Agents SDK sandbox path — updated to reflect the repo’s current state: five concrete implementations, four agent-runtime paths, and one new Python sandbox spike.
Thesis
Once you factor in agent -> capability gateway -> domain service -> real system, the winner depends on whether you’re choosing a reasoning framework or a runtime foundation.
Workflow requirements and architecture constraints
Workflow requirements from the repo
- Artifact-driven pipeline: qualification report, PM brief, site preview, QA verdict, outreach packet, learning event.
- Two-stage qualification: deterministic prescreening plus LLM judgment.
- Specialized stage ownership with explicit build / QA separation.
- Bounded revision loop with machine-readable failure reasons.
- Ready-for-reachout gate after QA and required contact data.
- Durable state, retries, and resumability.
- Concrete repo proof now exists for all five options, not just theoretical fit.
Security architecture constraints
- Agents should request capabilities, not hold raw secrets.
- Reasoning must be separated from privileged execution.
- All side effects should be explicit, validated, logged, and often approval-gated.
- The policy-enforced capability gateway is the trust boundary — not MCP, not the model SDK itself.
- Use domain-scoped internal services for DB, payments, messaging, deployments, and admin actions.
What the architecture lens rewards
Penalized
- SDKs that only help with prompting and typed outputs.
- Stacks where approvals, audit, persistence, and identity must all be bolted on later.
Rewarded
- Durable per-entity runtime state.
- Resumable orchestration with explicit gates.
- A clean split between reasoning and execution.
Result
We must distinguish between a great SDK for agent reasoning and a great platform for securely running agents.
Vercel AI SDK — strongest app-layer artifact generator
Example in this repo
src/vercel-ai-sdk/workflow.ts uses real generateObject(...) calls for qualification, PM brief creation, site generation, revision, and outreach packet creation.
It is runnable once ANTHROPIC_API_KEY is set, and it writes a real structured workflow result plus generated HTML artifacts to disk.
Pros
- Excellent typed outputs and schema-first stage contracts.
- Low complexity and easy integration into product code.
- Good for deterministic app orchestration around model calls.
Cons through the architecture lens
- No built-in durable workflow runtime.
- No real concept of per-agent identity or approval gates.
- Not a trust boundary; you still need gateway, services, persistence, audit, and sandboxing.
Best fit
Best as a component, not as the system foundation.
Use it when you already have strong backend security and workflow infrastructure, and you mainly need typed generation inside that safer system.
Claude Agent SDK — best match for orchestrator + specialists
Example in this repo
src/claude-agent-sdk/workflow.ts defines specialist agents like qualifier, brief_writer, website_builder, and outreach_operator.
It now runs as a real end-to-end example with structured JSON output and generated HTML artifacts when Claude auth works in the shell.
Pros
- Natural subagent model for stage specialization.
- Strong fit for tool-mediated autonomous work.
- Keeps the workflow aligned with the real agent-runtime problem.
Architecture caveat
- Still not the trust boundary.
- Can become dangerous if agents get direct shell, repo, browser, or secret-heavy access without a gateway discipline.
- Needs external capability gateway + domain services to stay secure.
Best fit
Best reasoning layer if you explicitly enforce:
- narrow artifact context,
- tool calls only to capability surfaces,
- no raw credentials,
- restricted sandboxes and workspaces.
Gas City — orchestration substrate with platform-style control
Example in this repo
The gascity/ implementation uses a city config, formulas, prompts, and stage workers to instantiate the same Relay North pipeline with durable artifact handoffs.
The current version is actually runnable end-to-end, auto-installs missing local tooling, routes steps to Claude-backed specialist workers, and writes a workflow-summary.json proof artifact.
Pros
- Explicit orchestration separate from app code.
- Clear routing, stage ownership, and workflow externalization.
- Closer in spirit to a true control plane than app-layer SDKs.
Cons
- Highest operational and ecosystem complexity here.
- Feels more like adopting an orchestration operating system than choosing a library.
- Still requires deliberate capability-gateway and service-boundary design for security.
Best fit
Best when orchestration itself is the product challenge.
Good if you expect many workflow variants and want a real substrate, but probably heavier than needed at the current product stage.
Cloudflare Agents — best integrated runtime candidate
Example in this repo
The Cloudflare spike maps one lead to one durable Agent and uses Workflows for the bounded execution path.
It includes status, approval, rejection, and artifact export control-plane routes — plus a deterministic proof that revision 1 fails QA, revision 2 passes, and invalid control-plane actions return 409.
Pros
- Durable per-lead state is a first-class primitive.
- Resumable orchestration and approval holds feel native.
- Natural home for workflow identity, timeline events, and operator actions.
- The local walkthrough already proves
load-sample → start → awaiting_approval → approve/reject → artifacts.
Cons / cautions
- The spike currently proves runtime ergonomics, not full production auth and service boundaries.
- Adopting it is a platform decision, not just an SDK choice.
- Introduces stronger vendor/runtime coupling than Claude or Vercel.
Best fit
Best integrated runtime foundation if you want the secure architecture to be reflected in the runtime itself.
Especially compelling when one workflow instance naturally maps to one durable runtime entity.
OpenAI Agents SDK — best Python sandbox workstation candidate
Example in this repo
The new spike keeps orchestration in app code and uses sandbox agents as stage-specific workers over one shared workspace.
The implementation stops at the approval gate, serializes sandbox session state, and can resume later for outreach — which is exactly the shape the new sandbox release was meant to enable.
Pros
- Filesystem + shell + memory make artifact-first workflows feel natural.
- Host-owned orchestration matches the docs guidance for stateful app-controlled agent systems.
- Serialized sandbox session state gives a real pause / resume story.
- More portable than Cloudflare if you want sandboxed work without picking one edge runtime.
Cons / cautions
- Python-first: adopting it here means accepting a polyglot runtime next to the TypeScript comparisons.
- You still own the approval/control plane and secure service boundary design.
- Less opinionated integrated runtime than Cloudflare Agents.
Best fit
Best Python sandbox workstation layer when agents need files, commands, packages, snapshots, and resumable host-controlled state.
Especially compelling for workstation-style tasks, code/workspace operations, and approval-gated long-running jobs.
How the five options compare against architecture criteria
| Option | Reasoning fit | Durability / resume | Approval / control plane | Security boundary support | Complexity |
|---|---|---|---|---|---|
| Vercel AI SDK | High for typed artifact generation | Low | Low | Low without substantial custom infra | Low |
| Claude Agent SDK | Very high for orchestrator + specialists | Medium | Medium | Medium if paired with gateway/services | Medium |
| Gas City | Medium | High | High | Medium but design still required | High |
| Cloudflare Agents | High | Very high | Very high | High as an integrated runtime shape | Medium |
| OpenAI Agents SDK | High for sandboxed specialist work | High | High with host-owned approval flow | High when paired with a capability gateway and narrow sandboxes | Medium with a Python/runtime split |
Different priorities produce different winners
Best reasoning/orchestration SDK
Claude Agent SDK, then OpenAI Agents SDK, Cloudflare Agents, Vercel AI SDK, Gas City.
Best secure runtime foundation
Cloudflare Agents, then OpenAI Agents SDK + capability gateway, Claude Agent SDK + custom gateway/services, Gas City, Vercel AI SDK.
Lowest complexity to ship
Vercel AI SDK, then Claude Agent SDK, Cloudflare Agents, OpenAI Agents SDK, Gas City.
Interpretation
Cloudflare wins when you ask, “What should host this securely?”
Claude wins when you ask, “What should do the reasoning work?”
OpenAI wins when you ask, “What should inspect files, run commands, and resume a sandboxed workspace in Python?”
Vercel wins when you ask, “What gets us a clean typed app integration fastest?”
Gas City wins when you ask, “Do we want an orchestration substrate as a strategic bet?”
What I recommend for Relay North right now
Default recommendation
Claude Agent SDK remains the best default and the primary foundation recommendation if the goal is to move fast without prematurely locking the whole runtime to one platform.
- Best fit for orchestrator + specialists.
- Best fit for narrow artifact handoffs.
- Best fit for build / QA / rebuild loops and multi-step agentic work.
- Still the repo’s strongest overall choice if forced to pick one SDK today.
- The new OpenAI sandbox path is promising, but strongest when the team is happy to operate a Python-first workstation layer.
Important condition
This only remains true if the team enforces the agent-access architecture:
- agents request capabilities,
- gateway enforces policy,
- domain services own secrets and side effects,
- sandboxes stay narrow.
Alternative strategic recommendation
If the team wants the runtime itself to embody durable state, approvals, workflow identity, and control-plane semantics, then Cloudflare Agents is still the most interesting next bet. If the team instead wants a portable sandboxed workstation layer with host-owned orchestration, the new OpenAI Agents SDK is now the clearest Python-first alternative.
The de-risking experiment to run next
Cloudflare path
- Add a real capability gateway layer.
- Make approval and caller identity real.
- Put a fake-but-realistic domain service behind the gateway.
- Prove agents never touch raw credentials.
OpenAI sandbox path
- Run the new Python spike with a real API key and operator approval stop/resume.
- Put the same capability gateway in front of every privileged action.
- Compare serialized sandbox-state resume against Cloudflare workflow resume.
- Measure whether the Python sandbox layer reduces glue code enough to justify a polyglot runtime.
Decision question
The next experiment should not be “another generic demo.” It should be a security-architecture prototype that compares Cloudflare’s integrated runtime against OpenAI’s sandbox-resume path while preserving the same capability-gateway trust boundary.
Evidence used from the repo
Files inspected
requirements.mdagent-access-architecture.mdREADME.mdsrc/vercel-ai-sdk/workflow.tssrc/claude-agent-sdk/workflow.tscloudflare/README.mdopenai-agents-python/README.mdopenai-agents-python/relay_north_openai_agents/workflow.pydocs/plans/2026-04-14-cloudflare-agents-spike.mdcloudflare/src/lead-agent.tscloudflare/src/workflows/lead-workflow.ts
Key evidence points
- The repo now contains five proof-of-concept implementations with concrete runnable paths.
- The Cloudflare spike proves durable per-lead state, forced QA fail/pass, and an approval-gated workflow stop at
awaiting_approval. - The OpenAI spike proves a clean Python sandbox-agent pattern with host-owned orchestration and serialized sandbox session state.
- The Claude example most directly expresses orchestrator + specialists and remains the best single-SDK pick.
- The Vercel example is strongest on typed app-layer generation but still needs surrounding runtime infrastructure.
- Gas City is now actually runnable end-to-end, but remains the heaviest substrate-style option.