Relay North · SDK Evaluation
HTML slide deck ← / → navigate Home / End jump
Strategic Review

Choosing among five options through the agent-access-architecture lens

A grounded walkthrough of Vercel AI SDK, Claude Agent SDK, Gas City, Cloudflare Agents, and the new OpenAI Agents SDK sandbox path — updated to reflect the repo’s current state: five concrete implementations, four agent-runtime paths, and one new Python sandbox spike.

5 implementation options Repo status refreshed Runnable examples included Architecture-aware comparison Security + runtime tradeoffs

Thesis

Once you factor in agent -> capability gateway -> domain service -> real system, the winner depends on whether you’re choosing a reasoning framework or a runtime foundation.

Best pure agent framework: Claude Agent SDK Best integrated runtime bet: Cloudflare Agents Best Python sandbox workstation: OpenAI Agents SDK Best app-layer toolkit: Vercel AI SDK Best orchestration substrate: Gas City
Prepared for Schlop City / Relay North exploration 1 / 13
What we are optimizing for

Workflow requirements and architecture constraints

Workflow requirements from the repo

  • Artifact-driven pipeline: qualification report, PM brief, site preview, QA verdict, outreach packet, learning event.
  • Two-stage qualification: deterministic prescreening plus LLM judgment.
  • Specialized stage ownership with explicit build / QA separation.
  • Bounded revision loop with machine-readable failure reasons.
  • Ready-for-reachout gate after QA and required contact data.
  • Durable state, retries, and resumability.
  • Concrete repo proof now exists for all five options, not just theoretical fit.

Security architecture constraints

  • Agents should request capabilities, not hold raw secrets.
  • Reasoning must be separated from privileged execution.
  • All side effects should be explicit, validated, logged, and often approval-gated.
  • The policy-enforced capability gateway is the trust boundary — not MCP, not the model SDK itself.
  • Use domain-scoped internal services for DB, payments, messaging, deployments, and admin actions.
The architecture lens changes the ranking. 2 / 13
Decision frame

What the architecture lens rewards

DurabilityCan one lead map cleanly to resumable state and workflow progress?
Control planeIs there a natural place for approval, status, audit, and operator actions?
IsolationCan agents stay narrow while services own secrets and execution?
PragmatismCan the team ship without adopting unnecessary platform complexity?

Penalized

  • SDKs that only help with prompting and typed outputs.
  • Stacks where approvals, audit, persistence, and identity must all be bolted on later.

Rewarded

  • Durable per-entity runtime state.
  • Resumable orchestration with explicit gates.
  • A clean split between reasoning and execution.

Result

We must distinguish between a great SDK for agent reasoning and a great platform for securely running agents.

Same workflow, different layers of the stack. 3 / 13
Option 1

Vercel AI SDK — strongest app-layer artifact generator

Example in this repo

src/vercel-ai-sdk/workflow.ts uses real generateObject(...) calls for qualification, PM brief creation, site generation, revision, and outreach packet creation.

const { object } = await generateObject({ model, schema: siteArtifactOutput, prompt: `Generate the complete website artifact...` });

It is runnable once ANTHROPIC_API_KEY is set, and it writes a real structured workflow result plus generated HTML artifacts to disk.

Pros

  • Excellent typed outputs and schema-first stage contracts.
  • Low complexity and easy integration into product code.
  • Good for deterministic app orchestration around model calls.

Cons through the architecture lens

  • No built-in durable workflow runtime.
  • No real concept of per-agent identity or approval gates.
  • Not a trust boundary; you still need gateway, services, persistence, audit, and sandboxing.

Best fit

Best as a component, not as the system foundation.

Use it when you already have strong backend security and workflow infrastructure, and you mainly need typed generation inside that safer system.

Summary: great artifact production layer, weak autonomous runtime story. 4 / 13
Option 2

Claude Agent SDK — best match for orchestrator + specialists

Example in this repo

src/claude-agent-sdk/workflow.ts defines specialist agents like qualifier, brief_writer, website_builder, and outreach_operator.

const run = query({ prompt, options: { agent, agents: specialistAgents, outputFormat: { type: 'json_schema', schema } } });

It now runs as a real end-to-end example with structured JSON output and generated HTML artifacts when Claude auth works in the shell.

Pros

  • Natural subagent model for stage specialization.
  • Strong fit for tool-mediated autonomous work.
  • Keeps the workflow aligned with the real agent-runtime problem.

Architecture caveat

  • Still not the trust boundary.
  • Can become dangerous if agents get direct shell, repo, browser, or secret-heavy access without a gateway discipline.
  • Needs external capability gateway + domain services to stay secure.

Best fit

Best reasoning layer if you explicitly enforce:

  • narrow artifact context,
  • tool calls only to capability surfaces,
  • no raw credentials,
  • restricted sandboxes and workspaces.
Summary: strongest agent framework, but execution must stay behind services. 5 / 13
Option 3

Gas City — orchestration substrate with platform-style control

Example in this repo

The gascity/ implementation uses a city config, formulas, prompts, and stage workers to instantiate the same Relay North pipeline with durable artifact handoffs.

The current version is actually runnable end-to-end, auto-installs missing local tooling, routes steps to Claude-backed specialist workers, and writes a workflow-summary.json proof artifact.

Pros

  • Explicit orchestration separate from app code.
  • Clear routing, stage ownership, and workflow externalization.
  • Closer in spirit to a true control plane than app-layer SDKs.

Cons

  • Highest operational and ecosystem complexity here.
  • Feels more like adopting an orchestration operating system than choosing a library.
  • Still requires deliberate capability-gateway and service-boundary design for security.

Best fit

Best when orchestration itself is the product challenge.

Good if you expect many workflow variants and want a real substrate, but probably heavier than needed at the current product stage.

Summary: architecturally serious, pragmatically heavy. 6 / 13
Option 4

Cloudflare Agents — best integrated runtime candidate

Example in this repo

The Cloudflare spike maps one lead to one durable Agent and uses Workflows for the bounded execution path.

async startWorkflow() { const workflowRunId = await this.runWorkflow('LEAD_WORKFLOW', { leadId }); } async approveOutreach(reason?: string) { await this.approveWorkflow(this.state.workflowRunId!, { reason }); }

It includes status, approval, rejection, and artifact export control-plane routes — plus a deterministic proof that revision 1 fails QA, revision 2 passes, and invalid control-plane actions return 409.

Pros

  • Durable per-lead state is a first-class primitive.
  • Resumable orchestration and approval holds feel native.
  • Natural home for workflow identity, timeline events, and operator actions.
  • The local walkthrough already proves load-sample → start → awaiting_approval → approve/reject → artifacts.

Cons / cautions

  • The spike currently proves runtime ergonomics, not full production auth and service boundaries.
  • Adopting it is a platform decision, not just an SDK choice.
  • Introduces stronger vendor/runtime coupling than Claude or Vercel.

Best fit

Best integrated runtime foundation if you want the secure architecture to be reflected in the runtime itself.

Especially compelling when one workflow instance naturally maps to one durable runtime entity.

Summary: strongest runtime/control-plane story. 7 / 13
Option 5

OpenAI Agents SDK — best Python sandbox workstation candidate

Example in this repo

The new spike keeps orchestration in app code and uses sandbox agents as stage-specific workers over one shared workspace.

sandbox = await client.create(manifest=build_manifest()) qualification = await Runner.run(qualifier, prompt, run_config=RunConfig( sandbox=SandboxRunConfig(session=sandbox), group_id=lead.id, )) state = client.serialize_session_state(sandbox.state)

The implementation stops at the approval gate, serializes sandbox session state, and can resume later for outreach — which is exactly the shape the new sandbox release was meant to enable.

Pros

  • Filesystem + shell + memory make artifact-first workflows feel natural.
  • Host-owned orchestration matches the docs guidance for stateful app-controlled agent systems.
  • Serialized sandbox session state gives a real pause / resume story.
  • More portable than Cloudflare if you want sandboxed work without picking one edge runtime.

Cons / cautions

  • Python-first: adopting it here means accepting a polyglot runtime next to the TypeScript comparisons.
  • You still own the approval/control plane and secure service boundary design.
  • Less opinionated integrated runtime than Cloudflare Agents.

Best fit

Best Python sandbox workstation layer when agents need files, commands, packages, snapshots, and resumable host-controlled state.

Especially compelling for workstation-style tasks, code/workspace operations, and approval-gated long-running jobs.

Summary: strongest portable sandbox-agent option. 8 / 13
Comparative scoring

How the five options compare against architecture criteria

Option Reasoning fit Durability / resume Approval / control plane Security boundary support Complexity
Vercel AI SDK High for typed artifact generation Low Low Low without substantial custom infra Low
Claude Agent SDK Very high for orchestrator + specialists Medium Medium Medium if paired with gateway/services Medium
Gas City Medium High High Medium but design still required High
Cloudflare Agents High Very high Very high High as an integrated runtime shape Medium
OpenAI Agents SDK High for sandboxed specialist work High High with host-owned approval flow High when paired with a capability gateway and narrow sandboxes Medium with a Python/runtime split
The winner changes depending on which layer we’re choosing. 9 / 13
Rankings

Different priorities produce different winners

1

Best reasoning/orchestration SDK

Claude Agent SDK, then OpenAI Agents SDK, Cloudflare Agents, Vercel AI SDK, Gas City.

2

Best secure runtime foundation

Cloudflare Agents, then OpenAI Agents SDK + capability gateway, Claude Agent SDK + custom gateway/services, Gas City, Vercel AI SDK.

3

Lowest complexity to ship

Vercel AI SDK, then Claude Agent SDK, Cloudflare Agents, OpenAI Agents SDK, Gas City.

Interpretation

Cloudflare wins when you ask, “What should host this securely?”

Claude wins when you ask, “What should do the reasoning work?”

OpenAI wins when you ask, “What should inspect files, run commands, and resume a sandboxed workspace in Python?”

Vercel wins when you ask, “What gets us a clean typed app integration fastest?”

Gas City wins when you ask, “Do we want an orchestration substrate as a strategic bet?”

Do not collapse these categories into one decision. 10 / 13
Recommendation

What I recommend for Relay North right now

Default recommendation

Claude Agent SDK remains the best default and the primary foundation recommendation if the goal is to move fast without prematurely locking the whole runtime to one platform.

  • Best fit for orchestrator + specialists.
  • Best fit for narrow artifact handoffs.
  • Best fit for build / QA / rebuild loops and multi-step agentic work.
  • Still the repo’s strongest overall choice if forced to pick one SDK today.
  • The new OpenAI sandbox path is promising, but strongest when the team is happy to operate a Python-first workstation layer.

Important condition

This only remains true if the team enforces the agent-access architecture:

  • agents request capabilities,
  • gateway enforces policy,
  • domain services own secrets and side effects,
  • sandboxes stay narrow.

Alternative strategic recommendation

If the team wants the runtime itself to embody durable state, approvals, workflow identity, and control-plane semantics, then Cloudflare Agents is still the most interesting next bet. If the team instead wants a portable sandboxed workstation layer with host-owned orchestration, the new OpenAI Agents SDK is now the clearest Python-first alternative.

Short version: Claude for reasoning, Cloudflare for runtime, OpenAI for sandboxed workstations. 11 / 13
Next step

The de-risking experiment to run next

Cloudflare path

  • Add a real capability gateway layer.
  • Make approval and caller identity real.
  • Put a fake-but-realistic domain service behind the gateway.
  • Prove agents never touch raw credentials.

OpenAI sandbox path

  • Run the new Python spike with a real API key and operator approval stop/resume.
  • Put the same capability gateway in front of every privileged action.
  • Compare serialized sandbox-state resume against Cloudflare workflow resume.
  • Measure whether the Python sandbox layer reduces glue code enough to justify a polyglot runtime.

Decision question

The next experiment should not be “another generic demo.” It should be a security-architecture prototype that compares Cloudflare’s integrated runtime against OpenAI’s sandbox-resume path while preserving the same capability-gateway trust boundary.

That will tell us whether Cloudflare’s integrated runtime or OpenAI’s sandbox-resume path buys more leverage for the trust boundary. 12 / 13
Appendix

Evidence used from the repo

Files inspected

  • requirements.md
  • agent-access-architecture.md
  • README.md
  • src/vercel-ai-sdk/workflow.ts
  • src/claude-agent-sdk/workflow.ts
  • cloudflare/README.md
  • openai-agents-python/README.md
  • openai-agents-python/relay_north_openai_agents/workflow.py
  • docs/plans/2026-04-14-cloudflare-agents-spike.md
  • cloudflare/src/lead-agent.ts
  • cloudflare/src/workflows/lead-workflow.ts

Key evidence points

  • The repo now contains five proof-of-concept implementations with concrete runnable paths.
  • The Cloudflare spike proves durable per-lead state, forced QA fail/pass, and an approval-gated workflow stop at awaiting_approval.
  • The OpenAI spike proves a clean Python sandbox-agent pattern with host-owned orchestration and serialized sandbox session state.
  • The Claude example most directly expresses orchestrator + specialists and remains the best single-SDK pick.
  • The Vercel example is strongest on typed app-layer generation but still needs surrounding runtime infrastructure.
  • Gas City is now actually runnable end-to-end, but remains the heaviest substrate-style option.
Slide 1 of 13Use arrow keys to navigate