Shipping a SaaS from my phone

I built Runvault with Claude Code agents doing most of the typing, often while away from my laptop. The hard part wasn't getting the agents to write code — it was making sure I could actually verify their work from wherever I happened to be. This is the Railway and Cloudflare setup that made that loop work.

The loop I was optimizing for

When most of your code is being written by agents, the bottleneck in your day stops being typing speed and starts being verification. The agent opens a PR. You need to know: does this thing actually work end-to-end, against a real database, with real auth, calling the real third-party APIs? Did it break something else?

"Read the diff" is not enough. "Run it locally" works at a desk, sometimes. "Trust the tests" works for a while and then catches up with you. The version of the loop I wanted is:

  1. Agent opens a PR.
  2. CI passes.
  3. A real, isolated copy of the entire app — UI, API, background worker, sandbox, database, queue — comes up at its own URL.
  4. I open that URL on whatever device I have on me and actually use the feature.
  5. If it's wrong, I tell the agent what's wrong. If it's right, I merge.

Step 3 is the one that took real infrastructure work. Every other step was already free. This post is about step 3.

The app, briefly

It's an agent platform: users chat with it from a dashboard or a messaging channel, and the agent runs tool calls inside an isolated sandbox. That gives you four moving parts:

  • a dashboard SPA for the UI,
  • a gateway API (HTTP + webhooks + SSE),
  • an agent worker (background queue consumer that runs the agent loop), and
  • a sandbox host (where the agent's shell actually executes, with persistent per-user storage).

Plus Postgres and Redis. I run the API and worker on Railway, the sandbox host as a Cloudflare Worker with containers, and the dashboard from Cloudflare Pages.

Production topology. The interesting part isn't what this looks like — it's that I wanted this exact shape to exist per PR, automatically.

Three environments, one repo

The mental model is the simplest one that supports the loop I described:

  • PR previews — one full stack per open pull request, torn down when it merges or closes. This is the one that matters most for agent-driven dev.
  • staging — a long-lived environment that tracks the staging branch.
  • production — tracks main.

Every PR is opened against staging. When it merges, both platforms redeploy staging automatically. When I'm ready to ship, I run a single GitHub Action that fast-forwards main to a known-good staging commit, and that one ref bump is the trigger for production deploys on both platforms.

Branches map one-to-one to environments. Promotion is a fast-forward, not a separate deploy.

The shape is conventional. The thing that makes it work for agent-driven dev is that every branch in the leftmost column gets a real, working URL — not a static preview of the UI, not a screenshot bot, not a partial mock. A full stack.

Railway: one project, three environments

Railway's concept of environments inside a project is what made the per-PR thing tractable. Each environment is its own copy of every service, its own Postgres, its own Redis, its own env vars. The killer feature is that PR-preview environments are automatic — open a PR, Railway spins up a fresh environment named after the PR number, runs the same services, gives them throwaway *.up.railway.app URLs.

This is the part that maps directly onto an agent's workflow. The agent doesn't need to know anything about deployment; it just pushes a branch and opens a PR. By the time I look at my notifications, there's already a live URL waiting.

Each Railway environment is a full, isolated stack. Production and staging have two services; PR previews add a third for the dashboard.

Why the dashboard runs on Railway only for PR previews

In production and staging the dashboard lives on Cloudflare Pages, because Pages is genuinely better at serving an SPA at the edge. But the whole point of a PR preview is that I can poke at the new UI from my phone, so the dashboard needs to be wired up to that PR's gateway-api, not the shared staging one — and threading a per-PR API URL into a Pages preview build was more friction than just adding the dashboard as a third Railway service in the preview environment.

Railway lets one service reference another service's auto-generated domain via a template variable. So the PR-preview dashboard is configured with VITE_API_URL=https://${{gateway.RAILWAY_PUBLIC_DOMAIN}}, and Railway substitutes the right URL for whichever PR is being deployed. Open PR #123, the dashboard at runvault-pr-123.up.railway.app is already pointed at gateway-pr-123.up.railway.app. No config plumbing, no manual step for me, no instructions for the agent to remember.

Cookie gotcha worth knowing. up.railway.app is on the Public Suffix List, like vercel.app and netlify.app. That means two Railway auto-domains are not the same site for cookie purposes, even though they look like they should be. The session cookie has to be SameSite=None; Secure in PR previews and SameSite=Lax in production (where my real domains are same-site). I toggle it on the RAILWAY_ENVIRONMENT_NAME env var that Railway injects automatically.

Promotion is a git operation

I deliberately did not build a "deploy to production" button that talks to Railway's API. Instead I rely on Railway's existing "watch this branch" behavior: production watches main, staging watches staging, everything else makes a preview environment.

A small GitHub Action fast-forwards main to a chosen staging commit with safety rails: the SHA must be reachable from origin/staging, origin/main must already be an ancestor of it (true fast-forward, no merges), and CI must have passed on it. The push is gated behind a GitHub Environment with required reviewers, so production deploys still need a human approval — but I'm approving a git push, not configuring a deploy. Which, again, is friendly to working from a phone: tap "approve" in the GitHub mobile app, walk away.

Cloudflare: a Worker per environment, isolated by R2 prefix

The sandbox host is where the agent's shell tool actually runs. It's a Cloudflare Worker that owns Durable Objects (one per tenant) and mounts an R2 bucket into each container as /workspace. Persistent storage, edge-located compute, container instances on demand — a really nice fit for the problem.

It's also the part of the system with the loudest blast radius. A single Worker mishandling a deploy can kill every in-flight agent run in every environment that shares it. I learned that the hard way: for a while I had one shared sandbox Worker serving both staging and PR previews. A PR that changed the Worker's internal protocol broke staging mid-execution. Worse, PR-preview gateway-apis were seeing staging's tenant state because they all pointed at the same R2 prefix.

That's exactly the kind of bug that destroys the loop I was trying to build. If a PR preview can scribble on staging's data, the previews aren't actually safe playgrounds, and the "agent opens a PR, I test it from my phone" flow stops being something I trust. So I gave every environment its own Worker, and isolated each Worker's storage with an R2 prefix.

Three Workers, one bucket, three prefixes. Tenant state cannot leak across environments even if a bug temporarily routes a request to the wrong place.

How the Railway side finds the right Worker

This was the part I expected to be annoying and turned out to be elegant. Each environment's Worker needs a shared secret to authenticate calls from its Railway environment, and the Railway services need to know which Worker URL to hit. The naïve approach is to pipe values from CI back into Railway env vars per environment — lots of glue, easy to drift, the kind of thing an agent would not know how to set up correctly.

Instead, both sides derive the same values from the environment name. The GitHub Action computes the Worker name as runvault-sandbox-host-${ENV_NAME} and the shared secret as HMAC(master, "sandbox-host:" + ENV_NAME). Railway services do the exact same derivation at runtime using the RAILWAY_ENVIRONMENT_NAME Railway already injects. No CI → Railway plumbing — both sides reach the same answer from the same inputs. New PR pops into existence, every secret it needs is already correct.

Pages for the production dashboard

The dashboard SPA on Cloudflare Pages is deliberately boring: Pages watches the repo, builds with pnpm install && pnpm --filter dashboard build, serves the output with SPA fallback. VITE_API_URL is set at build time per environment so the right API origin is baked into the bundle. Pages handles caching, TLS, custom domains. There's no GitHub Action for this — Pages just builds when the branch advances.

GitHub Actions: the glue, but only just

I kept CI/CD minimal. Most of the deployment intelligence lives on the platforms themselves; GitHub Actions only does three things:

  1. CI — install, build, lint, type-check, test. Runs on every PR and every push to staging or main. One job, no matrix. This is the gate that has to go green before I trust a preview URL.
  2. Sandbox-host deploy — the only workflow that actually pushes code anywhere. Triggers on PR open/sync/close and on pushes to staging / main. Resolves the right environment name, computes the right Worker name and secrets, runs wrangler deploy, and on PR close runs a cleanup script that deletes the Worker, the per-PR R2 prefix, the container app, and the Durable Object namespace.
  3. Promote to production — the fast-forward push to main, gated by a required-reviewer GitHub Environment.

Things not in GitHub Actions: Railway deploys (Railway watches the branches itself), Pages deploys (Pages watches the branches itself), Postgres migrations (run on container start), Worker URL plumbing (derived). Less to break, fewer secrets to manage, and fewer places where an agent's PR can drift from the deploy reality.

The cleanup step on PR close matters more than it sounds. When the dev loop is "open lots of PRs, throw most of them away," the per-PR resources add up fast — a stale Worker per closed PR, a stale R2 prefix, a stale Durable Object namespace. Forgetting to garbage-collect those quietly turns a $50/month bill into something embarrassing. So PR close runs the same workflow that deployed the Worker, just in tear-down mode.

Three event sources, three pairs of outcomes. Each row is one shape of deploy.

The loop, in practice

Here's what a typical session ends up looking like. I'm on a walk, or on a train, or out to lunch. An agent finishes something I asked it for earlier — a new dashboard page, a fix to the integrations flow, a tweak to how the sandbox handles file uploads. It opens a PR. My phone buzzes when CI passes maybe two minutes later, and again when the Cloudflare workflow finishes deploying the per-PR Worker.

I tap into the PR, click the Railway preview URL, sign in, and use the feature. Not a screenshot of it. The actual feature, running against an actual database, hitting an actual sandbox Worker isolated from every other PR and from staging. If something's off, I dictate a follow-up to the agent and put my phone away; if it's right, I approve and merge. Staging redeploys automatically. The next morning, or whenever I get around to it, I run the promote workflow from the GitHub mobile app, approve the protected-environment prompt, and production catches up.

Whole features have landed in this codebase that I never ran on my laptop.

What worked, what I'd do differently

What worked. Letting each platform watch its own branch instead of orchestrating deploys from CI removed a whole class of "the deploy ran but the secret didn't update" bugs. Deriving per-environment secrets from a master + the environment name removed the rest. And the single biggest win — the thing that turned this whole stack into something I'd recommend — is per-PR isolation across every tier. Preview environments that share state with staging are a trap, even when nothing seems to go wrong for weeks. When you start opening many PRs a day because agents are cheap, that trap snaps shut very quickly.

What I'd do differently. I should have set up the per-PR Cloudflare Worker from day one. I spent a stretch sharing the staging Worker across previews "to save time" and lost more time to weird state-leak bugs — and to second-guessing PR previews when I should have been trusting them — than the Worker workflow ever cost me to write. The pattern of "every environment gets its own copy of the loudest thing" is a much better default than "share until it breaks," especially when the dev loop depends on the previews being trustworthy.

The thing that surprised me. The infrastructure work in this post wasn't really about ops at all. It was about removing the last excuse for being chained to a laptop. The agents were already doing most of the typing; the missing piece was a way to verify their work that didn't require me to be at a desk. Once that piece was in place, the shape of my workweek changed.

Back to posts