If you’ve spent any time trying to put an AI agent into production, you’ve hit the same wall most of us have: the framework you build with and the platform you ship to are usually optimized for completely different worlds. Frameworks assume a long-lived process with filesystem access. Platforms push you toward stateless functions with 10-second timeouts. The result is a lot of glue code, a lot of workarounds, and a lot of agents that work great in dev and fall apart in production.
Flue and Unkey Deploy are interesting because they were designed with the same assumptions about what an agent actually needs to run. This post is a look at what Flue is, what makes it different from “yet another LLM SDK,” and why pairing it with Unkey Deploy removes most of the friction of getting agents into the hands of real users.
There’s a working reference implementation at github.com/perkinsjr/Flue-with-Unkey
What Flue actually is
Flue calls itself “The Agent Harness Framework,” and the framing matters. Most of the AI tooling you see today is an SDK: a thin wrapper around model APIs that gives you a typed chat.completions.create and calls it a day. Flue is a step up the abstraction ladder. It’s a framework in the same sense that Next.js is a framework: opinionated structure, build pipeline, deploy targets, runtime conventions. You write agents, not request handlers, and Flue compiles them into runnable artifacts.
Concretely, a Flue project looks like this:
.flue/
agents/
translate.ts
summarize.ts
analyze.ts
roles/
analyst.md
skills/
summarize/
SKILL.md
Each .ts file in .flue/agents/ is an autonomous agent. Each Markdown file in roles/ is a system prompt. Each SKILL.md in skills/ is a reusable capability. The framework picks all of this up at build time and emits HTTP endpoints, one per agent, plus a health check and session machinery.
A minimal agent looks like this:
import type { FlueContext } from '@flue/sdk/client';
import * as v from 'valibot';
export const triggers = { webhook: true };
export default async function ({ init, payload }: FlueContext) {
const agent = await init({ model: 'anthropic/claude-sonnet-4-6' });
const session = await agent.session();
return await session.prompt(
`Translate this to ${payload.language}: "${payload.text}"`,
{
result: v.object({
translation: v.string(),
confidence: v.picklist(['low', 'medium', 'high']),
}),
},
);
}
A few things worth flagging:
The result schema is enforced. Flue uses Valibot to validate model output, so you don’t have to write a parser, a fallback parser, and then a fallback for the fallback parser. If the model returns something that doesn’t fit, Flue handles the retry loop.
session is a real concept. Sessions are how Flue threads conversation state across multiple requests. Same session ID, same thread. Different ID, new conversation. On Node, sessions live in process memory by default. On Cloudflare Workers, they live in Durable Objects. (More on why this matters in a second.)
There’s no HTTP framework code in your agent. You don’t import hono or express. Flue generates the HTTP layer at build time, with one route per agent at POST /agents/<name>/<id> plus GET /health. Your code is the agent, not the plumbing.
And finally, Flue ships a sandbox. By default agents get a virtual filesystem powered by just-bash, which is fast and cheap. If you need actual processes, you can opt into Daytona-backed Linux containers. Either way, the sandbox is a real thing your agent code can use.
What Flue is not
Flue is not a chat UI. It’s not a model gateway. It’s not a vector store. It’s the runtime your agent runs inside, plus the build tooling to ship it. That narrow scope is part of why it slots cleanly into a deploy platform: there’s no opinion about hosting, no proprietary runtime to fight with, and no “you must use our gateway” lock-in.
Which is what brings me to Unkey Deploy.
Where most agent platforms force compromises
If you take that translate agent and try to deploy it on a typical serverless platform, you’ll start to feel the friction within a day:
Timeouts. Most agent loops involve multiple model calls, tool invocations, and validation retries. A 10-second function timeout is a hard wall. A 60-second one buys you a longer wall. Even the more generous limits on serverless container platforms still cap at the 10 to 15 minute mark, which is exactly the wrong shape for a “let the agent figure it out” workflow.
Session state. Flue’s Node target keeps sessions in process memory. That works beautifully on a server that’s been up for an hour. It works terribly on an ephemeral container that gets recycled between requests. You either lose conversation continuity or you bolt on Redis and pay the latency tax on every prompt.
Sandbox processes. The just-bash sandbox spawns child processes. The Daytona option needs a real Linux container. Functions-as-a-service runtimes are deliberately stripped down to make these things hard or impossible. You end up doing without the sandbox or moving the heavy lifting to a separate worker, which defeats the point.
Cold starts. LLM-backed endpoints are already slow. Adding a 1 to 3 second cold start to the front of every request, on top of model latency, kills your p95.
Auth. Flue intentionally doesn’t ship auth in the agent layer, which is the right call: an agent shouldn’t know what an API key is. But that means you need an auth layer somewhere. Most teams roll their own middleware, which works until it doesn’t.
Why Unkey Deploy fits
Unkey Deploy was built around the opposite assumption from serverless: that an API is a long-lived process that benefits from a real server underneath it. Our tagline is “Ship APIs, not infrastructure.” That covers a lot of ground, but in the context of running Flue agents, it cashes out as four specific things.
1. Real servers, no timeouts to engineer around. Your Flue agent runs in a container that stays up. A multi-step agent loop that takes 90 seconds is not an event you have to special-case. There’s no streaming-only escape hatch, no “split your work into chunks so each fits under the timeout.” It just runs.
2. In-process session state that doesn’t get nuked. Flue’s default Node session store is in-memory, and it works on Unkey Deploy because the process keeps running. You can layer in a custom store later if you need durability across restarts, but you don’t need it on day one to have a working product.
3. Container-native deploy. The reference repo ships a multi-stage Dockerfile that builds with flue build --target node, strips dev dependencies, and runs the result. Unkey Deploy reads that Dockerfile and turns git push into a running container. Nothing about Flue’s build output has to be reshaped to fit a custom runtime.
4. Auth and rate limiting at the platform boundary. This is the one that matters most for agents. Unkey Deploy includes Sentinel, a proxy layer that sits in front of your container and handles API key verification, rate limiting, and identity. Your Flue agent stays auth-free. Sentinel checks the key, applies whatever per-key or per-identity rate limits you’ve configured, and passes the request through. From the agent’s point of view, the request just arrives, already authenticated.
That last point is worth dwelling on. Agents are unusually easy to abuse. They’re slow, they’re expensive per request, and they often hit downstream APIs (your bill, your model provider’s bill, your customer’s bill). Front-of-house rate limiting and key-scoped quotas aren’t a nice-to-have. Doing this at the platform layer instead of in your agent code means you don’t ship a rate limiter, you don’t run Redis, and you don’t have to debug a leaky token bucket at 2am.
Putting it together
The reference repo at perkinsjr/Flue-with-Unkey is small enough to read end to end in five minutes, but it captures the shape of a real deployment.
The Dockerfile is a standard multi-stage build:
# syntax=docker/dockerfile:1.7
# Build stage
FROM node:22-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json* ./
RUN npm ci
COPY tsconfig.json ./
COPY .flue ./.flue
RUN npx flue build --target node
RUN npm prune --omit=dev
# Runtime stage
FROM node:22-alpine AS runtime
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/.flue ./.flue
COPY package.json ./
USER node
EXPOSE 8080
CMD ["node", "dist/server.mjs"]
flue build --target node produces a Hono server with one route per agent. Unkey Deploy injects PORT=8080. The container exposes GET /health, which the platform uses to confirm the deploy is live.
Environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, anything else your agents need) live in the Unkey dashboard under Project → App → Environment → Variables. They’re injected at runtime and are not baked into the image.
To call an agent, a client hits something like:
POST https://<your-app>.unkey.app/agents/translate/<session-id>
Authorization: Bearer <unkey-key>
Content-Type: application/json
{ "text": "Hello", "language": "Spanish" }
Sentinel verifies the bearer token, applies rate limits, and passes the request to the container. The container resolves the translate agent, threads the call through the session ID, runs the loop, validates the result against the Valibot schema, and returns. None of that requires any Unkey-specific code in the agent. The integration is the deployment, not a library you import.
Where this leaves you
The pattern that works for shipping Flue agents in production is, roughly:
Use Flue for what it’s good at: writing agents as code, with structured output, sessions, skills, and a sandbox you can trust. Use Unkey Deploy for what it’s good at: running that code on a real server, with auth and rate limiting at the door. Don’t build the glue between them. There isn’t any to build.
For most teams I’ve talked to, the limiting factor on getting an agent into production isn’t the model or the framework. It’s the stack of decisions about hosting, auth, quotas, and operational headroom that you have to make before you can let the first user hit it. Flue plus Unkey Deploy collapses most of those decisions into a Dockerfile and a dashboard. Which, in 2026, is about the right amount of friction for shipping a small piece of working software.
If you want to try it, clone the example, point it at your own Anthropic or OpenAI key, and deploy. The whole loop, from git clone to a live URL with rate-limited auth in front of it, is well under an hour.
Links: Flue framework · Unkey · Reference repo
Discussion // 0 comments
sort: oldest ↓