Full-stack product development in the AI era

Speed is no longer the constraint. Quality is.

May 11, 20266 min read

The leading lean teams ship customer feedback to production in six to eight hours. They do it without losing track of what is well-written, what is secure, what is fast, and what will still be reliable in three months. That speed does not come from AI. It comes from the system around the AI.

That distinction matters, because most teams have it backwards. They reach for AI expecting velocity and find regression instead — duplicate components, swallowed errors, subtly broken authorization, tests that prove nothing. The problem is not that AI is unreliable. The problem is that AI is a force multiplier, and a force multiplier on a weak system multiplies the weakness.

I lead full-stack at Mono.ge — the Georgian Legal OS. For the past nine months, we have rebuilt our development process around AI. The lesson has been consistent: as soon as execution gets fast, the surrounding system has to get sharper. Specs have to be tighter. Reviews have to land in the right places. Gates have to be real, not symbolic. The discipline shifts from the line of code to the shape of the workflow.

What follows is the system. Not the philosophy. The actual mechanics.

                           ┌─ mono.ge ─────────────────────────────────────────────────────────┐                           
                           │                                                                   │                           
                           │  ┌─ Context ──────┐    ┌─ Guardrails ──┐    ┌─ Agents ─────┐      │                           
                           │  │ Specs          │    │ Lint rules    │    │ ▓ · · · · ·  │      │                           
 Customer requests         │  │ CLAUDE.md      │    │ Type config   │    │ · · · · · ·  │      │                           
 Bug reports         ─────→┤  │ Conventions    ├←──→┤ Hooks         ├←──→┤ · · · · · ·  │      ├──→ Production             
 Feedback                  │  │ Decisions      │    │ CI gates      │    │ · · · · · ·  │      │                           
                           │  │ Tests          │    └───────────────┘    └──────────────┘      │                           
                           │  │ Code           │                                               │                           
                           │  └────────────────┘                                               │                           
                           │                                                                   │                           
                           └───────────────────────────────────────────────────────────────────┘

The frame. AI is a force multiplier on engineers who already know the stack cold. If a teammate could not write what Claude just produced, the team has a quality problem disguised as a velocity problem. The edge of a two-or-three-person team is not generating more code. It is making fewer, smaller, better decisions, faster.

Specs are the unit of work. Every meaningful change starts with a one-page spec, written before the agent is opened. What the user does. What changes in the database — tables, migrations, row-level security. What endpoints exist. Edge cases that matter. What is explicitly out of scope. The spec lives in /specs/YYYY-MM-DD-feature.md and becomes the seed prompt. Specs are what humans review. Code is what the agent writes. This is the single largest lever against slop, because agents diverge fast when the brief is vague.

The repo is a prompt. A short, sharp CLAUDE.md at the root is worth more than any meeting. It states the stack versions, the file structure with examples, the forbidden patterns — any, useEffect for derivable state, swallowed errors — the database conventions, and the one error-handling pattern the team has agreed on. Per-folder CLAUDE.md files extend the rules where conventions are local: app/api/, trigger/, supabase/migrations/. The repo teaches the agent, every time, with no human in the loop.

Guardrails are set once and saved forever. TypeScript in strict mode, with noUncheckedIndexedAccess and exactOptionalPropertyTypes. Biome or oxlint instead of ESLint — they are faster, and the agent gets the config right on the first try. Lefthook for pre-commit typecheck, lint, and format on changed files; full typecheck and tests on pre-push. CI runs the same plus Playwright, RLS tests, and a bundle-size guard. Hono routes are contract-tested with Zod. Supabase types are regenerated on every migration through a committed script. Lint enforces a 300-line ceiling on files, a 50-line ceiling on functions, and a banned-import list.

The file-size cap is the most effective anti-slop rule we have. It forces decomposition before bloat.

Parallelism without chaos. Each developer runs two or three agent sessions concurrently, each in its own git worktree. One person, multiple agents, no context-switch pain. Branches stay short — hours, not days. Trunk-based. No long-lived feature branches.

Vertical slices, small PRs. Every pull request delivers one user-visible capability or one infrastructure change. Never scaffolding. Target a hundred to two hundred lines of code, with a hard ceiling at four hundred. The agent writes the implementation, the tests, and the migration in the same PR. When a slice naturally splits, the PRs stack.

Two-layer review. A human reviewer looks at the things that carry real risk — spec alignment, auth, row-level security, payments, data deletion, API contracts, anything in security/ or billing/. A separate agent session, with clean context, reviews lint compliance, duplication, dead code, edge cases against the spec, and whether tests cover behavior rather than implementation. Both have to pass. With three people on the team, at least one senior human stays in the loop on every PR.

┌─ Spec ─┐    ┌─ Agent ─┐    ┌─ Review ────────────────────┐    ┌─ Merge ─┐    ┌─ Production ─┐
│        │ ═→ │         │ ─→ │ Human  · Auth, RLS, pay…    │ ─→ │         │ ─→ │              │
│        │    │         │    │ Agent  · Lint, dedup, edges │    │         │    │              │
└────────┘    └─────────┘    └─────────────────────────────┘    └─────────┘    └──────────────┘

Tests that do not bloat.Five to ten Playwright happy paths. Three to five sad paths. That is the full UI test suite. Hono integration tests cover every route and every status branch against a local Supabase. RLS tests assert, explicitly, that user A cannot see user B's data — agents are never trusted on row-level security without these. Unit tests are reserved for pure logic: parsers, calculators, state machines. Tests that check whether a function was called with a particular argument are banned. The agent will write fifty of those if you let it, and none of them will catch a real bug.

Anti-slop is a continuous practice.A weekly audit — a fresh agent given the week's diff and asked for duplication, dead code, inconsistent patterns. A twenty-percent refactor budget, scheduled, not aspirational. One library per concern: one toast, one form, one query, one validation. Documented in CLAUDE.md, enforced by the linter. When the agent proposes a new dependency, the bar is what it replaces. Net-new dependencies require explicit justification in the PR.

Daily rhythm. Nine a.m., a fifteen-minute sync — what is shipping today, what is blocked, what is risky. Morning, the spec is finalized and the agent is kicked off. Afternoon, review and polish. Five p.m., staging is promoted to production or rolled back. Nothing sits on staging overnight. On Friday, one hour per person for cleanup.

┌─ 09:00 Sync ─┐   ┌─ Morning ────┐   ┌─ Afternoon ──┐   ┌─ 17:00 Ship ─┐   ┌─ Wrap ───────┐
│ Today,       │ ═ │ Spec → agent │ ─ │ Review,      │ ─ │ Promote or   │ ─ │ 17:30        │
│ blockers     │   │              │   │ polish       │   │ roll         │   │              │
└──────────────┘   └──────────────┘   └──────────────┘   └──────────────┘   └──────────────┘

If a feature did not ship on Tuesday, the next thing to debug is the process, not the codebase.

Observability from the first commit. Sentry. Product analytics. Logs piped somewhere queryable. The background-job dashboard checked daily. Real signal beats agent confidence every time. When Sentry spikes, the stack trace becomes the next prompt, verbatim.

MCP and slash commands. Supabase MCP for schema queries during development. GitHub and Linear MCP for issue context. Sentry MCP for error context. Slash commands for the moves the team makes every day: /spec, /migration, /trigger-task, /ship, /audit. These pay back inside a week.

None of this comes from working harder. None of it comes from being smarter than the AI. It comes from building the system that makes AI safe to trust.

The shift is not from human to AI. It is from writing code to designing the conditions under which code gets written.

Speed and quality are not in tension here. They come from the same source: a system tight enough that the agent cannot drift, and small enough that any drift is visible within hours.