I did not code 36 packages in 3 weeks. I directed AI agents to build them, with the architectural discipline a senior team would apply. That distinction is the entire story.
AI-directed development, as I practice it, is not autocomplete at scale. It is a senior engineer running a junior team made of language models: strict layering, enforced dependency directions, test gates at every package boundary. The agents wrote the code. I wrote the rules they had to follow.
Across about three weeks — roughly a week of architecture audits and authoring the agent skill files, then a 12-day agent-directed build (2026-04-17 → 2026-04-28, 289 commits) — I directed agents to ship a 36-package React Native game engine: 35 @flare-engine/* scoped packages plus the create-flare-app CLI scaffold, with ~1,936 passing tests. This post defends one claim: the leverage came from the architecture I imposed before a single agent wrote a line, not from the agents themselves.
What the 3 weeks actually included
The window has a real start and a real end. About a week of architecture audits, layer planning, and authoring the per-role agent skill files came first — no engine code written, just the rules the agents would later follow. Then the build window: first commit at 2026-04-17 16:30, phase 11 closed 2026-04-28 22:30. Twelve calendar days of build inside the broader three-week project.
What filled the build window: monorepo initialization, package scaffolding, per-package agent sessions with skill files and scoped instructions, architecture reviews after each phase, human-in-the-loop decisions on every dependency edge, and the test discipline that kept the graph stable as packages accumulated.
Those 289 commits — roughly 24 per build-day — were driven by eight specialized agents working a defined phase plan: an architect, three engine roles (builder, planner, researcher), three game roles (the same trio scoped to the showcase game), and a verifier that closed each phase. High intensity, but bounded by a known end state, not an open-ended slog.
Why 36 packages? The 35 @flare-engine/* scoped packages are the engine proper. create-flare-app is the CLI scaffold that bootstraps blank, top-down, and shmup game templates. Both ship; both count.
What the 3 weeks did not include
The engine is not open source yet. The first public release is targeted for October 2026. Readers cannot inspect the repo today; this post compensates with the architecture diagram and provenance links to internal docs.
No game has shipped on top of the engine yet. Pan Tvardowski, a vertical shoot-'em-up built on the engine as the canonical showcase, is integration-tested but not on store pages. Two August 2026 posts — a launch teaser and a v1 retrospective — will be the proof points. I am not citing Pan Tvardowski's integration coverage numbers here; those belong in the launch story.
There is no native acceleration module. The engine runs on Shopify's react-native-skia and CanvasKit WASM. C++ offload is a post-launch concern.
The same workflow I used here ports to non-game React Native prototypes. If you are building a data-heavy dashboard or a flow-based onboarding experience with a dozen packages, the layering discipline transfers. That is a future post; this one is the origin story.
How AI-directed development survives a 36-package monorepo
The load-bearing decision happened before any agent session started: five strict layers, dependency edges flowing downward only, peer deps for native platform libraries, and a test gate at every package boundary before promotion to the next layer.
The diagram is the architecture. Every package knows exactly what it can import. The enforcement lives in package.json and pnpm's workspace resolver, not in a CONTRIBUTING document. Here is the entire dependency declaration for @flare-engine/ecs, a Layer 1 package:
// packages/ecs/package.json
{
"name": "@flare-engine/ecs",
"dependencies": {
"@flare-engine/events": "workspace:*",
"@flare-engine/math": "workspace:*"
}
}Both declared dependencies are Layer 0 packages. Nothing from Layer 2 (render, input, physics...), nothing from Layer 3 (scene, animation...), nothing from Layer 4. If an agent writes import { Scene } from '@flare-engine/scene' inside packages/ecs/src/, pnpm refuses to resolve the import — the package was never declared as a dependency, so it is not on disk inside the package's node_modules. TypeScript fails at typecheck, the build fails before commit, and the verifier agent rejects the diff. Three points of detection on a single failure, all driven by one declarative file.
The contrast lower in the graph proves the rule. @flare-engine/scene (Layer 3) declares events, math, core, ecs, and render as deps — exactly the lower-layer set its tier permits, nothing above. The graph is auditable in seconds: open any package's package.json and read down.
The agents do not infer this layering from the diagram. They read it as plain rules in their per-role briefing. Here is the relevant block from the engine-builder agent file — the role that lives in packages/**:
## Scope
**You own:** packages/**, benchmarks/**, engine-scoped docs.
**You never touch:** apps/**, .github/**, design docs.
## Constraints
- No features beyond the API spec.
- No `any`, `as any`, `@ts-ignore`.
- No default exports.
- No allocations in hot paths.
- No imports from higher layers.Eight files like this one run the build, one per role. Plain markdown, no DSL. The agent reads the file at session start and the constraints become its self-check on every diff. "No imports from higher layers" is the same rule the package.json above enforces — just restated at the moment the agent decides whether to write the import in the first place. Two layers of enforcement: the rule the agent agreed to, and the configuration that catches it if the agent forgets.
When an agent proposed a shortcut — pulling a higher-layer utility into a lower-layer package to save a few lines — pnpm rejected it at install time, not at code review. The architecture enforced itself.
The discipline runs both ways, though. Early in the build, a single ambiguous instruction in an agent prompt cascaded into nearly a hundred terminal sessions running the same module's test suite in parallel — the orchestrator obeying my broken rules at full speed. The agents were not wrong; my rules were. Broken rules execute at scale, and the failure mode is loud.
That enforceability is the point. The agent does not need discipline of its own. The graph carries it — and where the graph cannot reach, every prompt becomes the gate.
The numbers behind the claim
- 36 packages: 35
@flare-engine/*scoped +create-flare-appCLI scaffold - About 3 weeks elapsed: ~9 days of architecture audits + agent-skill authoring, then 12 calendar days of build (2026-04-17 first commit → 2026-04-28 phase-11 close)
- 289 commits in the build window (verified via
git log --since=2026-04-17 --until=2026-04-29) - ~1,936 passing tests: monorepo total, all green as of 2026-05-04
- 8 specialized agents: architect, engine-builder, engine-planner, engine-researcher, game-builder, game-planner, game-researcher, verifier
- ~17,500 source LOC across 36 packages — median ~380, smallest ~60 (the
leaderboards/achievements/analyticsadapters, thin wrappers over vendor APIs), largest ~1,200 (uiandmath). About one test per ten lines of code.
The test count is a number, not a guarantee. The regime is mostly unit tests plus integration tests on a single Android device (Samsung Galaxy A54 5G, release build). iOS coverage is pending. The count keeps climbing; the regime it covers is narrower than that count implies.
Limits
This is where the honest accounting lives.
n=1, single operator. There is no A/B comparison, no team study. I cannot tell you whether this approach would scale to a three-person team or fail in a distributed context. METR's 2025 RCT is the most rigorous published counterweight: experienced OSS maintainers measured roughly 19% slower with AI tools on their own complex repos, despite feeling about 20% faster. Stack Overflow's 2025 survey corroborates the friction: 84% of developers now use AI tools, but 66% cite "AI almost right but not quite" as their top frustration. The only durable evidence I can offer is the shipped artifact: 36 packages, ~1,936 tests, a defined layer graph. Read the commit count and the test count — don't take the framing on faith.
The engine is not OSS yet. October 2026 is the RC target. Until then, you cannot fork it, inspect the full source, or file a PR. The architecture diagram and this post are the public record.
No shipped game yet. Pan Tvardowski is the proof game. "The engine ships games" is aspirational until Pan Tvardowski reaches store pages.
289 commits in 12 build-days = 24/day. Real, but high-intensity sprint conditions inside the build window. The same pace is not what normal maintenance looks like.
AI agent infrastructure is commoditizing. Phaser 4 (released April 2026) ships an in-tree skills/ folder for AI coding agents and explicitly markets that frontier LLMs know its API. Custom agent infrastructure was a meaningful differentiator in early 2026; by the time Flare OSS ships, "I built custom agent infra" will be table stakes for any serious framework. The moat is the engine design, not the tooling.
Why an engine, not just a game
Which raises a fair question — if the infrastructure is commoditizing, why bother building this engine at all? The honest answer is personal.
I taught myself JavaScript about seven years ago. From there: React, TypeScript, React Native, and eventually native iOS and Android. Across those same years I put roughly 4,000 hours into Satisfactory. The gamer half and the engineer half of my head eventually asked the same question — what if I shipped the games I kept daydreaming about, on the stack I already love?
React Native plus games. The engine is the bridge between the two.
Pan Tvardowski, the showcase game I keep referring back to, is my reimagining of a vertical shoot-'em-up dressed in Polish folklore. After that, I want to ship a top-down adventure game shaped like A Link to the Past with a roguelike twist — time-gated zones, looping runs, build-up that lets you close a zone in a single loop, and two or three viable paths into every new area instead of one canonical route. (These are still assumptions, not committed design.) After that, a Slay the Spire-style deck-builder.
That backlog is the reason an engine — not a one-shot game — is worth the investment. Three games on three different mechanic axes (twin-stick action, exploration roguelike, deterministic deck-builder) share the same primitives: ECS, scenes, input, audio, IAP, save state, localization. I love React Native, I respect what C++ can do underneath it, and I want to ship games on phones people actually carry around. An engine is the only thing that lets all three plans keep moving in parallel.
Close
AI-directed development is a senior engineering discipline, not a shortcut. The shortcut story is the one AI vendors want to tell. The real story is that you cannot outsource architectural judgment to a language model; you can only hire it to execute the judgment you already have.
If you want to pressure-test this on your own stack this week: pick one package in your monorepo, write down what it is allowed to import (by layer name, not by gut feeling), and codify the rule in package.json — let pnpm's workspace resolver bounce anything you did not declare. See whether the next time you reach for a convenience import it fails to resolve. That single constraint surfaces more hidden coupling than any linter rule.
Two threads from here. Next week's post takes on the React Native 2D landscape — why Phaser and PixiJS do not port, and what the gap looks like at the API level. Later in this series, I will publish the actual agent skill files and phase plan that ran the 12-day build, for any senior engineer who wants to copy the workflow.
Subscribe to the RSS at /blog/rss.xml if you want it in your reader rather than your inbox.