Skip to content
·[performance]·10 min read

Zero-alloc game loops in TypeScript - how I hit 60 fps on a mid-range Android

Plain {x,y} vectors, one generic Pool, atlas batching - the zero-allocation TypeScript discipline that holds 60 fps for a 2D game across three Android devices.

16.67 ms sounds generous until a garbage collector decides to think.

Across three Android devices (a year-old mid-tier, an older flagship, and a current flagship), release builds, Hermes runtime, a 2D TypeScript game loop has roughly that long to advance every entity, run every system, and flush every draw command to Skia before the display catches up. The zero-allocation TypeScript discipline keeps that budget intact across a 36-package engine and a shipped showcase game. Five rules carry the load: plain {x, y} objects instead of Vec2 classes, an out parameter made mandatory on every hot-path operation, one generic Pool<T> reused everywhere transient objects appear, module-level scratch state for random-number generation, and an atlas render pipeline that groups draw commands by texture so the GPU sees one draw call per batch instead of one per sprite.

Measured outcome (three Android devices, release build, Hermes, TestingBot Maestro sweep 2026-05-26): worst cell across five gameplay scenarios landed at 9.52 ms p95 frame-time (Pixel 10, particle-storm) - 7.15 ms of headroom against the 16.67 ms / 60 fps budget. The four real-gameplay flows stayed between 0.93 ms and 4.18 ms. The frame budget is decided at module load - not at the closure-allocating callsite an autocomplete suggested.

The frame budget on a mid-range Android

16.67 ms is the 60-fps wall. Hermes ships the Hades concurrent garbage collector (GC) by default - Hades keeps pause times short, roughly 70%+ shorter than the older JavaScriptCore (JSC) engine, but "short" is not zero, and the pauses scale with allocation rate. A single allocation inside a for loop over 300 particles is not one allocation: it is 300 allocations per frame, which is 18,000 allocations per second at 60 fps. Each one is a fresh object the GC has to trace and collect.

The game loop itself - onFixedUpdate / onUpdate / onLateUpdate / onPreRender / onPostRender signals, a fixed 16.67 ms step accumulator, a maxDt cap to prevent the spiral of death - is defined in packages/core/src/game-loop.ts and packages/core/src/clock.ts. That structure is not the subject here. What the loop runs inside every signal is the subject.

Five gameplay scenarios captured 2026-05-26 via TestingBot Maestro on three real Android devices, release build, Hermes runtime. Each cell is the p95 frame-time across a 30-second steady-state window, measured by a recorder that brackets the live game loop (postfx + ECS + Skia render), not a synthetic micro-bench. The 16.67 ms / 60 fps budget is the reference:

Scenario Galaxy A55 Pixel 6 Pixel 10
boot (ambient Zone 1) 0.93 ms 1.51 ms 1.76 ms
boss-smok (mid-FSM + bullets) 1.00 ms 1.58 ms 3.40 ms
boss-lucyper (3-phase postfx) 1.11 ms 1.61 ms 2.03 ms
zone-transitions (despawn + respawn every 5s) 2.25 ms 4.18 ms 3.73 ms
particle-storm (synthetic worst-case) 3.67 ms 6.69 ms 9.52 ms

Worst observed cell across the sweep: 9.52 ms (Pixel 10, particle-storm). That leaves 7.15 ms of headroom against the 16.67 ms 60-fps budget - roughly 43% of the frame unused at p95 even under the synthetic worst-case load. The four real-gameplay flows (boot, boss fights, zone transitions) sat between 0.93 ms and 4.18 ms - 75-95% of the budget unspent.

Postfx-stage p95 stayed under 0.030 ms in every cell across the sweep. The screen-effect pipeline is not the bottleneck.

Methodology and caveats. Each cell is a single TestingBot capture (n=1) from a cloud device farm, 2026-05-26. The 30-second window starts after stable steady state. "p95 frame-time" is the 95th-percentile of per-frame durations - one frame in twenty exceeded the listed value during the window. With n=1 per cell, this is a preliminary baseline, not a device-vs-device ranking; the noise floor between cloud-farm sessions is wider than several of the inter-device gaps in the table. A nightly cron is now accumulating samples; a future post will share rolling medians once the dataset is meaningful.

This sweep also replaces a different measurement methodology from April 2026. The earlier numbers (Samsung Galaxy A54 5G, entity-count stress with 10/30/50 enemies and 50/150/300 bullets, average fps) covered a narrower scenario set on one device. The current sweep traded that for p95 frame-time across in-game gameplay flows on three devices, which is closer to how the shipped game actually runs.

Plain objects beat classes - zero-allocation TypeScript by construction

The cheapest object is the one that already exists.

A Vec2 class allocates on every construction. A plain { x, y } object allocated once and mutated in place does not. The engine uses a plain-object type plus a static namespace for all math operations - no prototype chain, no GC pressure from construction:

// packages/math/src/vec2.ts
/** Vec2 plain object type - no class, no GC pressure. */
export type Vec2 = { x: number; y: number };
 
/** Creates a Vec2 plain object. */
export function vec2(x = 0, y = 0): Vec2 {
  return { x, y };
}

The static namespace includes an optional out parameter on every operation. Optional means callers who do not care about allocations can ignore it. The mandatory-out variant removes the option entirely:

// packages/math/src/vec2.ts
  /** Component-wise addition. Writes into out if provided. */
  add(a: Vec2, b: Vec2, out?: Vec2): Vec2 {
    if (out) {
      out.x = a.x + b.x;
      out.y = a.y + b.y;
      return out;
    }
    return { x: a.x + b.x, y: a.y + b.y };
  },

Vec2.add(a, b) returns a fresh object. That is the natural autocomplete completion - it is what a senior dev would reach for on first pass, and it is wrong in any loop over hundreds of entities. The discipline is to supply the out buffer and never let that form ship in a hot path. The lint catches some obvious cases; code review catches the rest.

For the genuinely hot path - computing a normalised velocity vector inside an enemy update loop - the engine drops the optional form entirely:

// packages/math/src/vec2.ts
  /** Zero-alloc; writes into `out`. Mandatory `out` so callers avoid temporaries in hot paths. */
  normalizeOut(dx: number, dy: number, speed: number, out: Vec2): Vec2 {
    const len = Math.sqrt(dx * dx + dy * dy);
    if (len <= 1e-6) {
      out.x = 0;
      out.y = 0;
      return out;
    }
    const inv = speed / len;
    out.x = dx * inv;
    out.y = dy * inv;
    return out;
  },

normalizeOut has no return-a-fresh-object path. The caller owns the buffer. In a 300-bullet update loop, that is 300 existing Vec2 structs written into - not 300 new ones allocated. Autocomplete will not write this form for you; the signature forces the discipline on the caller.

One Pool to rule them - twelve lines, three patterns

The pool pattern is not new. Robert Nystrom's Game Programming Patterns documented it in 2014. What is new here is that one twelve-line generic primitive runs the entire 36-package engine plus every transient in Pan Tvardowski - particles, sprite-batch commands, audio sources, explosion FX slots. One shape, applied uniformly:

// packages/core/src/pool.ts
/**
 * Generic object pool for zero-alloc gameplay.
 *
 * - acquire() returns a recycled or newly created object
 * - release(obj) returns it to the pool
 * - prewarm(count) pre-allocates objects
 */
export class Pool<T> {
  private items: T[] = [];
  private factory: () => T;
  private resetFn: ((item: T) => void) | undefined;
 
  /**
   * @param factory Function to create a new instance.
   * @param reset Optional function to reset an object before reuse.
   */
  constructor(factory: () => T, reset?: (item: T) => void) {
    this.factory = factory;
    this.resetFn = reset;
  }
 
  /** Get an object from the pool (or create a new one). */
  acquire(): T {
    if (this.items.length > 0) {
      return this.items.pop()!;
    }
    return this.factory();
  }
 
  /** Return an object to the pool. Calls reset if configured. */
  release(item: T): void {
    if (this.resetFn) this.resetFn(item);
    this.items.push(item);
  }
 
  /** Pre-allocate objects into the pool. */
  prewarm(count: number): void {
    for (let i = 0; i < count; i++) {
      this.items.push(this.factory());
    }
  }
 
  /** Number of available objects in the pool. */
  get available(): number {
    return this.items.length;
  }
 
  /** Clear all pooled objects. */
  clear(): void {
    this.items.length = 0;
  }
}

Three real consumers, three patterns, one primitive:

ParticlePool (packages/particles/src/particle-pool.ts) wraps Pool<Particle> and calls prewarm(256) at construction - 256 particle slots ready before the first frame. The update() method uses a swap-and-pop release so the active list never shifts: when a particle's lifetime hits zero, it swaps with the last element and pops, returning the dead particle to the pool in O(1).

SpriteBatch (packages/render/src/sprite-batch.ts) keeps an internal _pool: SpriteDrawCommand[] and an _acquire() method that pops from the pool or calls the factory. Every draw() call writes into a recycled command object. At flush(), every command returns to the pool and _commands.length = 0 resets the list in place - no splice, no filter, no GC.

Pan Tvardowski explosion FX slots - the third pattern is module-level rather than class-level: a fixed-size array prewarmed at module load, not inside a constructor:

Lifecycle of one slot through that pool:

100%

The slots themselves, prewarmed at module load in Pan Tvardowski:

// apps/pan-tvardowski/src/fx/explosions.ts
const SPIRAL_POOL_SIZE = 8;
 
interface SpiralSlot {
  entityId: number;
  active: boolean;
}
 
const _spiralPool: SpiralSlot[] = [];
for (let _i = 0; _i < SPIRAL_POOL_SIZE; _i++) {
  _spiralPool.push({ entityId: -1, active: false });
}

The key point: prewarm sizing is a scene-load decision, not a per-frame one. The worst time to grow a pool is during gameplay. All three patterns - class-level with constructor prewarm, class-level with internal pop/factory, and module-level with literal array init - move that cost to module or scene load. Once the game loop is running, no pool grows.

Scratch state and the module-level linear congruential generator

Random numbers should not allocate either.

Math.random() is a closure call. A per-call array of candidate values is worse. The engine uses a module-level scalar _lcgS with a linear congruential generator (LCG) function _lcgNext() - re-seeded per call from the spawn position so identical spawns produce identical particle sequences:

// apps/pan-tvardowski/src/fx/explosions.ts
// Scratch LCG state - zero-alloc; re-seeded per call.
let _lcgS = 1;
 
function _lcgNext(): number {
  _lcgS = (_lcgS * 16807) % 2147483647;
  return (_lcgS - 1) / 2147483646;
}
 
// [inside _spawnParticles:]
// Seed the LCG reproducibly from position so identical spawns are consistent.
_lcgS = (x * 31 + y * 17) | 0 || 1;

No new, no array, no closure capture per call. One number on the module scope, rewritten in place. The deterministic seeding from spawn position is load-bearing for replay testing - a future post will cover deterministic game testing in depth.

The render side - atlas batching is part of the allocation story

A draw call is the heaviest object in the system. Allocating fewer is as important as pooling particles.

SpriteBatch.flush() (packages/render/src/sprite-batch.ts) groups all queued draw commands by the triple (image, effect, alpha) - the atlas key. Every command that shares the same key goes into one batch. The render call fires once per batch - one draw call per atlas texture batch (per-texture, not per-sprite). A change in effect or alpha breaks a batch by design; that is a structural property, not a bug.

At the end of flush(), every SpriteDrawCommand returns to the internal pool, _commands.length = 0 resets the list, and the batch records are cleared in place. The command objects survive across frames - the frame boundary is a reset, not a reconstruction.

Microbench figures from the Bun harness (harness, not device - see Limits):

Benchmark Config Result Throughput
Entity-Component-System (ECS) step 1,000 entities (Pos+Vel) 23.7 µs/frame 42.2K ops/sec
Render flush 1,000 sprites in 1 atlas 26.6 µs/frame 37.6K ops/sec
Integration frame 1,000 entities + 300 particles 9.01 ms/frame 111 ops/sec

The integration frame - Entity-Component-System step plus render flush plus particle update, 1,000 entities and 300 particles - completes in 9.01 ms on the desktop harness. That is inside the 16.67 ms / 60-fps budget by roughly 40%. The device p95 numbers (the scenario table earlier in this post) are what matter for shipped behaviour; the harness figures show the math fits on a desktop CPU.

What the measurements say - and do not

The three devices in this sweep span roughly two years of Android hardware: a Galaxy A55 (Exynos 1480, Android 14) as a mid-tier reference, a Pixel 6 (Tensor G1, Android 13) as an older flagship, and a Pixel 10 (Tensor G5, Android 16, 120 Hz) as a current flagship. All ran release builds on Hermes. iOS measurements are pending; no JavaScriptCore comparison exists yet.

Every cell in the sweep cleared the 16.67 ms / 60 fps budget at p95. The worst cell (Pixel 10, particle-storm, 9.52 ms) is the floor of the headroom claim - everything else sat further inside the frame. With n=1 per cell, do not read inter-device differences as device rankings; read them as a baseline distribution.

The harness microbenches (23.7 µs ECS step, 26.6 µs render flush, 9.01 ms integration frame) are from bun bench on a desktop CPU, not a phone CPU. They validate that the math fits the budget in isolation. The device p95 numbers above are what happened on actual hardware. Both matter; they are not the same measurement.

As of 2026-05-20, the engine carried approximately 2,481 tests green and Pan Tvardowski carried 1,168 - the p95 numbers above came from that same engine running the shipped game's gameplay code paths, not from a synthetic stress harness.

Limits

Three Android devices, single OS family. Every measurement in this post is Android, release build, Hermes runtime. No iOS, no JavaScriptCore (JSC) comparison. The sweep names three devices spanning a year-old mid-tier to a current flagship; with n=1 per cell this is a baseline, not a fleet model.

p95 is one metric. The long tail (p99, p99.9, frame spikes) is not in this snapshot. A frame that exceeds the 16.67 ms budget but stays inside p95 is hidden by definition. A nightly cron is now accumulating samples so a future post can publish rolling medians plus the long tail.

The April A54 stress test is retired. That run used entity-count synthetic loads (10/30/50 enemies, 50/150/300 bullets) and reported average fps - including a heavy-mode dip to 48 fps. The new sweep traded that for p95 frame-time across in-game scenarios on three devices. The two datasets are not directly comparable; the May sweep is closer to shipped gameplay, the April sweep pushed harder on raw entity counts.

Harness microbenches are not device measurements. 23.7 µs Entity-Component-System (ECS) step and 26.6 µs render flush are from bun bench on a desktop dev machine. They show the algorithm fits the frame budget; the device p95 numbers in the scenario table are what actually happens on a phone.

Discipline, not a tool. No lint rule catches "you closed over a fresh { x, y } inside a worklet." Code review catches it; tests do not. The normalizeOut signature communicates intent but cannot enforce that every caller supplies a pre-existing buffer. The senior dev's job is to recognise the pattern and reject the natural autocomplete completion - which returns a fresh object - in any hot path.

Hermes regime, 2026. Hades concurrent GC keeps pauses short but not zero. A closure-allocating callsite inside a 300-iteration update loop still moves the needle. The claim is that the discipline is robust under Hermes' GC model - not that Hermes removes the problem.

The engine is not open-source software (OSS) yet. The October 2026 Release Candidate (RC) target is when the engine publishes to npm. The code blocks and provenance paths in this post are the current public record; readers cannot inspect the repository today.

Not the orchestration sequel. The patterns in this post predate any AI-directed session - they are discipline, not orchestration.

Close

The rule a reader can apply this week: in any hot loop, name the buffer you write into. If the call signature does not accept an out, fix the signature first.

The same discipline applies outside games. A React Native app running a Reanimated worklet, a Skia canvas, or a gesture handler doing per-frame coordinate math wants the same shape: pre-allocate the state object, mutate it in place on update, never close over a fresh { x, y } inside the worklet. An animated dashboard rendering 200 cards at 60 fps has the same problem as a game rendering 200 bullets - the frame budget is the frame budget.

If you want the context this post grew from: the 3-week build story for the origin, the engine-gap post for the why, and the architecture deep-dive for the structural layer the discipline runs inside. The next deep-dive covers the Skia Atlas pipeline in detail - how useRSXformBuffer drives the per-texture batching at higher sprite counts.

Follow on X / Twitter or subscribe to the RSS feed for the next post.

Related