Skip to content
·[render]·8 min read

Skia Atlas in React Native: batching 2,000 sprites into one draw call per texture

Skia Atlas React Native: how drawAtlas batches sprites into one draw call per texture, why the declarative path stalls on budget Android, and the imperative PictureRecorder path that doesn't - with benchmarks.

The default way to render many moving sprites in React Native Skia is to call drawImage once per sprite per frame. At a hundred sprites that is fine on most hardware. As the count climbs into the hundreds and past a thousand, you are issuing that many separate GPU commands every frame - and on budget Android that is where it bites. That is the problem drawAtlas was built to solve.

Skia Atlas React Native batching collapses N sprites that share the same texture atlas into one draw call: one GPU command, one state switch, one call to canvas.drawAtlas(image, sourceRects, xforms, paint). The GPU processes the whole list in one pass. This is not a trick specific to game engines; the same primitive can flatten an animated data-viz card, a map-marker cluster, or a confetti burst just as effectively.

This post covers the drawAtlas primitive itself, both ways @shopify/react-native-skia exposes it (declarative and imperative), the trade-offs between them, and the real numbers behind the engine's choice.

The cost model: why one draw call per texture matters

Every draw call carries overhead beyond the pixels it paints: binding a texture, uploading a paint state, issuing the command, and synchronizing with the GPU command queue. On desktop WebGL that overhead is cheap. On Android's Skia path inside a React Native app it is not.

With a per-sprite approach, N sprites = N draw calls. The GPU is fast at rendering pixels but is not designed for issuing and switching state thousands of times per frame. You hit the driver's state-change ceiling before you hit the fill-rate ceiling.

An atlas packs many sprite frames into one texture. drawAtlas tells the GPU: here is the texture, here is an array of source rectangles (which frame to sample), here is an array of transforms (where and how big to draw each one), draw them all. That is one draw call regardless of N.

The drawAtlas primitive

@shopify/react-native-skia exposes the Skia C++ drawAtlas call directly on the canvas. Its signature:

// pseudocode - from the rn-skia docs, not engine code
canvas.drawAtlas(
  image,        // SkImage - the packed sprite sheet
  rects,        // SkHostRect[] - source frame in the texture per sprite
  xforms,       // SkRSXform[] - scale+rotation+translation per sprite
  paint,        // SkPaint - tint, alpha, blend mode (shared for the batch)
);

All four arrays are parallel: element i in rects pairs with element i in xforms. The constraint is that a single paint applies to the whole batch, which is why the engine groups sprites by (texture, effect, alpha) before issuing the call - sprites with different tints or alpha values go into separate batches, each getting their own paint.

The [scos, ssin, tx, ty] RSXform transform is worth understanding. scos is scale * cos(angle), ssin is scale * sin(angle), tx and ty are the translation. For axis-aligned scaling (no rotation) ssin = 0 and scos = dstWidth / srcWidth. The engine stores these per-sprite and mutates them in place each frame - zero allocation in steady state.

The declarative binding: <Atlas> + useRSXformBuffer

The rn-skia docs present a JSX-first API for Atlas. You allocate typed RSXform and rect buffers, fill them in a Reanimated worklet on the UI thread, then hand them to an <Atlas> component:

// pseudocode - from the rn-skia docs - the engine does NOT use this path
import { Atlas, useRSXformBuffer, useRectBuffer } from "@shopify/react-native-skia";
 
const xforms = useRSXformBuffer(spriteCount, (val, i) => {
  "worklet";
  val.set(scale, 0, positions[i].x, positions[i].y);
});
 
<Atlas image={spriteSheet} sprites={rects} transforms={xforms} />

The appeal is real: animated in a Reanimated worklet, the transforms run on the UI thread with no JS-thread involvement per frame. For hundreds of sprites with smooth, hardware-accelerated motion this is the happy path.

The community has found a wall, however. Skia GitHub issue #2521 documents that on cheap Android hardware the declarative path stalls: the JavaScript-to-native (JSI) call JsiRSXform.set fires once per sprite per frame. On an OPPO A16, that budget exhausts around 300 sprites before frame drops appear - a crowdsourced community report, not a controlled benchmark, so device and scene specifics shift the number. GitHub issue #2688 found that on a Samsung Galaxy A34 the Picture API outperformed the Atlas API even at 1,000 animated circles. The bottleneck is JSI object churn per sprite, not GPU fill rate.

The imperative binding: how SpriteBatch works

The engine takes the other route: imperative canvas.drawAtlas inside a PictureRecorder, with per-sprite transform math done in plain TypeScript on preallocated pools.

The grouping layer is SpriteBatch in @flare-engine/render. Each frame, game systems call batch.draw(...) for every visible sprite. When the frame owner flushes, SpriteBatch groups commands by (image, effect, alpha) identity and calls a drawAtlas callback once per group:

// packages/render/src/sprite-batch.ts
flush(
  drawAtlas: (image: SkImageLike, commands: SpriteDrawCommand[]) => void,
  renderer?: Renderer,
): void {
  if (this._commands.length === 0) return;
 
  const batches = this._batches;
  for (const cmd of this._commands) {
    let batch: Batch | undefined;
    for (let i = 0; i < batches.length; i++) {
      const b = batches[i]!;
      if (b.image === cmd.image && b.effect === cmd.effect && b.alpha === cmd.alpha) {
        batch = b;
        break;
      }
    }
    if (!batch) {
      batch = { image: cmd.image, effect: cmd.effect, alpha: cmd.alpha, commands: [] };
      batches.push(batch);
    }
    batch.commands.push(cmd);
  }
 
  for (let i = 0; i < batches.length; i++) {
    const batch = batches[i]!;
    drawAtlas(batch.image, batch.commands);
    if (renderer) { renderer.recordDrawCall(); renderer.recordBatch(); }
  }
  // ... pool cleanup
}

The drawAtlas callback is implemented in SkiaFrameRenderer (in @flare-engine/react). It fills preallocated SkHostRect[] and SkRSXform[] pools and calls the real Skia drawAtlas on the recording canvas:

// packages/react/src/skia-bridge.ts
function fillXforms(commands: SpriteDrawCommand[], n: number, dx: number, dy: number): void {
  for (let i = 0; i < n; i++) {
    const cmd = commands[i] as SpriteDrawCommand;
    const scos = cmd.srcW === 0 ? 1 : cmd.dstW / cmd.srcW;
    xforms[i]?.set(scos, 0, cmd.dstX + dx, cmd.dstY + dy);
  }
}
 
// inside drawAtlas callback (normal path - no outline, no zone-bg):
ensurePoolCapacity(n);
fillSrcRects(commands, n);
fillXforms(commands, n, 0, 0);
const srcs = srcRects.length === n ? srcRects : srcRects.slice(0, n);
const dsts = xforms.length === n ? xforms : xforms.slice(0, n);
const paint = effect === undefined ? defaultPaint : resolveEffectPaint(effect);
paint.setAlphaf(alpha);
current.drawAtlas(skImage, srcs, dsts, paint);
paint.setAlphaf(1);

Everything is done on preallocated pools - srcRects and xforms grow once and are mutated in place every frame. No JsiRSXform.set per sprite; xforms[i]?.set(...) mutates the existing SkRSXform object without crossing the JSI boundary per call.

The recording canvas current is opened by a PictureRecorder at the start of each frame and finished at the end. <GameView> then hands the resulting SkPicture to usePictureAsTexture, which converts it to an SkImage that a plain <Image> component can display:

// packages/react/src/game-view.tsx
const sceneTexture = usePictureAsTexture(picture, sceneSize);
 
// inside the onPreRender frame hook:
bridge.beginFrame(w, h, 0, pd);              // opens PictureRecorder
ctx.renderer.begin(clearColor);
ctx.renderer.applyCamera(/* ... */);
if (onRenderBackground) onRenderBackground(ctx, ctx.renderer.canvas);
renderSystem();                              // queues batch.draw() calls
ctx.batch.flush(bridge.drawAtlas, ctx.renderer); // one drawAtlas per group
if (onRenderPostFx) onRenderPostFx(ctx, ctx.renderer.canvas, bridge);
ctx.renderer.end();
const pic = bridge.endFrame();              // finishRecordingAsPicture
if (pic !== null) setPicture(pic);

The key line is ctx.batch.flush(bridge.drawAtlas, ctx.renderer). On a busy gameplay screen - enemies, bullets, and a few background sprites - that collapses to one drawAtlas call per unique texture+effect+alpha combination: often two or three draw calls total for the sprite pass.

The post-processing composite (zone backgrounds, runtime shaders on the player) is a separate <Image><RuntimeShader/></Image> JSX pass that runs after the picture is finished. It does not run through the Atlas pipeline; the Atlas pass and the post-processing pass are two distinct Skia operations.

Numbers: what is measured vs what is designed

Here is the full render microbenchmark table (off-device Bun, no GPU, no React Native bridge - measures queueing and grouping cost only):

Scenario Cost / frame (median) Ops / sec
flush 100 sprites / 1 atlas 3.13 µs 319.8K/s
flush 500 sprites / 1 atlas 13.70 µs 73.0K/s
flush 1,000 sprites / 1 atlas 26.6 µs 37.6K/s
flush 500 sprites / 4 atlas 16.95 µs 59.0K/s
flush 1,000 sprites / 8 atlas 35.79 µs 27.9K/s

Source: the engine's render benchmarks, generated 2026-04-22 with Bun 1.3.11 on win32 x64. These are off-device microbenches: they exercise the TypeScript queueing and grouping hot path, not actual Skia calls and not on-device frame time. Think of them as a regression gate and an order-of-magnitude sizing tool, not a frame-time prediction.

Two things the table shows clearly. First, 1,000 sprites through a single atlas costs 26.6 µs to queue and group - a tiny fraction of a 16.67 ms frame budget. Second, more atlas textures cost more: spreading 1,000 sprites across 8 atlases costs 35.79 µs, 35% more than the 1-atlas case. The batching benefit is per-texture; eight draw calls instead of one. Pack your sprite sheets.

The on-device measured proof is the E8 TestingBot baseline (TestingBot Maestro, 2026-05-26, n=1 per cell). The full Pan Tvardowski render frame - entity loop, physics, sprite batching, post-processing composite - held 60 fps with headroom across three real Android phones (two of the five sweep scenarios shown):

Device boss-smok frame p95 particle-storm frame p95 budget at 60 fps
Galaxy A55 1.00 ms 3.67 ms 16.67 ms
Pixel 6 1.58 ms 6.69 ms 16.67 ms
Pixel 10 3.40 ms 9.52 ms 16.67 ms

These are full-frame numbers, not Atlas-isolated measurements: every scenario runs the batched sprite pass alongside the entity loop, physics, and the post-processing composite. None of them is a pure sprite-count stress test - they are real gameplay scenes (boss-smok drives one boss through its phase script; particle-storm saturates the particle cap). The worst cell in the matrix is the Pixel 10 particle-storm frame at 9.52 ms, which leaves 43% headroom under the 60 fps budget; particle-storm drives the engine's immediate-mode path (not Atlas), so it is the whole-engine floor, not an Atlas ceiling. The honest claim is narrow: across five real scenes on three devices, the frame that contains the Atlas pass never came close to the budget.

The engine's batch capacity is designed for 500–2,000 sprites (sprite-atlas-min / sprite-atlas-max, design bound dated 2026-04-24). That 2,000 is a design ceiling, not a measurement: no isolated 2,000-sprite device test has run. The E8 baseline is the real measured anchor.

For the device spread on the declarative path, community reports (Skia GitHub issue #2521 - not controlled benchmarks) range from roughly 300 sprites on an OPPO A16 before frame drops to roughly 15,000 on an iPhone 12 mini before slight drops begin. Device capability varies; treat both numbers as directional.

Limits

A few things this post does not prove and a few genuine constraints.

The 2,000 number is design capacity, not a measured benchmark. A render-isolated 2,000-sprite test would need a dedicated EAS build plus a full TestingBot sweep - deferred for now. The E8 baseline above, which covers real gameplay loads, is the honest on-device anchor in the meantime.

The A54 heavy-load 48 fps is physics-bound, not Atlas-bound. The Samsung Galaxy A54 5G heavy load (50 enemies + 300 bullets) averages 48 fps. The engine's own benchmarks pin the bottleneck on CollisionWorld.step at around 350 colliders, which consumes most of the 16.67 ms frame budget by itself. This number says nothing about sprite rendering; do not read it as an Atlas ceiling.

The 26.6 µs flush is an off-device microbench (covered above): a regression indicator for the TypeScript queueing and grouping path, not a device frame-time estimate.

More atlas textures cost more (covered above): batching is per-texture, so pack your sprites into as few sheets as you can.

The declarative useRSXformBuffer path has a measured JSI-per-transform stall on cheap Android hardware. GitHub issues #2521 and #2688 both document this. On budget devices the per-sprite JSI boundary crossing is the bottleneck, not the GPU. The engine chose the imperative PictureRecorder path to avoid this. The declarative path still works well on capable hardware and may be the simpler starting point if your target device tier is mid-range or above.

This is not the post-processing pipeline. The Atlas/PictureRecorder pass renders sprites to a texture. A separate JSX <Image><RuntimeShader/></Image> composite handles zone backgrounds and runtime shader effects afterwards. Those are two distinct passes; the post-processing shaders do not run through the Atlas pipeline.

Community numbers carry methodology caveats. The ~300 OPPO A16 floor and ~15,000 iPhone 12 mini headline both come from GitHub issue #2521 - crowdsourced reports, not controlled benchmarks. Device model, OS version, build type, and scene specifics all affect the number.

Close

If you are rendering more than a few hundred animated sprites, tiles, or data-driven markers in React Native, drawAtlas is the right primitive. Whether you take the declarative <Atlas> + useRSXformBuffer route or the imperative PictureRecorder route depends on your target device tier and how much transform arithmetic you want inside a Reanimated worklet.

The same batching discipline that keeps a game engine's sprite pass cheap applies directly to non-game React Native: an animated dashboard with 200 cards, a live scatter-plot with moving data points, a confetti burst, a map-marker cluster at high density. One draw call per texture, regardless of count.

The companion technique that makes the transform math zero-alloc - preallocated SkRSXform[] and SkHostRect[] pools mutated in place each frame - is covered in Zero-alloc game loops in TypeScript. And the context for why native GPU is the only credible path for any of this in React Native is in The React Native game engine gap in 2026.

Try it on your densest screen. One drawAtlas call where you had a thousand.

Related