Serialization Optimised ECS
January 5, 2026 by Farms
In the last post I sketched out the shape of a Multitap game: three loops running at different rates, passing state between them. The simulation loop computes the next state from inputs, the render loop draws the state to the screen, and the input loop keeps everyone's inputs in sync.
But what actually is this "state" we're passing around?
The serialization problem
Our architecture has state crossing boundaries constantly:
The simulation runs in a WebAssembly module. The renderer is React Three Fiber running on the main thread. If we want to keep the main thread responsive we probably want the simulation ticking away in a worker. In browser land crossing these boundaries means going via the Channel Messaging API, either copying object structures or carefully passing arraybuffers.
Optimistic Strategies like prediction/eager-application (which intend to implement) will further exagdrate this issue. Instead of a single state copy in/out of wasm land per tick, it could be dozens when rollback is triggered or even hundreds during some kind of a "catch up" phase where we need to replay multiple previous ticks of input data.
So keeping any kind of serialization/deserialization of the state to an absolute minimum is going to be an unusually major constraint in this design.
ECS, but not for the usual reasons
Entity Component Systems are usually pitched on cache locality and query performance. The Struct-of-Arrays layout means iterating over all positions in a game world touches contiguous memory, which makes your CPU's cache happy.
I'm not planning on making anything that gets anywhere near the kind of size where cache locality comes into play, but an ECS structure offers something else that is interesting to me: a simple, predictable memory layout.
If I define a schema upfront that describes all my component types and their fields, I can calculate the exact byte offset of every piece of data at build time. Position component for entity 42? That's at byte offset 1,234. Velocity Y for entity 7? Byte offset 892.
If I also fix the maximum number of entities upfront, I can pre-allocate the entire state buffer as a single contiguous block of memory. No dynamic allocation, no resizing, no pointer chasing.
And if the memory layout is fixed and known, I don't need to serialize anything. The state is already just bytes. Copying state becomes memcpy. Passing state to WASM becomes copying bytes into linear memory. Passing state to a worker becomes transferring an ArrayBuffer. No encoding, no decoding.
The state is the bytes. No serialization.
A fixed-size buffer
A buffer layout in memory could look something like:
Every entity gets a slot in the entity table with a generation counter (for detecting stale references) and a bitmask tracking which components it has. The component data itself is stored in separate arrays, one per component type. Singletons like match state or global timers get their own section at the end.
The total size could be calculated from the schema representing the types of components we plan to use at build time.
A game with Position, Velocity, Health components and 1000 max entities might come out to around 25KB. That's the size regardless of how many entities are actually alive. Wasteful? Very! But predictable.
We're not building a framework for everyone. We can make questionable decisions that work for our specific constraints. Like requiring a fixed slab of memory up front.
Schema-first design
Requiring that all components be declared up front will let us precompute the state layout.
A schema might look something like:
{
"maxEntities": 100,
"components": [
{ "name": "Position", "type": "vec3" },
{ "name": "Velocity", "type": "vec3" },
{ "name": "Health", "type": "compound", "fields": [
{ "name": "current", "type": "int16" },
{ "name": "max", "type": "int16" }
]},
{ "name": "MatchState", "type": "compound", "singleton": true, "fields": [
{ "name": "score", "type": "int32" },
{ "name": "timeRemaining", "type": "f32" }
]},
{ "name": "IsDead", "type": "tag" }
]
}
The order of components in the array determines their bit position in the bitmask and their order in memory. Arrays preserve ordering in all languages, which matters when we need to generate identical layouts from TypeScript and Rust.
Tags like IsDead are components with no data. They just set a bit in the bitmask to mark presence. Zero storage cost.
Singletons are components that exist once for the whole match rather than per-entity. Global scores, timers, configuration. They get stored separately from the per-entity component arrays.
Codegen for two languages
From a single schema we can then generate accessor code for both TypeScript (renderer) and Rust (simulation). Both sets of accessors operate on the same memory layout with identical offsets.
The TypeScript side might look like:
// Auto-generated from schema
export function getPositionX(state: DataView, entity: EntityRef): number {
const offset = POSITION_OFFSET + unpackIndex(entity) * POSITION_SIZE;
return state.getFloat32(offset, true);
}
export function setPositionX(state: DataView, entity: EntityRef, value: number): void {
const offset = POSITION_OFFSET + unpackIndex(entity) * POSITION_SIZE;
state.setFloat32(offset, value, true);
}
And the Rust side:
// Auto-generated from schema
pub unsafe fn get_position_x(state: *const u8, entity: EntityRef) -> f32 {
let offset = POSITION_OFFSET + unpack_index(entity) as u32 * POSITION_SIZE;
load_f32(state.add(offset as usize))
}
pub unsafe fn set_position_x(state: *mut u8, entity: EntityRef, value: f32) {
let offset = POSITION_OFFSET + unpack_index(entity) as u32 * POSITION_SIZE;
store_f32(state.add(offset as usize), value);
}
The API is verbose. Lots of getPositionX, setPositionX, getPositionY, setPositionY functions. But that's fine. Our primary consumer is an AI agent, and agents don't mind repetitive APIs as long as the usage is crystal clear. No ambiguity about what each function does.
WASM linear memory
WebAssembly has a linear memory model. The WASM module gets a flat ArrayBuffer that it can read and write directly. From JavaScript, that same ArrayBuffer is accessible as wasmInstance.exports.memory.buffer.
This is perfect for our minimal-serialization goal. Before calling the simulation:
- Copy state bytes into WASM linear memory
- Call the simulation function (passing the offset where state lives)
- Simulation reads and writes state directly in place
- After returning, copy state bytes back out
No encoding at the function boundary. No decoding on the WASM side. The simulation just gets a pointer to bytes laid out exactly how it expects. Accessors read and write at known offsets.
The copies in and out are just memcpy operations. Fast. And we only need to do them once per tick rather than serializing every field individually.
Entity references
One detail worth mentioning: how do entities reference each other? If a projectile needs to track which player fired it, that reference needs to survive across ticks and be safe even if the player despawns.
We pack entity references into 32 bits: 16 bits for the slot index, 16 bits for a generation counter.
EntityRef: [ generation (16 bits) | index (16 bits) ]
When an entity is despawned, its generation counter increments. Any old references still holding the previous generation will fail validation when dereferenced. No dangling pointers, no use-after-free, and it all lives in the same fixed buffer with the same zero-serialization property.
Trade-offs
Nothing's free. This design trades away some things traditional ECS implementations care about:
The fixed allocation wastes memory. An entity slot reserves space for all components whether that entity has them or not. A sparse world with few entities but many component types will have lots of unused bytes.
No runtime schema changes. Components are defined at build time. You can't add new component types while the game is running.
Queries will be O(n) - at least initially. We scan all entity slots checking bitmasks. No acceleration structures, no archetype tables. For our target of 100-1000 entities in quick arcade games this is fine. Maybe I'll revisit this.
But in exchange we get very effceient state copying. When rollback networking needs to re-compute dozens of ticks within a single frame, that matters more than query performance. At least at my toy scale.
Enough rambling
So that's the shape of the state layer:
- A schema-first design
- A fixed binary buffer
- ECS-shaped
- accessors generated for multiple languages
- designed around skipping serialization entirely.
It's probably not super original, but I couldn't find much out there where people are building ECS shaped things around minimizing serialization costs. That said, there are like a million ECS implementations out there, I didn't check them all.
Next up I need a little design around inputs and the wire format. But it's coming together.