Solutions: 28 — Sort for locality

Exercise 1 — Compute spatial cells

#![allow(unused)]
fn main() {
fn spatial_cell(pos: (f32, f32), cell_size: f32) -> u32 {
    let x = (pos.0 / cell_size).floor() as i32 as u32 & 0xFFFF;
    let y = (pos.1 / cell_size).floor() as i32 as u32 & 0xFFFF;
    (x << 16) | y
}
}

For 1 000 random creatures in a 100 × 100 world with cell_size = 10:

#![allow(unused)]
fn main() {
let mut hist = std::collections::BTreeMap::new();
for &p in &pos {
    *hist.entry(spatial_cell(p, 10.0)).or_insert(0) += 1;
}
}

Output: roughly 100 cells, each holding ~10 creatures (uniform distribution). A skewed distribution would cluster.

Exercise 2 — Sort by cell

#![allow(unused)]
fn main() {
fn sort_creatures_for_locality(world: &mut World, cell_size: f32) {
    let n = world.pos.len();
    let mut order: Vec<usize> = (0..n).collect();
    order.sort_by_key(|&i| spatial_cell(world.pos[i], cell_size));

    apply_permutation_inplace(&mut world.pos, &order);
    apply_permutation_inplace(&mut world.vel, &order);
    apply_permutation_inplace(&mut world.energy, &order);
    apply_permutation_inplace(&mut world.id, &order);
    // also: gen, birth_t

    // Rebuild id_to_slot
    for (new_slot, &id) in world.id.iter().enumerate() {
        world.id_to_slot[id as usize] = new_slot as u32;
    }
}
}

After the sort, pos[0..10] are all in the same cell (or a small number of adjacent cells). Spatial neighbours are now memory neighbours.

Exercise 4 — Time `next_event`

A naive next_event for each creature scans the next 100 entries of pos:

Pre-sort: those 100 entries are random — random RAM access, ~50–100 ns per check, ~5 µs per creature. At 1M creatures, ~5 seconds per tick. Impossible.

Post-sort: those 100 entries are spatial neighbours — sequential reads, ~1–2 ns per check, ~150 ns per creature. At 1M creatures, ~150 ms per tick. Still over budget for 30 Hz, but ~30× faster.

The combination with the sort cadence (exercise 5) usually brings this in budget.

Exercise 5 — Sort cadence

Sort cost at 1M: ~10 ms. Per-tick savings on next_event post-sort: ~500 ms (compared to pre-sort). One sort every 50 ticks amortises sort cost to 0.2 ms/tick — vastly cheaper than the savings.

If the world’s positions barely change tick-to-tick, you can sort even less often. If positions change wildly, you need more frequent sorts. The right cadence is data-dependent and worth measuring.

A simple stripe packing puts cells with the same x in adjacent linear positions. A Z-order (Morton) curve interleaves x and y bits, so spatial neighbours in 2D are usually neighbours in the linear order — even across “stripe” boundaries.

A Morton encoder for 16-bit x, y:

#![allow(unused)]
fn main() {
fn morton_2d(x: u16, y: u16) -> u32 {
    let mut x = x as u32;
    let mut y = y as u32;
    x = (x | (x << 8)) & 0x00FF00FF;
    x = (x | (x << 4)) & 0x0F0F0F0F;
    x = (x | (x << 2)) & 0x33333333;
    x = (x | (x << 1)) & 0x55555555;
    y = (y | (y << 8)) & 0x00FF00FF;
    y = (y | (y << 4)) & 0x0F0F0F0F;
    y = (y | (y << 2)) & 0x33333333;
    y = (y | (y << 1)) & 0x55555555;
    x | (y << 1)
}
}

Z-order typically gives ~10–30 % better cache locality than stripe packing on 2D access patterns. The cost is a few more bit operations per row.

An Introduction to Programming, using ECS & EBP in Rust

Solutions: 28 — Sort for locality

Exercise 1 — Compute spatial cells

Exercise 2 — Sort by cell

Exercise 4 — Time `next_event`

Exercise 5 — Sort cadence

Exercise 6 — Z-order

Keyboard shortcuts

An Introduction to Programming, using ECS & EBP in Rust

Solutions: 28 — Sort for locality

Exercise 1 — Compute spatial cells

Exercise 2 — Sort by cell

Exercise 4 — Time next_event

Exercise 5 — Sort cadence

Exercise 6 — Z-order

Exercise 4 — Time `next_event`