Interning and the arenas
Every interesting value in suffete — an Element, a Type, an element list, a known-items list, a defining entity, a callable signature — is interned. This chapter covers how interning works and why suffete leans on it for both correctness and performance.
This is internals reading. The public API works without understanding it. But if you're optimising analyser-side code or debugging an arena-related issue, this chapter is the one.
What interning gives you
Two ElementIds with the same logical content are the same handle. Equality is u32 == u32. Hashing is u32 as u64. Two parts of a process that construct the same Element independently end up with the same handle and never know they raced.
Three consequences:
- Cheap equality. No structural compare, no recursion, no hash chain walk. One CPU compare.
- Stable identity. A
TypeIdconstructed today is the same handle as one constructed tomorrow (same process). Useful for caching. - Compact representation. Every Element is one
u32. ATypeIdis oneu64. A union ofnElements is oneu64plus a borrowed slice ofnu32s.
What interning costs
Interning has a fixed memory cost: the dedup table. Suffete uses dashmap::DashMap for the dedup, which is concurrent-friendly but not zero-cost. The lookup is a hash of the value plus a bucket walk.
For a content-keyed value, the hash is over the content (the kind tag plus the payload). For a slice-keyed value, the hash is over the slice contents.
The intern operation:
- Hash the input.
- Probe the dashmap.
- On a hit, return the existing slot.
- On a miss, allocate the value, push it into the per-kind arena, return the new slot.
Steps 2-3 dominate: the hash is fast, the dashmap lookup is the cost.
The arena types
Two basic shapes:
Arena<T>— interns whole values ofT. One slot per uniqueT. Used for: per-element-kind payloads, defining entities, callable signatures, known-items lists, etc.SliceArena<T>— interns slices ofT. One slot per unique slice content. The slice is leaked to'staticon first sight. Used for: the union body slice, object type-args, derived type lists, known-property lists, etc.
Both store underlying values in a boxcar::Vec (a thread-safe append-only vector). Both expose 1-based slot indices ; slot 0 is reserved as the NonZero{U32,U64} niche.
The interner singleton
There is exactly one interner per process. It contains every arena, lives behind a OnceLock for the process lifetime, and is Sync ; concurrent use from multiple threads is supported and uncontended in the common case (the dashmap shards by hash).
Per-element-kind arenas
Each payload-bearing Element kind gets its own arena. The interner exposes per-kind intern + lookup methods. Trivial-kind elements (no payload) don't need interning ; their handles are constructed directly from the kind tag and the canonical slot, and the prelude exposes them as constants.
The Type interner
The Type arena is keyed on the content slice plus the flags. Two Types with the same elements (in any order) and the same flags are the same TypeId.
Type interning does:
- Sort and dedup the input slice (canonicalisation).
- Intern the sorted slice into the element-list slice arena to get a stable handle.
- Intern the
(slice_handle, flags)pair into the Type arena to get a slot. - Construct the
TypeIdfrom(slot, flags, meta=0, reserved=0).
The sort+dedup step uses sort_unstable + dedup, with a SIMD prefilter that skips both when the input is already canonical. Most analyser-side calls hit the fast path because the join layer produces canonical output.
A singleton-Type cache is layered on top: interning a one-element slice is a hashmap lookup keyed on the single Element ID, skipping the sort + the dashmap hash entirely.
Memory growth
Arenas only grow. There is no GC, no reference counting, no compaction. The trade-off:
- Pro: every handle is stable for the process lifetime; no use-after-free; no churn.
- Con: a long-running analyser session that processes many distinct types accumulates them all.
In practice, the analyser's working set saturates: most types are constructed once and reused many times, and the arena converges. Mago's expected memory budget for the type arenas is in the tens of megabytes for a large monorepo.
If the analyser truly needs per-session deallocation (e.g. an LSP server processing many distinct codebases), the recommendation is to spawn a fresh process per session. Suffete does not support arena resets within a process.
Concurrency
The dashmap is the concurrency point. Two threads interning the same value race on the dashmap entry; whichever wins inserts, the other reads back the winner's slot. Both end up with the same handle.
The arenas themselves (boxcar) are append-only and lock-free for read; writes synchronise minimally. Two threads pushing different values append independently.
Performance numbers
Approximate, on a modern x86_64 desktop:
- Trivial-kind handle access: zero ; it's a
const. - Cache-hit intern: ~30ns.
- Cache-miss intern (first time this payload is seen): ~100ns plus the cost of the boxed clone if the content is fresh.
- Singleton-Type cache hit: ~10ns.
- Slow-path Type intern: ~100-300ns depending on slice length.
- Payload lookup by handle: ~5ns; one boxcar index plus a deref.
The lattice's hot loops are dominated by the family rules, not interning. Interning is the constant-cost overhead.
See also: The ElementId tag layout, SIMD scans, Performance philosophy.