Building a union from scratch

Three ways to construct a union type, depending on what the analyser is doing.

Method 1: literal construction with TypeBuilder

For when the analyser knows the elements ahead of time (a parsed PHP type expression, a hand-coded fixture):

use suffete::{TypeBuilder, prelude::{INT, STRING, NULL}};

let int_or_string_or_null = TypeBuilder::new()
    .push(INT)
    .push(STRING)
    .push(NULL)
    .build();

build is structural: it sorts and dedups but does not apply join's canonicalisation rules (no range merging, no literal collapse, no subtype absorption).

If the analyser wants those rules applied:

let canonical = TypeBuilder::new()
    .push(INT)
    .push(STRING)
    .push(NULL)
    .build_canonical();

build_canonical runs the join's canonicalisation pass: subsumption, range merging, literal collapse, etc. Use it when the analyser wants the smallest representation.

Method 2: incremental, via join::compute

For when the analyser is producing a union from a set of computed values (e.g. the union of a switch's case values, or the join of a control-flow merge):

use suffete::{TypeId, join};

fn join_all(types: &[TypeId]) -> TypeId {
    let mut elements = Vec::new();
    for &t in types {
        elements.extend_from_slice(t.as_ref().elements);
    }
    TypeId::union(&join::compute(&elements))
}

join::compute takes the combined element multiset and returns the canonical element list. Wrapping it in TypeId::union interns the result. The result is the canonical union of every input.

The join::compute pass applies the canonicalisation rules: range merging, literal collapse, etc.

Method 3: from_type for incremental modification

For when the analyser already has a union and wants to push one or two more elements:

use suffete::{TypeBuilder, prelude::NULL};

let nullable = TypeBuilder::from_type(existing_type)
    .push(NULL)
    .build();

from_type enables the origin short-circuit: if the buffer is unchanged after the mutations, build() returns the original TypeId without re-interning. Useful when the mutations might be no-ops.

Comparison of the three methods

MethodUse whenCanonicalisationCost
TypeBuilder::push * / build()Constructing from known elementsStructural only (sort, dedup)Sort + intern lookup
TypeBuilder::push * / build_canonical()Constructing from known elements, want canonical formFull canonicalisationSort + canonicalise + intern
join::compute over collected elementsFolding a sequenceFull canonicalisation1 intern + 1 canonicalise
TypeBuilder::from_type / push / build()Adding to an existing typeStructural onlyOrigin check + sort + intern

For most analyser code, build (or build_canonical if you need the canonical form) is the right choice. The join::compute pattern is for pipelines where each input is computed lazily.

Worked example: union of switch case types

function f(int $kind): string {
    return match ($kind) {
        1, 2 => 'a',
        3 => 'b',
        default => 'c',
    };
}

The analyser computes the type of each match-arm body and joins them to produce the function's return type:

let arm1 = ...; // type of 'a' = literal "a"
let arm2 = ...; // type of 'b' = literal "b"
let arm3 = ...; // type of 'c' = literal "c"

let return_t = join_all(&[arm1, arm2, arm3]);
// return_t == "a"|"b"|"c"  (or, after literal collapse threshold, just `string`)

If the analyser wants to preserve the literals (no collapse), use TypeBuilder::push * / build. If the analyser wants the canonical form (let the lattice decide whether to collapse), use join::compute (which uses the lattice's literal-collapse threshold) or build_canonical.

Worked example: making a type nullable

use suffete::{TypeBuilder, prelude::NULL, predicates::contains_null};

fn make_nullable(t: TypeId) -> TypeId {
    if contains_null(t) {
        return t;  // already nullable
    }
    TypeBuilder::from_type(t).push(NULL).build()
}

The contains_null check avoids the unnecessary intern lookup for already-nullable inputs. The from_type ensures the origin short-circuit fires when (paradoxically) the type was already nullable but we hit this code anyway.

Worked example: union of class names

The analyser has a list of classes the user declared in a docblock and wants to construct the type:

use suffete::{TypeBuilder, ElementId};
use mago_word::Word;

fn classes_to_union(names: &[Word]) -> TypeId {
    let mut b = TypeBuilder::new();
    for &name in names {
        b.push(ElementId::object_named(name.as_bytes()));
    }
    b.build()
}

let t = classes_to_union(&[
    mago_word::word(b"Foo"),
    mago_word::word(b"Bar"),
    mago_word::word(b"Baz"),
]);
// t == Foo | Bar | Baz

If two of the classes are related by inheritance, build_canonical (or a separate join::compute pass) would absorb the descendant ; build keeps them distinct.

Performance notes

  • TypeBuilder::push is O(1).
  • build is O(n log n) for the sort plus the interner cost. The interner has a fast path that detects already-sorted-and-unique input via SIMD.
  • build_canonical is O(n²) worst case (subsumption is pairwise), but typical inputs are small.
  • join::compute is the canonicalisation cost over the collected element multiset, O(N).

For analyser hot loops constructing many unions, consider:

  • Reusing one TypeBuilder instance across iterations (clear and re-push).
  • Pre-sorting if you know the input is canonical, so the interner's fast path fires.
  • Choosing build over build_canonical if you don't need the canonical form ; the latter is strictly more expensive.

See also: TypeBuilder for the construction API; join for the canonicalisation rules; Predicates for the contains_null and friends used as fast pre-checks.