Introduction
Suffete is a standalone PHP type system, written in Rust, designed to be the type-system core of a static analyzer.
It is not an analyzer. It does not parse PHP, walk control flow, infer types from expressions, or report diagnostics. It is the layer underneath all of those: a representation of every PHP type a real-world analyzer needs to express, plus the operations that move types around the lattice — subtype check, union, intersection, set difference, narrowing under assertions, generic substitution, generic inference.
If you are building an analyzer, suffete answers the questions you would otherwise have to answer yourself, and it answers them with a stable contract: same inputs, same answers, no spooky action across the codebase.
What this book covers
The book is organised in eight parts. Each part stands on its own; you don't need to read them in order, but the earlier parts assume less.
- Part I — Foundations — what suffete is, why it exists as a separate crate, a quick guided tour, and the notation we use throughout.
- Part II — The Type Universe — every PHP type expressible in suffete: scalars, special elements, objects, arrays, lists, iterables, callables, resources, class-like strings, refinement axes, the
NegatedandIntersectedwrappers, unresolved elements, and templates. - Part III — The Lattice — the operations that relate and combine types:
refines,overlaps,meet,join,subtract,narrow. Plus the algebraic laws every operation is required to satisfy. - Part IV — Generics — template parameters, variance, capture-free substitution, the
standininference round, and the inheritance specialisation that resolves what a class supplies to a parameter of one of its ancestors. - Part V — Public API — the Rust crate surface:
TypeId,ElementId,TypeBuilder, predicates, inspection, transformation, theWorldtrait, casting, expansion, serialization. - Part VI — Cookbook — recipes that compose the API into the operations a real analyzer needs.
- Part VII — Internals — optional reading, for when you want to know how the sausage is made: interning, the bit layout of a handle, the SIMD scans, the performance philosophy.
- Part VIII — Reference — terse, scannable reference material: every element kind, every lattice option, every prelude constant.
Who this book is for
You know PHP. You know Rust. You know what a static analyzer is and why one would want a type system to lean on. You may not know much programming-language theory, but you know what a tree is, what a set is, what a partial order is.
The book does not try to teach PL theory from scratch. Where we use a term like "lattice", "subtype", "variance", or "least upper bound", we define it the first time it shows up and assume it after that.
A note on naming
The PL-theory term for the indivisible pieces a type is built from is atom. Suffete code calls them Elements. This book uses Element throughout to match the code, but you will occasionally see "atom" in references to the underlying theory. Treat the words as interchangeable.
The crate also exposes a separately-named type called mago_atom::Atom (re-exported as the atom! macro). That is unrelated — it is an interned string handle (a name pool), not a type-system element.
A note on stability
Suffete is pre-1.0 and the API is unstable. The book documents the API as it stands at the latest commit on main. If you are pinning to a published version, prefer cargo doc --open for the version-correct API; this book is the conceptual reference, not a frozen release artifact.
How to read examples
PHP source is shown as:
function f(int|string $x): bool { /* ... */ }
Rust against the suffete crate is shown as:
use suffete::{TypeBuilder, TypeId, prelude::{INT, STRING}};
let union: TypeId = TypeBuilder::new()
.push(INT)
.push(STRING)
.build();
Mathematical notation uses MathJax. The subtype relation is $\tau \mathrel{<:} \sigma$ ; least upper bound is $\tau \sqcup \sigma$ ; greatest lower bound is $\tau \sqcap \sigma$ ; set difference is $\tau \setminus \sigma$ ; bottom is $\bot$ ; top is $\top$. The full notation table is in the glossary.
Diagrams use mermaid:
graph LR
Never[never] --> Int[int]
Never --> String[string]
Int --> Scalar[scalar]
String --> Scalar
Scalar --> Mixed[mixed]
When you find something wrong
If a definition contradicts the code, the code wins and the book is buggy. Please open an issue or a PR; the book lives in book/ of the suffete repository and accepts the same kind of contributions as the code.
What suffete is
Suffete is a Rust crate that implements the PHP type system as a self-contained, queryable data structure with a comprehensive set of operations.
It has three deliverables:
- A representation. A handle-based, intern-deduplicated, content-addressed data model that can express every PHP type a real-world analyzer needs to express.
- Operations on that representation. Subtyping, overlap, intersection, union, set difference, narrowing, generic substitution, generic inference, expansion of unresolved forms, structural transformations.
- A
Worldtrait. The single abstraction by which the type system asks questions about the user's codebase: "does classDextend classC?", "what does classCdeclare for propertyp?", "what is the upper bound of template parameterTon classC?". An analyzer plugs in its codebase model behind this trait; suffete itself stays codebase-agnostic.
That is the entire surface. There is no parser. There is no AST. There is no notion of a file, a statement, a scope, a control-flow graph, a diagnostic, or a configuration.
What suffete does
Concretely, given two TypeIds and a World, suffete answers:
- $\tau \mathrel{<:} \sigma$ — does every value of type $\tau$ also have type $\sigma$? (refines)
- $\tau \mathrel{\#} \sigma$ — are $\tau$ and $\sigma$ disjoint? (overlaps)
- $\tau \sqcap \sigma$ — what is the greatest lower bound? (meet)
- $\tau \sqcup \sigma$ — what is the least upper bound? (join)
- $\tau \setminus \sigma$ — what remains of $\tau$ after removing $\sigma$? (subtract)
- $\mathit{narrow}(\tau, \pi)$ — given an assertion $\pi$ that holds on top of $\tau$, what is the refined type? (narrow)
It also answers, given a type, simpler structural questions: is this guaranteed truthy? does this contain null? is this a single literal value the analyzer can constant-fold? does this contain a free template parameter anywhere in its tree? — and many more. These are the predicates.
And it offers the build-side primitives: making a new type from elements, walking a type and rebuilding it under a transformation, applying a generic substitution, expanding an alias.
What suffete does not do
It does not parse PHP. There is no lexer, no parser, no docblock interpreter. The analyzer brings types in from somewhere — a parser, a serialised cache, a hand-built fixture — and constructs TypeIds through suffete's builders and prelude.
It does not run a control-flow analysis. The lattice operations are pure functions of their inputs; they do not know which line of code they came from, what branch they are on, or which assertions led to them being asked. The analyzer asks; suffete answers.
It does not produce diagnostics. When refines returns false, the analyzer chooses what message to display, where to point the user, and at what severity. Suffete returns a boolean and a LatticeReport carrying structured side information; the message-writing is the analyzer's job.
It does not maintain a codebase model. Class hierarchies, declared properties, declared methods, template parameter bounds — all of those live in the analyzer's data structures and are queried through World on demand.
What this buys you
Three things, in roughly decreasing order of value to a downstream consumer:
Single source of truth for hard semantics. Subtyping in PHP is not simple. The interaction of generics and intersections, of nullability and the truthiness axes on mixed, of literal collapse with refined ranges, of object shape with HasMethod — these are conditions that every analyzer has to get right and few of them do. If suffete is correct, every analyzer that depends on suffete is correct on those conditions, automatically.
A property-test battery, run for everyone. Suffete is verified against an algebraic-law battery: idempotence, commutativity, associativity, identity, absorption, the GLB and LUB bounds, the soundness interlock between the operations. Every PR runs that battery. When suffete says $\tau \sqcap \sigma$ is some value, you can rely on it being a lower bound — that is checked on thousands of cases per CI run.
Performance you don't have to think about. The interner, the SIMD scans, the canonical-form fast paths, the singleton caches — they are tuned, and they run on every analyzer that depends on suffete, without the analyzer needing to know how. You write refines(t1, t2, world, opts, &mut report); it returns in nanoseconds for typical inputs.
How it relates to the rest of Carthage
Suffete is being designed in isolation so that the type-system contract can be specified, exercised, and benchmarked without an analyzer in the loop. Once it is stable, it is intended to replace the type-system core inside Mago, the Carthage Software PHP toolchain.
Mago is one consumer. Anyone else writing a PHP analyzer in Rust is invited to be another. Suffete will not change its API on Mago's behalf; if it works for Mago, it works for everyone.
See also: Why a separate type system — the rationale for not building this inside an analyzer in the first place.
Why a separate type system
The PHP type system is a serious piece of work. It is not as small as it looks from the docs. There are at least a dozen design decisions a real-world analyzer has to make about how types interact, and most of them are the difference between a useful tool and a frustrating one.
This chapter is the case for solving those problems exactly once.
The problem with bundling
Most static analyzers grow a type system organically. The analyzer is built first, the type system is whatever the analyzer needed at the moment, and the seams between the two stop existing. Three things happen.
Specification rots. The type system never becomes a thing that can be written down; its rules live in the analyzer's source as comments-near-callsites and inferred-from-tests. New contributors patch existing rules without seeing the whole; the rules drift; nobody has a complete picture; and the only way to find out what the rules are is to read every callsite.
Cross-tool divergence. If you and your colleagues use a different analyzer for different reasons — one for IDE feedback, one for CI, one for refactoring — the answers you get to the same question differ. Each analyzer disagrees with the others on at least some of: literal-int range merging, the truthiness collapse on mixed, what happens to HasMethod under intersection with a class that lacks it, whether int refines float, whether a sealed shape with all-optional keys equals an empty shape, what variance Traversable<K, V> has on each parameter. In an ideal world all three tools agree. In practice they don't, and there is no shared spec to negotiate against.
Performance is invisible. The type system is sprinkled throughout the analyzer. There is no single hot loop to optimise. The cost of a subtype_of call is one HashMap allocation here, one clone() there, one quadratic loop somewhere else, none of which look bad locally. The only people who can move the needle are the ones with the entire architecture in their head, and they are usually busy.
The bet
The bet suffete makes is that all three of those problems get easier if the type system is outside the analyzer.
Specification becomes the artifact. You can run property tests on it. You can write a book about it. You can argue about whether a particular law should hold without simultaneously arguing about which line of which analyzer file should change.
Convergence becomes possible. Two analyzers that share the same type-system crate cannot disagree on the type-system answers. They can still disagree on flow analysis, on diagnostic policy, on what the user actually meant — those are analyzer concerns. But "is int<0,10> a subtype of int?" has one answer in Rust, and every consumer gets it.
Performance becomes a single target. The hot loops are visible. The benchmarks are runnable. The optimisations live in one place and benefit everyone.
What this costs
There are three real costs.
An API surface. Bundled type systems have no API; they have callsites. A standalone type system has a public Rust API that is permanent in the sense that breaking it breaks every consumer at once. Suffete acknowledges this: the API is currently unstable, broken often, and will stay that way until the design has settled. Once stable, breaking it requires a major version.
A World indirection. The type system needs to ask the analyzer about the codebase: class hierarchies, declared methods, template parameter bounds. Bundled type systems can reach directly into the analyzer's HashMaps. Suffete cannot; it goes through the World trait. The analyzer pays for one virtual dispatch per query. Suffete amortises this with caching where it can and goes for free wherever the question can be answered without the world.
Vocabulary alignment. The analyzer and the type-system crate have to agree on what a type is. If the analyzer wants to express a type the crate cannot represent, the analyzer is stuck. The mitigation is the completeness mandate: any PHP type that a real-world analyzer expresses — Mago, PHPStan, Psalm, Hack — must be representable in suffete. If you find one that isn't, that is a bug.
Why now, why Rust
Mago is a Rust toolchain. Doing the same work in two languages would be untenable; doing it in not-Rust would lose Mago's performance baseline. The same reasoning that pulled mago into Rust pulls suffete with it.
Beyond Mago, the long-term story is that other Rust-based PHP tools — IDE servers, refactoring tools, test runners that need to reason about types — should not have to re-implement this. There aren't many of them today. Suffete bets there will be more, and that they would all rather build on top of a settled spec than write their own.
See also: What suffete is — the surface area suffete deliberately covers (and what it deliberately doesn't).
Quick tour
This chapter walks through suffete by example. It is not a tutorial — the assumed audience already knows what a type system does — but a 10-minute look at the API surface so the rest of the book has a concrete shape to point at.
We will build types, ask the lattice questions about them, narrow under an assertion, and substitute through a generic. Every snippet in this chapter compiles against the suffete crate at HEAD.
Setting up
use suffete::{
TypeBuilder, TypeId,
prelude::{INT, STRING, NULL, FALSE, TYPE_MIXED, TYPE_NEVER},
lattice::{self, LatticeOptions, LatticeReport},
world::NullWorld,
};
TypeBuilder is the mutable builder for a type. prelude::* exposes the well-known constants — INT is the ElementId for the unconstrained integer; STRING is the unconstrained string; TYPE_MIXED is the TypeId of the universal top; TYPE_NEVER is the TypeId of the empty bottom. NullWorld is the trivial World that knows nothing about classes ; useful for examples that don't touch the codebase.
Constructing a union
The PHP type int|string is a two-element union:
let int_or_string: TypeId = TypeBuilder::new()
.push(INT)
.push(STRING)
.build();
TypeBuilder::build interns the result, sorting and deduplicating the elements. Call it twice with the same elements (in any order) and you get the same TypeId back — handle equality is content equality.
let a = TypeBuilder::new().push(INT).push(STRING).build();
let b = TypeBuilder::new().push(STRING).push(INT).build();
assert_eq!(a, b);
Asking the lattice
Is int|string a subtype of int|string|null? Yes:
let nullable = TypeBuilder::new().push(INT).push(STRING).push(NULL).build();
let world = NullWorld;
let opts = LatticeOptions::default();
let mut report = LatticeReport::new();
assert!(lattice::refines(int_or_string, nullable, &world, opts, &mut report));
The reverse direction does not hold — null is not in int|string:
assert!(!lattice::refines(nullable, int_or_string, &world, opts, &mut report));
refines returns a boolean; the &mut LatticeReport collects structured side information about why a particular answer was reached (coercion edges, template defaults, etc.). For most questions you will ignore the report.
Overlap and disjointness
refines is one-directional. To ask "is there any value in both types?" use overlaps:
let int_only = TypeBuilder::new().push(INT).build();
let string_only = TypeBuilder::new().push(STRING).build();
assert!(!lattice::overlaps(int_only, string_only, &world, opts, &mut report));
assert!(lattice::overlaps(int_or_string, int_only, &world, opts, &mut report));
Meet, join, subtract
The three combinators return TypeId:
// meet (greatest lower bound, ⊓): the values both types share.
let common = lattice::meet(int_or_string, nullable, &world, opts, &mut report);
// common == int|string
// join (least upper bound, ⊔): the smallest type containing both.
let either = lattice::join(int_only, string_only, &world, opts, &mut report);
// either == int|string
// subtract: what remains after removing the second from the first.
let only_string = lattice::subtract(int_or_string, int_only, &world, opts, &mut report);
// only_string == string
Narrowing under an assertion
Suppose you have a value of type int|string|null and the analyzer has just observed an assertion that excludes null. The result is int|string:
let after_null_check = lattice::narrow(
nullable, // input type
int_or_string, // assertion: the value is one of these
&world, opts, &mut report,
);
// after_null_check == int|string
Narrowing is the operation that consumes the assertions an analyzer extracts from if, instanceof, comparisons against constants, and so on.
Predicates
For top-level structural questions — "is every element in this type guaranteed truthy?", "does this type contain any object element?", "is this type a single literal that can be constant-folded?" — you call into the predicates module:
use suffete::predicates::{is_truthy, contains_null, is_constant_foldable};
assert!(!contains_null(int_or_string));
assert!(contains_null(nullable));
let one = TypeBuilder::new().push(suffete::ElementId::int_literal(1)).build();
assert!(is_truthy(one));
assert!(is_constant_foldable(one));
These do not need a World; they ask only about the structure of the type as you've handed it to them.
A generic, briefly
Generics get their own Part IV. One snippet to show the shape:
use suffete::{ElementId, template::substitute};
use suffete::element::payload::GenericParameterInfo;
// Suppose `T` is bound on class `Box<T>`. Substitute `int` for it everywhere
// in `T|null`.
let t = ElementId::generic_parameter("T", "Box", /* upper bound: */ TYPE_MIXED);
let t_or_null = TypeBuilder::new().push(t).push(NULL).build();
let int_t = TypeBuilder::new().push(INT).build();
let result = substitute(t_or_null, &|info: &GenericParameterInfo| {
if info.name.as_str() == "T" { Some(int_t) } else { None }
});
// result == int|null
The substitution is capture-free. Recursion into nested types (object type-args, callable parameters, conditional then/else, etc.) is handled by the walker; you supply only the leaf decision.
Where to look next
If you want the data model first: Part II — The Type Universe starts with the element kinds and works through every payload variant.
If you want the operations first: Part III — The Lattice covers refines, overlaps, meet, join, subtract, narrow in order.
If you want to start writing code: Part V — Public API is the by-module API reference, and Part VI — Cookbook shows the API composed into common analyzer recipes.
See also: What suffete is and Glossary and notation.
Glossary and notation
The book uses a small but consistent set of mathematical and PL-theory terms. They are defined here once. Subsequent chapters use them without re-defining.
Notation
| Symbol | Meaning |
|---|---|
| $\tau, \sigma, \rho$ | Range over types. |
| $T, U, V$ | Range over template parameters. |
| $\Gamma$ | The program environment: the function from class, interface, trait, enum, function, and constant names to their declarations. In the API, this is the World trait. |
| $\Theta$ | A template environment: a partial function from template parameters (qualified by their defining entity) to types. |
| $\Delta$ | A defining entity: a class-like, function, method, or closure that introduces template parameters. |
| $\tau \mathrel{<:} \sigma$ | $\tau$ refines $\sigma$: every value of type $\tau$ is also a value of type $\sigma$. |
| $\tau \equiv \sigma$ | $\tau \mathrel{<:} \sigma$ and $\sigma \mathrel{<:} \tau$. |
| $\tau \sqcup \sigma$ | Join: the least upper bound (union, with absorption rules applied). |
| $\tau \sqcap \sigma$ | Meet: the greatest lower bound (intersection, with literal/range collapse). |
| $\tau \setminus \sigma$ | Set difference: values in $\tau$ that are not in $\sigma$. |
| $\tau \mathrel{\#} \sigma$ | $\tau$ and $\sigma$ are disjoint: $\tau \sqcap \sigma \equiv \bot$. |
| $\tau \Rightarrow \sigma$ | A coercion edge: an admissible non-subtype edge in non-strict positions. PHP's runtime, on the boundary, allows e.g. int to flow into a float parameter. The coercion edge is recorded on the LatticeReport. |
| $\bot$ | Bottom: the empty type, no values. PHP's never. |
| $\top$ | Top: the universal type, all values. PHP's mixed. |
| $\Gamma \vdash D \prec C$ | $\Gamma$ records that class-like $D$ extends, implements, or uses-as-trait $C$ (transitively). |
| $\sigma[T \mapsto \rho]$ | Capture-free substitution of $T$ by $\rho$ in $\sigma$. |
| $\sigma\Theta$ | Simultaneous substitution of $\sigma$ under the environment $\Theta$. |
| $\Theta_1 \circ \Theta_2$ | Composition of template environments. |
| $\mathit{tparam}_C$ | The indexed list of template parameters of class $C$, in declaration order. |
| $\mathit{ext}_{D \to C}$ | The extension binding: the type arguments $D$ supplies to $C$ along the inheritance chain. |
| $\mathit{specialise}(C, T, D\langle\bar\rho\rangle)$ | The type of $C$'s template parameter $T$ in the context of an instantiated descendant $D\langle\bar\rho\rangle$. |
| $\mathit{expand}(\tau)$ | Resolution of non-structural forms in $\tau$ (aliases, references, derived types, conditionals). |
| $\mathit{standin}(\sigma, \rho, \Theta)$ | An inference round: walk parameter $\sigma$ against argument $\rho$, accumulating bounds into $\Theta$. |
| $\mathit{infer}(\sigma, \Theta)$ | Inferred-replacement of templates in $\sigma$ using a fully-determined $\Theta$. |
| $\mathit{narrow}(\tau, \pi)$ | Apply the assertion $\pi$ to refine the type $\tau$. |
Vocabulary
The terms below are used throughout the book in their PL-theory sense. Where a term has a more colloquial meaning that might confuse, the technical sense is noted.
Atom. The PL-theory name for the indivisible piece a type is built from. Suffete calls these Elements. The two terms are interchangeable; this book prefers "Element" everywhere except in references to literature.
Element. The indivisible unit a type is built from: a single integer literal, the unconstrained string, the named class Foo, the negation of null. See Elements.
Type. A set of values, expressed as a finite union of Elements. A singleton Element is a one-element type.
Element kind. The family or shape of an Element — int, string, named-class object, list, negation, intersection, conditional, and so on.
Refines / refinement / subtype. "Refines" is the verb form of "is a subtype of". $\tau$ refines $\sigma$ ($\tau \mathrel{<:} \sigma$) iff every value of $\tau$ is also a value of $\sigma$. "Refinement" is also used for the refinement axes that can be carried on certain element kinds (the truthiness, non-empty, non-null, isset-from-loop axes on mixed; the casing/numeric/non-empty axes on string); context disambiguates.
Overlap / disjoint. Two types overlap if their meet is inhabited (contains at least one value). They are disjoint if their meet is empty (uninhabited).
Inhabited / uninhabited. A type is inhabited if it contains at least one value. $\bot$ is the canonical uninhabited type, but some sealed shapes (e.g. an empty sealed object shape with required properties) and some Intersected wrappers (e.g. Foo & !Foo) are also uninhabited despite not being syntactically $\bot$.
Meet ($\sqcap$). The greatest lower bound of two types. Operationally: the largest type that refines both. PHP's intersection.
Join ($\sqcup$). The least upper bound of two types. Operationally: the smallest type that both refine into. PHP's union, with literal collapse and range merging applied.
Subtract ($\setminus$). Set-theoretic difference: $\tau \setminus \sigma$ is the values in $\tau$ that are not in $\sigma$. Used by the analyzer to eliminate a positive case (e.g. after !is_int($x)).
Narrow. The composite operation that combines an input type with an assertion type. Often equivalent to meet (positive assertions) or subtract (negative assertions) but with extra rules for cross-cutting concerns like the truthiness axes. See narrow.
Lattice. The set of types ordered by $\mathrel{<:}$, equipped with $\sqcap$ and $\sqcup$ as meet and join, with $\bot$ at the bottom and $\top$ at the top. The PHP type lattice is not a complete lattice in the strict mathematical sense — there are infinite ascending chains in literal-int ranges, for example — but the operations behave as a lattice does on the cases that matter.
Variance. The relationship between a generic parameter and the substitution direction. Covariant: a subtype of T in a subtype of Box<T>. Contravariant: a supertype of T. Invariant: only T itself. See variance.
Substitution. Replacing a template parameter with a concrete type, capture-free. Written $\sigma[T \mapsto \rho]$. See substitute.
Inference (standin). A round of bound-collection: walking a parameter type against an argument type and recording, for each free template, the bounds the argument imposes. Repeated inference rounds against multiple call-site arguments accumulate into a template environment $\Theta$. See standin.
Inferred-replacement. Applying a fully-determined template environment to a type to produce the inferred concrete result. Sometimes called "infer". See standin.
Inheritance specialisation. Computing what a descendant supplies to an ancestor's template parameter, given the descendant's instantiation. See specialise.
Expansion. Resolving the non-structural forms — aliases, references, member references, global references, conditionals, derived types — into structural ones. The lattice operations work on already-expanded inputs in most cases; expansion is the analyzer's responsibility. See expand.
Wrapper. A composite Element that contains other Elements: a negation (set complement) and an intersection (head plus a list of conjuncts). Wrappers exist in the Element universe; the union itself is in the Type universe. See wrappers.
Refinement axis. A boolean or low-cardinality property carried on top of an Element. The narrowed-mixed kind carries non-null, truthiness, empty, and isset-from-loop axes. The string kind carries casing, numeric-shape, non-empty axes. Axes never multiply the element count — they are bits on the same Element. See refinement axes.
Coercion edge. A non-subtype, non-overlap relationship that PHP's runtime allows on the parameter boundary. The classic example is int flowing into a float parameter. The lattice records the use of a coercion edge so the analyzer can warn about it.
Sealed. A keyed-array, list, or object shape is sealed when it asserts no entries beyond those it lists. Unsealed shapes admit additional entries.
Defining entity. The class-like, function, method, or closure that introduces a template parameter. Every template parameter is keyed by (name, defining_entity) so substitutions know which parameters they apply to.
World. The analyzer's codebase model, exposed to suffete as the source of class hierarchies, declared methods, declared properties, template parameter bounds. Suffete itself stores none of this.
See also: Quick tour for the notation in action, and Element kinds for the exhaustive list of element-kind names referenced throughout the book.
Elements: the indivisible types
Every type in suffete is a finite union of Elements. An Element is the smallest unit a type is built from — a single integer, the unconstrained string, the named class Foo, the negation of null. A type with one Element is a singleton; a type with several is a union of those.
The PL-theory term for an Element is atom. The two terms are interchangeable; the book uses "Element" to match the code. (See the glossary and the introduction note on naming.)
The split
A type lives at two levels:
- Element — one indivisible piece.
intis an Element.nullis an Element.list<int>is an Element.Foo & Baris an Element (an intersection wrapping a head and a conjunct). - Type — a union of Elements, plus a small bag of flow-flag metadata.
int|stringis a type with two Elements.list<int>is a type with one Element.
Singletons are still types: int (the type) is a one-element type containing the int Element. The Type/Element distinction is what lets the union machinery be straightforwardly set-of-elements without every operation having to handle "what if this is one element" specially.
graph TD
T["Type<br/>(union of Elements)"]
T --> E1["Element #1"]
T --> E2["Element #2"]
T --> E3["Element #3<br/>(may itself wrap further<br/>Elements via negation/intersection)"]
Element kinds, by family
The universe groups into families. The next chapters of Part II cover each family in detail; this section is a one-screen index.
Scalars
The PHP scalar primitives, plus their unions and refinements.
int,float,string,bool,true,falsenumeric(the disjoint unionint | float | numeric-string)scalar(the disjoint unionbool | int | float | string)array-key(the disjoint unionint | string)class-string,interface-string,enum-string,trait-string
See scalars and class-like strings.
Special elements
The well-known landmarks that are not values themselves but anchors in the lattice.
null(the valuenull)void(the absence of a return; falsy and null-equivalent at value sites — avoidfunction returnsnullto its caller, sovoidcollapses tonullin any non-degenerate union)never(the empty type, $\bot$ — a function that never returns has return typenever)mixed(the unconstrained universe top, $\top$, plus its narrowed variants carrying truthiness/non-null/non-empty axes)placeholder(an inference hole)
See special elements.
Object family
The forms that describe an object reference at runtime.
- Named classes (
Foo,Foo<int>, with optionalstatic/$thismodality) - The unconstrained
object - Enums (whole
Status, or a single caseStatus::Active) - Structural shapes (
object{name: string, age: int}) has-method<m>andhas-property<p>predicates
See objects, enums, and structural object types.
Array / list family
PHP's two array views, in their generic and shape forms.
- Keyed arrays:
array<K, V>, sealedarray{a: int, ...}, the emptyarray{} - Lists:
list<T>,non-empty-list<T>, sealedlist{0: int, ...}
See arrays and lists.
Iterable / callable
iterable<K, V>callable, plus signatures and closures
Resources
resource,open-resource,closed-resource, plus optional named kinds (resource<curl>)
See resources.
Wrappers
The two combinators that contain other Elements.
- Negation: $\neg \tau$
- Intersection: $H \sqcap C_1 \sqcap \dots \sqcap C_n$
See wrappers.
Unresolved elements
Forms that name a future resolution rather than carrying its result. Lattice operations on these route through expansion.
- Aliases (a type alias, name-only)
- References (
Treferenced by name, before binding) - Member references (
Foo::T) - Global references
- Conditional types (
T extends U ? X : Y) - Derived types (
key-of<T>,value-of<T>,T[K], ...) - Inference variables (analyser-introduced placeholders)
See unresolved elements.
Generics
- Free template parameters (
Tinside a generic class declaration, with its constraint, defining entity, and qualifier)
Closure
The Element universe is closed under the lattice operations in the following sense: every well-formed application of meet, join, subtract, or narrow to two types produces another well-formed type. The result may be $\bot$ (never) or $\top$ (mixed), or it may be a wrapper (negation / intersection) — but it is always a type that can itself be the input to further operations.
The unresolved kinds break this closure on the lattice side: the lattice is not directly defined on aliases, references, conditionals, etc. The contract is that those are expanded out before the lattice is asked anything important. See expand for the operation, and unresolved elements for the kinds.
A worked example
The PHP type:
non-empty-list<int>|null
is a type with two Elements: the non-empty-list<int> element and the null element. The order in which they were constructed is irrelevant; the universe canonicalises unions so that any two paths to the same value-set produce the same type.
See also: Element kinds for the exhaustive list with one-line descriptions; Constructing types for the API used to build them.
Scalars
The scalar family is PHP's primitive value space: integers, floats, strings, booleans, plus the cross-cutting unions numeric, scalar, and array-key. Every scalar admits refinement: a literal value, a bounded range, a string with declared casing or non-empty flag, a boolean specialised to true or false.
Integers
int is the family for some integer. Four refinement levels:
| PHP-side | Denotes |
|---|---|
int | Any integer. |
literal-int | Some integer literal (the value is unknown, but the analyser knows it is a literal, not a computed result). |
int(7), 7 | The single integer 7. |
int<0, 100>, int<-∞, 0>, int<0, ∞> | A closed integer range (bounds may be open on either side). |
Subtyping among int variants
graph BT
L7["int(7)"] --> R0_10["int<0, 10>"]
L7 --> UL["literal-int"]
R0_10 --> Pos["int<0, ∞>"]
R0_10 --> U["int"]
Pos --> U
UL --> U
int(n)refinesint<lo, hi>iff $n \in [lo, hi]$.int<lo', hi'>refinesint<lo, hi>iff $[lo', hi'] \subseteq [lo, hi]$.- Every
intvariant refines plainint. int(n)refinesliteral-int;literal-intrefinesint.
Range arithmetic
The lattice operations on ranges are interval arithmetic with optional infinite bounds. Meet of two ranges is their intersection (becomes never if disjoint); join is the smallest range containing both. Ranges of one literal collapse to a literal; ranges that grow past a threshold widen to int.
Floats
Three refinement levels:
| PHP-side | Denotes |
|---|---|
float | Any float. |
float(3.14), 3.14 | The single float literal. 0.0 and -0.0 are distinct under PHP's identity comparison. |
| (analyser-internal) | A float that is provably non-zero. Used by the truthiness machinery rather than appearing in user-written types. |
There is no PHP-side range form for floats — the precision questions involved (denormals, NaN, signed zero) make ranges more trouble than they are worth.
Strings
The richest scalar. A string carries a literal slot plus a small bag of refinement flags:
| Flag | PHP-side | Meaning |
|---|---|---|
non-empty | non-empty-string | The string is at least one character. |
truthy | truthy-string | Truthy at runtime (excludes "" and "0"). |
lowercase | lowercase-string | Every character is lowercase. |
uppercase | uppercase-string | Every character is uppercase. |
numeric | numeric-string | Parses as an integer or float. |
These are axes — combinations are valid: non-empty-lowercase-string carries both bits.
Subtyping among string variants
graph BT
L["literal 'foo'"] --> NEL["non-empty-lowercase-string"]
NEL --> NES["non-empty-string"]
NEL --> LS["lowercase-string"]
NES --> S["string"]
LS --> S
- A literal refines an unspecified string with flags iff the literal value satisfies every flag (e.g. literal
"foo"refinesnon-empty-lowercase-string). - Unspecified-with-flags $F'$ refines unspecified-with-flags $F$ iff $F \subseteq F'$ (more flags = stricter).
truthyimpliesnon-empty(any non-""non-"0"string is both).
Numeric strings
numeric-string is the dominator that takes a string into the same family as int and float. It has two refining ancestors: string (every numeric-string is a string) and numeric (every numeric-string is numeric). The lattice handles both directions.
Booleans
Three landmarks:
| PHP-side | Denotes |
|---|---|
true | The value true. |
false | The value false. |
bool | Either of true or false. |
Subtyping:
truerefinesbool.falserefinesbool.trueandfalseare disjoint.
The lattice canonicalises true | false to bool (join), and decomposes bool into true | false when subtracting (e.g. bool \ true = false).
The cross-cutting unions
Three forms are true unions — disjoint covers over multiple primitive families, treated as a single name for compactness, and decomposed by the lattice when a question requires it.
scalar
$$\mathit{scalar} = \mathit{bool} \cup \mathit{int} \cup \mathit{float} \cup \mathit{string}$$
Every bool, int, float, or string refines scalar.
numeric
$$\mathit{numeric} = \mathit{int} \cup \mathit{float} \cup \mathit{numeric{-}string}$$
Same dominator behaviour as scalar, but over the smaller cover. numeric is not a subset of scalar directly — they overlap on int and float but numeric-string is a refinement of string rather than a direct member of scalar.
array-key
$$\mathit{array{-}key} = \mathit{int} \cup \mathit{string}$$
Used in PHP's array keys and as the bound on most generic key parameters. Subsumes every int and string variant.
A worked subtype query
function f(int|string $x): bool { /* ... */ }
f(7); // OK: int(7) <: int|string
f("hello"); // OK: literal "hello" <: int|string
f(3.14); // FAIL: float is disjoint from int|string
See also: Refinement axes for the detail on string axes and the analogous mixed axes; class-like strings for the strings that name classes; refines for the subtype rules.
Special elements
A handful of Element kinds are not values in the ordinary sense; they are landmarks. They mark the bottom and top of the lattice, the absence of a value, the presence of a value that has not yet been resolved, and the universe top with optional axis refinements layered on top. They have outsized impact on the lattice rules, so this chapter treats them in one place.
never ($\bot$)
The empty type. No values inhabit never. PHP-side: the return type of a function that never returns (always throws or exits, an infinite loop).
- The bottom of the lattice. Refines every type.
- Disjoint from every type including itself — there is no value that inhabits
never, so $\mathit{never} \sqcap \mathit{never} = \bot$ and $\mathit{overlaps}(\mathit{never}, \mathit{never}) = \mathit{false}$. - $\mathit{never} \sqcup \tau = \tau$ for every $\tau$. The join unit.
- $\mathit{never} \sqcap \tau = \mathit{never}$ for every $\tau$. The meet annihilator.
mixed ($\top$)
The universal top. Some value, no constraints. PHP-side: mixed. It is the type every value-bearing type refines.
mixed carries optional refinement axes that narrow it without changing its kind:
is_non_null— the value is notnull. PHP-side:non-null mixed.truthiness—Truthy,Falsy, or undetermined. PHP-side:truthy mixed/falsy mixed.is_empty— analyser-internal marker for "the value isfalsy".is_isset_from_loop— analyser-internal marker for "this value flowed through a loop body".
A narrowed mixed (one or more axes set) sits between vanilla mixed and the kinds that imply the axis. non-null mixed sits above every non-null-bearing kind.
graph BT
Int[int] --> NN["non-null mixed"]
Str[string] --> NN
True_[true] --> Truthy["truthy mixed"]
Truthy --> NN
NN --> Mixed[mixed]
Null --> Mixed
Void --> Mixed
The short version of the rules: a type refines mixed-with-axes iff every axis the container constrains is implied by the input. int refines non-null mixed because int is structurally non-null. string refines truthy mixed only when the string is also known truthy.
null
The single value null.
- Falsy. Disjoint from every kind that asserts non-null (most kinds —
int,string,Foo, etc. — andnon-null mixed). - $\mathit{null} \sqcup \tau$ produces a nullable type when $\tau$ does not already admit
null. - The lattice's
narrowmachinery recognisesis_null($x)/!is_null($x)assertions specifically.
void
The absence of a return value. PHP-side: a function declared with return type void. The runtime returns the value null from such a function, so at every value-flow site void is observationally null ; the distinction is purely a declarative constraint on the function body (a : void return forbids return $expr).
- Falsy. Does not refine
non-null mixed(because the runtime value isnull). - Preserved alone: a singleton union containing only
voidround-trips asvoid, so a: voidreturn-type annotation survives interning. - Canonicalised to
nullin any non-degenerate union:void | T(for any value-bearingT) becomesT | null;void | nullbecomesnull;void | neverbecomesvoid(theneveris dropped first, leavingvoidalone). This matches PHP runtime semantics: callers of avoidfunction seenull, never a phantomvoidvalue.
In the original mago bug that prompted this rule, a match arm produced true | void and the analyser collapsed it to true, dropping the null branch entirely. The correct answer is true | null, which is what the lattice now produces.
placeholder
The inference-time hole. Marks a position where the analyser knows a type will exist but has not yet committed to one. Conceptually _ in TypeScript or ? in some inference contexts.
- The lattice treats
placeholderas $\top$ for the purposes ofrefinesandoverlaps; everything refinesplaceholderandplaceholderrefines everything. This keeps a partially-inferred type from blocking the analysis until inference completes. - Should not appear in a finalised type. If you see it after expansion, the analyser failed to commit.
Summary
| Element | Inhabited? | Position in lattice |
|---|---|---|
never | no | $\bot$ |
mixed (vanilla) | yes (every value) | $\top$ |
mixed (narrowed) | yes | between $\top$ and the kinds that imply the axis |
null | yes (one value) | strict subtype of mixed |
void | yes (the runtime value null) | strict subtype of mixed; preserved alone, canonicalised to null in any union of length > 1 |
placeholder | by convention, treated as $\top$ | inference-only |
Why mixed carries axes instead of being wrapped
A natural alternative would be to express narrowed mixed as mixed & non-null ; an Intersected of mixed and a hypothetical NonNull element. Suffete instead puts the axes directly on mixed for two reasons:
- No rule duplication. With axes attached to the type, the
mixed-family rules are local. With a wrapper, every other type would need to know how to interact with that wrapper. - Compactness.
non-null truthy mixedis one type carrying two bits, not a chain of three nested wrappings.
The trade-off is that mixed has special rules in the lattice. Those rules are tested against the algebraic-law battery so the elaboration does not introduce soundness drift.
See also: Refinement axes for the full semantics of the mixed and string axes; refines for the subtype rules; glossary for the relation $\bot$ to
neverand $\top$ tomixed.
Objects, enums, and structural object types
PHP objects can be referenced six ways. Five describe a runtime object; one (has-method/has-property) describes only a structural property the runtime object must have.
| PHP-side | Denotes |
|---|---|
Foo, Foo<int>, Foo & Bar | A named class, with optional template arguments. |
object | Any object, no class commitment. |
Status, Status::Active | A backed or pure enum, possibly narrowed to a single case. |
object{name: string, age: int} | A structural object: any object that has these properties. |
has-method<save> | Anything that declares method save. |
has-property<id> | Anything that declares property id. |
This chapter describes what each means. The lattice rules (subtyping, meet, etc.) live in refines and the operation chapters.
Named classes (Foo)
A reference to a specific class, optionally instantiated with template arguments and/or marked with a late-static modality (static, $this).
Type arguments
When a class is generic (@template T), the user supplies the arguments at the use site. If the class declares more parameters than the user supplies, the missing ones are filled with their declared upper bounds and marked as default-filled so downstream variance checks know to skip them. If the user supplies more than the class declares, the surplus is dropped (truncated to the declared arity). This collapses Box<int, string> and Box<int> to the same form when Box declares one parameter.
Modality flags
Three flags govern the late-static modality:
is_this—$this(the receiver). Strongest.is_static— the late-static class. Weaker thanis_this.remapped_parameters— analyser-internal marker for an inheritance-binding remap.
is_this implies is_static. The modality check at refinement time:
- A container marked
is_thisaccepts onlyis_thisinputs. - A container marked
is_staticacceptsis_staticoris_thisinputs. - A plain
Fooaccepts any of the above.
Intersections
Foo & Bar is expressed via the Intersected wrapper, with Foo as head and Bar as a conjunct.
object (the unconstrained form)
The "some object, class not committed". Used when the analyser knows the value is an object but has not pinned down a class. Refines mixed; refined by every named class, enum, structural shape, and has-* form.
Enums
PHP enums can be backed (enum Status: string { case Active = 'active'; ... }) or pure (enum Color { case Red; ... }). The type system models them with two forms:
Status— the whole enum (any case).Status::Active— the single case.
A vanilla Status is the join of all its cases. A specific case is a strict subtype of the enum. Two cases of the same enum are disjoint.
An enum case behaves structurally like an object shape with a name property (always present) and a value property (present only on backed enums). The lattice synthesises that shape on demand, so Status::Active refines object{name: string} automatically.
Structural object types (object{...})
A structural object type is a list of declared properties (each with a name, a type, and an optional flag) plus a sealed bit.
Sealed vs unsealed
- An unsealed shape
object{name: string, ...}is satisfied by any object that has at least the listed properties (with refining types). Extra properties are allowed. - A sealed shape
object{name: string}is satisfied by an object that has exactly those properties.
The sealed-vs-sealed rule mirrors the keyed-array rule (see arrays): every required key in the container must be present in the input shape with a refining value, and the input cannot introduce required keys the container does not list.
Optional properties
A property can be marked optional (name?: string). Required is a strict subtype of optional ; the input is more committal.
Intersections
object{x: int} & has-method<draw> is the Intersected wrapper with the shape as head and has-method<draw> as conjunct.
has-method<m>
PHP-side: has-method<m>. Denotes "anything that declares a method called m". No commitment to a specific class.
A refines has-method<m> iff one of:
Aishas-method<m>itself (the same method name).Ais a named classCand the analyser confirmsCdeclares (or inherits)m.Ais an enum and the enum declaresm.
The lattice asks the analyser's codebase model. If it does not know, the conservative answer is "no" ; better to over-reject than to under-reject.
Multiple has-method constraints chain through the Intersected wrapper: has-method<save> & has-method<delete>.
has-property<p>
Symmetric to has-method, with p as the property name. The lattice handles the value property specially for backed enums (see above). Beyond that, the rules are the same.
How the kinds compose
graph BT
HM["has-method<save>"] --> O[object]
HP["has-property<id>"] --> O
Shape["object{x: int}"] --> O
Foo["Foo"] --> O
StatusActive["Status::Active"] --> Status["Status (enum)"]
Status --> O
O --> Mixed[mixed]
Every member of the object family refines object; object refines mixed.
Within the family:
Foorefinesobject{x: int}if the analyser records thatFoohas a propertyxwhose declared type refinesint.Foorefineshas-method<m>if the analyser records thatFoo(or an ancestor) declaresm.object{x: int, y: string}refinesobject{x: int}(sealed-vs-unsealed: extra key allowed).object{x: int}does not refineobject{x: int, y: string}; the input is missingy.
A worked example
function f(Stringable $x): string {
return $x->__toString();
}
class Foo {
public function __toString(): string { return 'foo'; }
}
f(new Foo()); // OK if Foo implements Stringable
The parameter is the named class Stringable. The argument is the named class Foo. The lattice asks the analyser's codebase model: does Foo descend from Stringable? If yes, the call type-checks.
A subtler example with a structural type:
/**
* @param object{handle: resource} $obj
*/
function process($obj): void { /* ... */ }
The parameter is a structural object{handle: resource}. Passing a Database instance succeeds iff the analyser confirms Database::handle exists with a type that refines resource.
See also: refines for the full subtype rules across the object family; wrappers for the
Intersectedrepresentation.
Arrays and lists
PHP has one runtime collection type — the array — but the type system distinguishes two views of it: keyed arrays (the general form, with key and value parameters) and lists (int-keyed, contiguous, with a single element type). Both views support generic and shape forms. Shapes can be sealed or unsealed.
| PHP-side | Denotes |
|---|---|
array<K, V>, array{a: int, b?: string, ...<K, V>}, array{} | Keyed array. |
list<T>, non-empty-list<T>, list{0: int, 1: string, ...<T>} | Int-keyed contiguous list. |
This chapter describes what each means. The lattice rules for subtyping, meet, join, and so on are in refines, meet, and join.
Keyed arrays
A keyed array can be in one of four states:
| State | PHP-side | Notes |
|---|---|---|
| Generic | array<K, V>, non-empty-array<K, V> | K and V are type parameters. Any entries allowed, all keys constrained by K, all values by V. |
| Unsealed shape | array{a: int, ...<K, V>} | Specific known entries plus a rest type for unknown entries. The bare ... is shorthand for ...<array-key, mixed>. |
| Sealed shape | array{a: int} | Specific entries; no extras. |
| Empty | array{} | The single empty array. |
The unsealed shape's rest parameters constrain the additional entries beyond the listed ones ; they say nothing about the listed entries' types. So array{a: int, ...<string, bool>} means: the entry a is an int, and any other entries have string keys and bool values. The bare ... form is the common case where the user does not want to constrain the extras and is exactly equivalent to ...<array-key, mixed>.
Known entries
A shape's entries each carry:
- A key (an integer literal like
0, a string literal like"name", or a class constant likeFoo::KEY). - A value type.
- An
optionalflag (PHP?:array{name?: string}).
The lattice's refines chapter has the full required-vs-optional rules.
Sealed vs unsealed
A sealed shape commits to having no entries beyond those listed. An unsealed shape allows additional entries, with their key and value types constrained by the rest parameters (...<K, V>, defaulting to ...<array-key, mixed>). The lattice consequences:
array{a: int, b: string}(sealed) refinesarray{a: int}(sealed) only if the input has exactly the keys the container does (extras are forbidden).array{a: int, b: string}(sealed) refinesarray{a: int, ...}(unsealed) by ignoring the extras.array{a: int, b: string}refinesarray{a: int, ...<string, bool>}only if every extra entry beyondahas astringkey and aboolvalue ; hereb: stringhas a string key but astringvalue, so it fails the rest constraint.array{}refines every keyed-array container that admits empty (i.e. is notnon-empty-array).
non-empty-array
The non-empty axis asserts that the array has at least one entry. A shape with at least one required entry is automatically non-empty; the explicit form exists for the generic case.
Lists
A list is structurally an int-keyed array with contiguous keys starting at 0. Suffete carries it as a separate family because PHP analysers commonly distinguish them and many operations are sharper in list-shape (the keys are positional, not arbitrary).
| State | PHP-side | Notes |
|---|---|---|
| Generic | list<T>, non-empty-list<T> | Single element type. Any number of entries allowed, all values constrained by T. |
| Unsealed shape | list{0: int, ...<T>} | Known prefix plus a rest element type. The bare list{0: int, ...} is shorthand for ...<mixed>. |
| Sealed shape | list{0: int, 1: string} | Fixed-shape list, no extras. |
| Empty | list{} | The single empty list (equivalent to array{} viewed as a list). |
Known elements
Indices start at 0; required entries must be contiguous from 0. Optional entries can be skipped only at the tail (PHP supports [int, int, ?string] but not [int, ?int, int]).
Cross-family relationships
list<int>refinesarray<int, int>; lists are int-keyed arrays.non-empty-list<int>refineslist<int>; non-empty is a strict refinement.list{0: int, 1: string}(sealed) refineslist<int|string>; sealed-vs-generic with element-type covering all known types.- An empty sealed list (
list{}) is uninhabited if combined withnon-empty.
How keyed arrays and lists relate
graph BT
SealList["list{0: int, 1: int}"] --> NonEmptyList["non-empty-list<int>"]
NonEmptyList --> List["list<int>"]
List --> KArray["array<int, int>"]
SealedKArray["array{0: int, 1: int}"] --> KArray
KArray --> Empty["array<mixed, mixed>"]
A list refines a keyed-array of compatible parameters when the container's key parameter accepts int and the value parameter accepts the list's element type.
A worked example
/**
* @param array{name: string, age: int} $u
*/
function f(array $u): void { /* ... */ }
f(['name' => 'Hannibal', 'age' => 24]); // OK: matches the shape
f(['name' => 'Hannibal']); // FAIL: missing required 'age'
f(['name' => 'Hannibal', 'age' => 24, 'rank' => 'general']); // FAIL: extra key, sealed
The parameter is a sealed keyed-array shape; the argument types are inferred (or user-annotated) sealed shapes. The sealed-vs-sealed rule fires and decides each case.
Intersections
array<K, V> & SomeConjunct and list<T> & SomeConjunct are expressed via the Intersected wrapper.
See also: refines for the per-pair subtype rules; iterables and callables for how arrays relate to
iterable<K, V>; meet for sealed-shape intersection.
Iterables and callables
Two PHP types that are not values themselves but commitments about what a value supports: iterable<K, V> says "this can be foreached over", and callable says "this can be invoked".
iterable<K, V>
PHP-side: iterable<K, V>. Denotes any value that can be iterated with foreach: an array, a Traversable, a generator, an IteratorAggregate, and so on.
iterable is its own family because array <: iterable and Traversable <: iterable both hold, but iterable does not commute with arbitrary classes; it must be tracked as its own thing.
Subtyping
iterable is the supertype of every array, list, and Traversable (or descendant). The variance is covariant on K and V: iterable<int, Foo> refines iterable<int, object>.
The lattice handles:
array<K, V>refinesiterable<K, V>; arrays are iterable, with the same key/value parameters.list<T>refinesiterable<int, T>; lists key by int.Traversable<K, V>(or any descendant) refinesiterable<K, V>; the analyser supplies the variance via the codebase model.
Note that iterable sits above Traversable<K, V> ; not all iterables are traversables (an array is iterable but not a Traversable).
Intersections
iterable<K, V> & Countable is expressed via the Intersected wrapper.
callable
PHP-side: callable, plus the more refined callable(int): string, Closure(int): string, and so on.
Three variants:
- Bare
callable— no signature commitment. Any callable: a function name string, aClosure, an[$obj, 'method']array, an__invoke-bearing object. callable(...): T— the open form with a declared signature. Includes any of the above whose effective signature refines the declared one.Closure(...): T— strictly theClosureclass instance with the declared signature.
Signatures
A signature has parameters, a return type, and a throws clause:
- Parameters: each one carries a type, plus
optional/variadic/by_referenceflags. The optionalnameis for diagnostic alignment; it does not affect typing. - Return type: covariant.
- Throws: declared exception types (PHPDoc).
Variance
The standard rules for callables:
- Return type is covariant. A callable returning
Foorefines a callable returningBariffFoorefinesBar. - Parameter types are contravariant. A callable taking
Barrefines a callable takingFooiffFoorefinesBar. - Optional / variadic / by-reference must align in the obvious way.
How iterable relates to array and list
A partial order:
graph BT
SealedList["list{0: int}"] --> List["list<int>"]
SealedArray["array{0: int}"] --> KArr["array<int, int>"]
List --> KArr
KArr --> Iter["iterable<int, int>"]
A list refines a keyed-array of compatible parameters; a keyed-array refines an iterable of compatible parameters.
See also: arrays and lists; refines for the variance rules.
Resources
PHP's resource is a runtime handle to an external resource: a file, a database connection, a stream, a curl handle. The type system models this as the resource family, with optional refinements for the resource's open-or-closed state and its named kind.
The two axes
A resource carries two refinement axes:
State
Three values:
- Unspecified — the analyser has no knowledge of openness. PHP-side:
resource. - Open — the resource is open. PHP-side:
open-resource. - Closed — the resource is closed. PHP-side:
closed-resource.
Open and Closed are disjoint. Unspecified is the join.
The state tracks the result of fclose($f) and similar calls in the analyser: after fclose($f), the analyser narrows $f from open-resource to closed-resource.
Kind
An optional named kind — resource<curl>, resource<gd image>, resource<stream>. The name corresponds to PHP's get_resource_type($r) return value.
A resource without a named kind admits any kind. With a name, it admits only that specific kind.
Subtyping
graph BT
OpenCurl["open-resource<curl>"] --> Open["open-resource"]
OpenCurl --> Curl["resource<curl>"]
Open --> R[resource]
Curl --> R
Closed["closed-resource"] --> R
OpenrefinesUnspecified;ClosedrefinesUnspecified; both refine the unrefined.- A named kind refines the unnamed;
open-resource<curl>(state =Open, kind =curl) refines bothopen-resourceandresource<curl>.
A worked example
/**
* @param open-resource $r
*/
function read($r): string { /* ... */ }
$f = fopen('/tmp/x', 'r'); // open-resource<stream>
read($f); // OK: open-resource<stream> <: open-resource
fclose($f); // narrows $f to closed-resource<stream>
read($f); // analyser warns: passed a closed resource
The lattice answers each refines query straightforwardly. The narrowing on fclose is the narrow operation with an analyser-supplied assertion that the resource is now Closed.
See also: refines for the per-pair rules; narrow for state-flip narrowing.
Class-like strings
A class-like string is a string value whose runtime content is the fully-qualified name of a class, interface, trait, or enum. PHP-side: class-string, class-string<Foo>, interface-string, enum-string, trait-string. The type system uses these to track the connection between a string-as-name and the class it refers to.
The two axes
Every class-like string has two axes:
Kind
Five values, one per family:
class— only class names (excluding interfaces, traits, enums).interface— only interface names.trait— only trait names.enum— only enum names.unspecified— any of the above.
unspecified is the join of the others. The lattice's refines chapter has the per-pair rules.
Specifier
How specific the class-name is:
- Unspecified — the
class-stringform, no commitment to a particular class. - Literal — the
class-string<Foo>form, where the FQN is known. - Of a type —
class-string<T>whereTis some bounded form (the analyser substitutes through this whenTis bound). - Generic — the
::classlookup on a generic parameter, before the parameter is substituted.
Subtyping
A class-like string refines another by both axes:
graph BT
L["class-string<App\\Foo>"] --> ClassUnspec["class-string"]
ClassUnspec --> AnyUnspec["any class-like-string"]
InterfaceUnspec["interface-string"] --> AnyUnspec
AnyUnspec --> Str[string]
- The kind axis:
class$\mathrel{<:}$unspecified,interface$\mathrel{<:}$unspecified, etc. Different specific kinds are disjoint (a class-string is not an interface-string). - The specifier axis: a literal
class-string<Foo>refines a literalclass-string<F>iff the analyser confirmsFoodescends fromF.
The cross-cutting check is that a class-like string is also a string ; every class-like string refines string.
Why a separate form
PHP's class-string<Foo> is not just "a string that happens to spell Foo"; it carries the analyser's knowledge that the string can be passed to new, used in is_a(), paired with instanceof, and so on. Modelling it as a refinement of a plain string would lose the connection. Modelling it as its own form preserves the connection and lets the lattice apply class-hierarchy reasoning where needed.
A worked example
/** @param class-string<Throwable> $cls */
function throw_named(string $cls): never {
throw new $cls('boom');
}
throw_named(RuntimeException::class); // OK
throw_named(stdClass::class); // FAIL: stdClass does not descend from Throwable
The parameter is class-string<Throwable>. The arguments are class-string<RuntimeException> and class-string<stdClass>.
The lattice asks the analyser: does RuntimeException descend from Throwable? (yes). Does stdClass? (no). It returns the matching answer.
Crossing into string
A class-string<Foo> refines string directly. A string does not refine class-string<Foo> — most strings are not class names.
The other direction matters for narrowing: an analyser observing class_exists($s) returning true can narrow $s from string down to class-string. The narrow operation handles this when the analyser supplies the assertion.
See also: scalars for the
stringfamily; refines for the per-pair rules.
Refinement axes
A refinement axis is a boolean (or low-cardinality) property carried directly on a type, not as a separate type. The lattice treats axes as bits on the same form: setting an axis tightens the type without changing what it is.
Axes are how suffete avoids exploding the universe. There is no separate "non-null mixed" form distinct from mixed ; mixed carries a non-null axis. There is no separate "non-empty string" form distinct from string ; string carries a non-empty axis. The lattice rules work over axes by checking that every axis the container constrains is implied by the input.
The general rule
Given a form $F$ that carries an axis $a$, and a container of form $F$ with $a$ set:
- An input of the same form $F$ refines the container iff its $a$ is also set.
- An input of a different form $F'$ refines the container iff $F'$ structurally guarantees $a$.
The "structurally guarantees" half does the work. int structurally guarantees non-null (every integer is not null), so int refines non-null mixed even though int is not mixed.
Axes on mixed
Four axes:
| Axis | PHP-side | Refinement |
|---|---|---|
| non-null | non-null mixed | The value is not null. |
| truthy | truthy mixed | The value is truthy at runtime. |
| falsy | falsy mixed | The value is falsy at runtime. |
| empty | analyser-internal | The value is falsy (paired with the falsy truthiness). |
| isset-from-loop | analyser-internal | The value flowed through a loop body. |
Combinations are valid. A truthy value cannot be null (PHP), so truthy mixed already implies non-null.
Structural implications
What other types satisfy each axis:
- non-null is implied by every form except
null,void, andmixedwithout the non-null axis. Soint,string,Foo,array,list<int>,resource,callableall refinenon-null mixed. - truthy is implied by:
true- any object (objects are truthy in PHP)
- any resource
- non-zero integer literals; integer ranges that exclude 0
- non-zero float literals
- string literals that are not
""or"0" truthy-stringandnon-empty-string(with the truthy axis explicitly set)non-empty-array,non-empty-list
- falsy is implied by:
false,null,void0,0.0(and-0.0)""and"0"- the empty array, the empty list
- empty is the same as falsy for refinement purposes.
- isset-from-loop propagates only on
mixeditself; no other form structurally implies it.
Axes on string
Five axes:
| Axis | PHP-side | Refinement |
|---|---|---|
| non-empty | non-empty-string | Length ≥ 1. |
| truthy | truthy-string | Truthy at runtime (excludes "" and "0"). |
| lowercase | lowercase-string | Every character is lowercase. |
| uppercase | uppercase-string | Every character is uppercase. |
| numeric | numeric-string | Parses as int or float. |
Combinations are valid: non-empty-lowercase-string carries both axes.
Refinement direction: more axes = stricter. non-empty-lowercase-string refines non-empty-string, lowercase-string, and string. The same rule as for mixed: every axis the container constrains must be carried by the input.
Structural implications
A string literal carries every axis it satisfies:
- non-empty iff the literal is not
"". - truthy iff the literal is not
""and not"0". - lowercase iff every character is lowercase.
- uppercase iff every character is uppercase.
- numeric iff the literal parses as an int or float.
Axes on resources
resource carries two refinements:
- state —
Open,Closed, or unspecified. - kind — an optional named kind (
curl,gd image, ...).
Subtyping is the conjunction: state must match (or be unspecified on the container) and kind must match (or be unspecified on the container). See resources.
Axes on objects
A named class carries three modality flags:
$this— the object is exactly the receiver.static— the object is the late-static class.- A class can be analyser-known
final; the universe does not flag this on the type itself but the lattice consults the analyser at refinement time.
$this implies static. The modality check at refinement time:
- A container marked
$thisaccepts only$thisinputs. - A container marked
staticacceptsstaticor$thisinputs. - A plain
Fooaccepts any of the above.
A structural object shape carries sealed as its main axis (see objects).
Axes on arrays and lists
Both kinds carry:
- non-empty — at least one entry.
- sealed — the shape commits to having no entries beyond those listed.
Both follow the standard "more axes = stricter" rule.
Why axes, not wrappers
A natural alternative would be to wrap a type in a refining wrapper: an outer "non-null" wrapping a mixed. Suffete uses axes for two reasons:
- No rule duplication. With axes attached to the type, the rule for "does this axis hold?" lives in one place. With wrappers, every other type would need to know how to interact with the wrapper.
- Compactness. A
non-null truthy lowercase stringis one type carrying three bits, not a chain of three nested wrappings.
The trade-off is that the forms carrying axes have more elaborate rules. Those rules are tested against the algebraic-law battery (laws) so the elaboration does not introduce soundness drift.
A worked example
The PHP type truthy lowercase non-empty-string carries three string axes: non-empty, truthy, and lowercase.
Refinement:
- The literal
"foo"refines it ;"foo"is non-empty, truthy, and lowercase. "FOO"does not — fails lowercase.""does not — fails non-empty and truthy."0"does not — fails truthy.non-empty-string(only one axis) does not — missing lowercase and truthy.lowercase-string(only one axis) does not — missing non-empty and truthy.
See also: scalars for the string family in full; special elements for the mixed axes; refines for the per-axis lattice rules.
Wrappers: negation and intersection
Two forms wrap other types: they exist to compose the universe with set complement and set intersection. They are the universe's two combinators, and they account for most of the lattice's complexity ; whenever the rules need to do something nontrivial, it's usually because a wrapper is involved.
| Form | PHP-side | Denotes |
|---|---|---|
| Negation | !T, !int, !Foo | Set complement: every value not in $T$. |
| Intersection | Foo & Bar & Baz | Set intersection of two or more types. |
Negation
PHP-side: !T. Denotes the set complement of $T$ — every value in the PHP universe except those in $T$.
Subtyping
The standard rules:
- $\mathit{never} \mathrel{<:} \neg \tau$ for every $\tau$ ; never is in every set's complement.
- $\tau \mathrel{<:} \neg \sigma$ iff $\tau \sqcap \sigma \equiv \bot$ (i.e. $\tau$ and $\sigma$ are disjoint).
- $\neg \sigma \mathrel{<:} \tau$ iff $\neg \tau \mathrel{<:} \sigma$ (Boolean duality, when both inner types are well-formed for the rule).
- $\neg \neg \tau \equiv \tau$ ; double negation collapses.
- $\neg \mathit{never} \equiv \top$ (vanilla mixed); $\neg \top \equiv \mathit{never}$.
Negation in narrowing
The most common use of negation is in narrowing. After !is_int($x), the analyser narrows $x by the type !int. The lattice's subtract operation handles this directly.
Intersection
PHP-side: Foo & Bar & Baz. Denotes the set intersection of all parts.
An intersection is represented as a head and a list of conjuncts. Foo & Bar & Baz has Foo as head and [Bar, Baz] as conjuncts. The head is canonicalised so that Foo & Bar and Bar & Foo are the same intersection.
One representation, always
Every PHP intersection is expressed as the same wrapper, regardless of the head's type. Foo & Bar, int & literal-int, array<int, V> & Countable, T & Foo (a generic parameter conjoined with a class) — they all use the wrapper. There is no in-payload conjunct list on individual types.
Subtyping
The standard rules from PL theory:
- $\tau \mathrel{<:} (H \sqcap C_1 \sqcap \dots \sqcap C_n)$ iff $\tau \mathrel{<:} H$ and $\tau \mathrel{<:} C_i$ for every $i$ (intersection on the right is a conjunction of refinements).
- $(H \sqcap C_1 \sqcap \dots \sqcap C_n) \mathrel{<:} \sigma$ iff $H \mathrel{<:} \sigma$ or some $C_i \mathrel{<:} \sigma$ (intersection on the left is a disjunction; some side must do the work).
The asymmetry — intersection on the left is a disjunction, intersection on the right is a conjunction — is the standard "Int-L / Int-R" rule.
Uninhabited intersections
An intersection can be uninhabited — Foo & !Foo is empty. The construction phase detects the trivial cases:
H & !H(a conjunct that negates the head) collapses to $\bot$.- A conjunct that is lattice-disjoint with the head also collapses.
The full uninhabited check is part of overlaps.
A worked example
The PHP type:
Stringable & Countable & !Iterator
is an intersection with Stringable as head, and [Countable, !Iterator] as conjuncts.
A class Foo refines this iff:
- $\mathit{Foo} \mathrel{<:} \mathit{Stringable}$ (the analyser's codebase confirms
FooimplementsStringable). - $\mathit{Foo} \mathrel{<:} \mathit{Countable}$ (similarly for
Countable). - $\mathit{Foo} \mathrel{\#} \mathit{Iterator}$ (
Foodoes not implementIterator).
The lattice asks the codebase for the first two and uses overlaps for the third.
See also: refines for the Int-L / Int-R rules; overlaps for the uninhabited-detection logic; meet for how intersections are constructed during the meet operation; subtract for how negations are introduced.
Unresolved elements
A handful of element kinds name a future resolution rather than carrying its result. They appear in the type universe because the analyser needs a way to express "the type that you'd get if you looked this up", and they get replaced by structural elements through the expansion operation when the lattice needs to reason about them.
If you call refines(τ, σ, ...) with an unresolved element on either side, the contract is that the analyser has already expanded it. The lattice itself does not invoke expansion; it would loop forever or worse if it did. The expand chapter has the rules for when and how the analyser does this.
| Form | PHP-side | Denotes |
|---|---|---|
| Alias | /** @type UserId = int */ | A user-introduced name for a type. |
| Reference | T (a referenced template before binding) | A name that refers to some declared template parameter or alias. |
| Member reference | Foo::T, self::T | A reference to a class-member type. |
| Global reference | T declared at the file level | A reference to a global type variable. |
| Conditional | T extends U ? X : Y | A conditional type, resolved by checking subject vs target. |
| Derived | key-of<T>, value-of<T>, T[K], ... | A type derived from another by a transformation. |
| Variable | analyser-introduced | A placeholder for an inferred-but-not-yet-pinned type. |
Aliases
A user-introduced name for a type, declared at the package or class level:
/** @type UserId = int */
Aliases are nominal: two distinct aliases with identical underlying types are not equal. UserId and int are not the same alias even when UserId is declared as int. Expansion turns them into the same structural type, but the alias preserves the name for diagnostics and refactoring.
References
A reference is a name that has not yet been bound to a specific declaration. Used during PHPDoc parsing and resolution, before the analyser has determined whether the name refers to a class, an alias, a template parameter, or something else.
Expansion resolves the name through the analyser's symbol table:
- If the name is a class-like, becomes a named class.
- If the name is an alias, becomes the alias's underlying type.
- If the name is a template parameter, becomes a generic parameter.
- If the name is unknown, expansion fails and the analyser surfaces a diagnostic.
References can carry their own type arguments: Foo<int> is one reference. Intersections like Foo<int> & Bar use the Intersected wrapper.
Member references
A reference to a type declared as a member of another type:
/** @return self::ItemType */
Expansion looks up the name on the resolved class via the analyser's codebase model, substitutes the class's template environment, and returns the resulting type.
Global references
A reference to a global type variable (rare, but used in some PHPDoc dialects):
/** @global TItem $item */
Expansion looks the name up in the analyser's global type variable table.
Conditional types
PHP-side: T extends U ? X : Y. Has four pieces: a subject, a target, a then branch, and an otherwise branch.
Expansion checks the subject <: target test under the current template environment. If true, the result is then; otherwise, the result is otherwise.
Conditional types are most useful inside generic declarations:
/**
* @template T
* @return ($t extends int ? string : bool)
*/
function f($t) { ... }
After substitution T := int(7), the conditional resolves to string.
Derived types
A family of derived-type forms, each describing a transformation:
key-of<T>— the keys ofTif it's an array, list, or shape.key-of<array{a: int, b: string}>expands to'a' | 'b'.value-of<T>— the values ofT.value-of<array{a: int, b: string}>expands toint | string.T[K]— index access; the type of accessing keyKonT. Used for shape lookups.properties-of<T>— the properties ofTas a structural shape.int-mask-of<T>/int-mask<L>— bitmask types, for flags.new<T>— the type ofnew T(...); useful whenTis a class-string.- template-of-class — the type of template parameter
Pon classCas instantiated on objectO. Resolved by specialise.
Expansion routes each variant to the matching transformation, which itself may produce more unresolved kinds (for example, key-of<T> where T is itself a reference).
Inference variables
The analyser's own placeholder for an inferred-but-not-yet-pinned type. Variables are introduced by the analyser during inference and resolved by the analyser when inference completes. The lattice does not see them in finalised types.
Why they exist in the universe
The alternative would be to keep unresolved forms as a separate AST and only build types for resolved cases. Suffete prefers a single universe for two reasons:
- Uniform structure. Every type — resolved or not — is a finite union of elements. Walkers, transforms, and serialisation work on every type without a special case for "is this resolved?".
- Lazy expansion. Resolving every reference at construction time would force the analyser to have a complete codebase model before constructing any type. The unresolved forms let the analyser construct types eagerly and resolve them on demand.
The trade-off is the rule above: lattice operations on unresolved forms are not directly defined. The analyser must expand first.
A worked example
/**
* @template T
* @param T $item
* @return ($T extends list<infer U> ? U : T)
*/
function head($item) { ... }
The return type is a conditional with:
subject=Ttarget=list<U>then=Uotherwise=T
Calling head(['a', 'b']) gives the analyser an argument type list<string>. Inference (see standin) binds T := list<string>, and U := string is bound by the conditional's match.
Expansion then sees the subject <: target test holds and picks the then branch, yielding string.
Without expansion, the lattice cannot see any of this ; the conditional is opaque to it.
See also: expand for the operation that resolves these; generics for the template machinery.
Templates and generic parameters
A template parameter is a placeholder in a generic declaration: T on a class Box, K/V on a class Map, T on a function id. PHP has no native generics syntax; they are introduced by @template T (or @template-covariant, @template-contravariant) in a PHPDoc block.
This chapter covers what a free template parameter is in the type universe. The operations that act on it — substitution, inference, specialisation — are in Part IV.
What a template parameter carries
A template parameter has four pieces of information:
- A name (
T,K,V, ...) — the user-visible identifier. - A defining entity — the class-like, function, method, or closure that declared it. Two parameters with the same name on different entities are different parameters:
Tonclass Boxis distinct fromTonclass Foo. - An upper bound (the constraint) — PHP-side, the
T extends Fooclause. Defaults tomixed. - An optional qualifier — used at the use site for specific forms like
T::class.
The name plus the defining entity together identify the parameter. Substitution, inference, and specialisation all key on this pair.
Subtyping
A free template parameter behaves like its constraint for subtype questions:
- $T \mathrel{<:} \tau$ iff $\mathit{constraint}(T) \mathrel{<:} \tau$.
- $\tau \mathrel{<:} T$ iff $\tau \mathrel{<:} \mathit{constraint}(T)$ ; with caveats (the variance-aware rules in refines).
Two parameters that share the same (name, defining_entity) are the same thing. Two with different defining entities are distinct even when their names match.
How free parameters appear in types
Inside the body of a generic class, a free T shows up wherever it is used:
/**
* @template T
*/
class Box {
/** @var T */
public mixed $value;
/** @return T */
public function get() { return $this->value; }
/** @param T $value */
public function set($value): void { $this->value = $value; }
}
The field $value, the return of get, and the parameter of set all reference the same T (same name, same defining entity).
When the user instantiates Box<int> (in a PHPDoc context such as @var Box<int>), the analyser substitutes every T for (name="T", defining_entity=Box) with int, producing the concrete versions of the field, getter, and setter types.
Free vs bound
A template parameter is free until it is substituted. After substitution, the parameter is gone — replaced with the concrete type.
A type that contains a free template is not "wrong" — it just has not been instantiated yet. The lattice still answers questions about it (using the constraint as the upper bound), but the answers are conservative: $T \mathrel{<:} \tau$ holds iff $\tau$ accepts every value the constraint admits.
Variance
The variance of a parameter is declared at the class level, not on the parameter itself. The analyser registers each class's parameter list with the codebase model; the lattice asks for the declared variance when it needs it during a refines query on instantiated classes.
Variance is one of:
- Covariant ; subtype of
Tproduces subtype ofBox<T>. - Contravariant ; supertype of
Tproduces subtype ofBox<T>. - Invariant ; only the same
Tproduces a refinement.
See the variance chapter for the full rules.
Default-filled parameters
When a class declares a T parameter and the user references the class as plain Box (rather than Box<int>), the analyser fills T with the constraint (mixed by default) and marks the resulting type as default-filled. The marker rides along with the type wherever it is later nested.
The variance check at refinement time consults the marker: a default-filled type-argument is allowed to flow either direction (subject to variance) without producing a strict refinement failure ; the lattice records the use of the default on its report so the analyser can warn about the unpinned position.
A worked example
/**
* @template T
*/
class Box {
/** @var T */
public mixed $value;
}
/** @var Box<int> $ints */
$ints = new Box();
Inside the class body, the field $value has type T (free).
After the analyser sees the @var Box<int> annotation, it builds the instantiated Box. It walks the class's stored field types and substitutes each T with int, producing the concrete field type for this instance.
The T parameter itself is not modified ; substitution is a pure function returning new types.
See also: Substitute for the substitution operation; Variance for how variance is declared and used; Standin for the inference round that binds free parameters.
Subtyping: refines
The subtype relation $\tau \mathrel{<:} \sigma$ is suffete's most important operation. Read it as: every value of type $\tau$ is also a value of type $\sigma$.
Reading the rules
The lattice rules read as a dispatch tree: first, the universal axioms ; then, family-specific rules ; then, fall through to false. The dispatch happens at the Element level, not the Type level: the Type-level rule is just "every Element on the left must refine some Element on the right", with a few cross-cutting collapse rules.
Type-level rule
For Types $\tau, \sigma$:
$$\tau \mathrel{<:} \sigma \quad\iff\quad \forall e \in \tau,;; e \mathrel{<:} \sigma$$
(Read $e \mathrel{<:} \sigma$ as: there exists some Element in $\sigma$ that the Element $e$ refines, OR a fan-out coverage rule fires that lets a single Element be covered by a union of Elements on the right.)
The fan-out rules let the lattice handle cases like int<-∞, 0> refining int<-∞, -1> | int(0) ; no single Element on the right covers it, but the union does. The rules are:
- Int union coverage — an integer range may be covered by a sorted union of integer ranges and literals.
- String union coverage — a string literal is covered by an unspecified string with matching axes, or by another literal.
- List union coverage — a sealed list is covered by a union of unsealed lists with element types covering the known positions.
- Array union coverage — same for keyed arrays.
- Intersected partition coverage — an intersection may be covered by a union if each part of the intersection is.
- Negation partition coverage — a negation of a finite-cover universe (such as
boolor an enum) may be covered by the union of its complement.
These coverage rules are why the rules are more elaborate than the textbook "for-each-on-the-left, exists-on-the-right" rule.
Universal axioms
| Rule | Reading |
|---|---|
| Reflexivity | $a \mathrel{<:} a$ for every Element. |
| Bottom | $\bot \mathrel{<:} a$ for every Element. ($\bot$ has no values; the universal "every value of $\bot$ is a value of $a$" holds vacuously.) |
| Top | $a \mathrel{<:} \top$ for every Element. (Every value is in $\top$.) |
| Placeholder | placeholder refines and is refined by everything (inference convenience; should not appear in finalised types). |
The order is significant: the universal axioms fire first, the family dispatch last.
Family dispatch
Dispatch is by the family of the right-hand side (the container) and then by the family of the left-hand side (the input). One rule set per family (objects, arrays, scalars, etc.) plus the wrappers and unresolved elements.
| Container | Input | Rule |
|---|---|---|
int | int | range / literal subtyping |
float | float | float-family rules |
string | string, class-like-string | string axis subset, literal flags |
| class-like-string | class-like-string | kind + specifier rules |
bool, true, false | bool, true, false | trivial subtype rules |
| object family | object family | named, shape, has-method, etc. |
array, list | array, list | sealed, optional, variance |
iterable | array, list, named class | Traversable etc. |
callable | callable | return covariant, params contravariant |
resource | resource | state + kind axes |
scalar / numeric / array-key | (decomposed) | true-union dominator covers |
| negation | (anything) | the negation rules below |
| intersection | (anything) | the Int-L / Int-R rules below |
mixed | (anything) | narrowed-mixed axis rules |
| generic parameter | (anything) | constraint projection |
| alias / reference / ... | (anything) | unresolved — caller must expand first |
The wrapper rules
Negation on the right
$a \mathrel{<:} \neg b$ iff $a \sqcap b \equiv \bot$ (i.e. $a$ is disjoint from $b$).
Example: int <: !string iff int is disjoint from string ; true.
Negation on the left
$\neg a \mathrel{<:} b$ iff $\neg b \mathrel{<:} a$ (Boolean duality), with caveats. When $a$ has a finite cover (e.g. bool is true | false), the lattice can directly check whether the complement is contained in $b$.
Intersection on the right (Int-R)
$\tau \mathrel{<:} (H \sqcap C_1 \sqcap \dots \sqcap C_n)$ iff $\tau \mathrel{<:} H$ and $\tau \mathrel{<:} C_i$ for every $i$.
Conjunction-of-refinements: the container demands $\tau$ refine every part.
Intersection on the left (Int-L)
$(H \sqcap C_1 \sqcap \dots \sqcap C_n) \mathrel{<:} \sigma$ iff $H \mathrel{<:} \sigma$ or some $C_i \mathrel{<:} \sigma$.
Disjunction-of-refinements: the intersection refines $\sigma$ when some part already does.
The asymmetry is critical: intersection on the left is easier to refine into something (any part can do the work); intersection on the right is harder to refine into (every part must accept).
The mixed-axes rule
If the container is mixed with axes $A$, the input refines it iff every axis in $A$ is implied by the input. The implications are listed in refinement axes.
Example: int <: non-null mixed ; the non-null axis is structurally guaranteed by int.
Generic-parameter projection
Free template parameters are treated as their constraint for the purpose of subtyping.
- Left: $T \mathrel{<:} \sigma$ iff $\mathit{constraint}(T) \mathrel{<:} \sigma$.
- Right: $\tau \mathrel{<:} T$ iff $\tau \mathrel{<:} \mathit{constraint}(T)$ (with extra checks; see variance).
The same parameter on both sides reflexively refines (rule 1, equality). Two parameters with different defining entities are different parameters; whether they refine each other depends on their constraints.
Coercion edges
PHP allows certain non-subtype coercions on parameter boundaries: int can flow into a float parameter, for example. The lattice has a single toggle to admit these on demand:
| Edge | Source | Target |
|---|---|---|
int → float | int parameter | float parameter |
numeric-string → int | string parameter | int parameter (with cast) |
numeric-string → float | string parameter | float parameter (with cast) |
The default is to admit these (matches PHP's loose-types behaviour); analysers in strict-types mode disable them. When a coercion edge fires, it is recorded so the analyser can warn.
Disjointness
refines decides $\tau \mathrel{<:} \sigma$. For the symmetric question "do these two types share any value?", use overlaps.
Visualising the dispatch
flowchart TD
Start[Start: refines a, b] --> Refl{a == b?}
Refl -->|yes| True
Refl -->|no| Top{is b == mixed?}
Top -->|yes| True
Top -->|no| Bot{is a == never?}
Bot -->|yes| True
Bot -->|no| Phold{placeholder on either side?}
Phold -->|yes| True
Phold -->|no| Wrap{a or b a wrapper?}
Wrap -->|negation| NegRule[Apply negation rule]
Wrap -->|intersection| IntRule[Apply Int-L / Int-R]
Wrap -->|no| Generic{a is a generic parameter?}
Generic -->|yes| Project[Recurse on constraint]
Generic -->|no| Family[Family dispatch]
NegRule --> End[Return]
IntRule --> End
Project --> End
Family --> End
True --> End
A worked example
/** Inputs */
$a = int|string;
$b = int|string|null;
$a <: $b ; every Element of int|string (namely int and string) refines some Element of int|string|null. Trivial.
$b <: $a is false ; the null on the left does not refine any Element on the right. No fan-out applies.
See also: overlaps for the symmetric "share a value?" question; meet for the operation that produces a type rather than a boolean; laws for the algebraic identities the implementation is required to satisfy.
Overlap and disjointness
The overlap relation $\tau \mathrel{\#} \sigma$ asks the symmetric question: do $\tau$ and $\sigma$ share at least one value?. Its negation is disjointness.
Overlap holds iff there exists some value $v$ such that $v \in \tau$ and $v \in \sigma$. Disjointness holds iff no such value exists.
Overlap is the test you want when neither side dominates the other. int<0,10> and int<5,15> overlap on the value 7, but neither is a subtype of the other. Refines does not see this; overlap does.
Why both refines and overlaps
Refines is one-directional: $\tau \mathrel{<:} \sigma$ tells you if every value of $\tau$ is also in $\sigma$. Overlap is symmetric: $\tau \mathrel{\#} \sigma$ tells you if any value is in both.
The two are independent:
- $\tau \mathrel{<:} \sigma$ implies $\tau \mathrel{\#} \sigma$ when $\tau$ is inhabited.
- $\tau \mathrel{\#} \sigma$ does not imply $\tau \mathrel{<:} \sigma$ or $\sigma \mathrel{<:} \tau$.
In the analyser, refines answers parameter-vs-argument questions; overlap answers questions like "could this instanceof Foo ever succeed?" (yes iff the variable's type overlaps with Foo).
The Type-level rule
For Types $\tau, \sigma$:
$$\tau \mathrel{\#} \sigma \quad\iff\quad \exists e_1 \in \tau, e_2 \in \sigma ;:; e_1 \mathrel{\#} e_2$$
Some pair of Elements, one from each side, must overlap. The Element-level rule does the work.
The Element-level dispatch
The Element-level rule fires its tests in this order:
neveraxiom — if either side isnever, the answer isfalse.neverhas no values; the empty set overlaps nothing.- Uninhabited check — if either side is uninhabited (e.g.
Foo & int, the empty sealed shape), the answer isfalse. - Reflexivity — identical inhabited Elements overlap.
- Top —
mixedoverlaps every inhabited Element. - Placeholder — overlaps everything by inference convention.
- Generic-parameter projection — a free template parameter overlaps $\sigma$ iff its constraint overlaps $\sigma$.
- Subsumption shortcut — if $a \mathrel{<:} b$ or $b \mathrel{<:} a$, they overlap (one inhabits the other).
- Family-specific overlap rules — last resort.
The order matters. The never axiom must fire before the reflexivity rule, otherwise never would (wrongly) be reported as overlapping itself.
The never axiom
never overlaps with nothing, including itself.
In set-theoretic terms: never is the empty set. The intersection of the empty set with anything is the empty set. So overlap with never is false, always.
Subsumption shortcut
If $a \mathrel{<:} b$ and $a$ is inhabited, then $a$ has some value, and that value is in $b$, so $a \mathrel{\#} b$. The lattice uses this as an early-exit: try refines(a, b) and refines(b, a), and if either holds, return true.
This makes the common cases (one side dominates the other) fast and avoids re-implementing the case analysis the family rules already do.
Family-specific positive rules
When neither subsumption nor the universal axioms fire, the lattice consults the family-specific rules. These add precision in cases where neither side dominates:
- Int ranges:
int<0,10>overlapsint<5,15>because both contain5..=10. - String literals and refinements:
"hello"overlapsnon-empty-string(the literal satisfies the axis). - Class hierarchies:
FoooverlapsBarif the world says they share a descendant, or one is final and structurally satisfies the other. - Iterables and arrays:
iterable<int, int>overlapsarray<int, int>; arrays are iterables. - Narrowed mixed: cross-axis overlap rules.
The family rules are incomplete by design: adding a positive rule never weakens correctness (the relation is monotone in true outcomes). Missing rules cost precision but not soundness ; a downstream narrow returns never instead of a real overlap, and the analyser is conservative rather than wrong.
Uninhabited Elements
Some Elements have no values:
neveritself.- An intersection whose head and some conjunct are disjoint, or whose head is
never, or where two conjuncts are disjoint (e.g.Foo & int). - A sealed object shape with no properties.
Note that array{} (the empty sealed array) is not uninhabited ; it has exactly one inhabitant, the empty array.
A worked example
Consider three integer ranges:
- $r_1 = \texttt{int<0, 10>}$
- $r_2 = \texttt{int<5, 15>}$
- $r_3 = \texttt{int<20, 30>}$
Then:
- $r_1 \mathrel{\#} r_2$ ; they share
5..=10. - $r_1$ does not overlap $r_3$ ; the ranges are disjoint.
Visualising overlap vs refines
graph TD
Int[int]
Range1["int<0, 10>"]
Range2["int<5, 15>"]
Lit5["int(5)"]
Lit20["int(20)"]
Range1 --> Int
Range2 --> Int
Lit5 --> Range1
Lit5 --> Range2
Lit20 --> Int
In this diagram:
int<0,10>overlapsint<5,15>; they shareint(5)..int(10). Neither refines the other.int(5)refinesint<0,10>; subsumption. Also they overlap.int<0,10>does not overlapint(20); disjoint.
A subtle case: empty types overlap with nothing
never does not overlap itself. There are no values in never, so the question "do they share a value?" is vacuously false. This is correct set-theoretically and is what the analyser expects ; an instanceof against an uninhabited type is unreachable, not always-true.
When overlaps and refines disagree
By construction, when both terms are inhabited:
- $\tau \mathrel{<:} \sigma$ implies $\tau \mathrel{\#} \sigma$.
- $\sigma \mathrel{<:} \tau$ implies $\tau \mathrel{\#} \sigma$.
But:
- $\tau \mathrel{\#} \sigma$ does not imply either direction of refines.
If you find a case where refines holds and overlaps does not (with an inhabited subtype), that is a soundness violation. The laws chapter has the algebraic identity that catches this in CI.
See also: refines for the asymmetric subtype relation; meet for the operation that produces a type representing the overlap; laws for the soundness interlock between refines and overlap.
Greatest lower bound: meet
The meet $\tau \sqcap \sigma$ is the largest type that refines both $\tau$ and $\sigma$. PHP-side: the type-level intersection.
What "greatest lower bound" means
Among all types $\rho$ such that $\rho \mathrel{<:} \tau$ and $\rho \mathrel{<:} \sigma$:
- $\rho = \mathit{never}$ always works (vacuously).
- The lattice's job is to find the largest such $\rho$.
Operationally: meet is the type whose value-set is the intersection of $\tau$'s and $\sigma$'s value-sets, expressed as concretely as the kind system allows.
Examples:
| $\tau$ | $\sigma$ | $\tau \sqcap \sigma$ |
|---|---|---|
int | string | never |
int|string | int|null | int |
int<0,10> | int<5,15> | int<5,10> |
mixed | int | int |
Foo | Bar (sibling classes) | Foo & Bar |
Foo | !Foo | never |
non-null mixed | int|null | int |
array<int, mixed> | array<int, string> | array<int, string> |
Type-level rule
For every Element pair $(e_1, e_2)$ where $e_1 \in \tau$ and $e_2 \in \sigma$, compute the per-pair meet, and union the results. If every pair is disjoint, the result is never.
$$\tau \sqcap \sigma ;=; \bigsqcup_{e_1 \in \tau,; e_2 \in \sigma} (e_1 \sqcap e_2)$$
Element-level dispatch
For Elements $a, b$, the per-pair meet fires its tests in this order:
- Uninhabited check — if either is uninhabited, the meet is
never. - Reflexivity — $a \sqcap a = a$.
neveraxiom — $\bot \sqcap a = \bot$.- Top — $\top \sqcap a = a$.
- Placeholder — placeholder $\sqcap a = a$ by inference convention.
- PHP runtime separation —
intandfloatare runtime-coercible but not subset-related; treat them as disjoint at the value level. - Wrappers — Negated and Intersected dispatch (below).
- Narrowed mixed — axis-AND rules (below).
- Subsumption — if one refines the other, return the smaller.
- Generic-parameter projection — meet via constraint.
- Family-specific intersection rules — last resort.
The wrapper rules
Meet involving negation
The standard set-theoretic identity:
$$a \sqcap \neg b ;=; a \setminus b$$
This routes to subtract. For int $\sqcap$ !int(0), the result is int<-∞, -1> | int<1, ∞>.
The dual case ($\neg a \sqcap b = b \setminus a$) is handled symmetrically.
The case $\neg a \sqcap \neg b$ uses De Morgan: $\neg(a \sqcup b)$. The result is a single negation wrapping the join.
Meet involving intersection
Intersection with anything adds a new conjunct: $(H \sqcap C_1 \sqcap \dots \sqcap C_n) \sqcap X = H \sqcap C_1 \sqcap \dots \sqcap C_n \sqcap X$. The universe canonicalises by sorting and deduplicating conjuncts.
Intersection meet intersection flattens both conjunct lists and re-intersects.
If the new conjunct is incompatible with the head or another conjunct, the meet collapses to never.
The narrowed-mixed meet
Meet of two narrowed mixed Elements ANDs their axes: non-null mixed $\sqcap$ truthy mixed = non-null truthy mixed. If the axes are inconsistent (e.g. truthy mixed $\sqcap$ falsy mixed), the result is never.
Meet of a mixed-with-axes and a non-mixed Element: the non-mixed Element is returned, but only if it satisfies the axes ; otherwise never. non-null mixed $\sqcap$ int = int. non-null mixed $\sqcap$ null = never.
Some narrowings decompose into multiple Elements. truthy mixed $\sqcap$ int produces int<-∞, -1> | int<1, ∞> (the 0 is excluded). The result is still a single type, just with two Elements.
Family-specific intersection rules
Each family has its own intersection logic:
- Int / Int: range intersection.
int<0,10> $\sqcap$ int<5,15> = int<5,10>. Empty intersection →never. - Float / Float: literal-vs-literal equality, otherwise
never. - String / String: literal-vs-literal equality; unspecified-with-axes $\sqcap$ literal returns the literal if it satisfies the axes; axis-AND otherwise.
- String / class-like-string: literal
"Foo"$\sqcap$class-string<Foo>returns the literal if it names the class; otherwisenever. - Object / Object: same class → per-position parameter meet; different classes → an
Foo & Barintersection. - Object / enum:
never(objects and enums are disjoint). - Object / object shape: an
object & object{...}intersection, unless the world says the class doesn't have the shape's properties. - Object / has-method, has-property: similarly an intersection.
- Array / Array: sealed-vs-sealed key-by-key; sealed-vs-generic; generic-vs-generic per-parameter.
- List / List: same idea, by index.
- Iterable / Array, List: per-parameter, then return the more specific kind.
- Callable / Callable: signature meet (return type meet, parameter type joins for contravariance, throws meet).
Sealed shape meet
A specifically tricky case. Meet of two sealed shapes:
array{a: int, b?: string} ∩ array{a: int<0,100>, c: bool}
The result is a sealed shape with the union of keys, each value being the meet of the per-key types:
a:int $\sqcap$ int<0,100> = int<0,100>; required (both required).b: only on the left, optional on the left, missing on the right. A required key on the right would force absence on the left ; herebis optional on the left, so the result hasb?: string(still optional).c: only on the right, required. This is a problem — the right requirescbut the left does not havec. The result isneverbecause no value satisfies both.
The full sealed-shape-meet logic checks each key against the other side's known items and the sealed bit, and returns never if the result would be uninhabited.
A worked example
/**
* @param int<0, 10> $x
* @param int<5, 15> $y
*/
The meet of these two parameter types is int<5, 10>.
/**
* @param int|string|null $a
* @param int|null $b
*/
The meet is int|null ; the string Element on the left has no overlap with the right.
Visualising the result lattice
graph BT
Bot["never (⊥)"]
Int["int"]
Str["string"]
IS["int|string"]
ISN["int|string|null"]
Bot --> Int
Bot --> Str
Int --> IS
Str --> IS
IS --> ISN
style Bot fill:#fdd,stroke:#a44
style ISN fill:#dfd,stroke:#4a4
Meet moves down this lattice; the meet of two types is the highest node that sits below both.
Properties
The laws chapter checks all of these on every CI run:
- Idempotence: $\tau \sqcap \tau \equiv \tau$.
- Commutativity: $\tau \sqcap \sigma \equiv \sigma \sqcap \tau$.
- Associativity: $(\tau \sqcap \sigma) \sqcap \rho \equiv \tau \sqcap (\sigma \sqcap \rho)$.
- Identity: $\tau \sqcap \top \equiv \tau$.
- Annihilator: $\tau \sqcap \bot \equiv \bot$.
- GLB property: For all $\rho$, $\rho \mathrel{<:} \tau$ and $\rho \mathrel{<:} \sigma$ implies $\rho \mathrel{<:} (\tau \sqcap \sigma)$.
The GLB property is the soundness interlock with refines: meet must produce a lower bound, and any other lower bound must refine it.
See also: refines for the per-Element subtyping; join for the dual operation; narrow for the assertion-aware variant; subtract for the difference operation that meet uses internally for negations; laws for the algebraic checks.
Least upper bound: join
The join $\tau \sqcup \sigma$ is the smallest type that both $\tau$ and $\sigma$ refine. PHP-side: the union, with absorption and merging applied.
What "least upper bound" means
Among all types $\rho$ such that $\tau \mathrel{<:} \rho$ and $\sigma \mathrel{<:} \rho$:
- $\rho = \mathit{mixed}$ always works.
- The lattice's job is to find the smallest such $\rho$.
Operationally: join is the type whose value-set is the union of $\tau$'s and $\sigma$'s value-sets, expressed as concretely as the kind system allows after canonicalisation.
Examples:
| $\tau$ | $\sigma$ | $\tau \sqcup \sigma$ |
|---|---|---|
int | string | int|string |
int(0) | int(1) | int<0,1> |
int<0,10> | int<5,15> | int<0,15> |
true | false | bool |
int(0) | int(1) | ... | int(9) | -- | int<0,9> (range collapse threshold) |
Foo | Bar | Foo|Bar (no canonical class union) |
int | int|string | int|string (subsumption) |
never | int | int |
mixed | int | mixed |
How join is computed
Conceptually: gather every Element from both sides, then canonicalise.
The canonicalisation rules apply in order:
- Top wins: if
mixedis in the union, the result ismixedand everything else is dropped. - Bottom is absorbed: if
neveris in the union, drop it (unless it's the only thing). - Void canonicalises to null in non-degenerate unions: in any union of length > 1,
voidis replaced bynull(and not duplicated ifnullis already present). PHP runtime: avoidfunction returns the valuenullto its caller, so at every value-flow sitevoidis observationallynull.voidalone is preserved so a: voidreturn-type annotation round-trips. Examples:void | intcollapses toint | null;void | nullcollapses tonull;void | nevercollapses tovoid(theneveris dropped first). - Bool composition:
true | false = bool.bool | true = bool.bool | false = bool. - Resource composition:
open-resource | closed-resource = resource(when no specific kind). - Same-kind dominator: same-kind Elements may collapse to a single range or merged unspecified form.
- Range merging: adjacent or overlapping integer ranges merge.
- Literal collapse: many literals of the same kind collapse to a range or unspecified Element when they exceed a threshold.
- Family-level absorption:
int | int<0,10> = int.string | non-empty-string = string. Subsumption applied symmetrically. - Subtype absorption: any Element subsumed by a structurally larger Element in the same multiset is dropped (
Foo | BarwhereBar extends Foocollapses toFoo; the rule consults the world).
Range merging
Joining adjacent or overlapping integer ranges merges them:
| $\tau$ | $\sigma$ | $\tau \sqcup \sigma$ |
|---|---|---|
int<0,5> | int<3,10> | int<0,10> (overlap) |
int<0,5> | int<6,10> | int<0,10> (adjacent) |
int<0,5> | int<10,15> | int<0,5> | int<10,15> (gap) |
int(7) | int<8,10> | int<7,10> |
int<0,∞> | int<-∞,5> | int (unspecified) |
The rule applies symmetrically: literals join with ranges that contain or are adjacent to them; ranges that span the entire integer line collapse to int.
Literal collapse
A union of many literals of the same kind collapses. The thresholds are conservative: a small number of literals stays as a union (so the analyser can constant-fold), but a large union widens.
- Int literals: ten literals of the same range form a range; many literals form unspecified
int. - Float literals: similar, with a higher threshold (floats have less natural "tight" structure).
- String literals: more than a small number of literals collapses to unspecified
stringwith the join of their axes (e.g. all literals satisfylowercase→ result islowercase-string).
The defaults err on the side of widening eagerly; that prevents the analyser from carrying around 1000-element unions that no human will ever look at.
Subtype absorption
If two Elements are in the union and one refines the other, the more specific one is dropped:
int | int<0,10> → int
Foo | Bar → Foo (when Bar extends Foo)
literal "hello" | non-empty-string → non-empty-string
The absorption is not applied for coercion-driven refinements. int $\mathrel{<:}$ float via PHP runtime coercion does not drive absorption: int|float stays distinct so the analyser knows the value was originally typed as int.
Object union
Joining two named objects with no subtype relationship gives back the union:
Foo | Bar → Foo|Bar (when neither extends the other)
If one extends the other, subtype absorption fires:
Foo | Bar → Foo (when Bar extends Foo)
If both extend a common ancestor, the lattice does not automatically widen to the ancestor. Given class A {} class B extends A {} class C extends A {}, B | C → B|C, not A. This is conservative but correct: widening to the ancestor would lose the information that the value cannot be class D extends A.
The analyser can request widening explicitly via the inspection layer, which does walk up the hierarchy.
Sealed shape join
Joining two sealed shapes:
array{a: int} ⊔ array{b: string} → array{a?: int, b?: string}
Each key on the left becomes optional in the result (it might not be present on the right side); each key on the right becomes optional in the result (similarly). Common keys with different value types use the per-value join.
A sealed shape joined with a generic array decomposes the shape and joins:
array{a: int} ⊔ array<string, bool> → array<string|'a', int|bool>
A worked example
int | string joined with int is int | string ; the right side refines the int Element on the left, which absorbs it.
int<0, 5> joined with int<3, 10> is int<0, 10> ; the ranges overlap and merge.
true joined with false is bool ; the bool-composition rule fires.
Visualising the lattice
graph BT
Bot["never (⊥)"]
Int["int"]
Str["string"]
Lit3["int(3)"]
Lit5["int(5)"]
R35["int<3,5>"]
IS["int|string"]
Top["mixed (⊤)"]
Bot --> Lit3
Bot --> Lit5
Lit3 --> R35
Lit5 --> R35
R35 --> Int
Int --> IS
Str --> IS
IS --> Top
style Bot fill:#fdd,stroke:#a44
style Top fill:#dfd,stroke:#4a4
Join moves up this lattice; the join of two types is the lowest node that sits above both.
Properties
The laws chapter checks:
- Idempotence: $\tau \sqcup \tau \equiv \tau$.
- Commutativity: $\tau \sqcup \sigma \equiv \sigma \sqcup \tau$.
- Associativity: $(\tau \sqcup \sigma) \sqcup \rho \equiv \tau \sqcup (\sigma \sqcup \rho)$.
- Identity: $\tau \sqcup \bot \equiv \tau$.
- Annihilator: $\tau \sqcup \top \equiv \top$.
- LUB property: For all $\rho$, $\tau \mathrel{<:} \rho$ and $\sigma \mathrel{<:} \rho$ implies $(\tau \sqcup \sigma) \mathrel{<:} \rho$.
- Absorption: $\tau \sqcap (\tau \sqcup \sigma) \equiv \tau$ and $\tau \sqcup (\tau \sqcap \sigma) \equiv \tau$.
The LUB property is the soundness interlock with refines: join must produce an upper bound, and any other upper bound must contain it.
See also: meet for the dual; refines for the per-Element subtyping that drives subsumption; laws for the algebraic checks.
Set difference: subtract
The subtract $\tau \setminus \sigma$ removes from $\tau$ the values that are in $\sigma$. PHP-side: the type after a negative narrowing — $x of type int|string after is_int($x) returns false becomes string.
What "set difference" means
Subtract is the type whose value-set is $\tau \setminus \sigma$, expressed as concretely as the kind system allows.
Examples:
| $\tau$ | $\sigma$ | $\tau \setminus \sigma$ |
|---|---|---|
int|string | int | string |
int|string|null | null | int|string |
int<0,10> | int<5,15> | int<0,4> |
bool | true | false |
mixed | null | non-null mixed |
Foo|Bar | Foo | Bar |
non-empty-list<int> | list{} | non-empty-list<int> (no overlap to remove) |
int | int | never |
How subtract is computed
For each Element on the left, attempt to subtract every Element on the right; collect the results.
Per-pair subtract may split a single Element into multiple pieces:
int<0,10> \ int(5)producesint<0,4> | int<6,10>(range punching).bool \ trueproducesfalse.mixed \ nullproducesnon-null mixed.
Element-level dispatch
For a per-pair subtract $a \setminus b$:
- Reflexivity — $a \setminus a = \emptyset$ (the input is fully removed).
neveron the right — $a \setminus \bot = a$ (nothing to remove).neveron the left — $\bot \setminus b = \bot$.- No overlap — if $a$ and $b$ are disjoint, $a \setminus b = a$.
- Subsumption — if $a \mathrel{<:} b$, then $a \setminus b = \emptyset$ (everything in $a$ is removed).
- Family-specific subtract rules — last resort.
The early exit on disjoint pairs is essential for performance: most subtract calls in an analyser are negative narrowings on small types where most pairs are disjoint.
True-union dominator subtract
The scalar, numeric, and array-key Elements are true unions (disjoint covers). When the right-hand side lands inside one of their members, the left-hand side decomposes:
scalar = bool | int | float | string
numeric = int | float | numeric-string
array-key = int | string
scalar \ int → bool | float | string
numeric \ float → int | numeric-string
array-key \ string → int
Family-specific subtract rules
Int
int<0,10> \ int(5) = int<0,4> | int<6,10>; range punching.int<0,10> \ int<5,15> = int<0,4>; the overlap is removed.int \ literal-int = int(the overlap is one literal among infinitely many; the unspecified set does not measurably shrink).int \ int = never.
Float
float \ float-literal = float(no special punching for floats; one literal does not measurably change the unspecified set).float-literal-x \ float-literal-x = never.
String
string \ literal "foo" = string(one literal does not change the unspecified set).non-empty-string \ "" = non-empty-string(no overlap; no change).string \ non-empty-string = literal ""(only the empty string remains).string \ string = never.
Bool, true, false
bool \ true = false.bool \ false = true.bool \ bool = never.true \ false = true(no overlap).
Object
Foo \ Foo = never.Foo \ Bar = Foo(no overlap if neither extends the other; world-aware).Foo \ Bar = neverifFooextendsBar(every Foo is a Bar; remove all Foo).Foo \ Bar = FooifBarextendsFoo(some Foo is a Bar, but not all; we cannot easily express the difference, so conservatively keep all of Foo).
List, Array
- Sealed list / array subtract is shape-by-shape; rarely splits, mostly all-or-nothing.
- Generic list / array subtract is per-parameter.
- A sealed list minus the empty list returns the original (no overlap; sealed non-empty has no element to remove).
Mixed
mixed \ null = non-null mixed; the canonical case for nullable narrowing.mixed \ false = non-falsy mixed(truthy mixed | null).mixed \ truthy = falsy mixed.
Why subtract may produce many Elements
A single Element minus another may produce zero, one, or many Elements:
- Zero: full removal (e.g.
int(5) \ int(5)). - One: most cases (the input is unchanged or replaced with a slightly narrower form).
- Many: range punching (e.g.
int<0,10> \ int(5)), or other splitting.
A worked example
int|string minus int is string ; the Element-by-Element dispatch removes the int Element entirely and leaves string untouched (disjoint from int).
int<0, 10> minus int(5) is int<0, 4> | int<6, 10> ; range punching produces a two-Element result.
bool minus true is false ; the bool family has an exact rule.
When subtract is conservative
PHP types are infinite in some axes (literal strings, literal ints) and finite in others (booleans, enum cases). Subtract is exact on the finite cases (because the kind system can express the complement) and conservative on the infinite cases:
string \ "foo" = string(we can't express "every string except 'foo'" precisely, so we keepstringand lose nothing soundness-wise).int<0,10> \ "foo" = int<0,10>(no overlap; nothing to remove).
Properties
The laws chapter checks:
- Bound: $(\tau \setminus \sigma) \mathrel{<:} \tau$ for every $\tau, \sigma$.
- Disjoint after: $(\tau \setminus \sigma) \sqcap \sigma \equiv \bot$ when the subtract is exact (and is "best-effort disjoint" when it is conservative).
- Identity: $\tau \setminus \bot \equiv \tau$.
- Annihilator: $\bot \setminus \sigma \equiv \bot$.
- Self: $\tau \setminus \tau \equiv \bot$ (when the subtract can express the full removal).
The Bound property is the most important: subtract must never produce a larger type than the input. If subtract returns $T$ such that $T$ is not a subtype of $\tau$, that is a soundness violation.
See also: meet for the operation that uses subtract internally for negation; narrow for the assertion-aware operation that often boils down to subtract; laws for the algebraic checks.
Narrowing under assertions
The narrow operation $\mathit{narrow}(\tau, \pi)$ refines a type $\tau$ by an assertion $\pi$. PHP-side: the type after a control-flow assertion fires — the type of $x after if ($x instanceof Foo), after if (is_int($x)), after if ($x !== null), and so on.
What "narrow" means
The analyser observes some assertion that holds about a value of type $\tau$. It encodes the assertion as a type $\pi$ — the set of values that satisfy the assertion. Narrow returns the type whose values are exactly the values of $\tau$ that satisfy $\pi$.
For most assertions, narrow is just meet ($\tau \sqcap \pi$). The cases where narrow does extra work:
- Negative assertions that can be expressed as a positive assertion are recognised and routed to subtract instead of meet ($\tau$ is
int|stringand the assertion is "not int" → subtractintfrom $\tau$). - Mixed-axis assertions (truthiness, non-null, non-empty) propagate the axis even when the input is not
mixed, narrowing kinds on the way (mixedaftertruthybecomestruthy mixed;int|nullafternon-nullbecomesint). - Unsound input cases are detected and the lattice avoids producing nonsense (narrowing
Foobyintisnever, notFoo & int).
Operation summary
The relationship to the other operations:
| Assertion shape | Operation |
|---|---|
| Positive: "value satisfies $\pi$" | $\tau \sqcap \pi$ (meet) |
| Negative: "value does not satisfy $\pi$" | $\tau \setminus \pi$ (subtract) |
| Mixed-axis (truthy, non-null, etc.) | meet with axis-propagation rules |
Class assertion (instanceof Foo) | meet, with class-hierarchy rules |
Equality assertion ($x === 5) | meet with the singleton type |
In the analyser layer, the assertion-extraction code chooses the right form. narrow is the operation that runs after the choice is made.
Worked examples
Nullable narrowing
function f(?int $x): int {
if ($x !== null) {
return $x; // here $x has type int (not int|null)
}
return 0;
}
The assertion is non-null mixed. Meet of int|null with non-null mixed — int satisfies non-null (kept), null does not (dropped). Result: int.
Boolean narrowing
function f(bool $x): true {
if ($x === true) {
return $x; // $x has type true
}
throw new RuntimeException();
}
The assertion is true. Meet of bool with true is true (the literal-true Element).
Class narrowing
function f(object $x): Foo {
if ($x instanceof Foo) {
return $x; // narrows to Foo
}
throw new RuntimeException();
}
The assertion is Foo. Meet of object (the unconstrained object) with Foo is Foo.
Class anti-narrowing
function f(Foo|Bar $x): Bar {
if (!($x instanceof Foo)) {
return $x; // subtracts Foo, leaves Bar
}
throw new RuntimeException();
}
The negative assertion subtracts Foo from Foo|Bar, leaving Bar.
Truthiness narrowing
/**
* @return non-empty-string
*/
function f(int|string $x): string {
if (is_string($x) && strlen($x) > 0) {
return $x;
}
throw new RuntimeException();
}
Two narrowings: first is_string narrows to string; then strlen > 0 narrows to non-empty-string.
The is_string assertion is the type string; meet of int|string with string is string. The strlen > 0 assertion translates to a mixed-axis assertion that the value is non-empty-string; meet with string produces non-empty-string.
Negative assertion handling
The lattice recognises a few negative-assertion patterns and routes them to subtract:
- $\mathit{narrow}(\tau, \neg \sigma)$ → $\tau \setminus \sigma$.
- $\mathit{narrow}(\tau, \sigma)$ where $\sigma$ is
non-null mixed→ meet with the axis (no different from positive narrowing). - $\mathit{narrow}(\tau, \sigma)$ where $\sigma$ is the empty type →
never(a no-overlap assertion narrows to bottom).
The analyser layer typically constructs the assertion type with the negation already in place, and narrow does the right thing.
Mixed-axis propagation
When the assertion is a narrowed mixed, the lattice propagates the axes through the input's kinds:
- non-null axis: drop any input Element that is
nullorvoid. Keep everything else. - truthy axis: keep input Elements that structurally guarantee truthy (objects, resources, non-zero literals, non-empty truthy strings, non-empty arrays/lists). For ambiguous Elements (general
int, generalstring), narrow them:int→int<-∞,-1> | int<1,∞>,string→non-empty truthy string. - falsy axis: keep
null,void,false,int(0),float(0.0),"","0", empty arrays/lists. For ambiguous Elements, drop or narrow. - empty: same as falsy.
- isset-from-loop: only propagates between
mixedElements; non-mixedinputs lose this axis.
Decomposition of narrowed mixed
A mixed-axis narrowing may produce multiple Elements:
int $\sqcap$ truthy mixed→int<-∞,-1> | int<1,∞>(the0is excluded).string $\sqcap$ falsy mixed→"" | "0"(only the falsy strings remain).
The result is still one type, just with multiple Elements.
A worked example: chained narrowings
function classify(mixed $x): string {
if (is_int($x)) { return "int"; }
if (is_string($x) && $x !== "") { return "non-empty string"; }
if (is_array($x) && count($x) > 0) { return "non-empty array"; }
return "other";
}
The analyser narrows $x differently in each branch:
- After
is_int:int. - After
is_string && != "":non-empty-string. - After
is_array && count > 0:non-empty-array<array-key, mixed>. - The "else" branch sees the negation of each prior assertion:
mixed \ int \ non-empty-string \ non-empty-array<array-key, mixed>.
Each narrowing is a narrow call in the analyser; the result feeds the next.
Visualising narrow
flowchart LR
Input["Input τ"] --> Op[narrow]
Assertion["Assertion π"] --> Op
Op --> Result["τ ∩ π<br/>(or τ \ π for negative<br/>assertions)"]
Properties
The laws chapter checks:
- Bound: $\mathit{narrow}(\tau, \pi) \mathrel{<:} \tau$.
- Reflexivity: $\mathit{narrow}(\tau, \tau) \equiv \tau$.
- Annihilator: $\mathit{narrow}(\tau, \bot) \equiv \bot$.
- Identity (top assertion): $\mathit{narrow}(\tau, \top) \equiv \tau$.
- Idempotence: $\mathit{narrow}(\mathit{narrow}(\tau, \pi), \pi) \equiv \mathit{narrow}(\tau, \pi)$.
- Soundness vs meet: $\mathit{narrow}(\tau, \pi) \mathrel{<:} (\tau \sqcap \pi)$ when $\pi$ is the positive form ; the narrow result is at most as large as the meet result, sometimes smaller because the axis-propagation rules tighten further.
See also: meet for the operation narrow specialises; subtract for the negative-assertion path; refines and overlaps for the underlying relations; laws for the soundness checks.
Soundness: the algebraic laws
The lattice operations form an algebra. There are identities every implementation of meet, join, subtract, narrow, refines, and overlaps must satisfy. Suffete checks them on every CI run, on a property-test battery generating thousands of distinct type triples per case.
This chapter lists the laws, explains what each one rules out, and shows the test infrastructure that runs them.
Why laws matter
A lattice implementation that fails one of these is not just imprecise — it is wrong. Downstream analysers will report bugs that aren't there, miss bugs that are, or flip-flop on the same code depending on which order the analyser asks questions. Worse, the bugs are not local: a bug in meet shows up as wrong output in unrelated parts of the analyser that happened to call meet.
Suffete catches them by treating the laws as proofs that must hold on every input. The property tests generate random types and check the laws on them; a single failure stops the build.
The laws
Idempotence
| Operation | Law |
|---|---|
| meet | $\tau \sqcap \tau \equiv \tau$ |
| join | $\tau \sqcup \tau \equiv \tau$ |
| narrow | $\mathit{narrow}(\tau, \tau) \equiv \tau$ |
A type combined with itself is itself. Failure means: meet/join/narrow is producing a type that is structurally different from the input despite the inputs being equal.
Commutativity
| Operation | Law |
|---|---|
| meet | $\tau \sqcap \sigma \equiv \sigma \sqcap \tau$ |
| join | $\tau \sqcup \sigma \equiv \sigma \sqcup \tau$ |
| overlaps | $\mathit{overlaps}(\tau, \sigma) = \mathit{overlaps}(\sigma, \tau)$ |
The order of arguments to a symmetric operation doesn't matter. Failure means: the operation is privileging the left or right argument.
Associativity
| Operation | Law |
|---|---|
| meet | $(\tau \sqcap \sigma) \sqcap \rho \equiv \tau \sqcap (\sigma \sqcap \rho)$ |
| join | $(\tau \sqcup \sigma) \sqcup \rho \equiv \tau \sqcup (\sigma \sqcup \rho)$ |
Grouping doesn't matter. Failure means: the canonicalisation step is missing some pair.
Identity
| Operation | Law |
|---|---|
| meet | $\tau \sqcap \top \equiv \tau$ |
| join | $\tau \sqcup \bot \equiv \tau$ |
| subtract | $\tau \setminus \bot \equiv \tau$ |
| narrow | $\mathit{narrow}(\tau, \top) \equiv \tau$ |
The top is the meet identity, the bottom is the join identity. Subtracting nothing leaves the input alone. Narrowing by no assertion leaves the input alone.
Annihilator
| Operation | Law |
|---|---|
| meet | $\tau \sqcap \bot \equiv \bot$ |
| join | $\tau \sqcup \top \equiv \top$ |
| subtract | $\bot \setminus \sigma \equiv \bot$ |
| narrow | $\mathit{narrow}(\tau, \bot) \equiv \bot$ |
| narrow | $\mathit{narrow}(\bot, \pi) \equiv \bot$ |
Bottom annihilates meet; top annihilates join. Subtracting from bottom leaves bottom. Narrowing by an impossible assertion (or narrowing the empty type) yields bottom.
Absorption
| Operation pair | Law |
|---|---|
| meet/join | $\tau \sqcap (\tau \sqcup \sigma) \equiv \tau$ |
| join/meet | $\tau \sqcup (\tau \sqcap \sigma) \equiv \tau$ |
The "lattice absorption" identities. These tie meet and join together: a value that is a meet of $\tau$ and a join with $\tau$ is just $\tau$. Failure means: the operations are defined inconsistently with each other.
Bounds (the GLB and LUB properties)
| Operation | Law |
|---|---|
| meet (GLB) | For every $\rho$ such that $\rho \mathrel{<:} \tau$ and $\rho \mathrel{<:} \sigma$, $\rho \mathrel{<:} (\tau \sqcap \sigma)$. |
| join (LUB) | For every $\rho$ such that $\tau \mathrel{<:} \rho$ and $\sigma \mathrel{<:} \rho$, $(\tau \sqcup \sigma) \mathrel{<:} \rho$. |
Meet is the greatest lower bound — any other lower bound refines it. Join is the least upper bound — it refines any other upper bound. These are the soundness interlocks between meet/join and refines.
Failure of the GLB property means: the lattice is computing a meet that is not actually the greatest lower bound — it's leaving a smaller type that still bounds both inputs from below. Imprecise but sound.
Failure in the other direction — meet returning something that is not a lower bound — would be a soundness violation. The bound checks both ways.
Subsumption interlock
| Law |
|---|
| $\tau \mathrel{<:} \sigma \iff \tau \sqcap \sigma \equiv \tau$ |
| $\tau \mathrel{<:} \sigma \iff \tau \sqcup \sigma \equiv \sigma$ |
Subtype is equivalent to "meet is unchanged" or "join is unchanged". A failure means: refines is disagreeing with what meet/join compute, which is a global inconsistency.
Subtract bound
| Law |
|---|
| $(\tau \setminus \sigma) \mathrel{<:} \tau$ |
Subtract never produces a larger type than the input. Failure is a soundness violation — the analyser would think a value has more refinement than it does.
Subtract disjoint after
| Law (when subtract is exact) |
|---|
| $(\tau \setminus \sigma) \sqcap \sigma \equiv \bot$ |
After removing $\sigma$ from $\tau$, what remains is disjoint from $\sigma$. This holds when subtract is exact (the kind system can express the complement); when subtract is conservative, the law holds in the weaker form "approximately disjoint" and is checked with a is_uninhabited or overlaps == false proxy.
Refines/overlaps interlock
| Law |
|---|
| $\tau \mathrel{<:} \sigma$ and $\tau$ inhabited implies $\mathit{overlaps}(\tau, \sigma)$ |
Subtype implies overlap (assuming the subtype is non-empty). Failure means: refines is saying yes while overlaps is saying no, which is a contradiction.
Narrow vs meet
| Law (positive assertions) |
|---|
| $\mathit{narrow}(\tau, \pi) \mathrel{<:} (\tau \sqcap \pi)$ |
Narrow's positive form is at least as tight as meet. Often equal; sometimes strictly tighter (the axis-propagation rules of narrow do work meet doesn't).
How they're checked
A property-test battery generates random worlds (with random class hierarchies and template parameters) and random types exercising every Element kind family, then asserts every law on every triple. Failures shrink to a minimal counter-example and are recorded so future CI runs re-check the same input.
Two batteries cover the law set:
- The pair battery runs every pair-shaped law on $(\tau, \sigma)$. Idempotence, commutativity, identity, annihilator, GLB/LUB, subsumption interlock, subtract bound, refines/overlaps interlock.
- The triple battery runs the triple-shaped laws on $(\tau, \sigma, \rho)$. Associativity (meet, join), absorption.
Each property-test case calls both batteries on the generated triples.
Why laws beat tests
Hand-written tests check what the author thought to test. The author writes "meet of int and string is never" because that's the obvious case. They miss the edge cases — the nested intersection with a negated head, the sealed shape with an optional uninhabited entry, the narrowed-mixed with the truthy axis interacting with a literal float.
Property tests check what the laws require, and the random generator finds the cases the author didn't think of. A single CI run on a thousand random triples covers more ground than a year of manually-written test cases.
The trade-off is that property tests find counter-examples but don't tell you why. The investigation work is on the human. The shrunk inputs and an instrumented path through the lattice (showing which rule fired on which Element pair) keep that investigation short.
See also: refines, overlaps, meet, join, subtract, narrow — the operations the laws are about.
Template parameters in depth
The universe chapter on templates covered the template-parameter Element kind. This chapter covers what happens with template parameters: how a class's parameters are declared, what the analyser must register about them, and how the lattice consults that information.
Declaration
A class with template parameters is declared in PHP via PHPDoc:
/**
* @template T
* @template-covariant V
* @template K of array-key
* @template U of Iterator = ArrayIterator
*/
class Box {
// ...
}
The analyser parses each @template line and registers, for each parameter:
- A name (
T,V,K,U). - A defining entity (the class
Box). - An upper bound (the
of Xclause; defaults tomixed). - A variance (covariant, contravariant, invariant; defaults to invariant unless declared otherwise).
- An optional default (the
= Xclause; used when the user supplies fewer arguments than declared).
Suffete itself does not store this information. The analyser registers it with its world implementation, and the lattice queries the world when it needs a parameter's variance, upper bound, or default.
What the world supplies
The world tells the lattice three things about a class's template parameters:
- The arity (how many
@templatelines the class declares). - For each position, the parameter's variance, upper bound, and default.
- For each (descendant, ancestor, position), the type the descendant supplies to the ancestor's parameter at that position.
The third one is what makes inheritance work: see specialise.
Instantiation
A use-site instantiation Box<int, string> is a named-object Element that carries its type arguments in declaration order. The first argument fills T, the second fills V. If the class declares more parameters than supplied, the missing ones are filled from the upper bound (or the declared default) and the type is flagged as having received a template default ; the lattice tolerates the default at variance check time.
Inheritance and parameter mapping
When class Bag<X> extends Box<X, int>:
Bag'sXcorresponds toBox'sT.Box'sVis bound tointfromBag's perspective.
The lattice uses this when checking Bag<string> refines Box<string, int>: it asks the world what Bag supplies to Box's parameters, substitutes Bag's actual arguments through, and compares positionally with Box's declared variance.
The full algorithm is in specialise.
Defining entities
Every template parameter is keyed by (name, defining_entity). Two parameters with the same name on the same class are the same parameter ; two with the same name on different classes are different parameters. The defining entity can be:
- A class-like (the parameter is declared on a class, interface, trait, or enum).
- A function or method.
- A closure (analyser-assigned identity).
Capture-free substitution uses the defining entity to know which parameters a substitution applies to.
Free vs bound vs partially-applied
A template-parameter Element is free until the analyser substitutes it. Three states:
- Free. The parameter appears in the type with no commitment to a value.
Box<T>::valueisT(free). The lattice can answer questions aboutTusing its constraint as an upper bound. - Bound. The parameter has been substituted. After
Box<int>::value,T := int, the field type isint, noTElement remains. - Partially applied / default-filled. The user wrote
Boxinstead ofBox<int>. The analyser fillsTwith the upper bound (mixedby default), flags the type as carrying a template default, and the lattice tolerates the default at variance check time (recording a coercion cause).
A worked example
/**
* @template T
* @template-covariant V
*/
class Map {
/** @var array<T, V> */
public array $entries = [];
/**
* @param T $key
* @return V
*/
public function get($key) { return $this->entries[$key]; }
}
/**
* @extends Map<string, mixed>
*/
class StringMap extends Map {
/** @var array<string, mixed> */
public array $entries = [];
}
Inside Map's body, the field $entries has type array<T, V>, where T and V are template-parameter Elements with defining entity Map.
Inside StringMap's body, the field $entries has type array<string, mixed> ; fully concrete because StringMap extends Map<string, mixed>.
When the analyser checks StringMap refines Map<string, mixed>:
- The lattice gets the container's parameters:
Map'sT(invariant by default) andV(covariant, declared with@template-covariant). - It asks the world what
StringMapsupplies toMap's position 0 ; the answer isstring. - It asks the same for position 1 ; the answer is
mixed. - Compare position 0 with invariant:
stringis equivalent tostring. ✓ - Compare position 1 with covariant:
mixedrefinesmixed. ✓ - Result:
StringMaprefinesMap<string, mixed>. ✓
The variance, the inheritance binding, and the per-position check are all driven by the world. Suffete itself orchestrates the dispatch.
See also: variance for the per-variance refinement rules; substitute for how
Tis replaced with a concrete type; standin for inferringTfrom call-site arguments; specialise for the inheritance-binding resolution.
Variance
The variance of a template parameter governs whether a subtype substitution at the parameter position produces a subtype of the parameterised type.
| Variance | Rule | Source syntax |
|---|---|---|
| Covariant | $\sigma \mathrel{<:} \sigma'$ implies $\mathrm{C}\langle\sigma\rangle \mathrel{<:} \mathrm{C}\langle\sigma'\rangle$ | @template-covariant T |
| Contravariant | $\sigma \mathrel{<:} \sigma'$ implies $\mathrm{C}\langle\sigma'\rangle \mathrel{<:} \mathrm{C}\langle\sigma\rangle$ | @template-contravariant T |
| Invariant | only $\sigma \equiv \sigma'$ produces $\mathrm{C}\langle\sigma\rangle \equiv \mathrm{C}\langle\sigma'\rangle$ | @template T (default) |
Why variance matters
PHP's type system without variance is rigid: Box<int> $\mathrel{<:}$ Box<mixed> only when the analyser hand-codes the rule. With variance declarations, the lattice answers correctly and uniformly:
- Read-only containers (e.g.
Iterator<T>) are naturally covariant: anIterator<int>is anIterator<mixed>(you can read ints out of it as mixeds). PHP-side:@template-covariant T. - Write-only containers (e.g. a
Sink<T>acceptingTvalues) are naturally contravariant: aSink<mixed>accepts an int (because it accepts everything). SoSink<mixed> $\mathrel{<:}$ Sink<int>. PHP-side:@template-contravariant T. - Read-and-write containers (e.g.
Box<T>with bothget(): Tandset(T): void) are invariant: changingTwould break either the getter or the setter. PHP-side:@template T(default).
How the lattice uses variance
When the lattice decides $\mathrm{C}\langle a_1, \dots, a_n\rangle \mathrel{<:} \mathrm{C}\langle b_1, \dots, b_n\rangle$ (same class on both sides), it asks the world for each parameter's variance, and compares positionally:
- Covariant: $a_i \mathrel{<:} b_i$.
- Contravariant: $b_i \mathrel{<:} a_i$.
- Invariant: $a_i \mathrel{<:} b_i$ AND $b_i \mathrel{<:} a_i$.
If every position passes, the whole refines query passes.
Variance for inherited classes
When the input is a descendant of the container's class, the lattice asks the world to resolve what the descendant supplies to each ancestor parameter. The variance check then runs on the resolved arguments.
For example, with class StringList extends ArrayList<string> and class ArrayList<T> implements Iterator<T>:
StringList <: Iterator<string>
↓
StringList → ArrayList<string> (StringList extends ArrayList<string>)
↓
ArrayList<string> → Iterator<string> (ArrayList implements Iterator<T>, T := string)
↓
Iterator<string> <: Iterator<string> ✓ (covariant or invariant, both pass)
The lattice's object family handles this chain.
Variance and substitution
Variance does not affect substitution itself ; substitution always replaces every template-parameter Element matching the substitution key, regardless of variance. Variance affects the use site: when the analyser asks "is this instance a subtype of that?", the variance is consulted to decide which direction of refines to check.
Default-filled type-args and variance
When a class declares a parameter and the user supplies fewer arguments than declared, the missing args are filled with their declared upper bounds (or mixed if undeclared). The resulting type is flagged as carrying a template default.
The variance check at refinement time consults the flag:
- A default-filled type-arg flowing into a covariant or invariant position is allowed without strict refinement; the lattice records the coercion cause so the analyser can warn.
- A default-filled type-arg flowing into a contravariant position is checked normally.
This rule prevents Box (the parameter-less reference) from being non-refining in Box<int> callsites, while still letting the analyser surface the use of an unpinned parameter in its diagnostics.
A worked example
/**
* @template-covariant T
*/
interface Iterator {
/** @return T */
public function current(): mixed;
}
/**
* @implements Iterator<int>
*/
class IntIterator implements Iterator { /* ... */ }
function f(Iterator $it): void { /* ... */ }
f(new IntIterator());
Checking IntIterator refines Iterator<numeric>:
IntIteratoris not the same class asIterator, so the lattice walks the inheritance.IntIterator implements Iterator<int>; the world resolves position 0 ofIteratorfromIntIteratortoint.- The container's
Iteratordeclares position 0 as covariant. - Covariant: check
int $\mathrel{<:}$ numeric. ✓ (intis in thenumerictrue-union). - Result:
IntIteratorrefinesIterator<numeric>. ✓
If Iterator's T were invariant, the answer would be different: int $\equiv$ numeric is false (numeric admits float and numeric-string, int does not), so the call would fail.
Bivariant / unsafe-covariant
PHP-side, neither is supported. Suffete does not have a fourth variance kind; the standard three are sufficient for what real-world PHP analysers express.
Multiple parameters
When a class has multiple parameters, each has its own variance. Map<K, V> typically declares K as invariant (because keys are used both to read and write) and V as covariant (because the user typically reads values out). The variance check runs per-position with the per-position variance.
flowchart LR
Container["Map<K, V><br/>K invariant, V covariant"] --> Pos0[K: invariant check]
Container --> Pos1[V: covariant check]
Pos0 --> Result[All positions pass → refines]
Pos1 --> Result
See also: refines for the object-family rules that consult variance; specialise for inheritance-binding resolution.
Substitution
Substitution replaces a template parameter with a concrete type. PHP-side: when the user writes Box<int> and the analyser instantiates Box's body, every T becomes int.
Capture-free
Substitution is capture-free: it walks the type tree and applies the substitution only to free parameters that match the substitution's keys, identified by (name, defining_entity). A parameter declared on a different entity but sharing the same name is not substituted.
In practice:
- A template parameter with defining entity
Boxand a substitution targetingBox::Tis substituted. - A template parameter with defining entity
Foois not substituted by the same substitution, even if it is also calledT.
This keeps substitution semantically clean even when types nest other generic uses.
How substitution walks the tree
Substitution is a structural transform. It recurses into every nested type carrier:
- An object's type arguments.
- A list's element type and known elements.
- A keyed array's key parameter, value parameter, and known items.
- An iterable's key and value types.
- An object shape's known properties.
- A callable's parameter types, return type, and throws.
- A class-like-string's constraint.
- A template parameter's constraint (the parameter Element itself is the substitution target).
- An alias or reference's type arguments.
- A conditional's subject, target, then, and otherwise branches.
- A derived type's nested types.
- A negation's inner.
- An intersection's head and conjuncts.
At each leaf (every template-parameter Element), the substitution table is consulted. If the table has a binding, the type is substituted in place; otherwise the parameter is kept.
Substitution and unions
A substitution may replace one parameter with a union type. The walker handles this correctly: substituting T := int|string into the type T|null produces int|string|null, with the substitution flat-merged into the parent union (the lattice's join is run to collapse).
Substitution into nested generic uses
When the type contains other generic uses (e.g. class-string<T> or Box<T>), substitution walks into them and replaces the substituted parameter wherever it appears. So substituting T := int into Box<T> | class-string<T> | T produces Box<int> | class-string<int> | int.
Identity short-circuit
If the substitution produces no change (no bound parameter is found in the input), substitution returns the original type ; no re-interning, no allocation. This is the common case in the analyser when a callsite happens not to bind any parameter the substitution targets.
The walker maintains this guarantee even through deep recursion: as long as no leaf changes, the parents propagate the original handles up.
Multi-step substitution
A single substitution call can carry bindings for several parameters at once, keyed by (name, defining_entity). The walker visits each parameter once and looks up its binding in the table.
A worked example
/**
* @template T
* @template V
*/
class Map {
/** @var array<T, V> */
public array $entries = [];
/**
* @param T $key
* @return V
*/
public function get($key) { /* ... */ }
}
When the analyser instantiates Map<string, int> and asks for the concrete type of $entries:
- The declared field type is
array<T, V>. - The analyser builds a substitution:
Map::T := string,Map::V := int. - Substitution walks
array<T, V>, finds the two parameter Elements, replaces them. - The result is
array<string, int>.
A different class with its own T would not be touched by this substitution, because the defining entity differs.
See also: Templates in depth for the parameter Element kind; standin for the inverse direction (collecting bounds from arguments); specialise for inheritance binding.
Inference: standin and infer
When the user calls a generic function or method without supplying explicit type arguments — id($x) rather than id<int>($x) — the analyser must infer the type arguments from the call-site arguments. Two primitives do the work:
- standin — a single inference round: walk a parameter type against an argument type, and accumulate the bounds the argument imposes on each free template parameter.
- infer — apply a fully-determined template environment to a type.
standin: one round of inference
standin is one round — one parameter, one argument, one walk. The analyser typically runs standin once per call-site argument, against the matching parameter type, before reading the accumulated environment.
The standin environment
The environment collects two kinds of bound for each template parameter:
- Lower bounds ($\sigma \mathrel{<:} T$): types the argument provided that the parameter must accept.
- Upper bounds ($T \mathrel{<:} \sigma$): types the parameter must produce that the argument is forced to expose.
The "upper" / "lower" terminology comes from the lattice direction:
graph BT
Lower["argument-side type<br/>(lower bound on T)"] --> T["T (the parameter)"]
T --> Upper["constraint type<br/>(upper bound on T)"]
A simple call id<T> invoked with id(7) accumulates a lower bound int <: T (the argument forces T to be at least int).
How standin walks
Like substitution, standin is a structural recursion. It walks the parameter type and the argument type together, in lockstep, and at each leaf — every template-parameter Element on the parameter side — it records a bound.
The recursion respects the same nesting substitution does: object type-args, list element types, callable signatures, intersections, conditionals, and so on.
When the parameter and argument structures don't align (a parameter expecting array<K, V> matched against an argument of type int), the walker silently fails to record bounds — there is no way to derive a bound from a structural mismatch. The analyser sees an empty (or partial) environment and reports a type error in its own layer.
infer: applying a determined environment
After all standin rounds complete, the analyser has an environment containing every bound. To produce the inferred type arguments, it:
- For each parameter
Twith one or more lower bounds, joins them:T := lo_1 ⊔ lo_2 ⊔ … ⊔ lo_n. - Verifies the inferred
Tagainst any upper bounds: eachT <: hi_imust hold. - Verifies the inferred
Tagainst the parameter's declared upper bound (the constraint). - Substitutes the inferred values back into the function's return type.
infer is the convenience that walks a type and substitutes every template-parameter Element using the joined lower bounds in the environment. Parameters with no lower bound are filled with their declared upper bound (or mixed).
Worked example: identity function
/**
* @template T
* @param T $x
* @return T
*/
function id($x) { return $x; }
id(7); // T inferred to int(7) (or generalised to int).
id("hello"); // T inferred to literal "hello" (or generalised to string).
id($foo); // where $foo: Foo. T inferred to Foo.
For id(7):
- The parameter type is
T; the argument type isint(7). standinwalks them together; the parameter side is theTElement, so the environment recordsT → lower bound = int(7).infersubstitutes the function's return type (T) with the joined lower bound, producingint(7).
Worked example: collection
/**
* @template T
* @param array<T> $xs
* @return T
*/
function head(array $xs) { /* ... */ }
head([1, 2, 3]); // T → int<1,3>
head(['a', 'b']); // T → 'a'|'b' (or generalised to string)
head([new Foo(), new Bar()]); // T → Foo|Bar
For head([1, 2, 3]):
- The parameter type is
array<T>; the argument type isarray<int, int<1,3>>(the literal array's inferred type). standinrecurses into the array's value parameter: parameter side isT, argument side isint<1, 3>.- The environment records
T → lower bound = int<1, 3>. infersubstitutes the return typeTwithint<1, 3>.
Worked example: multiple arguments
/**
* @template T
* @param T $a
* @param T $b
* @return array{0: T, 1: T}
*/
function pair($a, $b): array { /* ... */ }
pair(1, "hello"); // T → int(1)|"hello" (generalised to int|string)
The analyser runs standin once per argument:
- First argument: parameter
T, argumentint(1). Environment:T → [int(1)]. - Second argument: parameter
T, argument"hello". Environment:T → [int(1), "hello"].
infer joins the lower bounds, producing int(1) | "hello".
Variance and standin
Variance affects which side of the lattice direction standin records the bound on:
- Covariant position: argument is a lower bound on
T. - Contravariant position: argument is an upper bound on
T. - Invariant position: argument is both a lower and upper bound (must be exact match).
flowchart LR
Cov["param: Iterator<T> (T covariant)<br/>arg: Iterator<int>"] --> CovBound["T ≥ int (lower bound)"]
Contra["param: Sink<T> (T contravariant)<br/>arg: Sink<int>"] --> ContraBound["T ≤ int (upper bound)"]
Inv["param: Box<T> (T invariant)<br/>arg: Box<int>"] --> InvBound["T = int (both)"]
The standin walker queries the world for the variance of each parameter when descending into a parameterised type.
Why two operations
Conceptually, inference is one operation: "given parameter and arguments, return the inferred environment". The split into standin (per-argument) and infer (final synthesis) lets the analyser:
- Run multiple standin rounds across multiple arguments.
- Inject extra bounds from other sources (e.g. a hand-written
@template T of inton the call site). - Verify intermediate states (e.g. abort inference early if a contradiction is found).
- Apply the result to multiple types (the return type, the throws clause, derived types).
A subtle case: variables in arguments
Suppose the analyser is inferring inside a generic function body:
/**
* @template T
* @param T $x
* @return T
*/
function id($x) { return $x; }
/**
* @template U
* @param U $y
* @return U
*/
function wrap($y) { return id($y); }
For wrap's body call id($y):
- The argument is
$yof typeU(a template-parameter Element). - The parameter is
id::T(a different template-parameter Element). standinrecords:id::T → lower bound = U.
The lower bound is itself a template-parameter Element (since U is still free in wrap's body). The inferred return type for id($y) is U. When wrap is later invoked with a concrete argument, the chain resolves.
See also: substitute for the substitution operation
inferuses internally; specialise for inheritance-binding resolution; variance for how variance is consulted during walking.
Inheritance specialisation
When StringList extends ArrayList<string> is checked against Iterator<string>, the lattice has to compute what StringList supplies to Iterator's template parameter. The chain — StringList → ArrayList<string> → Iterator<T> (where ArrayList implements Iterator) — has to be unwound, and T has to be resolved through every step.
This is inheritance specialisation, written formally as $\mathit{specialise}(C, T, D\langle\bar\rho\rangle)$: given that descendant D is instantiated with arguments $\bar\rho$, what is the type of ancestor C's template parameter T?
What the world supplies
The lattice cannot resolve specialisation alone ; it requires codebase knowledge of which classes extend which, with what type arguments. The world answers, for any (descendant, ancestor, position), the type the descendant supplies to that parameter, expressed in the descendant's own template namespace.
When the lattice walks D <: C<\bar\rho>, it asks the world for each position of C's parameters and substitutes through.
What "in the descendant's namespace" means
The world's answer may itself contain template-parameter Elements that refer to the descendant's parameters. For example:
/**
* @template X
* @implements Iterator<X>
*/
class Bag implements Iterator { /* ... */ }
Bag says: "I implement Iterator<X>, where X is my parameter." The world's answer for Bag's contribution to Iterator's position 0 is a template-parameter Element for Bag::X.
When the lattice is checking Bag<int> <: Iterator<int>:
- Ask the world for
Bag's contribution toIterator's position 0 ; the answer isBag::X. - Substitute
Bag's actual arguments through the answer:Bag::X := int→ result isint. - Compare with the container's
int, withIterator's declared variance.
The substitution in step 2 is a substitute call with the binding Bag::X := int.
A multi-step chain
/**
* @template T
*/
interface Iterator { /* ... */ }
/**
* @template T
* @implements Iterator<T>
*/
class ArrayIterator implements Iterator { /* ... */ }
/**
* @template U
* @extends ArrayIterator<U>
*/
class TypedList extends ArrayIterator { /* ... */ }
For TypedList<string> <: Iterator<string>:
- Walk:
TypedList → Iteratoris not direct. - The world records:
TypedList extends ArrayIterator<U>. SoTypedList's contribution toArrayIterator's position 0 isTypedList::U. - The world records:
ArrayIterator implements Iterator<T>. SoArrayIterator's contribution toIterator's position 0 isArrayIterator::T. - Compose:
TypedList's contribution toIterator's position 0 is the composed result,TypedList::U. (The world walks the chain on the analyser's behalf.) - The lattice substitutes
TypedList's actual arguments:TypedList::U := string. - Result:
string. - Compare with the container's
Iterator<string>;string $\equiv$ string(or covariant), passes.
Variance through the chain
Variance is per-parameter on each ancestor. The lattice consults the variance of Iterator's T (covariant), not TypedList's U (which has its own variance, declared independently).
This is correct: when checking TypedList<X> <: Iterator<Y>, the question is whether Iterator accepts the supplied parameter at the supplied variance, not whether TypedList's parameter is somehow compatible with Iterator's.
The same-class case
When the input and container are the same class, no chain walk is needed:
Box<int> <: Box<numeric>
The lattice asks Box's parameter variance and compares positionally. With T covariant: int <: numeric ✓. With T invariant: int $\equiv$ numeric is false (numeric admits float, int does not), the refines fails.
Resolving arity differences
If an ancestor declares more parameters than the descendant, the descendant must supply values for all of them. If an ancestor declares fewer, the descendant cannot supply more.
The world handles this in its inheritance-mapping implementation; the lattice does not enforce arity here.
When the world has no mapping
If the world has no mapping for a given (descendant, ancestor, position), the lattice falls back to using the parameter's upper bound (or mixed) for that position.
This is the conservative answer: the analyser couldn't prove the inheritance, so don't enforce a tight refinement.
A worked example
/**
* @template T
* @template-covariant V
*/
interface Map {
/**
* @param T $key
* @return V
*/
public function get($key);
}
/**
* @template W
* @implements Map<string, W>
*/
class StringMap implements Map { /* ... */ }
For StringMap<int> <: Map<string, int>:
- Input class
StringMapis not the same as container classMap. - The world says
StringMapdescends fromMap. - The world supplies
StringMap's contribution toMap's position 0:string(from@implements Map<string, W>, position 0). - The world supplies
StringMap's contribution toMap's position 1:StringMap::W(theWfrom the@implementsclause). - The lattice substitutes
StringMap's actual args:StringMap::W := int. The position-1 result becomesint. - Compare positions:
- Position 0:
string $\equiv$ string(Map's T is invariant; both pass). - Position 1:
int <: int(Map's V is covariant; passes).
- Position 0:
- Result:
StringMap<int> <: Map<string, int>. ✓
Why specialisation lives outside the lattice
Specialisation requires codebase knowledge: which classes extend which, with what type arguments. The lattice itself is codebase-agnostic; it asks the world, which is the analyser's responsibility. Specialisation is the protocol between the lattice and the world for the inheritance case.
This separation is what lets the lattice's correctness be checkable in isolation: the inheritance mapping is a black-box function the world supplies, and the lattice's behaviour is correct given the world's answers. The world's answers themselves are the analyser's correctness concern.
See also: Templates in depth for the parameter Element kind; variance for the rules each position is checked under; substitute for the substitution applied during specialisation.
TypeId, ElementId, and identity
Suffete is a handle-based crate. Every interesting value — a Type, an Element, an interned string of element IDs, a parameter list, a known-items list, a defining entity — is referenced by a small NonZero handle. The actual data lives in process-global arenas. Handles are cheap to compare, hash, copy, and pass around. The arenas are append-only and never freed.
This chapter covers the four handles a user of the public API will see most.
ElementId
The interned handle to a single Element. Layout: NonZeroU32, packed as
[kind tag: 6 bits] [arena slot: 26 bits]
The high 6 bits encode the ElementKind (1..=63 ; 64 values reserved, 30+ used). The low 26 bits encode the per-kind arena slot (0..=2^26-1 ≈ 67 million, more than enough).
Two ElementIds compare equal iff they refer to the same canonical Element ; this is the interner's contract. Equality is one u32 compare; hashing is trivial.
use suffete::{ElementId, ElementKind};
// ElementId is Copy, Eq, Hash, Ord
let id: ElementId = suffete::prelude::INT;
assert_eq!(id.kind(), ElementKind::Int);
The kind() method is (id.raw() >> 26) as u8 cast through ElementKind. Constant-time, branch-free.
The view() method resolves the handle to the borrowed Element view, which carries a &'static SomeInfo for payload-bearing kinds:
use suffete::Element;
let elem: Element = id.view();
match elem {
Element::Int(info) => { /* info: &'static IntInfo */ }
Element::Object(info) => { /* info: &'static ObjectInfo */ }
// ...
_ => {}
}
For trivial kinds (no payload) the view returns a unit-like variant: Element::Null, Element::Never, Element::Mixed(...), etc.
TypeId
The interned handle to a Type: a union of Elements plus a small bag of FlowFlags plus 8 bits of caller-defined meta. Layout: NonZeroU64, packed as
[slot: 32 bits] [flags: 16 bits] [meta: 8 bits] [reserved: 8 bits]
slot— the index into the type-content arena. TwoTypeIds with the same slot share the same internedType(the same element-set).flags— theFlowFlagsbitset. Riding on the handle keeps the arena content-keyed; toggling a flag is bit-twiddling, not a re-intern.meta— 8 bits of consumer-defined storage. Suffete never inspects it. Use it for tag-style metadata (provenance enum, severity, boolean markers); for anything that needs more bits or indexes a side table, the consumer should keep their ownHashMap<TypeId, T>.reserved— reserved for future suffete use; always zero. Not exposed publicly.
Equality and hashing compare all 64 bits: t1 == t2 means same content AND same flags AND same meta.
use suffete::{TypeId, FlowFlags, prelude::TYPE_INT};
// TypeId is Copy, Eq, Hash, Ord
let t: TypeId = TYPE_INT;
assert_eq!(t.flags(), FlowFlags::EMPTY);
assert_eq!(t.meta(), 0);
For comparison ignoring flags / meta:
let t1 = ...;
let t2 = ...;
assert!(t1.content_eq(&t2)); // same elements, ignore flags/meta
For deriving related handles in O(1) without touching the arena:
let with_flag = t.with_flags(t.flags().with_from_template_default(true));
let with_meta = t.with_meta(7);
The with_* methods return new handles in O(1) ; the underlying arena entry is unchanged.
Resolving a TypeId to its content
let view = t.as_ref();
let elements: &'static [ElementId] = view.elements;
The returned &'static reference is a real 'static slice into the per-type element-list arena ; safe to hold for the lifetime of the process.
ElementListId
A handle to an interned slice &'static [ElementId]. Used as the type of intersection-conjunct lists, of any-kind element lists stored on a payload.
use suffete::ElementListId;
use suffete::interner::interner;
let elements: &[ElementId] = &[suffete::prelude::INT, suffete::prelude::STRING];
let id: ElementListId = interner().intern_element_list(elements);
let resolved: &'static [ElementId] = interner().get_element_list(id);
assert_eq!(resolved, elements);
ElementListId is also NonZeroU32. Two lists with the same content have the same ElementListId.
TypeListId
A handle to an interned slice &'static [TypeId]. Used as the type of object type-args, of derived-info type lists.
Same shape as ElementListId. Same interning guarantee.
Construction discipline
The intended pattern for building a type:
- Use
TypeBuilderto push elements (and set flags) over a sequence of mutations. - Call
build()to intern once.
The TypeBuilder chapter covers the API in detail. Direct interner calls are also possible:
let t = suffete::interner::interner().intern_type(
&[suffete::prelude::INT, suffete::prelude::STRING],
suffete::FlowFlags::EMPTY,
);
The interner handles canonicalisation (sort + dedup) and dedup against existing entries.
What guarantees handle equality
For TypeId:
- Same
(slot, flags, meta)triple → same handle. - Same content (sorted+deduped element list) + same flags + same meta → same handle (because the slot dedups by content).
For ElementId:
- Same kind + same payload → same handle.
The interner enforces both. Two TypeIds constructed at different points in time, on different threads, with the same logical inputs, will compare ==. Likewise for ElementId.
This is what makes the lattice fast: two-element comparisons are one u32 apiece. Hashing a TypeId is hashing one u64.
Lifetime guarantees
&'static [ElementId]returned from interner methods is real'static; the arena is in aOnceLockfor the process lifetime.&'static SomeInforeturned viaElement::Int(info)(etc.) is real'static.Atom(interned strings viamago_atom) is also real'static.
The arenas grow over the lifetime of the process; they never shrink. Re-interning the same value is idempotent.
A worked example
use suffete::{TypeBuilder, ElementId, prelude::{INT, STRING}};
let t1 = TypeBuilder::new().push(INT).push(STRING).build();
let t2 = TypeBuilder::new().push(STRING).push(INT).build();
let t3 = TypeBuilder::new().push(INT).push(STRING).push(INT).build();
assert_eq!(t1, t2); // sort makes order irrelevant
assert_eq!(t1, t3); // dedup makes duplicates irrelevant
assert_eq!(t1.as_ref().elements.len(), 2);
See also: Constructing types: TypeBuilder and prelude for the user-facing build API; The ElementId tag layout for the bit-level details; Interning and the arenas for how the storage works.
Constructing types: TypeBuilder and prelude
There are two routes to a TypeId:
TypeBuilder— a mutable scratch buffer. Push elements, change flags, thenbuild()to intern once. The intended path for most analyser-side type construction.prelude— well-known constants for the common cases (INT,STRING,TYPE_NEVER,TYPE_MIXED,EMPTY_ARRAY, etc.). Use these directly when applicable.
For per-Element construction (named objects, sealed shapes, callable signatures, etc.), the ElementId constructors and the interner are the underlying tools; TypeBuilder is a layer on top.
TypeBuilder
use suffete::{TypeBuilder, TypeId, ElementId, FlowFlags};
let mut b = TypeBuilder::new();
b.push(suffete::prelude::INT);
b.push(suffete::prelude::STRING);
let t: TypeId = b.build();
TypeBuilder is a Vec<ElementId> + FlowFlags plus an optional origin TypeId. Mutations are direct vec operations; build() calls the interner once at the end.
Construction
TypeBuilder::new(); // empty buffer, EMPTY flags
TypeBuilder::from_type(some_t); // start from an existing TypeId; remembers it as origin
from_type enables the origin short-circuit: if build() is reached with the buffer in the same shape and flags as the origin, it returns the original TypeId without re-interning. Useful for transforms that mostly leave the type alone.
Element mutations
b.push(elem); // append
b.extend(iter_of_elems); // extend
b.remove(elem); // remove first occurrence
b.remove_all(elem); // remove every occurrence
b.retain(|e| /* keep this? */); // arbitrary filter
b.replace(old, new); // replace first occurrence
b.map(|e| /* transform */); // in-place per-element map
b.flat_map(|e| /* expand */); // 1-to-N expansion
b.contains(elem); // O(n) lookup (SIMD-accelerated)
Mutations preserve the order they happened in. The interner sorts and dedups on build.
Flag mutations
b.set_flags(FlowFlags::EMPTY.with_from_template_default(true));
b.modify_flags(|f| f.with_from_template_default(true));
Flags are FlowFlags ; a 16-bit bitset. The most relevant in user code:
from_template_default— set by suffete when a type-arg was filled with the parameter's upper bound rather than the user's value. Used by the variance check at refinement time.
The full list is in the reference.
Build modes
let t1 = b.build(); // sort + dedup, intern.
let t2 = b.build_canonical(); // sort + dedup + apply canonicalisation rules from `join`, intern.
build()runs the interner's structural canonicalisation: sort the element list byElementId, dedup, intern. No subsumption, no range merging, no literal collapse. The result is the input verbatim, just canonicalised.build_canonical()runs the join's canonicalisation: subsumption, range merging, literal collapse, true-union dominators, etc. Equivalent tolattice::join(t1, TYPE_NEVER, ...)for anyt1. Use this when you want the analyser's "official" canonical form.
The build() path is used by the analyser when it knows the input is already canonical or when the caller doesn't want collapses (e.g. preserving int|literal-int as two distinct elements). build_canonical() is used when the analyser wants the smallest possible expression.
Origin short-circuit
let original = ...; // some existing TypeId
let t = TypeBuilder::from_type(original)
.map(|e| if e == suffete::prelude::INT { suffete::prelude::STRING } else { e })
.build();
// If the map happened to return every element unchanged, t == original.
// If anything changed, t is a new interned TypeId.
The dirty-tracking is conservative: any mutation that could have changed the buffer flips the dirty bit, even if the mutation was a no-op (e.g. remove(elem) for an elem not in the buffer). The dirty bit only affects the short-circuit; build always produces a correct result.
prelude
The prelude exposes well-known constants. Every Element kind that has a singleton trivial form has a constant; many of the common payload-bearing forms have one too.
Element constants (ElementId)
use suffete::prelude::*;
NEVER, MIXED, NULL, VOID, PLACEHOLDER, // landmarks
TRUE, FALSE, BOOL, // booleans
INT, FLOAT, STRING, NUMERIC, SCALAR, ARRAY_KEY, // scalars (unrefined)
NUMERIC_STRING, // numeric-string
NON_EMPTY_STRING, EMPTY_STRING, INT_ZERO, // common refinements + literals
OBJECT_ANY, // any object
ITERABLE_MIXED_MIXED, // iterable<mixed, mixed>
CALLABLE, // bare callable
RESOURCE, OPEN_RESOURCE, CLOSED_RESOURCE, // resources
EMPTY_ARRAY, // array{}
The complete list is in the prelude reference.
Type constants (TypeId)
The TYPE_* constants are one-element types wrapping the corresponding Element:
use suffete::prelude::*;
TYPE_NEVER, TYPE_MIXED, TYPE_NULL, TYPE_VOID,
TYPE_TRUE, TYPE_FALSE, TYPE_BOOL,
TYPE_INT, TYPE_FLOAT, TYPE_STRING, TYPE_NUMERIC, TYPE_SCALAR,
// ...
TYPE_INT is TypeBuilder::new().push(INT).build(), but pre-computed at boot time and exposed as a const. Use the TYPE_* constants when you need a TypeId for a single element ; it saves a build call.
Per-element construction
For Elements that need a payload, use the constructors on ElementId:
ElementId::int_literal(42); // int(42)
ElementId::string_literal("hello"); // literal "hello"
ElementId::int_range(IntRange::new(Some(0), Some(100))); // int<0,100>
ElementId::named_object(atom("Foo")); // Foo
ElementId::named_object_with_args(atom("Box"), &[TYPE_INT]); // Box<int>
ElementId::enum_case(atom("Status"), atom("Active")); // Status::Active
ElementId::generic_parameter(name, defining_entity, constraint);
ElementId::intersected(head, &[conjunct1, conjunct2]); // head & conjunct1 & conjunct2
For more elaborate Elements (callable signatures, sealed shapes), call the interner methods directly:
use suffete::interner::interner;
let info = ObjectShapeInfo {
known_properties: Some(...),
flags: ObjectShapeFlags::default().with_sealed(true),
};
let shape = interner().intern_object_shape(info);
The full set of interner methods is in src/interner/store.rs, generated by the element_arena_methods! macro.
A worked example
Build the PHP type non-empty-list<int|string>|null:
use suffete::{TypeBuilder, ElementId};
use suffete::prelude::{INT, STRING, NULL};
use suffete::element::payload::{ListInfo, ListFlags};
use suffete::interner::interner;
// First the inner element type: int|string
let int_or_string = TypeBuilder::new().push(INT).push(STRING).build();
// Then the list element
let list_elem: ElementId = ElementId::list(int_or_string, /* non_empty: */ true);
// Finally the union with null
let result = TypeBuilder::new()
.push(list_elem)
.push(NULL)
.build();
Three intern calls (one for the inner union's element list, one for the list payload, one for the outer union). All idempotent ; calling this code twice with the same inputs produces the same TypeId both times.
Performance notes
TypeBuilder::new()isVec::new(); no allocation until pushing.pushisVec::push; amortised O(1).build()allocates a temporary sorted vec; the cost isO(n log n)for the sort plus the interner's hash lookup. The interner has a fast path that detects already-sorted-and-unique input (viasimd::is_sorted_strict) and skips the temporary allocation entirely.- Singleton types (
build()on a one-element buffer) hit a per-Element cache and skip both sort and the dashmap lookup.
For analyser code that builds many types in a loop, reuse a single TypeBuilder across iterations:
let mut b = TypeBuilder::new();
for elem in elements_to_process {
b.push(elem);
let _ = b.build();
b = TypeBuilder::new(); // or b.set_flags(FlowFlags::EMPTY) ; both reset
}
See also: TypeId, ElementId, and identity for the handle types; Prelude constants for the full list of
prelude::*; Lattice options and reports forFlowFlags.
Predicates: is_X, contains_X, and friends
The predicates module is the place for single-call structural questions about a TypeId. Each predicate answers one question and returns a bool. They are pure — no World, no options, no report.
use suffete::predicates::{
is_int, is_string, is_truthy, contains_null, is_singleton, is_constant_foldable,
contains_template_anywhere, is_fully_resolved,
};
This chapter walks through the naming conventions, the families, and the cost model.
Naming conventions
The predicate names follow a strict scheme:
is_X(ty)— guaranteed: every Element oftyis in familyX. Conservative:falsewhen any Element is outsideX, including fornever(the all-bottom type).contains_X(ty)— possible at the top level: at least one top-level Element oftyis in familyX.is_truthy(ty)/is_falsy(ty)— every Element guaranteed truthy / falsy at runtime.could_be_truthy(ty)/could_be_falsy(ty)— at least one Element could be truthy / falsy.*_anywhere(ty)— recurses into every nested-type carrier (object args, list element types, callable signatures, etc.). Use these for "does this tree contain any unresolved Element?" or "is there a free template anywhere?".
is_X is false on never: is_int(TYPE_NEVER) = false, because the empty type contains no Int Elements (vacuously, also no non-Int Elements). The conservative reading is what most analyser callers want; if you want vacuous-true, check is_X(ty) || ty == TYPE_NEVER.
Kind-family predicates
For each PHP type family, an is_X and (where useful) a contains_X:
is_X | contains_X | Family |
|---|---|---|
is_int | contains_int | Int |
is_float | contains_float | Float |
is_string | contains_string | String |
is_bool | contains_bool | Bool, True, False |
is_null | contains_null | Null |
is_void | contains_void | Void |
is_list | — | List |
is_keyed_array | — | Array |
is_array | contains_array | Array, List |
is_iterable | contains_iterable | Iterable (the Iterable kind only ; not arrays/lists) |
is_object | contains_object | Object, Enum, ObjectShape, HasMethod, HasProperty, ObjectAny |
is_resource | contains_resource | Resource |
is_callable | contains_callable | Callable |
is_array_key | — | ArrayKey |
is_scalar | — | Scalar, Int, Float, String, Bool, True, False, ClassLikeString, Numeric, ArrayKey |
is_numeric | — | Numeric, Int, Float |
| — | contains_mixed | Mixed |
is_X is slice.iter().all(|e| e.kind() in family). contains_X is slice.iter().any(|e| e.kind() in family). Single-kind versions of these (is_int, contains_int, etc.) route through SIMD-accelerated scans; multi-kind versions use the scalar matches.
Truthiness predicates
is_truthy(ty) // every Element guaranteed truthy
is_falsy(ty) // every Element guaranteed falsy
could_be_truthy(ty) // at least one Element could be truthy
could_be_falsy(ty) // at least one Element could be falsy
The is_* variants are vacuously false for never. The could_be_* variants are also false for never (no Elements, so no possibility).
The truthiness implications are the same as the refinement axes chapter: an Object is always truthy, an empty array is always falsy, int(0) is falsy, int(7) is truthy, int<-∞,-1> | int<1,∞> is truthy, an unconstrained int is could_be both, etc.
Literal predicates
is_literal(ty) // every Element is a literal-shaped value
// (specific int/float/string literal, true, false, null, void)
is_constant_foldable(ty) // is_literal && is_singleton
// — most useful "can I constant-fold this?" check
is_literal is true for types like int(7), "foo" | "bar", true | false, int(0) | int(1). It is false for int, non-empty-string, Foo.
is_constant_foldable adds the singleton requirement: exactly one Element. Use this when the analyser wants to know whether the type can be replaced by a concrete value at this program point.
Structural predicates
is_never(ty) // ty == TYPE_NEVER
is_mixed(ty) // ty == TYPE_MIXED (vanilla, no narrowing)
is_singleton(ty) // exactly one Element
is_union(ty) // more than one Element
is_mixed is the vanilla mixed test ; it returns false for narrowed mixed variants (non-null mixed, truthy mixed, etc.). To detect any mixed variant, use contains_mixed.
Tree-walking predicates
Three predicates recurse into every nested-type carrier (using the inspect walker):
contains_mixed_anywhere(ty) // any Mixed (vanilla or narrowed) anywhere in the tree
contains_template_anywhere(ty) // any free GenericParameter anywhere
contains_placeholder_anywhere(ty) // any Placeholder anywhere
contains_unresolved_anywhere(ty) // Alias, Reference, MemberReference, GlobalReference, Conditional, Derived
is_fully_resolved(ty) // negation of contains_unresolved_anywhere
These are the predicates to call before invoking the lattice on a possibly-unresolved type. The lattice's contract is that it works on resolved inputs; the analyser checks is_fully_resolved and calls expand if not.
Cost model
Predicates are cheap. The dispatch:
- The kind-family
is_X/contains_X(single kind) variants use SIMD scans with thresholds — sub-nanosecond for short slices,O(n / lane_width)for long ones. - The multi-kind variants (
is_bool,is_object, etc.) use scalarmatches!with early exit. - The truthiness predicates iterate Elements once and dispatch per-kind. Per-kind cost is constant (one to three comparisons). Total cost is
O(n). - The literal predicates iterate Elements once with a per-kind kind-only check.
O(n). - The tree-walking
*_anywherepredicates use the inspect walker with a short-circuiting predicate. Cost isO(tree size)worst case, but typical short-circuit on the first hit givesO(small)in practice.
None of the predicates allocate. None take a &mut argument. They are safe to call from any context.
A worked example
use suffete::{TypeBuilder, prelude::{INT, STRING, NULL, TRUE}, ElementId};
use suffete::predicates::{
is_int, is_string, is_singleton, contains_null, is_constant_foldable,
is_truthy, could_be_falsy,
};
let int_or_string = TypeBuilder::new().push(INT).push(STRING).build();
assert!(!is_int(int_or_string)); // mixed kinds, not all int
assert!(!is_string(int_or_string)); // ditto
assert!(!is_singleton(int_or_string));
assert!(!contains_null(int_or_string));
let nullable_int = TypeBuilder::new().push(INT).push(NULL).build();
assert!(contains_null(nullable_int));
let lit = TypeBuilder::new().push(ElementId::int_literal(7)).build();
assert!(is_singleton(lit));
assert!(is_constant_foldable(lit));
assert!(is_truthy(lit)); // 7 is truthy
assert!(!could_be_falsy(lit)); // 7 is never falsy
let true_t = TypeBuilder::new().push(TRUE).build();
assert!(is_truthy(true_t));
When you need more than a predicate
Predicates answer single yes/no questions. For:
- "Walk the type and collect data" → use inspect.
- "Walk the type and produce a new type" → use transform.
- "Compare two types" → use the lattice operations.
See also: Inspection: walking the tree for the underlying walker the
*_anywherepredicates use; Refinement axes for the truthiness rules; Special elements for the landmark Elements (is_never,is_mixed).
Inspection: walking the tree
The inspect module is the place for deep, short-circuiting boolean queries on a TypeId. Where predicates answer single questions about top-level structure, inspect recurses into every nested-type carrier in the tree and stops the moment the answer is known.
use suffete::inspect;
inspect::any(ty, |elem| /* predicate */); // true iff at least one Element in the tree satisfies
inspect::all(ty, |elem| /* predicate */); // true iff every Element in the tree satisfies
The closure is called at every level: top-level union elements, plus every Element nested inside any payload. It is not called twice on the same Element.
What "every nested-type carrier" means
The walker descends through:
Object'stype_argsandintersections.List'selement_typeandknown_elements.Array'skey_param,value_param,known_items,intersections.Iterable'skey_type,value_type,intersections.ObjectShape'sknown_properties,intersections.HasMethod's andHasProperty'sintersections.Callable'sSignature'sparameters,return_type,throws.ClassLikeString'sOfType/Genericconstraint.GenericParameter'sconstraint.Reference'stype_argsandintersections.Conditional's 4 operands.Derived's nestedTypeIds (all 8 variants).Negated'sinner.Intersected'sheadandconjuncts.
A trivial-kind Element (no payload to recurse into) is visited once and the walker moves on.
any
pub fn any<F: FnMut(ElementId) -> bool>(ty: TypeId, mut predicate: F) -> bool;
Returns true iff some Element anywhere in ty's tree satisfies predicate. Short-circuits on the first match.
use suffete::ElementKind;
let has_object = inspect::any(ty, |e| matches!(e.kind(), ElementKind::Object | ElementKind::Enum));
all
pub fn all<F: FnMut(ElementId) -> bool>(ty: TypeId, mut predicate: F) -> bool;
Returns true iff every Element anywhere in ty's tree satisfies predicate. Short-circuits on the first failure.
all is the negation of any with a flipped predicate:
inspect::all(ty, |e| /* P(e) */) == !inspect::any(ty, |e| !/* P(e) */)
Both signatures are available because all reads more naturally for the "for-every" case.
How the recursion works
The walker is post-order. For each Element it:
- Calls the predicate on the Element itself.
- If the predicate did not short-circuit, descends into every nested
TypeIdcarrier (per the kind's payload). - For each nested
TypeId, recurses with the same predicate.
The descent is implemented as a per-kind dispatch: the walker has a descend_object, a descend_list, etc. The dispatch is exhaustive ; every payload-bearing kind has a descender, and trivial kinds skip directly to the next.
Cost
The cost of inspect::any(ty, p) is bounded by the size of ty's tree, where the tree size is the number of Element occurrences across all nested types. For a flat type like int|string, the tree size is 2. For a deeply nested generic like Map<Foo, list<array{a: int, b: ?Bar}>>, the tree size is in the tens.
The walker does not allocate. It uses the call stack for recursion and a &mut F for the closure.
The most common use of inspect::any is the *_anywhere predicates (predicates) and the equivalent custom predicates analyser code writes. Performance is rarely a concern for this module ; the closure is the cost driver.
A worked example: collecting class names
The walker is for boolean queries; collecting data requires using a RefCell or owning the closure's environment:
use std::cell::RefCell;
use suffete::{ElementKind, inspect, TypeId};
use suffete::interner::interner;
use mago_atom::Atom;
fn collect_class_names(ty: TypeId) -> Vec<Atom> {
let names = RefCell::new(Vec::new());
inspect::any(ty, |e| {
match e.kind() {
ElementKind::Object => {
names.borrow_mut().push(interner().get_object(e).name);
}
ElementKind::Enum => {
names.borrow_mut().push(interner().get_enum(e).name);
}
_ => {}
}
false // don't short-circuit; visit every Element
});
names.into_inner()
}
Returning false from the predicate keeps the walker going; the caller gets the full enumeration.
For a transformation (build a new type), use transform instead.
A worked example: finding a free template
use suffete::{ElementKind, inspect, TypeId};
use suffete::interner::interner;
use mago_atom::{Atom, atom};
fn contains_template_named(ty: TypeId, target: Atom, target_class: Atom) -> bool {
inspect::any(ty, |e| {
if e.kind() == ElementKind::GenericParameter {
let info = interner().get_generic_parameter(e);
info.name == target /* and check the defining_entity matches target_class */
} else {
false
}
})
}
Short-circuits on the first hit. If the type doesn't contain the target template anywhere, the cost is the full tree walk.
The relationship to predicates
Several predicates in the predicates module are thin wrappers around inspect::any:
contains_mixed_anywhere(ty)isinspect::any(ty, |e| e.kind() == ElementKind::Mixed).contains_template_anywhere(ty)isinspect::any(ty, |e| e.kind() == ElementKind::GenericParameter).contains_placeholder_anywhere(ty)isinspect::any(ty, |e| e.kind() == ElementKind::Placeholder).
Use the predicate when you want the standard variant. Use inspect::any directly when you have a custom condition.
A subtle case: visiting the same Element twice
The walker is non-deduplicating. If two distinct nested positions reach the same ElementId (because the same Element appears in multiple places in the tree), the predicate is called for each occurrence, not just once per ElementId.
let int_t = TypeBuilder::new().push(suffete::prelude::INT).build();
let nested = TypeBuilder::new()
.push(ElementId::keyed_array(int_t, int_t)) // array<int, int>
.build();
let mut count = 0;
inspect::any(nested, |_| { count += 1; false });
// count == 3: the array Element + the two int Elements (one for k, one for v)
This is usually what you want; if you need uniqueness, accumulate into a HashSet<ElementId> in the closure's environment.
See also: Predicates for the standard wrappers; Transformation for the dual operation that produces a new type; Element kinds for the exhaustive list of payload-bearing kinds the walker descends into.
Transformation: map, flat_map, filter
The transform module is the place for building a new type from an existing one by applying a closure at every Element position. Where inspect recurses with a bool-returning predicate, transform recurses with an outcome-returning closure that decides whether to keep, replace, expand, or drop each Element.
use suffete::transform;
let result = transform::map(ty, |elem| /* new ElementId */);
let result = transform::flat_map(ty, |elem| /* iterator of ElementId */);
let result = transform::filter_map(ty, |elem| /* Option<ElementId> */);
let result = transform::filter(ty, |elem| /* bool */);
Each entry-point shape is a thin wrapper over the same underlying walker, which is post-order and recurses through every nested-type carrier (the same set as inspect).
The four entry points
map
pub fn map<F>(ty: TypeId, f: F) -> TypeId
where F: FnMut(ElementId) -> ElementId;
Apply f at every Element position; replace each in place. The returned type has the same shape as the input but with each Element substituted.
// Replace every `int` with `int<0, ∞>` (positive ints).
let result = transform::map(ty, |elem| {
if elem == suffete::prelude::INT {
ElementId::int_range(IntRange::new(Some(0), None))
} else {
elem
}
});
flat_map
pub fn flat_map<I, F>(ty: TypeId, f: F) -> TypeId
where I: IntoIterator<Item = ElementId>, F: FnMut(ElementId) -> I;
Apply f at every Element position; expand each into zero or more new Elements. Useful when one Element decomposes into a union (e.g. an integer range split).
// Decompose every `int` into `int<-∞,-1> | int(0) | int<1,∞>`.
let result = transform::flat_map(ty, |elem| {
if elem == suffete::prelude::INT {
vec![negative_int, zero, positive_int]
} else {
vec![elem]
}
});
filter_map
pub fn filter_map<F>(ty: TypeId, f: F) -> TypeId
where F: FnMut(ElementId) -> Option<ElementId>;
The combination of map and filter: Some(elem) keeps with replacement, None drops.
// Drop every Mixed Element; keep the rest unchanged.
let result = transform::filter_map(ty, |elem| {
if elem.kind() == ElementKind::Mixed { None } else { Some(elem) }
});
filter
pub fn filter<F>(ty: TypeId, f: F) -> TypeId
where F: FnMut(ElementId) -> bool;
Keep an Element iff f returns true. Equivalent to filter_map with if f(elem) { Some(elem) } else { None }, but reads cleaner for the common case.
let result = transform::filter(ty, |elem| elem.kind() != ElementKind::Null);
// result is `ty` with all Null Elements removed
How the walker works
The walker is post-order:
- For each Element at the current level, recurse into every nested
TypeIdcarrier (per the kind's payload), transforming each via the same closure. - If any nested
TypeIdchanged, re-intern the Element with the rebuilt payload. - Run the closure on the (possibly rebuilt) Element. The closure decides drop / replace / expand / leave.
Each level commits with a single intern_type call. Nothing is interned redundantly between levels.
Identity short-circuit
If the closure returns the input Element unchanged (or Some(input) for filter_map, or true for filter) at every level, the walker returns the original TypeId without re-interning. This is the common case when the transform happened to leave the type alone.
The closure can express this naturally:
transform::map(ty, |elem| {
if elem == suffete::prelude::INT { suffete::prelude::INT } else { elem }
});
// Returns ty unchanged; walker detects no leaf changed.
Recursion stops at non-payload types
The walker descends into every payload-bearing kind, but the closure is not called on the inner TypeIds themselves ; the walker substitutes those recursively using the same closure. The closure sees only ElementIds.
For example, array<int, string> is one Element of kind Array with payload KeyedArrayInfo { key_param: TypeId(int), value_param: TypeId(string) }. The walker:
- Recurses into
TypeId(int), calling the closure on theIntElement. - Recurses into
TypeId(string), calling the closure on theStringElement. - If either changed, rebuilds the
ArrayElement with the new key/value. - Calls the closure on the
ArrayElement. - The closure decides what to do with the
ArrayElement.
The closure can replace the entire Array Element with something else, or replace just the key parameter (by examining e.kind() and the kind's payload), or leave it alone.
A worked example: substitute a class name
Substitute every reference to class OldName with class NewName throughout a type:
use suffete::{ElementId, ElementKind, transform};
use suffete::interner::interner;
use mago_atom::{atom, Atom};
fn rename_class(ty: TypeId, from: Atom, to: Atom) -> TypeId {
transform::map(ty, |elem| {
match elem.kind() {
ElementKind::Object => {
let info = interner().get_object(elem);
if info.name == from {
let mut new_info = *info;
new_info.name = to;
interner().intern_object(new_info)
} else {
elem
}
}
_ => elem,
}
})
}
The walker handles the recursion into Object's type_args, and into nested arrays/lists/iterables/etc. The closure only handles the leaf decision.
A worked example: drop nullability
use suffete::{ElementKind, transform};
fn drop_null(ty: TypeId) -> TypeId {
transform::filter(ty, |elem| elem.kind() != ElementKind::Null)
}
The walker descends into every nested type and applies the filter. A nested ?int (i.e. int|null) becomes int. A nested array<int, ?Foo> becomes array<int, Foo>.
When to use transform vs lattice vs template::substitute
transformwhen the change is structural and Element-by-Element. Useful for renames, drops, kind-substitutions.lattice::*when the question is set-theoretic (subtype, intersection, union, difference). The lattice handles the canonicalization rules;transformdoes not.template::substitutewhen the change is parameter-driven: replacing a freeGenericParameterwith a concrete type.substituteistransform::mapspecialised to the parameter case.
Cost
Same as inspect: bounded by the tree size. The interner cost dominates when the transform actually changes things ; one intern call per level that has at least one changed Element. The identity short-circuit makes no-op transforms free.
See also: Inspection for the dual (boolean queries); Substitute for the parameter-substitution variant; TypeBuilder for an alternative when you want a flat, non-recursive build.
Hierarchy and World
Suffete is codebase-agnostic. It does not know which classes the user has declared, what they extend, what methods and properties they declare, what template parameters they have, what their declared variances are, or what type aliases the user has defined. All of that knowledge lives in the analyser and is exposed to suffete through one trait: World.
use suffete::world::World;
This chapter covers the trait surface and the conventions analyser implementations should follow.
The trait
World is a query interface. Every method takes a &self and returns a value (or None). Suffete never mutates the world; it only reads.
Sketch of the surface (the actual trait has more methods; this is the most-used subset):
pub trait World: Sized + Sync {
// Inheritance
fn descends_from(&self, descendant: Atom, ancestor: Atom) -> bool;
fn inherited_template_argument(
&self,
descendant: Atom,
ancestor: Atom,
position: usize,
) -> Option<TypeId>;
// Class properties and methods
fn class_has_method(&self, class: Atom, method: Atom) -> bool;
fn class_has_property(&self, class: Atom, property: Atom) -> bool;
fn class_property_type(&self, class: Atom, property: Atom) -> Option<TypeId>;
fn class_constant_type(&self, class: Atom, constant: Atom) -> Option<TypeId>;
// Class-like classification
fn is_final(&self, class_like: Atom) -> bool;
fn is_interface(&self, class_like: Atom) -> bool;
fn is_trait(&self, class_like: Atom) -> bool;
fn is_enum(&self, class_like: Atom) -> bool;
// Templates
fn template_parameter_arity(&self, class_like: Atom) -> usize;
fn template_parameter_at(&self, class_like: Atom, position: usize)
-> Option<TemplateParameter>;
fn template_parameter_index(&self, class_like: Atom, name: Atom)
-> Option<usize>;
// Enums
fn enum_backing(&self, enum_name: Atom) -> Option<EnumBacking>;
// Aliases
fn resolve_alias(&self, name: Atom) -> Option<TypeId>;
// ... and more
}
The exact method list lives in src/world/mod.rs.
What suffete asks the world
The lattice's family rules consult the world wherever it cannot answer from the type alone:
-
Object family (
refines,meet,overlaps):descends_from(D, C)for inheritance checks.inherited_template_argument(D, C, i)for variance through inheritance.template_parameter_at(C, i)for declared variance.class_has_method(C, m)forFoo <: has-method<m>.class_has_property(C, p)forFoo <: has-property<p>.class_property_type(C, p)for shape-vs-named subtyping.is_final(C)for finality-aware refines and meet.
-
Enum family:
enum_backing(E)for the structural shape ofStatus::Active'svalueproperty.
-
Class-like-string family:
descends_from(C, D)forclass-string<C> <: class-string<D>.is_interface(C),is_enum(C),is_trait(C)for the kind-axis check.
-
Conditional and Derived expansion (in
expand):class_constant_type(C, K)forMemberReference.resolve_alias(name)forAlias.- The full set of derived-type queries.
The NullWorld
Suffete ships a trivial implementation: NullWorld. It returns:
descends_from: onlydescendant == ancestor.class_has_method,class_has_property: alwaysfalse.- All the others:
Noneor zero or empty.
NullWorld is useful for:
- Examples and tests where the analyser's codebase model is irrelevant.
- The lattice's join canonicalisation, which uses
NullWorldfor the structural-only subsumption check (so thatint|floatis not collapsed tofloatvia PHP runtime coercion).
use suffete::world::NullWorld;
let world = NullWorld;
Implementing World
The analyser implements World by reading from its codebase model. A sketch:
use suffete::world::{World, TemplateParameter, EnumBacking, Variance};
use suffete::TypeId;
use mago_atom::Atom;
pub struct AnalyserWorld {
classes: HashMap<Atom, ClassInfo>,
aliases: HashMap<Atom, TypeId>,
// ... other tables ...
}
impl World for AnalyserWorld {
fn descends_from(&self, d: Atom, a: Atom) -> bool {
if d == a { return true; }
let mut current = self.classes.get(&d);
while let Some(info) = current {
if info.parents.contains(&a) { return true; }
current = info.parents.first().and_then(|p| self.classes.get(p));
}
false
}
fn class_has_method(&self, c: Atom, m: Atom) -> bool {
self.classes.get(&c)
.map(|info| info.methods.contains_key(&m))
.unwrap_or(false)
}
fn template_parameter_at(&self, c: Atom, i: usize) -> Option<TemplateParameter> {
let info = self.classes.get(&c)?;
info.template_parameters.get(i).copied()
}
// ... and so on ...
}
The implementation can use any storage strategy ; HashMap, indexed slices, B-trees, lazy-loading from disk. Suffete only cares about the answers.
Performance contract
Suffete calls World methods frequently during lattice operations on object-family types. A refines(IntList, Iterable<int>) involves at least:
- One
descends_from(to confirm IntList implements Iterable). - One
inherited_template_argument(to get IntList's contribution to Iterable's parameter). - One
template_parameter_at(for the declared variance).
For an analyser checking thousands of refines calls per file, a slow World is a bottleneck. The conventions:
descends_fromshould be O(1) amortised. Pre-compute the transitive closure of inheritance; cache the answer.class_has_methodandclass_has_propertyshould be O(1). Use a HashMap.inherited_template_argumentshould be O(1) amortised. Pre-compute the inheritance bindings.template_parameter_atshould be O(1). Index by position into a small vector.
The analyser pays the up-front cost when it ingests the codebase; suffete pays the per-query cost on every lattice call.
Threading
The trait requires Sync. Lattice operations are pure functions and can be called from multiple threads simultaneously, all sharing one &World. The analyser's implementation must allow concurrent reads (typical: read-only after ingestion).
A worked example: descendant lookup
use suffete::world::World;
use suffete::{TypeBuilder, ElementId};
use mago_atom::atom;
struct DemoWorld;
impl World for DemoWorld {
fn descends_from(&self, d: mago_atom::Atom, a: mago_atom::Atom) -> bool {
// For demo: class B descends from class A; everything else is itself only.
if d == a { return true; }
if d.as_str() == "B" && a.as_str() == "A" { return true; }
false
}
// ... other methods stub out to None / false / 0 ...
fn template_parameter_arity(&self, _: mago_atom::Atom) -> usize { 0 }
fn template_parameter_at(&self, _: mago_atom::Atom, _: usize)
-> Option<suffete::world::TemplateParameter> { None }
fn template_parameter_index(&self, _: mago_atom::Atom, _: mago_atom::Atom)
-> Option<usize> { None }
fn inherited_template_argument(&self, _: mago_atom::Atom, _: mago_atom::Atom, _: usize)
-> Option<suffete::TypeId> { None }
fn class_has_method(&self, _: mago_atom::Atom, _: mago_atom::Atom) -> bool { false }
fn class_has_property(&self, _: mago_atom::Atom, _: mago_atom::Atom) -> bool { false }
fn class_property_type(&self, _: mago_atom::Atom, _: mago_atom::Atom) -> Option<suffete::TypeId> { None }
fn class_constant_type(&self, _: mago_atom::Atom, _: mago_atom::Atom) -> Option<suffete::TypeId> { None }
fn is_final(&self, _: mago_atom::Atom) -> bool { false }
fn is_interface(&self, _: mago_atom::Atom) -> bool { false }
fn is_trait(&self, _: mago_atom::Atom) -> bool { false }
fn is_enum(&self, _: mago_atom::Atom) -> bool { false }
fn enum_backing(&self, _: mago_atom::Atom) -> Option<suffete::world::EnumBacking> { None }
fn resolve_alias(&self, _: mago_atom::Atom) -> Option<suffete::TypeId> { None }
}
let world = DemoWorld;
let class_a = TypeBuilder::new().push(ElementId::named_object(atom("A"))).build();
let class_b = TypeBuilder::new().push(ElementId::named_object(atom("B"))).build();
let mut report = suffete::lattice::LatticeReport::new();
let opts = suffete::lattice::LatticeOptions::default();
assert!(suffete::lattice::refines(class_b, class_a, &world, opts, &mut report));
Hierarchy module
Suffete ships a hierarchy module with helpers for managing class-hierarchy data structures. It's a convenience for analyser implementations that don't already have one. See src/hierarchy/mod.rs for the API.
The hierarchy module is not a World implementation by itself ; it's a kit you can compose into your World. Most analysers will have their own.
See also: Templates in depth for the template-related world methods; Specialise for the inheritance-binding protocol; Expand for the resolution methods on the world.
Casting and runtime compatibility
Two related but distinct operations live in their own modules:
cast— produces a new type that the source type would coerce to in a given target context.compatibility— answers a boolean: does the source type, in any of its parts, share runtime compatibility with the target?
Both exist because PHP's runtime has rules that the static lattice doesn't fully capture. int does not refine float in the static sense (int(0) is not a float), but int can be passed to a float parameter (PHP coerces). The cast and compatibility operations let the analyser model those.
cast
use suffete::cast;
let result: TypeId = cast::to(input, target_kind, &world, options);
The exact API has multiple cast variants depending on the target:
cast::to_int(ty)— what is the type after(int)$x?cast::to_float(ty)— what is the type after(float)$x?cast::to_string(ty)— what is the type after(string)$x?cast::to_bool(ty)— what is the type after(bool)$x?cast::to_array(ty)— what is the type after(array)$x?
Each follows PHP's runtime cast semantics. The result is typically a single Element of the target kind, with as much refinement as can be preserved (e.g. casting int(0) to bool gives false; casting int<-∞,-1>|int<1,∞> to bool gives true).
use suffete::{TypeBuilder, prelude::{INT, INT_ZERO}};
let zero = TypeBuilder::new().push(INT_ZERO).build();
let one = TypeBuilder::new().push(suffete::ElementId::int_literal(1)).build();
let any_int = TypeBuilder::new().push(INT).build();
let zero_bool = cast::to_bool(zero); // false
let one_bool = cast::to_bool(one); // true
let any_bool = cast::to_bool(any_int); // bool (could be either)
The cast operation is deterministic for refined inputs (a literal cast to a type produces a literal output) and conservative for unrefined inputs (an int cast to bool produces bool, since the analyser can't statically know which way it'll go).
compatibility
use suffete::compatibility;
let compatible: bool = compatibility::runtime_compatible(a, b, &world, options);
Asks: is there some pair of runtime values, one in a and one in b, that PHP would consider compatible? The answer is a boolean.
This is not the same as overlaps. overlaps is the static type-set intersection: do a and b share a value in the kind sense? runtime_compatible is broader: does PHP's runtime allow a value of a to be used where a value of b is expected?
Examples where runtime_compatible differs from overlaps:
intandfloat:overlapsreturnsfalse(no integer is a float);runtime_compatiblereturnstrue(PHP coerces).numeric-stringandint:overlapsreturnsfalse;runtime_compatiblereturnstrue(PHP coerces).intandbool:overlapsreturnsfalse;runtime_compatiblereturnstruein non-strict mode (PHP coerces 0 to false, non-zero to true).Fooandclass-string<Foo>:overlapsreturnsfalse(one is an object, the other a string);runtime_compatiblereturnstrueif the analyser is checking parameter passing where a class-string can produce an instance.
Use cases:
- The analyser is checking a function call boundary in non-strict mode and wants to know whether to warn about the argument type.
- The analyser is checking a
switchstatement againstcasevalues and wants to know which cases are reachable under PHP's loose comparison. - The analyser is checking an
==(loose equality) operator and wants to know whether the comparison can possibly returntrue.
For strict questions ("could this value, statically, be both?"), use overlaps instead.
How the two relate
The runtime_compatible operation is a superset of overlaps:
overlaps(a, b) → runtime_compatible(a, b). (If they share a value, they're compatible.)runtime_compatible(a, b) does not imply overlaps(a, b). (Compatibility includes coercion edges.)
refines(a, b) → runtime_compatible(a, b) (assuming a is inhabited).
Lattice options that affect cast / compatibility
The LatticeOptions php_runtime_coerce toggle controls whether the lattice itself admits coercion edges in refines. Cast and compatibility are always coercion-aware ; they exist to model the runtime. The php_runtime_coerce option toggles whether refines also models the runtime (in non-strict mode) or stays strict.
In strict-types mode (declare(strict_types=1)), refines should be called with php_runtime_coerce = false. The cast and compatibility operations are unchanged ; they always reflect runtime behaviour.
A worked example
use suffete::{TypeBuilder, prelude::{INT, FLOAT}, lattice::{self, LatticeOptions, LatticeReport}, world::NullWorld, compatibility, cast};
let world = NullWorld;
let opts = LatticeOptions::default();
let mut rep = LatticeReport::new();
let int_t = TypeBuilder::new().push(INT).build();
let float_t = TypeBuilder::new().push(FLOAT).build();
// Static refines: int does not refine float.
assert!(!lattice::refines(int_t, float_t, &world, opts, &mut rep));
// But runtime allows the coercion.
assert!(compatibility::runtime_compatible(int_t, float_t, &world, opts));
// Casting int to float gives float.
let casted = cast::to_float(int_t);
assert_eq!(casted, float_t);
When to use which
| Question | Operation |
|---|---|
Is a a strict subtype of b? | lattice::refines |
Do a and b share a value statically? | lattice::overlaps |
Is a runtime-compatible with b (including coercions)? | compatibility::runtime_compatible |
What does a coerce to when forced to type b? | cast::to_X |
What is the smallest type containing both a and b? | lattice::join |
What is the type of a's and b's shared values? | lattice::meet |
A subtle case: refines, overlaps, and runtime_compatible disagree
For int and float:
refines(int, float)=false(strictly: an int is not a float).overlaps(int, float)=false(no integer is a float as a value).runtime_compatible(int, float)=true(PHP coerces).meet(int, float)=never(no shared values).join(int, float)=int|float(no canonical merge).
The strict static analysis (refines, overlaps, meet) treats them as disjoint. The runtime model (runtime_compatible, cast) acknowledges the coercion. The analyser chooses which to consult based on the diagnostic it's producing.
See also: refines, overlaps for the strict static answers; Lattice options and reports for the
php_runtime_coercetoggle.
Expansion: resolving unresolved elements
The expand module resolves the unresolved Element kinds — Alias, Reference, MemberReference, GlobalReference, Conditional, Derived, Variable — into structural types the lattice can reason about.
use suffete::expand;
let resolved: TypeId = expand::expand(input, &world, &template_env);
The contract is one direction: the analyser must call expand on a type before passing it to the lattice if the type might contain any unresolved Element. The lattice does not invoke expansion itself; the recursion would loop, and the analyser knows when expansion is safe in a way the lattice cannot.
Why expansion is the analyser's job
Resolving an Alias requires looking the alias name up in the analyser's alias table. Resolving a Conditional requires evaluating the subject vs target subtype check, which requires the world. Resolving a Derived requires walking into the target type and applying the transformation.
All of these are traversals the analyser may want to control: cache the result, abort on a cycle, expand only some kinds (e.g. expand aliases but leave conditionals lazy), produce diagnostics on unresolved names. The lattice would have one fixed strategy; the analyser benefits from having the choice.
What expand does
expand(ty, world, template_env) walks the type tree and replaces every unresolved Element with its structural form, recursing into nested types. The high-level rules:
Alias { name }— resolve viaworld.resolve_alias(name). If the alias is itself an alias chain, follow until structural.Reference { name, type_args, intersections }— resolvenameto a class-like or a template parameter via the world's symbol table. Substitutetype_argsif applicable.MemberReference { class, name }— resolve the member name on the class viaworld.class_constant_type(or similar member-type query).GlobalReference { name }— resolve via the analyser's global type variable table.Conditional { subject, target, then, otherwise }— undertemplate_env, checksubject <: target. Returnthenif yes,otherwiseif no.Derived(...)— apply the per-variant transformation.Variable { id }— look upidin the analyser's inference state.
The template_env argument is the current substitution environment — bindings from template parameters to types. Used by Conditional (to substitute subject and target before the check) and by Derived variants that reference templates.
Worked example: Alias
/** @type UserId = positive-int */
function find(UserId $id): User { ... }
The parameter type is parsed as Alias { name: "UserId" }. Before the analyser checks find(7), it expands:
let alias_t = ...; // contains Alias { name: "UserId" }
let resolved = expand::expand(alias_t, &world, &template_env);
// resolved == int<1, ∞> (the underlying type of UserId)
The world's resolve_alias("UserId") returns the underlying type; expansion substitutes it.
Worked example: Conditional
/**
* @template T
* @return ($T extends int ? string : bool)
*/
function classify(): mixed { ... }
The return type is Conditional { subject: T, target: int, then: string, otherwise: bool }. After the call site has bound T := int(7):
let template_env = ...; // T → int(7)
let cond_t = ...; // Conditional element wrapped in a TypeId
let resolved = expand::expand(cond_t, &world, &template_env);
// Step 1: substitute template_env into the conditional.
// subject (T) becomes int(7); target stays int.
// Step 2: check int(7) <: int → true.
// Step 3: take the `then` branch.
// resolved == string
If T were bound to bool instead, the target check would fail and the otherwise branch (bool) would be taken.
Worked example: Derived (KeyOf)
/** @type Shape = array{a: int, b: string} */
/** @type Keys = key-of<Shape> */
The type Keys is Derived(KeyOf(Alias("Shape"))). Expansion:
- Resolve the inner
Alias("Shape")→array{a: int, b: string}. - Apply
KeyOfto the result: extract the keys →'a' | 'b'.
The Derived(KeyOf) variant has a per-kind transformation. For keyed arrays, it returns the union of literal-string keys (or array-key if the array is unsealed). For lists, it returns int (or a range of valid indices). For objects, it returns the property names.
Recursion and termination
Aliases can be chains: A = B, B = C, C = int. Expansion follows the chain. Suffete does not detect cycles directly ; the world's resolve_alias is expected to return None for unknown names, which terminates the chain naturally.
If the analyser needs cycle detection, it should detect cycles in its alias table during ingestion and report a diagnostic before suffete sees the type.
Partial expansion
Sometimes the analyser wants to expand some kinds but not others ; expand aliases but defer conditionals until enough template bindings exist, for example. The crate offers per-kind expansion helpers in addition to the catch-all expand::expand:
let alias_only = expand::expand_aliases(input, &world);
let conditionals_only = expand::expand_conditionals(input, &world, &template_env);
The exact list of partial expanders is in src/expand/. The catch-all is the most common entry point.
Cost
Expansion is O(tree size) for the type, plus the per-kind costs:
AliasandReferenceresolution: one world query per kind.Conditionalresolution: onerefinescall (lattice cost) plus the substitution.Derivedresolution: variant-specific, but bounded by the input's tree size.
The most expensive cases are Conditional chains where each branch is itself a Conditional with a non-trivial subject ; the lattice is invoked recursively. Most analyser-level types resolve in microseconds.
Idempotence
expand is idempotent: expand(expand(t)) == expand(t) (assuming no world changes between calls). The analyser can call expand without worrying about double expansion.
Detecting whether expansion is needed
The predicates chapter exposes:
use suffete::predicates::is_fully_resolved;
if !is_fully_resolved(ty) {
ty = expand::expand(ty, &world, &template_env);
}
This is the recommended pattern for analyser code that consumes types from the type-source layer (the parser, the codebase model, the docblock interpreter) and feeds them to the lattice.
A worked example: full pipeline
use suffete::{TypeId, expand, lattice::{self, LatticeOptions, LatticeReport}};
use suffete::predicates::is_fully_resolved;
fn analyser_check<W: suffete::world::World>(
input: TypeId,
expected: TypeId,
world: &W,
template_env: &TemplateEnv,
) -> bool {
let input = if is_fully_resolved(input) { input } else { expand::expand(input, world, template_env) };
let expected = if is_fully_resolved(expected) { expected } else { expand::expand(expected, world, template_env) };
let opts = LatticeOptions::default();
let mut report = LatticeReport::new();
lattice::refines(input, expected, world, opts, &mut report)
}
The pattern: expand each side if needed, then call the lattice. Trivial wrapper, but the right interface to enforce on every analyser-side query.
See also: Unresolved elements for the kinds expansion handles; Predicates for
is_fully_resolved; World for the resolution methods used during expansion; Conditional and Derived for the per-variant rules.
Serialization
The serialize module provides serde implementations for TypeId, ElementId, and the surrounding handle types. The intended use case: persisting types across analyser runs (caching), exchanging types over a wire (LSP, RPC), or producing a human-readable rendering for diagnostics.
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize)]
struct Diagnostic {
pub message: String,
pub expected: suffete::TypeId,
pub actual: suffete::TypeId,
}
TypeId and ElementId implement Serialize and Deserialize ; the same is true of the support handles (ElementListId, TypeListId, etc.).
Wire format
The wire format serialises the content of the handle, not the bit pattern. Two reasons:
- Cross-process stability. The bit pattern of a
TypeIdis a process-local arena slot ; it has no meaning in another process. The content (the element kinds, the payloads, the nested types) is portable. - Compactness for human-readable output. Producing a JSON like
{"kind": "Object", "name": "Foo", "args": [{"kind": "Int"}]}is more useful in a diagnostic than{"slot": 1234, "flags": 0, "meta": 0}.
The exact format is suffete-defined; consumers should not rely on the specific shape, only on round-trip safety: serialise + deserialise produces an equivalent handle (same == after re-interning).
Round-trip safety
let original: TypeId = ...;
let json = serde_json::to_string(&original).unwrap();
let recovered: TypeId = serde_json::from_str(&json).unwrap();
assert_eq!(original, recovered);
Deserialising the JSON re-interns the type into the current process's arenas. The recovered TypeId may have a different bit pattern from the original (different process, different arena state) but compares equal via the content-based Eq.
Serialising types referencing the world
Some types reference names the world owns: class names, enum names, alias names, template parameter names. Serialisation includes the names. Deserialisation interns the names through the standard mago_atom::atom! machinery; the recovered type carries the same names.
This means: a deserialised type can be passed to a World of a different analyser run, and the world's queries on those names will work as expected (assuming the codebase still has the same names).
What is not serialised
- The
metabyte ofTypeId. Themetais consumer-defined; serialisation includes it, but suffete does not know what it means. - Internal arena slot indices. They are process-local.
- Any caches or precomputed state.
Performance
Serialisation walks the type tree and emits a structured representation. The cost is O(tree size). For an analyser caching thousands of types, the serialise-side cost is the dominant one (deserialise re-interns into existing arenas, which is fast).
For very large collections, bincode or postcard are faster than serde_json ; both are supported via serde's standard mechanisms.
A worked example
use suffete::{TypeBuilder, prelude::{INT, STRING}};
let original = TypeBuilder::new().push(INT).push(STRING).build();
// Serialise to JSON.
let json = serde_json::to_string_pretty(&original).unwrap();
println!("{}", json);
// Deserialise back.
let recovered: suffete::TypeId = serde_json::from_str(&json).unwrap();
assert_eq!(original, recovered);
Using serialize for diagnostics
Diagnostics typically want a human-readable rendering of the type, not the wire format. For that, use the Display implementation:
let pretty: String = format!("{}", original.as_ref()); // "int|string"
Or the Typed trait's pretty_with_indent method for multi-line, indented output suitable for hover-style displays.
The serialize module is for storage and transport. The Display / Typed::pretty methods are for diagnostics.
See also: Handles for the underlying types being serialised; the
Typedtrait for diagnostic-quality rendering.
Answering "is A a subtype of B?"
The most common analyser question. Recipe with the full setup, expansion, and result-handling.
The pattern
use suffete::{TypeId, lattice::{self, LatticeOptions, LatticeReport}, predicates::is_fully_resolved, expand};
use suffete::world::World;
fn is_subtype<W: World>(
sub: TypeId,
sup: TypeId,
world: &W,
template_env: &TemplateEnv,
strict: bool,
) -> (bool, LatticeReport) {
// 1. Expand any unresolved Elements (Alias, Conditional, Derived, ...).
let sub = if is_fully_resolved(sub) { sub } else { expand::expand(sub, world, template_env) };
let sup = if is_fully_resolved(sup) { sup } else { expand::expand(sup, world, template_env) };
// 2. Configure lattice options.
let opts = LatticeOptions::default()
.with_php_runtime_coerce(!strict);
// 3. Run the lattice query, collecting report side-info.
let mut report = LatticeReport::new();
let result = lattice::refines(sub, sup, world, opts, &mut report);
(result, report)
}
Reading the result
The (bool, LatticeReport) pair is structured for diagnostic emission:
bool == true; the subtype check holds (under the chosen strictness).bool == false; the check failed; the analyser surfaces a type error.report.causes; bitset of which coercion-tolerant rules fired. Surface in the diagnostic if you want to warn (e.g. "this passes, but uses PHP's int-to-float coercion").
Strictness
The strict parameter maps to PHP's declare(strict_types=1):
- strict mode (
strict = true): no coercion edges.int <: floatisfalse.numeric-string <: intisfalse. - non-strict mode (
strict = false, the default): coercion edges admitted withCoercionCauses::PHP_RUNTIME_COERCErecorded.
The analyser typically chooses based on the file's declare(strict_types) directive. Files without a directive default to non-strict.
Worked examples
Direct subtype
use suffete::{TypeBuilder, prelude::{INT, STRING}};
let int_t = TypeBuilder::new().push(INT).build();
let int_or_str = TypeBuilder::new().push(INT).push(STRING).build();
let (ok, _report) = is_subtype(int_t, int_or_str, &world, &template_env, true);
assert!(ok);
Coercion-only subtype (non-strict)
use suffete::{TypeBuilder, prelude::{INT, FLOAT}};
let int_t = TypeBuilder::new().push(INT).build();
let float_t = TypeBuilder::new().push(FLOAT).build();
let (ok_strict, _) = is_subtype(int_t, float_t, &world, &template_env, true);
assert!(!ok_strict); // strict: int does not refine float.
let (ok_loose, report) = is_subtype(int_t, float_t, &world, &template_env, false);
assert!(ok_loose); // non-strict: int coerces.
assert!(report.causes.contains(suffete::lattice::CoercionCauses::PHP_RUNTIME_COERCE));
Class-hierarchy subtype
use suffete::{TypeBuilder, ElementId};
use mago_atom::atom;
// Setup (in your World): class B extends A.
let b_t = TypeBuilder::new().push(ElementId::named_object(atom("B"))).build();
let a_t = TypeBuilder::new().push(ElementId::named_object(atom("A"))).build();
let (ok, _) = is_subtype(b_t, a_t, &world, &template_env, true);
assert!(ok);
Generic subtype with variance
// World registers Iterator<T> as covariant on T.
let int_iter = TypeBuilder::new().push(
ElementId::named_object_with_args(atom("Iterator"), &[suffete::prelude::TYPE_INT])
).build();
let mixed_iter = TypeBuilder::new().push(
ElementId::named_object_with_args(atom("Iterator"), &[suffete::prelude::TYPE_MIXED])
).build();
let (ok, _) = is_subtype(int_iter, mixed_iter, &world, &template_env, true);
assert!(ok); // covariance: int <: mixed
Failure with a useful report
let foo = TypeBuilder::new().push(ElementId::named_object(atom("Foo"))).build();
let bar = TypeBuilder::new().push(ElementId::named_object(atom("Bar"))).build();
let (ok, report) = is_subtype(foo, bar, &world, &template_env, true);
assert!(!ok);
// report contains structured information you can use to construct
// a diagnostic message, like which family of rules failed.
Performance
refines is the hot path. Suffete optimises for the common cases:
- Reflexivity:
refines(t, t) = truein one comparison. - Singleton-vs-singleton: most analyser queries are between one-Element types; the cartesian is degenerate.
- Subsumption shortcut in unions: if any Element on the left refines any Element on the right by the universal axioms (top, bot, placeholder), short-circuit.
Expected cost for a typical analyser query: tens of nanoseconds. Worst-case (deep generics, multi-conjunct intersections, full fan-out coverage): microseconds.
When to expand
Expand exactly once per type, before the first lattice query. The expansion result is itself a TypeId and can be cached per (input, world-version, template_env) if the analyser sees it many times.
The is_fully_resolved check is cheap (a tree walk with a short-circuiting predicate), so the recipe above always-checks-then-expands. For analyser code that knows expansion has already happened (e.g. inside a single statement's analysis), skip the check.
When to skip the world
If the question doesn't require the world (e.g. is_subtype(int_or_string, int)), you can pass NullWorld:
use suffete::world::NullWorld;
let (ok, _) = is_subtype(int_or_string, int_t, &NullWorld, &TemplateEnv::default(), true);
The lattice queries the world only when needed. For fully-structural inputs, the world is never asked.
See also: refines for the rules; overlaps for the symmetric "share a value?" question; LatticeOptions for the strictness toggles.
Building a union from scratch
Three ways to construct a union type, depending on what the analyser is doing.
Method 1: literal construction with TypeBuilder
For when the analyser knows the elements ahead of time (a parsed PHP type expression, a hand-coded fixture):
use suffete::{TypeBuilder, prelude::{INT, STRING, NULL}};
let int_or_string_or_null = TypeBuilder::new()
.push(INT)
.push(STRING)
.push(NULL)
.build();
build is structural: it sorts and dedups but does not apply join's canonicalisation rules (no range merging, no literal collapse, no subtype absorption).
If the analyser wants those rules applied:
let canonical = TypeBuilder::new()
.push(INT)
.push(STRING)
.push(NULL)
.build_canonical();
build_canonical runs the join's canonicalisation pass: subsumption, range merging, literal collapse, etc. Use it when the analyser wants the smallest representation.
Method 2: incremental, via lattice::join
For when the analyser is producing a union from a set of computed values (e.g. the union of a switch's case values, or the join of a control-flow merge):
use suffete::{TypeId, lattice};
fn join_all<W: suffete::world::World>(types: &[TypeId], world: &W) -> TypeId {
let mut result = suffete::prelude::TYPE_NEVER;
let opts = lattice::LatticeOptions::default();
let mut report = lattice::LatticeReport::new();
for &t in types {
result = lattice::join(result, t, world, opts, &mut report);
}
result
}
The fold starts at TYPE_NEVER (the join identity) and applies join pairwise. The result is the canonical union of every input.
Each join call applies the canonicalisation rules: range merging, literal collapse, etc.
Method 3: from_type for incremental modification
For when the analyser already has a union and wants to push one or two more elements:
use suffete::{TypeBuilder, prelude::NULL};
let nullable = TypeBuilder::from_type(existing_type)
.push(NULL)
.build();
from_type enables the origin short-circuit: if the buffer is unchanged after the mutations, build() returns the original TypeId without re-interning. Useful when the mutations might be no-ops.
Comparison of the three methods
| Method | Use when | Canonicalisation | Cost |
|---|---|---|---|
TypeBuilder::push * / build() | Constructing from known elements | Structural only (sort, dedup) | Sort + intern lookup |
TypeBuilder::push * / build_canonical() | Constructing from known elements, want canonical form | Full canonicalisation | Sort + canonicalise + intern |
Repeated lattice::join | Folding a sequence | Per-step canonicalisation | N intern + N canonicalise |
TypeBuilder::from_type / push / build() | Adding to an existing type | Structural only | Origin check + sort + intern |
For most analyser code, build (or build_canonical if you need the canonical form) is the right choice. The repeated-join pattern is for pipelines where each input is computed lazily.
Worked example: union of switch case types
function f(int $kind): string {
return match ($kind) {
1, 2 => 'a',
3 => 'b',
default => 'c',
};
}
The analyser computes the type of each match-arm body and joins them to produce the function's return type:
let arm1 = ...; // type of 'a' = literal "a"
let arm2 = ...; // type of 'b' = literal "b"
let arm3 = ...; // type of 'c' = literal "c"
let return_t = join_all(&[arm1, arm2, arm3], &world);
// return_t == "a"|"b"|"c" (or, after literal collapse threshold, just `string`)
If the analyser wants to preserve the literals (no collapse), use TypeBuilder::push * / build. If the analyser wants the canonical form (let the lattice decide whether to collapse), use lattice::join (which uses the lattice's literal-collapse threshold) or build_canonical.
Worked example: making a type nullable
use suffete::{TypeBuilder, prelude::NULL, predicates::contains_null};
fn make_nullable(t: TypeId) -> TypeId {
if contains_null(t) {
return t; // already nullable
}
TypeBuilder::from_type(t).push(NULL).build()
}
The contains_null check avoids the unnecessary intern lookup for already-nullable inputs. The from_type ensures the origin short-circuit fires when (paradoxically) the type was already nullable but we hit this code anyway.
Worked example: union of class names
The analyser has a list of classes the user declared in a docblock and wants to construct the type:
use suffete::{TypeBuilder, ElementId};
use mago_atom::Atom;
fn classes_to_union(names: &[Atom]) -> TypeId {
let mut b = TypeBuilder::new();
for &name in names {
b.push(ElementId::named_object(name));
}
b.build()
}
let t = classes_to_union(&[
mago_atom::atom("Foo"),
mago_atom::atom("Bar"),
mago_atom::atom("Baz"),
]);
// t == Foo | Bar | Baz
If two of the classes are related by inheritance, build_canonical (or a separate lattice::join fold) would absorb the descendant ; build keeps them distinct.
Performance notes
TypeBuilder::pushis O(1).buildis O(n log n) for the sort plus the interner cost. The interner has a fast path that detects already-sorted-and-unique input via SIMD.build_canonicalis O(n²) worst case (subsumption is pairwise), but typical inputs are small.lattice::joinis the canonicalisation cost per pair, O(N) over the fold.
For analyser hot loops constructing many unions, consider:
- Reusing one
TypeBuilderinstance across iterations (clear and re-push). - Pre-sorting if you know the input is canonical, so the interner's fast path fires.
- Choosing
buildoverbuild_canonicalif you don't need the canonical form ; the latter is strictly more expensive.
See also: TypeBuilder for the construction API; join for the canonicalisation rules; Predicates for the
contains_nulland friends used as fast pre-checks.
Narrowing a parameter type from instanceof
A canonical analyser pattern: the parameter has type T, and the body has an instanceof check. Inside the if, the type is narrowed to the intersection; in the else, the type is narrowed to the difference.
The PHP
function f(Foo|Bar|null $x): void {
if ($x instanceof Foo) {
// $x is Foo here
} else {
// $x is Bar|null here
}
}
The recipe
use suffete::{TypeId, lattice};
use suffete::world::World;
struct Branches { in_branch: TypeId, else_branch: TypeId }
fn narrow_instanceof<W: World>(
input: TypeId, // the parameter's type
target: TypeId, // the type from the instanceof's RHS, e.g. Foo
world: &W,
) -> Branches {
let opts = lattice::LatticeOptions::default();
let mut report = lattice::LatticeReport::new();
let in_branch = lattice::narrow(input, target, world, opts, &mut report);
let else_branch = lattice::subtract(input, target, world, opts, &mut report);
Branches { in_branch, else_branch }
}
The narrow for the positive case typically reduces to meet (input ⊓ target). The subtract for the negative case gives back the input minus the target.
Worked example
use suffete::{TypeBuilder, ElementId, prelude::NULL};
use suffete::world::NullWorld;
use mago_atom::atom;
let foo = ElementId::named_object(atom("Foo"));
let bar = ElementId::named_object(atom("Bar"));
let input = TypeBuilder::new().push(foo).push(bar).push(NULL).build();
let target = TypeBuilder::new().push(foo).build();
let world = NullWorld;
let branches = narrow_instanceof(input, target, &world);
// branches.in_branch == Foo
// branches.else_branch == Bar | null
When target is itself a union
if ($x instanceof Foo || $x instanceof Bar) {
// $x is Foo|Bar here
}
The analyser computes the union of the two instanceof targets and passes it as the target. The recipe is unchanged ; narrow and subtract handle multi-element targets.
When instanceof is on a generic class
function f(mixed $x): void {
if ($x instanceof Iterator) {
// $x is Iterator<mixed, mixed> here (the parameter's upper bounds)
}
}
The analyser constructs the target as Iterator<mixed, mixed> (or with whatever bounds the codebase declares). The narrowing handles this through the world's parameter declaration: each parameter is filled with its upper bound, and the from_template_default flag is set so the variance check is permissive.
let target = TypeBuilder::new().push(
ElementId::named_object_with_args(
atom("Iterator"),
&[suffete::prelude::TYPE_MIXED, suffete::prelude::TYPE_MIXED],
)
).build();
let branches = narrow_instanceof(input, target, &world);
// branches.in_branch == Iterator<mixed, mixed>
// (with the from_template_default flag set on the args)
When the input has no overlap with the target
function f(string $x): void {
if ($x instanceof Foo) {
// unreachable; $x is `never` here
}
}
let input = TypeBuilder::new().push(suffete::prelude::STRING).build();
let target = TypeBuilder::new().push(ElementId::named_object(atom("Foo"))).build();
let branches = narrow_instanceof(input, target, &world);
// branches.in_branch == TYPE_NEVER
// branches.else_branch == string
The analyser sees in_branch == TYPE_NEVER and can surface a diagnostic ("this branch is unreachable").
When the input fully refines the target
function f(Foo $x): void {
if ($x instanceof Foo) {
// always true; no narrowing needed
} else {
// unreachable; $x is `never` here
}
}
let input = TypeBuilder::new().push(foo).build();
let target = TypeBuilder::new().push(foo).build();
let branches = narrow_instanceof(input, target, &world);
// branches.in_branch == Foo (unchanged)
// branches.else_branch == TYPE_NEVER
The analyser can warn ("this instanceof is always true") or use the result silently.
Negation pattern
function f(Foo|Bar|null $x): void {
if (!($x instanceof Foo)) {
// $x is Bar|null here
} else {
// $x is Foo here
}
}
The analyser computes the same Branches and swaps which branch corresponds to which:
let branches = narrow_instanceof(input, target, &world);
// In the !instanceof branch (the if-body): branches.else_branch
// In the else branch: branches.in_branch
Combining with other narrowings
The analyser can chain narrowings as control flow proceeds:
function f(Foo|Bar|null $x): void {
if ($x === null) { return; } // narrow: subtract null
if ($x instanceof Foo) { ... } // narrow: meet with Foo
}
After the null check, the analyser narrows $x to Foo|Bar. Inside the instanceof Foo branch, narrows again to Foo. Each step is a standalone narrow or subtract call.
Performance
narrow and subtract are both lattice operations. Their cost is bounded by the input sizes; for typical analyser inputs (small unions, no fan-out), each call is sub-microsecond.
The analyser typically runs many narrowings during a single function body's analysis. Reuse the LatticeOptions and LatticeReport if possible; both are cheap to construct, but reusing avoids the Default cost in tight loops.
See also: narrow, subtract, meet for the underlying operations; predicates::is_never for detecting unreachable branches in the result.
Resolving a generic class against an instance
The analyser sees a method call on a generic class. The class is Box<T>; the method declares function get(): T; the instance is Box<int>. What is the return type of $box->get() ?
This is the canonical generic-resolution recipe.
The PHP
/**
* @template T
*/
class Box<T> {
/** @return T */
public function get(): mixed { ... }
}
$box = new Box<int>();
$x = $box->get(); // type of $x is int
The recipe
use suffete::{TypeId, ElementId, ElementKind, template::substitute};
use suffete::element::payload::{ObjectInfo, GenericParameterInfo, DefiningEntity};
use suffete::interner::interner;
use suffete::world::World;
use mago_atom::Atom;
fn resolve_method_return<W: World>(
instance_type: TypeId, // Box<int>
method_name: Atom, // "get"
declared_return: TypeId, // T (a free GenericParameter)
world: &W,
) -> TypeId {
// 1. Extract the class name and arguments from the instance.
let i = interner();
let view = instance_type.as_ref();
let elements = view.elements;
if elements.len() != 1 || elements[0].kind() != ElementKind::Object {
// not a single-class type; analyser fallback (e.g. union of resolutions)
return declared_return;
}
let obj_info: &ObjectInfo = i.get_object(elements[0]);
let class_name = obj_info.name;
let class_entity = i.intern_defining_entity(DefiningEntity::ClassLike(class_name));
let actual_args = obj_info.type_args
.map(|id| i.get_type_list(id).to_vec())
.unwrap_or_default();
// 2. Substitute the class's template parameters with the instance's args.
substitute(declared_return, &|info: &GenericParameterInfo| -> Option<TypeId> {
if info.defining_entity != class_entity {
return None;
}
let position = world.template_parameter_index(class_name, info.name)?;
actual_args.get(position).copied()
})
}
How it works
- Pull the class name and actual arguments out of the instance type. (For a multi-class union, the analyser would resolve each branch and join.)
- Walk the declared return type, replacing every
GenericParameterElement whosedefining_entitymatches the class with the corresponding actual argument. - The walker handles deeply-nested uses of the parameter (e.g.
array<T, T>withT := intbecomesarray<int, int>).
Worked example
use suffete::{TypeBuilder, ElementId};
use suffete::element::payload::{GenericParameterInfo, DefiningEntity};
use suffete::interner::interner;
use mago_atom::atom;
let i = interner();
let box_class = i.intern_defining_entity(DefiningEntity::ClassLike(atom("Box")));
// Declared field type: T
let t_param: ElementId = i.intern_generic_parameter(GenericParameterInfo {
name: atom("T"),
defining_entity: box_class,
constraint: suffete::prelude::TYPE_MIXED,
qualifier: None,
});
let declared_return = TypeBuilder::new().push(t_param).build();
// Instance: Box<int>
let instance = TypeBuilder::new().push(
ElementId::named_object_with_args(atom("Box"), &[suffete::prelude::TYPE_INT])
).build();
// Resolve.
let resolved = resolve_method_return(instance, atom("get"), declared_return, &world);
// resolved == int
Multi-parameter case
/**
* @template K
* @template V
*/
class Map {
/**
* @param K $key
* @return V
*/
public function get($key) { /* ... */ }
}
The recipe walks the entire declared return type. For Map<string, int>::get, the declared return is V, which substitutes to int. For a hypothetical Map<string, int>::keys() declaring array<K>, the return would substitute to array<string>.
The substitution is per-parameter; the position lookup via template_parameter_index handles each.
Default-filled parameters
If the user wrote Box instead of Box<int>, the analyser fills T with the upper bound (mixed by default) and stamps the type with from_template_default. The recipe produces mixed as the resolved return; the variance check at the call site tolerates the default-fill (recording CoercionCauses::TEMPLATE_DEFAULT on the report).
Multi-class union
function f(Box<int>|Bag<string> $bw): mixed {
return $bw->get();
}
The analyser resolves each branch separately and joins:
fn resolve_method_on_union<W: World>(
instance_type: TypeId,
method_name: Atom,
world: &W,
) -> TypeId {
let mut result = suffete::prelude::TYPE_NEVER;
let opts = LatticeOptions::default();
let mut report = LatticeReport::new();
for &elem in instance_type.as_ref().elements {
// Look up the method's declared return on this Element's class.
let declared = world.method_return_type(elem, method_name); // hypothetical
let resolved = resolve_method_return(
TypeBuilder::new().push(elem).build(),
method_name,
declared,
world,
);
result = lattice::join(result, resolved, world, opts, &mut report);
}
result
}
The join folds the per-branch resolutions into the final return type.
Inheritance
If class IntBox extends Box<int> and the analyser sees (new IntBox())->get():
The recipe is unchanged ; the instance type is IntBox, not Box. To get the declared return for get (which lives on Box), the analyser looks up the method on the world ; the world walks the inheritance and returns Box::get's declared return.
For substitution, the parameter is T on Box (not T on IntBox). The recipe's class_entity should be Box, not IntBox. The actual argument is found via inherited_template_argument(IntBox, Box, 0) = int.
A more robust recipe handles this:
fn resolve_method_return_with_inheritance<W: World>(
instance_type: TypeId,
method_name: Atom,
declared_return: TypeId,
declaring_class: Atom, // Box (where get is declared)
world: &W,
) -> TypeId {
let i = interner();
let view = instance_type.as_ref();
let elements = view.elements;
if elements.len() != 1 || elements[0].kind() != ElementKind::Object {
return declared_return;
}
let obj_info: &ObjectInfo = i.get_object(elements[0]);
let instance_class = obj_info.name;
let actual_args = obj_info.type_args
.map(|id| i.get_type_list(id).to_vec())
.unwrap_or_default();
// For each parameter declared on the declaring class, resolve via inheritance.
let arity = world.template_parameter_arity(declaring_class);
let mut bindings: Vec<Option<TypeId>> = Vec::with_capacity(arity);
for pos in 0..arity {
// The declaring class's parameter at position pos.
// Find what the instance class supplies.
if instance_class == declaring_class {
bindings.push(actual_args.get(pos).copied());
} else {
let inherited = world.inherited_template_argument(instance_class, declaring_class, pos);
// The inherited type may itself reference the instance class's parameters;
// substitute those with actual_args.
bindings.push(inherited.map(|t| substitute_class_params(t, instance_class, &actual_args, world)));
}
}
let declaring_class_entity = i.intern_defining_entity(DefiningEntity::ClassLike(declaring_class));
substitute(declared_return, &|info: &GenericParameterInfo| -> Option<TypeId> {
if info.defining_entity != declaring_class_entity {
return None;
}
let position = world.template_parameter_index(declaring_class, info.name)?;
bindings.get(position).copied().flatten()
})
}
fn substitute_class_params<W: World>(
ty: TypeId,
instance_class: Atom,
actual_args: &[TypeId],
world: &W,
) -> TypeId {
let i = interner();
let instance_entity = i.intern_defining_entity(DefiningEntity::ClassLike(instance_class));
substitute(ty, &|info: &GenericParameterInfo| -> Option<TypeId> {
if info.defining_entity != instance_entity {
return None;
}
let position = world.template_parameter_index(instance_class, info.name)?;
actual_args.get(position).copied()
})
}
The lattice's specialise chapter is the formal protocol behind this recipe. The recipe above is what the analyser does to get the resolved return type.
Performance
The substitution walker is O(tree size) per call. The world queries (template_parameter_index, inherited_template_argument) are typically O(1) amortised in a well-implemented analyser. Total cost per method-call resolution is sub-microsecond for typical analyser inputs.
See also: Substitute for the substitution operation; Specialise for the inheritance protocol; Templates in depth for the parameter-Element details.
Walking a type to find every class name
The analyser wants to extract every class-like name a type references — for refactoring, for symbol tracking, for diagnostics. The recipe uses inspect with a closure that collects into a RefCell.
The recipe
use std::cell::RefCell;
use suffete::{TypeId, ElementKind, inspect};
use suffete::interner::interner;
use mago_atom::Atom;
fn collect_class_names(ty: TypeId) -> Vec<Atom> {
let names = RefCell::new(Vec::new());
inspect::any(ty, |elem| {
let i = interner();
match elem.kind() {
ElementKind::Object => {
names.borrow_mut().push(i.get_object(elem).name);
}
ElementKind::Enum => {
names.borrow_mut().push(i.get_enum(elem).name);
}
ElementKind::ClassLikeString => {
use suffete::element::payload::ClassLikeStringSpecifier;
let info = i.get_class_like_string(elem);
if let ClassLikeStringSpecifier::Literal(name) = info.specifier {
names.borrow_mut().push(name);
}
}
_ => {}
}
false // don't short-circuit; visit every Element
});
names.into_inner()
}
Returning false from the predicate keeps the walker going. If you want to short-circuit on the first hit (e.g. "does this type reference any class?"), return true after a match.
How inspect::any enumerates
The walker is post-order and recurses into every nested-type carrier (inspect chapter has the full list). For each Element:
- Recurse into nested types (object args, list element, callable params, etc.).
- Call the closure on this Element.
- If the closure returns
true, short-circuit; otherwise continue.
A type like Box<Map<string, Foo|Bar>> | null produces this enumeration order (post-order):
string, Foo, Bar, Map, Box, null
The recipe above pushes only the class-like Elements (Foo, Bar, Map, Box), giving [Foo, Bar, Map, Box].
Worked example
use suffete::{TypeBuilder, ElementId};
use mago_atom::atom;
let inner = TypeBuilder::new()
.push(ElementId::named_object(atom("Foo")))
.push(ElementId::named_object(atom("Bar")))
.build();
let map_t = TypeBuilder::new().push(
ElementId::named_object_with_args(atom("Map"), &[suffete::prelude::TYPE_STRING, inner])
).build();
let outer = TypeBuilder::new().push(
ElementId::named_object_with_args(atom("Box"), &[map_t])
).push(suffete::prelude::NULL).build();
let names = collect_class_names(outer);
// names == [Foo, Bar, Map, Box] (in some order)
Deduplicating
The walker visits the same ElementId once per occurrence, not per ID. If a class appears multiple times in the tree, the closure runs multiple times. To dedupe:
use std::collections::HashSet;
fn collect_unique_class_names(ty: TypeId) -> HashSet<Atom> {
let names = RefCell::new(HashSet::new());
inspect::any(ty, |elem| {
let i = interner();
match elem.kind() {
ElementKind::Object => { names.borrow_mut().insert(i.get_object(elem).name); }
ElementKind::Enum => { names.borrow_mut().insert(i.get_enum(elem).name); }
_ => {}
}
false
});
names.into_inner()
}
HashSet::insert does the dedup; the closure runs as many times as there are occurrences, but only unique names are stored.
Reach into intersection conjuncts
The walker recurses into Object's intersections automatically. For Foo & Bar, the recipe collects both Foo and Bar. For deeply nested intersections (e.g. Foo & Bar & Baz<Qux>), the recipe collects all four.
Filtering by class_like_kind
If you want only class names (not interfaces, traits, enums), the recipe can filter:
fn collect_class_only<W: World>(ty: TypeId, world: &W) -> Vec<Atom> {
let names = RefCell::new(Vec::new());
inspect::any(ty, |elem| {
if elem.kind() == ElementKind::Object {
let name = interner().get_object(elem).name;
// Filter: only keep classes, exclude interfaces/traits.
if !world.is_interface(name) && !world.is_trait(name) {
names.borrow_mut().push(name);
}
}
false
});
names.into_inner()
}
The world query keeps the recipe codebase-aware.
Performance
inspect::any is O(tree size) when the closure never short-circuits. For typical analyser types (tree size of a few dozen Elements), the walk is sub-microsecond.
The closure cost is the dominant factor. The recipes above do one interner().get_object(elem) per Object Element, which is one &'static arena read per element ; sub-nanosecond.
Allocating the result vector or hashset has its own cost; for hot paths, reuse a pre-allocated buffer.
A subtle case: ClassLikeString specifier
ClassLikeString Elements may carry their class name in the Literal(name) specifier, in the OfType { constraint } (a generic constraint), or in the Generic { constraint } (a ::class lookup on a generic). The recipe above handles only the Literal case.
For OfType and Generic, the constraint is a TypeId ; the walker descends into it automatically and the closure sees any Object Elements inside the constraint.
Variant: collect class names with positions
If the analyser wants to know where in the type each class name appeared (e.g. for diagnostic spans), the closure can track context using a stack in its environment:
let names_with_positions = RefCell::new(Vec::new());
let position = RefCell::new(0u32);
inspect::any(ty, |elem| {
*position.borrow_mut() += 1;
if elem.kind() == ElementKind::Object {
let name = interner().get_object(elem).name;
names_with_positions.borrow_mut().push((name, *position.borrow()));
}
false
});
The position counter increments on every visit, giving each class a stable position in the walk.
See also: Inspection for the walker; Element kinds for the full list of payload-bearing kinds and their nested-type carriers; World for codebase-aware filtering.
Interning and the arenas
Every interesting value in suffete — an Element, a Type, an element list, a known-items list, a defining entity, a callable signature — is interned. This chapter covers how interning works and why suffete leans on it for both correctness and performance.
This is internals reading. The public API works without understanding it. But if you're optimising analyser-side code or debugging an arena-related issue, this chapter is the one.
What interning gives you
Two ElementIds with the same logical content are the same handle. Equality is u32 == u32. Hashing is u32 as u64. Two parts of a process that construct the same Element independently end up with the same handle and never know they raced.
Three consequences:
- Cheap equality. No structural compare, no recursion, no hash chain walk. One CPU compare.
- Stable identity. A
TypeIdconstructed today is the same handle as one constructed tomorrow (same process). Useful for caching. - Compact representation. Every Element is one
u32. ATypeIdis oneu64. A union ofnElements is oneu64plus a borrowed slice ofnu32s.
What interning costs
Interning has a fixed memory cost: the dedup table. Suffete uses dashmap::DashMap for the dedup, which is concurrent-friendly but not zero-cost. The lookup is a hash of the value plus a bucket walk.
For a content-keyed value, the hash is over the content (the kind tag plus the payload). For a slice-keyed value, the hash is over the slice contents.
The intern operation:
- Hash the input.
- Probe the dashmap.
- On a hit, return the existing slot.
- On a miss, allocate the value, push it into the per-kind arena, return the new slot.
Steps 2-3 dominate: the hash is fast, the dashmap lookup is the cost.
The arena types
Two basic shapes:
Arena<T>— interns whole values ofT. One slot per uniqueT. Used for: per-element-kind payloads, defining entities, callable signatures, known-items lists, etc.SliceArena<T>— interns slices ofT. One slot per unique slice content. The slice is leaked to'staticon first sight. Used for: the union body slice, object type-args, derived type lists, known-property lists, etc.
Both store underlying values in a boxcar::Vec (a thread-safe append-only vector). Both expose 1-based slot indices ; slot 0 is reserved as the NonZero{U32,U64} niche.
The interner singleton
There is exactly one interner per process. It contains every arena, lives behind a OnceLock for the process lifetime, and is Sync ; concurrent use from multiple threads is supported and uncontended in the common case (the dashmap shards by hash).
Per-element-kind arenas
Each payload-bearing Element kind gets its own arena. The interner exposes per-kind intern + lookup methods. Trivial-kind elements (no payload) don't need interning ; their handles are constructed directly from the kind tag and the canonical slot, and the prelude exposes them as constants.
The Type interner
The Type arena is keyed on the content slice plus the flags. Two Types with the same elements (in any order) and the same flags are the same TypeId.
Type interning does:
- Sort and dedup the input slice (canonicalisation).
- Intern the sorted slice into the element-list slice arena to get a stable handle.
- Intern the
(slice_handle, flags)pair into the Type arena to get a slot. - Construct the
TypeIdfrom(slot, flags, meta=0, reserved=0).
The sort+dedup step uses sort_unstable + dedup, with a SIMD prefilter that skips both when the input is already canonical. Most analyser-side calls hit the fast path because the join layer produces canonical output.
A singleton-Type cache is layered on top: interning a one-element slice is a hashmap lookup keyed on the single Element ID, skipping the sort + the dashmap hash entirely.
Memory growth
Arenas only grow. There is no GC, no reference counting, no compaction. The trade-off:
- Pro: every handle is stable for the process lifetime; no use-after-free; no churn.
- Con: a long-running analyser session that processes many distinct types accumulates them all.
In practice, the analyser's working set saturates: most types are constructed once and reused many times, and the arena converges. Mago's expected memory budget for the type arenas is in the tens of megabytes for a large monorepo.
If the analyser truly needs per-session deallocation (e.g. an LSP server processing many distinct codebases), the recommendation is to spawn a fresh process per session. Suffete does not support arena resets within a process.
Concurrency
The dashmap is the concurrency point. Two threads interning the same value race on the dashmap entry; whichever wins inserts, the other reads back the winner's slot. Both end up with the same handle.
The arenas themselves (boxcar) are append-only and lock-free for read; writes synchronise minimally. Two threads pushing different values append independently.
Performance numbers
Approximate, on a modern x86_64 desktop:
- Trivial-kind handle access: zero ; it's a
const. - Cache-hit intern: ~30ns.
- Cache-miss intern (first time this payload is seen): ~100ns plus the cost of the boxed clone if the content is fresh.
- Singleton-Type cache hit: ~10ns.
- Slow-path Type intern: ~100-300ns depending on slice length.
- Payload lookup by handle: ~5ns; one boxcar index plus a deref.
The lattice's hot loops are dominated by the family rules, not interning. Interning is the constant-cost overhead.
See also: The ElementId tag layout, SIMD scans, Performance philosophy.
The ElementId tag layout
ElementId is a NonZeroU32 newtype. The 32 bits are split into two fields:
bit 31 30 29 28 27 26 | 25 24 23 ... 2 1 0
[ kind tag ] | [ arena slot ]
6 bits 26 bits
| Field | Bits | Width | Range |
|---|---|---|---|
| Kind tag | 31..26 | 6 | 1..=63 (64 values reserved; 30+ used) |
| Arena slot | 25..0 | 26 | 0..=2^26-1 ≈ 67 million |
The kind tag is 1-based: tag 0 is reserved as the NonZeroU32 niche, so ElementId(0) is impossible and Option<ElementId> is the same size as ElementId.
Why this layout
Three reasons:
- One-shot dispatch. A
kind()call is a right-shift by 26 plus a cast throughElementKind. No memory access, no branch. - Compact
ElementId. 32 bits per Element handle means the union body for ann-element type takes4nbytes ; a hot cache line fits 16 Elements. - SIMD-friendly. A slice of
ElementIdis a slice ofu32. AVX2 processes 8 lanes per 256-bit register; NEON processes 4 per 128-bit. The kind extraction is a vectorised right-shift; the equality scan is a vectorised compare. See SIMD scans.
Why 6 bits for the kind
PHP's type system has roughly 30 distinct Element kinds. 6 bits gives 63 usable values (excluding the niche), which leaves room for growth. 5 bits would have been tight; 7 would have wasted a bit.
Why 26 bits for the slot
26 bits gives 67 million slots per kind. The largest arena in any real-world analyser is the Object arena (every distinct (name, type_args, flags) tuple is one entry), which tops out in the millions on the largest codebases. 67 million is far beyond what any analyser is expected to need.
The slot space is per-kind, so two Elements with the same arena-slot index but different kinds are still distinct ElementIds.
How kind() works
impl ElementId {
const KIND_BITS: u32 = 6;
pub(super) const SLOT_BITS: u32 = u32::BITS - Self::KIND_BITS; // 26
const SLOT_MASK: u32 = (1u32 << Self::SLOT_BITS) - 1; // 0x03FF_FFFF
pub const fn kind(self) -> ElementKind {
let tag = (self.0.get() >> Self::SLOT_BITS) as u8;
// SAFETY: every ElementId is constructed from a valid ElementKind.
unsafe { core::mem::transmute(tag) }
}
}
The transmute is sound because every ElementId is constructed via ElementId::new(kind, slot), which sets the high bits to kind as u8 << 26. Tag values that don't correspond to an ElementKind discriminant cannot occur.
Why the kind tag is in the high bits
Two reasons:
- Sort order matches kind order. Sorting a
&[ElementId]by rawu32value clusters Elements by kind first, then by slot within kind. The interner relies on this for fast canonical-form detection (thesimd::is_sorted_strictprimitive checks adjacent strict ordering). - SIMD shift access. AVX2's
vpsrld(packed shift right logical, dword) takes an immediate count; right-shifting 8 lanes by 26 in one instruction extracts the kind tag from each lane simultaneously.
If the tag were in the low bits, a left-shift would be needed to compare against a ElementKind value, but the resulting compare would still work; the high-bit choice is a small win for the SIMD path.
The NonZero niche
NonZeroU32 requires the value to be non-zero. ElementId(0) is therefore impossible, which lets Option<ElementId> be 32 bits (the niche stores None as the all-zero pattern).
The kind tag is 1-based: ElementKind::Null is discriminant 1, not 0. Since the tag occupies the high 6 bits, the all-zero u32 would correspond to "kind 0, slot 0", which is never a valid Element ; the niche is naturally available.
TypeId is similar but bigger
TypeId is NonZeroU64, with this layout:
bit 63 ... 32 | 31 ... 16 | 15 ... 8 | 7 ... 0
[ slot: 32 ] | [flags:16] | [meta:8] | [reserved: 8]
- slot: 32 bits ; index into the type-content arena.
- flags: 16 bits ;
FlowFlagsbitset. - meta: 8 bits ; consumer-defined.
- reserved: 8 bits ; always zero.
Flags and meta ride on the handle (they don't change the slot), so toggling a flag is bit-twiddling, not a re-intern.
A subtle case: TypeId equality
Two TypeIds compare equal iff all 64 bits match. That includes flags and meta. To compare just content: t1.content_eq(&t2) ignores flags and meta. The full == is for "are these the exact same handle for use in caches keyed on the full handle".
See also: Interning and the arenas, SIMD scans.
SIMD scans
The element::simd module provides SIMD-accelerated scans over &[ElementId] slices. Six primitives, all of them dispatch through scalar fallback / AVX2 (x86_64) / NEON (aarch64), with hand-tuned inline assembly on the AVX2 path.
This is the lowest level of the crate. The lattice and predicate code consume these primitives; the user-facing API is unaware of them.
The primitives
| Function | Returns | Used by |
|---|---|---|
contains(slice, needle) | bool ; needle anywhere in slice | join's canonicalize, predicates |
position_of(slice, needle) | Option<usize> ; first index of needle | TypeBuilder::remove, TypeBuilder::replace |
any_of_kind(slice, kind) | bool ; some Element has this kind | predicates::contains_X, lattice prefilters |
all_of_kind(slice, kind) | bool ; every Element has this kind (false on empty) | predicates::is_X |
count_of_kind(slice, kind) | usize ; count of Elements with this kind | join's literal-collapse threshold |
is_sorted_strict(slice) | bool ; slice is strictly increasing under unsigned u32 order | interner's canonicality fast path |
The dispatch:
x86_64+ AVX2 (runtime-detected viastd::is_x86_feature_detected): 8 lanes (256 bits) per iteration. Hand-rolled inline assembly using FFmpeg's pointer-end trickery (one register holds a negative byte offset that doubles as both the load displacement and the loop counter).aarch64+ NEON (baseline ISA, no runtime check): 4 lanes (128 bits) per iteration. NEON intrinsics.- Other architectures or short slices: tight scalar loop.
The thresholds are 8 lanes for AVX2 and 4 lanes for NEON. Below that, scalar wins because the SIMD setup outweighs the parallel work.
Why ElementId-as-u32 is a perfect fit
ElementId is a NonZeroU32 (layout chapter). A slice of ElementId is contiguous 32-bit lanes. Equality scans (contains, position_of) and kind scans (any_of_kind, all_of_kind, count_of_kind) reduce to a per-chunk:
- Load 8 (AVX2) or 4 (NEON) lanes into a SIMD register.
- (For kind scans) Right-shift by 26 to extract the kind tag.
- Compare-equal against a broadcasted needle (or kind value).
- Reduce to a scalar via movemask (AVX2) or maxv/minv (NEON).
- Branch on the reduction.
The per-chunk cost is constant; the per-iteration overhead is one load, two-to-three SIMD ops, one branch.
The FFmpeg pointer-end trick
The AVX2 paths use a register-saving technique borrowed from FFmpeg. Instead of:
mov rcx, 0
.loop:
vmovdqu ymm1, [rsi + rcx*4]
; ... compare ...
inc rcx
cmp rcx, r8 ; r8 = chunk count
jb .loop
The trick is:
mov rdi, [end of slice] ; pointer to end-of-chunked-region
mov rcx, -bytes ; negative byte offset
.loop:
vmovdqu ymm1, [rdi + rcx]
; ... compare ...
add rcx, 32 ; byte stride = 32 (8 lanes × 4)
jl .loop
The negative offset doubles as the loop counter and the addressing-mode displacement. The loop tail is add + jl (two instructions, one micro-op fusion) instead of inc + cmp + jb (three instructions).
The savings are small per iteration but real, especially on long slices.
NEON: intrinsics, not assembly
NEON lacks a per-lane movemask instruction; the scalar reduction is done with vmaxvq_u32 (max across lanes) or vminvq_u32 (min across lanes) plus a per-element-position bit-set trick. The intrinsics version is fast enough that hand-rolled assembly doesn't help.
NEON is baseline on AArch64 (every aarch64 CPU has it). No runtime detection needed; the SIMD path runs whenever the slice is long enough.
Where the primitives are called from
The SIMD primitives are called by hot paths the lattice traverses:
lattice::overlapsandlattice::refinesusesimd::any_of_kindas a prefilter for theNegated,Intersected,Mixed,Object, etc. families ; "if no Element has this kind, skip the family rule entirely".lattice::join's canonicalisation usessimd::containsto detect well-known dominators (MIXED,NEVER,BOOL,RESOURCE, etc.) before applying the rule.predicates::is_int,is_string, etc. usesimd::all_of_kindas their core.predicates::contains_int,contains_string, etc. usesimd::any_of_kind.intern_type's slow path usessimd::is_sorted_strictto skip the sort + dedup when the input is already canonical.TypeBuilder::removeandTypeBuilder::replaceusesimd::position_of.
When the SIMD threshold isn't met
Most analyser-side unions are small (1-5 Elements). The threshold gates ensure that the SIMD code only runs when there's enough work to pay back the setup. On short slices, the scalar fallback runs ; LLVM autovectorises what it can.
Why hand-rolled assembly
The autovectoriser generates correct SIMD code, but it doesn't know:
- That the loop bound is the chunk count (not the byte count).
- That the broadcast can be done once outside the loop.
- That
vpcmpeqd + vpmovmskb + test + jnzis a tighter early-exit than the equivalent generated code. - That the FFmpeg pointer-end trick saves one micro-op per iteration.
For the AVX2 paths, hand-rolled assembly is consistently 10-30% faster than the autovectorised scalar on long slices. For the NEON paths, intrinsics-with-careful-loop are within 5% of hand-rolled assembly, and the maintenance is much lower.
Safety
Every SIMD function is marked unsafe fn. The public entry points are safe and gate on the threshold + the runtime feature detection (for AVX2). Inside the SIMD function:
- Unaligned loads (
vmovdqu,vld1q_u32) — both architectures support these without alignment. - Bounds: the function is called only when the slice is at least
THRESHOLDlanes; the chunk count islen / lanes_per_chunk, capped to fit. - Tail handling: each function has a scalar tail loop for the leftover lanes after the chunked region.
The unsafe blocks are local to each function and have SAFETY comments documenting the invariants.
Performance numbers
Approximate, on a modern x86_64 desktop, for a slice of 64 Elements:
simd::contains(hit early): ~5ns.simd::contains(miss to end): ~20ns.simd::any_of_kind(hit early): ~6ns.simd::any_of_kind(miss to end): ~25ns.simd::is_sorted_strict(already sorted, 64 elements): ~30ns.- Scalar equivalent (64 elements): ~3-5x slower across the board.
For typical analyser unions (5-10 Elements), the scalar fallback runs and the cost is on the order of the per-Element work itself ; nanoseconds per call.
See also: The ElementId tag layout, Interning and the arenas, Performance philosophy.
Performance philosophy
Suffete is a hot path. An analyser checking a large codebase calls refines, overlaps, meet, join, subtract, narrow and the predicates millions of times. A constant-factor regression in any of those shows up as a percentage of total analysis time.
This chapter is the philosophy. Specific optimisations are in the interner, element-id layout, and SIMD chapters.
The principles
1. Performance is a feature, not an afterthought
Performance regressions stop landing in suffete the same way correctness regressions stop. Every PR runs the codspeed benchmark suite; a regression beyond a small threshold blocks the merge. The benchmarks cover the lattice operations on a representative mix of inputs (small unions, large unions, deep generics, intersections, narrowed mixed).
This is not "ship and optimise later". The cost model is part of the design.
2. Compact representation beats clever rules
Most of suffete's speed comes from:
ElementIdis au32. Equality is one compare.TypeIdis au64. Equality is one compare.- Slices of
ElementIdare dense u32 arrays. SIMD-friendly.
The lattice rules themselves are not particularly clever ; they are the textbook subtype rules. The speed comes from the data being amenable to fast comparison and SIMD scans, not from algorithmic tricks in the rules.
3. Intern aggressively, free nothing
The interner never frees. Every ElementId and TypeId is 'static. The trade-off is committed: stable handles in exchange for accumulated memory.
In practice, the analyser's working set saturates. After ingesting a codebase, the arena is "warmed" and most subsequent queries hit existing entries. A long-running session converges; it does not grow without bound except as new types appear.
For sessions that need true per-context isolation (LSP servers across distinct codebases), the recommendation is process-per-session rather than arena-reset.
4. Pure functions, no global mutable state
Every operation is a pure function of its inputs (TypeIds, &World, LatticeOptions). Same inputs → same output. No internal state.
This buys:
- Concurrency. The lattice can be called from any thread without coordination.
- Memoisation. Callers can cache results keyed on the input handles, without worrying about invalidation.
- Testability. Property tests work because the operations are deterministic. The algebraic-law battery checks identities that only hold for pure functions.
The single piece of mutable state that exists — the interner — is concurrent-safe and append-only. Reads see a consistent snapshot.
5. Hot paths are tuned, cold paths are clear
The hot paths (the lattice operations, the SIMD primitives, the interner's lookup) are optimised aggressively: hand-rolled assembly, careful branch ordering, prefilter gates, inline annotations. The cold paths (the construction APIs, the diagnostic helpers, the serialize layer) prioritise clarity over speed.
The split is informed by codspeed. A function that doesn't show up in the benchmarks gets readable code; a function that does gets the optimisation effort.
6. Codspeed is the source of truth
We do not benchmark locally. Local microbenchmarks vary by CPU, OS, background load, and disk cache. The numbers don't transfer.
Codspeed runs every PR on a controlled VM and reports relative deltas to the base commit. A "5% regression" means 5% on Codspeed's hardware, which is not your hardware, but is the same hardware as the previous run. The signal is reproducible.
Locally we ship hunches and let codspeed vote. If a change is hopefully an improvement, we push and look at the benchmark output. If codspeed flags it as a regression, we revert; if it's a win, we keep.
7. SIMD is for the things SIMD is good at
SIMD wins when the data is naturally u32 or u64 lanes, the operation is per-lane independent (compare-equal, shift-extract, etc.), and the slice is long enough to amortise the setup. SIMD does not win when the data is heterogeneous (each Element triggers a different per-kind branch), the work per Element is heavyweight (a recursive refines call cannot be vectorised), or the slice is short.
Suffete uses SIMD for the slice scans and as prefilters in the lattice (does this slice contain any Negated Element? gate the family rule on the answer). The lattice's per-pair work is not SIMD; it can't be.
8. Allocation is a tax
Every allocation is a roundtrip to the allocator. The lattice's hot path is allocation-free: handles are passed by value, the report is mutated in-place, the workspace lives on the stack.
Callers should follow the same: reuse LatticeReport instances across queries, reuse TypeBuilder instances when constructing many types, prefer single-element prelude constants over building one-element types from scratch.
9. Branches are a tax too
Modern CPUs predict branches well, but predictable branches are a constant cost; mispredicted branches are 10-20× worse. Suffete's hot loops minimise both:
- Family dispatch in the lattice is a
matchon(a.kind(), b.kind()), which the compiler turns into a jump table for the common case. - The universal axioms (top, bot, placeholder) are checked first; once they fire, the rest of the dispatch is dead code.
- The SIMD scans use early-exit: on a match, branch out of the loop; on a miss, branch back to the loop start. The loop-back is well-predicted.
10. The lattice is not the only thing that matters
Suffete is one piece of an analyser's pipeline. The analyser's parser, the docblock interpreter, the symbol table, the codebase ingestion, the diagnostic emitter — all of these have their own performance budgets. Suffete's job is to be one constant-factor cost in that pipeline, not to dominate it.
If suffete becomes the bottleneck of a real analyser, that's a bug. The expected cost of suffete in a representative analysis is measured in low single-digit percent of total analyser time.
What doesn't matter
A few things that seem like they should matter, but don't:
- The exact number of cycles per refines call. It varies by case. The codspeed delta is what matters.
- Avoiding small allocations in cold paths.
Vec::newis cheap. Prematurely optimising it makes the code worse and saves nothing. - Inlining everything.
#[inline]helps when LLVM doesn't see the benefit; it hurts when overused (code bloat, instruction cache pressure). Suffete uses#[inline]for the small-and-hot functions and lets LLVM decide for the rest. - Lock-free data structures everywhere. The interner uses dashmap because it's a concurrent hashmap, not because every read needs to be lock-free. The lattice's hot path doesn't take locks at all.
When in doubt
Ship it. Look at codspeed. Revert if it regresses; keep it if it doesn't. Don't theorise about performance ; measure.
See also: Interning and the arenas, The ElementId tag layout, SIMD scans.
Element kinds
Every Element is one of the following kinds. The kind is the high 6 bits of the ElementId ; see the layout chapter. Trivial-kind Elements have no payload; payload-bearing kinds carry a &'static SomeInfo reference into the matching arena.
This is reference material, organised by family.
Trivial kinds (no payload)
| Kind | Prelude constant | Denotes |
|---|---|---|
Null | NULL | The value null. |
Never | NEVER | The empty type ($\bot$). |
Void | VOID | The PHP void (absence of return value). |
Placeholder | PLACEHOLDER | Inference-time hole. |
Bool | BOOL | true or false. |
True | TRUE | The value true. |
False | FALSE | The value false. |
Scalar | SCALAR | bool | int | float | string. |
Numeric | NUMERIC | int | float | numeric-string. |
ArrayKey | ARRAY_KEY | int | string. |
ObjectAny | OBJECT_ANY | Any object, no class commitment. |
Mixed | MIXED (vanilla) | Universe top ($\top$), with optional axes via MixedInfo. |
Mixed is technically payload-bearing (the MixedInfo carries the four axes), but the vanilla form (all axes default) is exposed as a constant and behaves trivially.
Scalar payload kinds
| Kind | Payload | Denotes |
|---|---|---|
Int | IntInfo | Integer (Unspecified, UnspecifiedLiteral, Literal(n), Range(IntRange)). |
Float | FloatInfo | Float (Unspecified, Literal, NonZero). |
String | StringInfo | String with literal slot + refinement flags. |
ClassLikeString | ClassLikeStringInfo | A string naming a class-like (kind + specifier). |
See scalars and class-like strings.
Object family
| Kind | Payload | Denotes |
|---|---|---|
Object | ObjectInfo | Named class (with optional type args, intersections, modality flags). |
Enum | EnumInfo | Enum (with optional case). |
ObjectShape | ObjectShapeInfo | Structural object type (with known properties, sealed/unsealed). |
HasMethod | HasMethodInfo | Anything declaring a method named m. |
HasProperty | HasPropertyInfo | Anything declaring a property named p. |
See objects.
Collection family
| Kind | Payload | Denotes |
|---|---|---|
Array | KeyedArrayInfo | Keyed array (generic, sealed shape, or empty). |
List | ListInfo | Int-keyed list (generic or sealed). |
Iterable | IterableInfo | iterable<K, V>. |
See arrays and iterables and callables.
Resource
| Kind | Payload | Denotes |
|---|---|---|
Resource | ResourceInfo | PHP resource (open / closed, named kind). |
See resources.
Callable
| Kind | Payload | Denotes |
|---|---|---|
Callable | CallableInfo | Bare, signature, or closure form. |
Wrappers
| Kind | Payload | Denotes |
|---|---|---|
Negated | NegatedInfo | Set complement: !T. |
Intersected | IntersectedInfo | Head + conjunct list: H & C1 & ... & Cn. |
See wrappers.
Generics
| Kind | Payload | Denotes |
|---|---|---|
GenericParameter | GenericParameterInfo | Free template parameter (T). |
See templates.
Unresolved
| Kind | Payload | Denotes |
|---|---|---|
Alias | AliasInfo | Type alias (resolved via World::resolve_alias). |
Reference | SymbolReference | Name reference (class, alias, parameter, etc.). |
MemberReference | MemberReference | Class-member type (Foo::T). |
GlobalReference | GlobalReference | Global type variable. |
Conditional | ConditionalInfo | T extends U ? X : Y. |
Derived | DerivedInfo | KeyOf<T>, ValueOf<T>, IndexAccess<T, K>, etc. |
Variable | VariableInfo | Analyser-introduced inference placeholder. |
See unresolved elements and expand.
How to enumerate
use suffete::ElementKind;
// Every variant in declaration order.
for k in ElementKind::iter() {
println!("{:?}", k);
}
The total count of variants is the size of core::mem::variant_count::<ElementKind>(). The maximum tag value is one less than the count.
Constructing each kind
For trivial kinds, use the prelude constants. For payload-bearing kinds, use the ElementId constructors when available, or interner().intern_* directly:
use suffete::{ElementId, prelude::INT};
use suffete::interner::interner;
use suffete::element::payload::*;
use mago_atom::atom;
// Trivial.
let _ = suffete::prelude::NULL;
let _ = INT;
// Convenience constructors.
let _ = ElementId::int_literal(7);
let _ = ElementId::string_literal("hello");
let _ = ElementId::named_object(atom("Foo"));
let _ = ElementId::named_object_with_args(atom("Box"), &[suffete::prelude::TYPE_INT]);
let _ = ElementId::enum_case(atom("Status"), atom("Active"));
// Direct interner.
let info = ResourceInfo { state: ResourceState::Open, kind: Some(atom("curl")) };
let _ = interner().intern_resource(info);
The full list of constructors is in src/element/id.rs; the per-kind interner methods are generated by the element_arena_methods! macro in src/interner/store.rs.
See also: Elements: the indivisible types for the conceptual overview; The ElementId tag layout for the bit layout; Prelude constants for the full prelude.
Lattice options and reports
The lattice operations take two extra parameters beyond the types and the world: LatticeOptions and a &mut LatticeReport. This chapter documents both.
LatticeOptions
The configuration knobs for a single lattice query. All fields default to "strict reading"; the analyser sets them based on the file's strictness mode and any per-call overrides.
use suffete::lattice::LatticeOptions;
let opts = LatticeOptions::default();
let opts = opts.with_php_runtime_coerce(false); // strict mode
let opts = opts.with_ignore_null(true); // ignore null in unions
let opts = opts.with_ignore_false(true); // ignore false in unions
The full set of options:
| Field | Default | Effect |
|---|---|---|
php_runtime_coerce | true | Admit PHP runtime coercion edges (int → float, numeric-string → int/float). Records CoercionCauses::PHP_RUNTIME_COERCE on the report when fired. Set to false for strict-types mode. |
ignore_null | false | When checking refines(τ, σ), drop null from τ before the per-Element check. Used by analyser code that has separately verified non-nullability. |
ignore_false | false | Same for false. Used in nullable-via-false patterns. |
merge_list_element_types | true | In join, merge sealed-list element types where possible. |
merge_keyed_array_params | true | In join, merge keyed-array key/value parameters where possible. |
int_literal_collapse_threshold | (small) | The number of distinct int literals at which join collapses to int or a range. |
string_literal_collapse_threshold | (small) | Same for string literals. |
The with_* methods return a new LatticeOptions ; the type is Copy, so chaining is cheap.
LatticeReport
A buffer the lattice writes into during a query. It carries:
- Coercion causes ; a bitset of which special rules fired, suitable for the analyser to surface in a diagnostic.
use suffete::lattice::{LatticeOptions, LatticeReport, CoercionCauses};
let mut report = LatticeReport::new();
// ... call lattice operations, passing &mut report ...
if report.causes.contains(CoercionCauses::PHP_RUNTIME_COERCE) {
// The check passed but used a runtime-coercion edge.
// Surface a warning if the analyser's policy requires.
}
if report.causes.contains(CoercionCauses::TEMPLATE_DEFAULT) {
// The check passed but used a default-filled template parameter.
}
CoercionCauses
A bitflags type:
| Cause | Meaning |
|---|---|
PHP_RUNTIME_COERCE | A PHP-runtime coercion edge was used (e.g. int → float). |
TEMPLATE_DEFAULT | A default-filled template parameter was tolerated. |
IGNORE_NULL | The query ran with null ignored. |
IGNORE_FALSE | The query ran with false ignored. |
The bits are non-zero when the corresponding rule contributed to the answer. The analyser reads the bitset after the query and decides what to do.
Reusing reports
LatticeReport::new() is cheap (just a zeroed bitset). For analyser hot loops, reuse one instance:
let mut report = LatticeReport::new();
for query in queries {
report.reset(); // clear the bitset
let _ = lattice::refines(query.a, query.b, &world, opts, &mut report);
// ... handle the result and the report ...
}
reset clears the bitset in O(1).
FlowFlags
A 16-bit bitset that rides on every TypeId. Stored in the flags field of the handle's u64 representation.
use suffete::FlowFlags;
let f = FlowFlags::EMPTY;
let f = f.with_from_template_default(true);
let f = f.with_isset_from_loop(true);
let f = f.with_explicitly_nullable(true);
The full set of flags:
| Flag | Meaning |
|---|---|
from_template_default | This type-arg was filled with the parameter's upper bound rather than the user's value. The variance check tolerates it (recording CoercionCauses::TEMPLATE_DEFAULT). |
isset_from_loop | This value flowed through a loop body. Used by the analyser's flow-typing to track variables introduced inside a loop. |
explicitly_nullable | The type was constructed as nullable via ?T syntax (rather than as a separate T | null union). Carried for diagnostic precision. |
The flags do not affect lattice operations directly ; they are metadata the analyser reads out via TypeId::flags(). The exception is from_template_default, which the lattice's variance check consults.
A worked example
use suffete::{TypeBuilder, prelude::{INT, FLOAT}};
use suffete::{lattice::{self, LatticeOptions, LatticeReport, CoercionCauses}, world::NullWorld};
let int_t = TypeBuilder::new().push(INT).build();
let float_t = TypeBuilder::new().push(FLOAT).build();
let world = NullWorld;
let mut report = LatticeReport::new();
// Strict mode: int does not refine float.
let opts_strict = LatticeOptions::default().with_php_runtime_coerce(false);
assert!(!lattice::refines(int_t, float_t, &world, opts_strict, &mut report));
// Non-strict mode: int coerces.
report.reset();
let opts_loose = LatticeOptions::default(); // php_runtime_coerce = true
assert!(lattice::refines(int_t, float_t, &world, opts_loose, &mut report));
assert!(report.causes.contains(CoercionCauses::PHP_RUNTIME_COERCE));
The same int <: float query, two different opts, two different answers. The report tells the analyser why the loose answer is loose.
See also: refines, meet, join, subtract, narrow, overlaps for the operations that consume these.
Prelude constants
The suffete::prelude module exposes well-known constants. Importing it gives you the common-case Elements and Types without going through the interner.
use suffete::prelude::*;
Elements (ElementId)
Landmarks
| Constant | Element | Notes |
|---|---|---|
NEVER | Never | $\bot$. |
MIXED | Mixed (vanilla) | $\top$. |
NULL | Null | The value null. |
VOID | Void | The PHP void. |
PLACEHOLDER | Placeholder | Inference hole. |
Booleans
| Constant | Element |
|---|---|
TRUE | True |
FALSE | False |
BOOL | Bool |
Scalars (unrefined)
| Constant | Element |
|---|---|
INT | Int (unspecified) |
FLOAT | Float (unspecified) |
STRING | String (unspecified, no flags) |
NUMERIC | Numeric (true union) |
SCALAR | Scalar (true union) |
ARRAY_KEY | ArrayKey (true union) |
Common refinements
| Constant | Element |
|---|---|
NUMERIC_STRING | String (unspecified, with is_numeric=true) |
NON_EMPTY_STRING | String (unspecified, with is_non_empty=true) |
EMPTY_STRING | String (literal "") |
INT_ZERO | Int (literal 0) |
Object family
| Constant | Element |
|---|---|
OBJECT_ANY | ObjectAny |
Iterable / callable
| Constant | Element |
|---|---|
ITERABLE_MIXED_MIXED | Iterable<mixed, mixed> |
CALLABLE | Callable (bare form, no signature) |
Resources
| Constant | Element |
|---|---|
RESOURCE | Resource (unrefined) |
OPEN_RESOURCE | Resource (state = Open, no kind) |
CLOSED_RESOURCE | Resource (state = Closed, no kind) |
Arrays
| Constant | Element |
|---|---|
EMPTY_ARRAY | array{} (the empty sealed shape) |
Types (TypeId)
Each TYPE_* constant is the singleton type wrapping the corresponding Element. Use these when you need a TypeId for a single Element ; it saves the construction call.
| Constant | Type | Equivalent |
|---|---|---|
TYPE_NEVER | never | TypeBuilder::new().push(NEVER).build() |
TYPE_MIXED | mixed | TypeBuilder::new().push(MIXED).build() |
TYPE_NULL | null | ... |
TYPE_VOID | void | ... |
TYPE_TRUE | true | ... |
TYPE_FALSE | false | ... |
TYPE_BOOL | bool | ... |
TYPE_INT | int | ... |
TYPE_FLOAT | float | ... |
TYPE_STRING | string | ... |
TYPE_NUMERIC | numeric | ... |
TYPE_SCALAR | scalar | ... |
TYPE_ARRAY_KEY | array-key | ... |
TYPE_NUMERIC_STRING | numeric-string | ... |
TYPE_NON_EMPTY_STRING | non-empty-string | ... |
TYPE_OBJECT_ANY | object | ... |
TYPE_RESOURCE | resource | ... |
TYPE_OPEN_RESOURCE | open-resource | ... |
TYPE_CLOSED_RESOURCE | closed-resource | ... |
TYPE_EMPTY_ARRAY | array{} | ... |
The TYPE_* constants are computed at boot time and exposed as consts. Comparing t == TYPE_INT is a single u64 compare ; faster than constructing a TypeBuilder and calling build.
When to use the prelude
For the trivial cases:
use suffete::prelude::*;
// Use the Element constant when constructing a union:
let t = TypeBuilder::new().push(INT).push(STRING).build();
// Use the Type constant when you need a TypeId directly:
let int_only: TypeId = TYPE_INT;
// Use the Type constant in operations:
let result = lattice::refines(some_t, TYPE_MIXED, &world, opts, &mut report);
For non-trivial Elements (named objects, sealed shapes, callable signatures), use the ElementId constructors or the interner directly. The prelude only covers the well-known singletons.
Naming conventions
UPPER_SNAKE_CASEforElementIdconstants.TYPE_UPPER_SNAKE_CASEforTypeIdconstants.- The element name matches the PHP-side name, lowercased and snake-cased:
INT,STRING,OBJECT_ANY,EMPTY_ARRAY,NUMERIC_STRING.
Adding to the prelude
The prelude is the place for frequently-needed constants. New entries should:
- Have a clearly-defined PHP-side meaning (so the name is unambiguous).
- Be a singleton (one canonical Element / Type per name).
- Get used often enough that an analyser shouldn't have to construct it inline.
Refinement variants that can be expressed as flag combinations on the basic kinds (e.g. truthy-string) are typically not in the prelude ; they are constructed via the interner with a StringInfo carrying the flags.
See also: Constructing types: TypeBuilder and prelude for the broader construction API; Element kinds for the full kind list.
Contributing
The book lives in book/ of the suffete repository. It's an mdbook with mathjax and mermaid enabled. Contributions are welcome — chapter improvements, fixes for incorrect descriptions, new cookbook recipes, missing diagrams.
Building locally
Install mdbook and the mermaid preprocessor:
cargo install mdbook
cargo install mdbook-mermaid
Then, from the repo root:
cd book
mdbook-mermaid install . # one-time, copies mermaid assets into theme/
mdbook serve --open # builds and watches; opens in browser
The book rebuilds on file changes; the browser auto-reloads.
Style
Match the existing chapters:
- Dense but readable. No filler. Short paragraphs. Real examples in every chapter.
- Math via mathjax.
$\tau \mathrel{<:} \sigma$,$\tau \sqcup \sigma$, etc. - Diagrams via mermaid when they clarify (not as decoration).
- Code samples are
rust,ignore; the book is not a doctest harness. - PHP samples are
php; PHP is not compiled by mdbook either, but the syntax highlighting is helpful. - "Element" not "atom" in code-related text. The atom note is in the introduction and glossary.
- No emojis anywhere ; book or commit messages.
- No em-dashes anywhere.
- Cross-link to other chapters when a concept comes up that has its own treatment.
- Every chapter ends with a "see also" footer.
Naming chapter files
Chapter files live in book/src/<part>/<name>.md. The file name is kebab-case. The SUMMARY.md references each chapter by its full path.
When you add a chapter, register it in SUMMARY.md ; otherwise it doesn't render.
Verifying
Before submitting:
cd book
mdbook build # checks links, mermaid, mathjax
mdbook test # smoke-tests code samples (rust,ignore is skipped, but the parser runs)
Open book/book/index.html in your browser and click through every page to check that math and diagrams render. CI does the same on every PR.
Where to file issues
Issues that affect the book go to the suffete repository with a book label. PRs touching book/ trigger the book CI workflow.