Semantic Sites

📋

On this page: The underlying category · Coordinate kinds · Morphisms · Grothendieck topology · Covering families · Building sites in Python · Analogy with the étale site

The Underlying Category

A semantic site is a pair $(\mathcal{C}, J)$ where $\mathcal{C}$ is a small category whose objects are semantic coordinates (code entities) and whose morphisms are semantic relationships (imports, calls, inheritance, …), and $J$ is a Grothendieck topology on $\mathcal{C}$.

Definition 2.1 — Semantic Site

$$\text{SemanticSite} \;=\; (\mathcal{C}_{\text{code}},\; J_{\text{Groth}})$$ $$\text{obj}(\mathcal{C}_{\text{code}}) = \{\text{semantic coordinates}\}, \quad \text{mor}(\mathcal{C}_{\text{code}}) = \{\text{semantic relationships}\}$$

The category is constructed automatically by JuGeo's static analyser when you call jugeo.build_site() on a module or package. The analyser traverses the AST, resolves imports, infers types from annotations, and registers all detected relationships as morphisms.

💡

Finite and computable. Real codebases are always finite, so $\mathcal{C}_{\text{code}}$ is a finite category. This means the Grothendieck topology is given by an explicit list of covering sieves for each object, computed from the dependency graph. There are no size issues.

Coordinate Kinds

Every object of $\mathcal{C}$ has a kind drawn from the following enumeration. The kind determines which morphisms are available and how covering families are generated.

MODULEA Python .py file

PACKAGEA directory with __init__.py

FUNCTIONdef or lambda

CLASSclass statement

METHODdef inside class body

VARIABLEModule-level assignment

TYPEType alias or TypeVar

IMPORTimport / from … import

Kinds are partially ordered by containment: PACKAGE > MODULE > CLASS > METHOD (and MODULE > FUNCTION > VARIABLE). This partial order generates the containment morphisms that underpin restriction maps in the sheaf.

Morphisms

Morphisms in $\mathcal{C}_{\text{code}}$ capture semantic relationships. JuGeo tracks five classes:

Containment — $f \hookrightarrow m$ (function f is defined inside module m)
Call — $f \xrightarrow{\text{call}} g$ (function f calls function g)
Import — $m \xrightarrow{\text{import}} n$ (module m imports from module n)
Inheritance — $C \xrightarrow{\text{inh}} D$ (class C inherits from class D)
Type annotation — $f \xrightarrow{\text{type}} T$ (f's return type is T)

Morphisms compose in the obvious way. JuGeo stores them in a sparse adjacency structure and indexes them by kind for efficient lookup during cover generation and descent.

Morphism composition in Ç_code

$$f \xrightarrow{r_1} g \xrightarrow{r_2} h \;\implies\; f \xrightarrow{r_1 \circ r_2} h$$

The Grothendieck Topology

A Grothendieck topology $J$ on $\mathcal{C}$ assigns to each object $U$ a collection $J(U)$ of covering sieves — sieves that "cover" $U$ in the sense that local data on the cover can be glued to global data on $U$.

JuGeo defines the topology declaratively through coverage axioms. The default topology uses three axiom schemas:

Containment coverage — A module is covered by the set of all its top-level definitions. A class is covered by the set of all its methods. Formally: $\{f_i \hookrightarrow U\}_i \in J(U)$.
Import coverage — A package is covered by the set of all modules it imports. This lets module-level judgments be assembled into package-level guarantees.
Call-graph coverage — A function is covered by the set of functions it directly calls (and transitively, if configured). This supports interprocedural analysis.

The three schemas satisfy the axioms of a Grothendieck topology (maximality, stability under base change, transitivity).

📚

Custom topologies. You can extend or replace the default topology by subclassing GrothendieckTopology and overriding covering_sieves(obj). This is useful for domain-specific analyses, e.g., when you want coverage based on test files rather than call graphs.

Covering Families

A covering family $\{U_i \to U\}_{i \in I}$ is the data JuGeo uses when it needs to prove a property of $U$ by proving it locally on each $U_i$ and then gluing. The covering family is the principal data of the topology — a sieve is just the closure of a family under pre-composition.

In practice, covering families look like:

cover(mymod) → [mymod.f1, mymod.f2, mymod.MyClass]
cover(MyClass) → [MyClass.__init__, MyClass.method_a, MyClass.method_b]
cover(mymod.process) → [mymod.validate, mymod.transform, mymod.emit]

Building Sites in Python

python

from jugeo.geometry.site import (
    Coordinate,
    CoordinateKind,
    CoveringFamily,
    Morphism,
    MorphismKind,
    SiteBuilder,
)

module = Coordinate(("mymod",), CoordinateKind.MODULE)
validate = Coordinate(("mymod", "validate"), CoordinateKind.FUNCTION)
transform = Coordinate(("mymod", "transform"), CoordinateKind.FUNCTION)
emit = Coordinate(("mymod", "emit"), CoordinateKind.FUNCTION)

cover = CoveringFamily(
    base=module,
    members=[
        Morphism(validate, module, MorphismKind.RESTRICTION, label="validate→module"),
        Morphism(transform, module, MorphismKind.RESTRICTION, label="transform→module"),
        Morphism(emit, module, MorphismKind.RESTRICTION, label="emit→module"),
    ],
    label="module-cover",
)

site = (
    SiteBuilder("demo")
    .add_coordinates([module, validate, transform, emit])
    .add_morphisms(cover.members)
    .add_covering_family(cover)
    .build()
)

print(site.coordinate_count())           # 4
print(site.morphism_count())             # 3
print(site.topology.is_covering(cover))  # True

python

from jugeo.geometry.site import Coordinate, CoordinateKind, GrothendieckTopology

module = Coordinate(("mypackage",), CoordinateKind.MODULE)
helper = Coordinate(("mypackage", "utils", "helper"), CoordinateKind.FUNCTION)
api = Coordinate(("mypackage", "api"), CoordinateKind.MODULE)

print(helper.parent().name)         # mypackage.utils
print(module.is_prefix_of(helper))  # True
print(module.children(["api", "db"]))

canonical = GrothendieckTopology.canonical()
print(canonical.identity_axiom_check(module))  # True
print(api.common_ancestor(helper).name)        # mypackage

Analogy with the Étale Site

Algebraic geometers will recognise the structure: JuGeo's semantic site is deliberately modelled on the étale site of a scheme. The analogy is instructive:

Étale site of a scheme X

Objects: étale maps U → X
Morphisms: X-morphisms U → V
Covers: jointly surjective families
Sheaves: local systems, coherent sheaves
Cohomology: Galois representations, ℓ-adic

Semantic site of a codebase

Objects: semantic coordinates (MODULE, FUNCTION, …)
Morphisms: import / call / containment
Covers: containment / import / call-graph families
Sheaves: judgment sheaves, type sheaves
Cohomology: H¹ obstruction classes (unresolvable conflicts)

The key difference is that the étale site is defined over a geometric object while JuGeo's site is defined over a discrete finite category. This means all sheaf conditions reduce to finite systems of equations, making them computable. There is no need for the machinery of derived categories — the relevant structure is captured by the explicit DAG of semantic relationships.

📚

Further reading. The full categorical treatment of semantic sites, including proofs that the three coverage axioms generate a genuine Grothendieck topology, appears in the Semantic Sites monograph. The relationship to the étale site is explored in the Hypercovers monograph.