Semantic Sites
A semantic site models a Python codebase as a category equipped with a Grothendieck topology — the foundation that lets JuGeo reason about local properties and assemble them into global guarantees via sheaf descent.
The Underlying Category
A semantic site is a pair \((\mathcal{C}, J)\) where \(\mathcal{C}\) is a small category whose objects are semantic coordinates (code entities) and whose morphisms are semantic relationships (imports, calls, inheritance, …), and \(J\) is a Grothendieck topology on \(\mathcal{C}\).
The category is constructed automatically by JuGeo's static analyser when
you call jugeo.build_site() on a module or package. The analyser
traverses the AST, resolves imports, infers types from annotations, and
registers all detected relationships as morphisms.
Finite and computable. Real codebases are always finite, so \(\mathcal{C}_{\text{code}}\) is a finite category. This means the Grothendieck topology is given by an explicit list of covering sieves for each object, computed from the dependency graph. There are no size issues.
Coordinate Kinds
Every object of \(\mathcal{C}\) has a kind drawn from the following enumeration. The kind determines which morphisms are available and how covering families are generated.
Kinds are partially ordered by containment: PACKAGE > MODULE > CLASS > METHOD (and MODULE > FUNCTION > VARIABLE). This partial order generates the containment morphisms that underpin restriction maps in the sheaf.
Morphisms
Morphisms in \(\mathcal{C}_{\text{code}}\) capture semantic relationships. JuGeo tracks five classes:
- Containment — \(f \hookrightarrow m\) (function f is defined inside module m)
- Call — \(f \xrightarrow{\text{call}} g\) (function f calls function g)
- Import — \(m \xrightarrow{\text{import}} n\) (module m imports from module n)
- Inheritance — \(C \xrightarrow{\text{inh}} D\) (class C inherits from class D)
- Type annotation — \(f \xrightarrow{\text{type}} T\) (f's return type is T)
Morphisms compose in the obvious way. JuGeo stores them in a sparse adjacency structure and indexes them by kind for efficient lookup during cover generation and descent.
The Grothendieck Topology
A Grothendieck topology \(J\) on \(\mathcal{C}\) assigns to each object \(U\) a collection \(J(U)\) of covering sieves — sieves that "cover" \(U\) in the sense that local data on the cover can be glued to global data on \(U\).
JuGeo defines the topology declaratively through coverage axioms. The default topology uses three axiom schemas:
- Containment coverage — A module is covered by the set of all its top-level definitions. A class is covered by the set of all its methods. Formally: \(\{f_i \hookrightarrow U\}_i \in J(U)\).
- Import coverage — A package is covered by the set of all modules it imports. This lets module-level judgments be assembled into package-level guarantees.
- Call-graph coverage — A function is covered by the set of functions it directly calls (and transitively, if configured). This supports interprocedural analysis.
The three schemas satisfy the axioms of a Grothendieck topology (maximality, stability under base change, transitivity).
Custom topologies. You can extend or replace the default
topology by subclassing GrothendieckTopology and overriding
covering_sieves(obj). This is useful for domain-specific analyses,
e.g., when you want coverage based on test files rather than call graphs.
Covering Families
A covering family \(\{U_i \to U\}_{i \in I}\) is the data JuGeo uses when it needs to prove a property of \(U\) by proving it locally on each \(U_i\) and then gluing. The covering family is the principal data of the topology — a sieve is just the closure of a family under pre-composition.
In practice, covering families look like:
cover(mymod)→[mymod.f1, mymod.f2, mymod.MyClass]cover(MyClass)→[MyClass.__init__, MyClass.method_a, MyClass.method_b]cover(mymod.process)→[mymod.validate, mymod.transform, mymod.emit]
Building Sites in Python
from jugeo.geometry.site import (
Coordinate,
CoordinateKind,
CoveringFamily,
Morphism,
MorphismKind,
SiteBuilder,
)
module = Coordinate(("mymod",), CoordinateKind.MODULE)
validate = Coordinate(("mymod", "validate"), CoordinateKind.FUNCTION)
transform = Coordinate(("mymod", "transform"), CoordinateKind.FUNCTION)
emit = Coordinate(("mymod", "emit"), CoordinateKind.FUNCTION)
cover = CoveringFamily(
base=module,
members=[
Morphism(validate, module, MorphismKind.RESTRICTION, label="validate→module"),
Morphism(transform, module, MorphismKind.RESTRICTION, label="transform→module"),
Morphism(emit, module, MorphismKind.RESTRICTION, label="emit→module"),
],
label="module-cover",
)
site = (
SiteBuilder("demo")
.add_coordinates([module, validate, transform, emit])
.add_morphisms(cover.members)
.add_covering_family(cover)
.build()
)
print(site.coordinate_count()) # 4
print(site.morphism_count()) # 3
print(site.topology.is_covering(cover)) # True
from jugeo.geometry.site import Coordinate, CoordinateKind, GrothendieckTopology
module = Coordinate(("mypackage",), CoordinateKind.MODULE)
helper = Coordinate(("mypackage", "utils", "helper"), CoordinateKind.FUNCTION)
api = Coordinate(("mypackage", "api"), CoordinateKind.MODULE)
print(helper.parent().name) # mypackage.utils
print(module.is_prefix_of(helper)) # True
print(module.children(["api", "db"]))
canonical = GrothendieckTopology.canonical()
print(canonical.identity_axiom_check(module)) # True
print(api.common_ancestor(helper).name) # mypackage
Analogy with the Étale Site
Algebraic geometers will recognise the structure: JuGeo's semantic site is deliberately modelled on the étale site of a scheme. The analogy is instructive:
Étale site of a scheme X
- Objects: étale maps U → X
- Morphisms: X-morphisms U → V
- Covers: jointly surjective families
- Sheaves: local systems, coherent sheaves
- Cohomology: Galois representations, ℓ-adic
Semantic site of a codebase
- Objects: semantic coordinates (MODULE, FUNCTION, …)
- Morphisms: import / call / containment
- Covers: containment / import / call-graph families
- Sheaves: judgment sheaves, type sheaves
- Cohomology: H¹ obstruction classes (unresolvable conflicts)
The key difference is that the étale site is defined over a geometric object while JuGeo's site is defined over a discrete finite category. This means all sheaf conditions reduce to finite systems of equations, making them computable. There is no need for the machinery of derived categories — the relevant structure is captured by the explicit DAG of semantic relationships.
Further reading. The full categorical treatment of semantic sites, including proofs that the three coverage axioms generate a genuine Grothendieck topology, appears in the Semantic Sites monograph. The relationship to the étale site is explored in the Hypercovers monograph.