Introduction
Python is the world's most popular programming language. It powers scientific computing, machine learning, web backends, financial systems, and critical infrastructure. Yet Python has essentially no formal verification story. Type checkers like mypy catch some bugs, but they cannot prove that your sorting function actually sorts, that your financial calculation never loses a cent, or that your concurrent server is free of data races.
Deppy changes this. It is a proof-oriented verification system for Python that lets you state and prove rich properties about existing code — without modifying that code.
What Is Python with Dependent Types?
Proof-oriented programming (POP), as pioneered by F*, is a paradigm where programs and proofs coexist in the same language. The compiler checks both your code and your proofs, rejecting programs that don't satisfy their specifications.
But F* requires you to write in F*. Dafny requires you to write in Dafny. For the vast majority of working programmers, rewriting production code in a new language is not feasible.
Deppy's answer is proof-oriented verification: you keep your
Python code exactly as it is, and write specifications and proofs beside
it in .deppy sidecar files. The Deppy checker reads both your
Python source and the sidecar proofs, and verifies that the code satisfies
the specification.
Traditional POP (F*)
(* You must write in F* *)
val factorial: nat -> Tot nat
let rec factorial n =
if n = 0 then 1
else n * factorial (n - 1)
Deppy (Verify Existing Python)
# Your existing Python — untouched
def factorial(n: int) -> int:
if n <= 0:
return 1
return n * factorial(n - 1)
A Capsule Summary of Deppy
Deppy combines six key ingredients:
-
Refinement types — types refined by logical predicates, like
refined(PyIntType(), "x >= 0", "x")(or justNat()) for natural numbers. These are the bread-and-butter of specification. - Z3 SMT solver — automatically discharges arithmetic, boolean, and data-structure obligations. Most simple properties verify with zero manual proof effort.
- Cubical homotopy type theory — path types model type equivalences, transport moves proofs between equivalent types, and Čech decomposition breaks complex proofs into composable local patches.
-
The cubical heap — instead of separation logic's
★, Deppy models mutation as paths in a world category. Mutations bump an epoch counter, aliases areDuckPaths at the cell level, and reads after mutation require explicitTransportProof. See Part 6.1. -
PSDL — Python-syntax tactics — the proof body is
ordinary Python. Walrus binding, postfix
.symm/.trans, operator overloadingp @ q/~p/p ** k, comprehensions as induction, andassert P, "qed"as proof closer. See Part 5.5. -
Sidecar metadata in two equivalent forms —
.deppyfiles (best for verifying third-party libraries) or proof decorators (@foundation,@axiom,@verify,@psdl_proof) inline with Python (best when you're writing the module yourself). Both feed the same kernel pipeline. See Part 4.2.
How Deppy Differs from Other Systems
vs. F*
F* is a full programming language with dependent types that extracts to C, OCaml, or Rust. You write programs in F*. Deppy does not replace Python — it verifies Python. F*'s effect system (Tot, Div, ST, etc.) maps to Deppy's verification levels. F*'s SMT-backed automation is mirrored by Deppy's Z3 integration. But Deppy adds homotopy methods that F* does not have, because Python's dynamic typing demands them.
vs. Dafny
Dafny is closest in spirit: it uses Z3 and has a familiar syntax. But Dafny is its own language (compiling to C#, Go, Java, etc.), and it has no concept of duck typing, dynamic types, or decorators. Deppy handles all of these. Dafny also has no sidecar proof mechanism — you must write in Dafny.
vs. Liquid Haskell
Liquid Haskell adds refinement types to Haskell via annotations in comments. This is the closest precedent to Deppy's non-invasive approach. But Liquid Haskell is limited to Haskell (a statically typed, purely functional language), and its annotations must still go inside the source file. Deppy's sidecar proofs are fully external, and its homotopy-theoretic foundations handle Python's dynamic features that have no Haskell equivalent.
vs. mypy / pyright
Type checkers like mypy verify type correctness — that you pass an
int where an int is expected. Deppy verifies
functional correctness — that your function returns the right value,
maintains invariants, terminates, and is safe for concurrent use. Type checking
is a subset of what Deppy can express.
The Key Innovation: Homotopy Type Theory for Dynamic Languages
Why homotopy type theory? Consider Python's duck typing. A function that calls
x.read() works with any object that has a read method —
files, StringIO, BytesIO, network sockets, mock objects. In traditional type
theory, these are unrelated types. In homotopy type theory, they are connected
by paths:
# These types are "path-equivalent" in Deppy:
# They all support .read() → str
path : TextIOWrapper ≃ StringIO # via {read, write, close}
path : StringIO ≃ MockFile # via {read, write}
# Transport: a proof about TextIOWrapper.read()
# can be *transported* to StringIO.read()
transport(path, proof_about_textio) : Proof[StringIO]
This is not just an analogy. Deppy implements cubical path types where:
- Two types are path-equivalent if they share the same structural interface (duck typing)
- Transport carries proofs from one type to an equivalent type
- Čech decomposition breaks a complex module into overlapping "patches" that can be verified independently and glued together
- Fibrations model dependent structures like decorated functions, where the output type depends on the decorator applied
Worked Example: Factorial with Full Verification
Let's walk through a complete verification from start to finish. We'll verify a simple factorial function, proving it correct, terminating, and non-negative.
Step 1: The Python Code
This is existing code that we will not modify:
# math_utils.py — existing production code
from __future__ import annotations
def factorial(n: int) -> int:
"""Compute n! for non-negative integers."""
if n <= 0:
return 1
return n * factorial(n - 1)
Step 2: The Sidecar Specification
We create a sidecar file with three proof obligations:
# math_utils.deppy — sidecar proof file (YAML-shaped)
module: math_utils
foundation Nat_zero_succ: "factorial(0) == 1; factorial(n) == n * factorial(n-1)"
spec math_utils.factorial:
guarantee: "factorial(n) >= 1 when n >= 0"
pure: true
verify factorial_nonneg:
function: math_utils.factorial
property: "factorial(n) >= 1"
via: "foundation Nat_zero_succ"
binders: { n: Nat }
psdl_proof: |
if n <= 0:
assert 1 >= 1, "z3"
else:
induction(n)
assert n * factorial(n-1) >= 1, "z3"
Equivalently, in decorator form if you wrote
math_utils.py yourself:
from deppy.proofs.sidecar_decorators import (
foundation, verify, psdl_proof,
)
@foundation
def Nat_zero_succ():
"""factorial(0) == 1; factorial(n) == n * factorial(n-1)"""
@verify(
property="factorial(n) >= 1",
via="foundation Nat_zero_succ",
binders={"n": "Nat"},
)
@psdl_proof("""
if n <= 0:
assert 1 >= 1, "z3"
else:
induction(n)
assert n * factorial(n-1) >= 1, "z3"
""")
def factorial(n: int) -> int:
if n <= 0:
return 1
return n * factorial(n - 1)
Step 3: Run Verification
$ python -m deppy.pipeline.run_verify math_utils.deppy --out math_utils.lean
…
ProofTerm subclasses (C1–C7 audit batch):
constructed / kernel-accepted: N / N
``verify`` blocks (.deppy → kernel.verify):
accepted by ProofKernel.verify: N
Lean export round-trip: OK
────────────────────────────────────────────────
✓ CERTIFIED — top-level certify_or_die: PASS
The pipeline emits a Lean certificate at the --out
path and a JSON sidecar next to it. Both are reproducible — running
again produces byte-identical output assuming the same inputs.
Step 4: Understanding the Proof
How Deppy Verified factorial
math_utils.py,
extracts the AST for factorial, and binds it to the spec in
math_utils.deppy.
decreases(n) annotation tells
Deppy that n strictly decreases on each recursive call. Since
Nat is bounded below by 0, this is a well-founded measure. Z3
checks: n > 0 ⟹ n - 1 < n ∧ n - 1 ≥ 0. ✓
n ≤ 0,
the function returns 1. Z3 checks: 1 ≥ 1. ✓
n > 0,
the function returns n * factorial(n - 1). By the inductive
hypothesis, factorial(n - 1) ≥ 1. Z3 checks:
n ≥ 1 ∧ factorial(n-1) ≥ 1 ⟹ n * factorial(n-1) ≥ 1. ✓
Pos, i.e.,
int with r ≥ 1. This follows directly from the
postcondition. ✓
Verification Levels
Not all proofs are created equal. Deppy assigns every verified obligation a trust level indicating how it was checked:
| Level | Badge | Meaning | Trust |
|---|---|---|---|
LEAN_KERNEL_VERIFIED |
🟢 | Proof exported to Lean 4 and checked by its kernel | Highest — machine-checked by an independent proof assistant |
Z3_VERIFIED |
🟡 | Discharged by Z3 SMT solver | High — Z3 is trusted and widely used, but its kernel is not independently verified |
LLM_CHECKED |
🟠 | An LLM oracle judged the property correct with confidence ≥ threshold | Medium — useful for rapid prototyping, but should be upgraded to Z3 or Lean |
UNTRUSTED |
🔴 | No verification performed yet | None — specification exists but hasn't been checked |
The verification levels form a lattice: LEAN_KERNEL_VERIFIED ⊒
Z3_VERIFIED ⊒ LLM_CHECKED ⊒ UNTRUSTED.
You can always upgrade a proof to a higher trust level without changing the
specification.
# For a .deppy sidecar, the run_verify pipeline always exports
# Lean and runs ``lean`` against it on every invocation.
# When the round-trip succeeds, every verify block reaches
# LEAN_KERNEL_VERIFIED automatically:
$ python -m deppy.pipeline.run_verify math_utils.deppy --out math_utils.lean
Lean export round-trip: OK
✓ CERTIFIED — top-level certify_or_die: PASS
# For a Python module that uses inline @guarantee/@ensures decorators,
# the ``deppy upgrade`` subcommand re-runs the safety pipeline aiming
# at a stricter level:
$ python -m deppy upgrade math_utils.py --to Z3_VERIFIED --json
{"target_level": "Z3_VERIFIED", "upgraded": 5, "failed": 0, ...}
LLM_CHECKED proofs are not formal proofs. They are
high-confidence heuristic checks useful during development. Production code
should target Z3_VERIFIED or LEAN_KERNEL_VERIFIED.
The Sidecar Approach
The sidecar approach is Deppy's defining architectural choice. Instead of
embedding specifications in your Python source (like type hints or docstrings),
proofs live in separate .deppy files:
# Project layout with sidecar proofs
myproject/
├── myproject/
│ ├── __init__.py
│ ├── math_utils.py # your code — untouched
│ ├── math_utils.deppy # sidecar proofs
│ ├── data_pipeline.py # your code — untouched
│ └── data_pipeline.deppy # sidecar proofs
├── tests/
│ └── test_math.py
└── pyproject.toml
This design has several advantages:
- Zero modification. Your Python code is never changed. This means no risk of introducing bugs through annotation, no merge conflicts with verification artifacts, and no dependency on Deppy at runtime.
- Library verification. You can write sidecar proofs for code you don't own — pip-installed packages, standard library modules, even C extensions (via their Python interface).
- Incremental adoption. Add sidecar proofs one function at a time. Unverified code coexists with verified code. No big-bang rewrite needed.
- Separation of concerns. Domain experts write Python; verification engineers write proofs. The two can work in parallel on different files.
How Sidecar Binding Works
A sidecar file uses @verify("module.function") to bind a
specification to a target function. Deppy resolves the target by:
- Parsing the Python source to extract the function's AST
- Matching the spec's parameter names and types against the target
- Generating verification conditions from the spec's postconditions, refinements, and proof terms
- Discharging the conditions using Z3 (or Lean, or the LLM oracle)
# Sidecar file can verify multiple functions from the same module
from deppy import verify, refined
@verify("data_pipeline.clean_data")
def clean_data_spec(
df: DataFrame
) -> refined(DataFrame, lambda r: not r.isnull().any().any()):
"""No nulls in the output."""
z3.auto()
@verify("data_pipeline.normalize")
def normalize_spec(
values: list[float]
) -> refined(list[float], lambda r: all(0 <= v <= 1 for v in r)):
"""All values in [0, 1]."""
assume(len(values) > 0)
z3.auto()
What Comes Next
With the big picture in mind, you're ready to start verifying Python code. The next chapter, Getting Started, walks you through installing Deppy, setting up your first project, and verifying your first function.
Here's a roadmap of what you'll learn:
- Part 1 teaches refinement types and Z3 — enough to verify pure functions, data structures, and algorithms like quicksort.
- Part 2 adds advanced type-level reasoning: generics, dependent types, and metatheory proofs.
- Part 3 handles Python's unique features: duck typing as path equivalence, decorators as fibrations, and dynamic typing as transport.
- Part 4 develops the sidecar proof methodology and demonstrates it on a real library (SymPy).
- Part 5 dives deep into homotopy type theory: path types, transport, Čech decomposition, and fibrations.
- Part 6 tackles mutable state: separation logic, references, arrays, GIL-aware threading, and async/await.
- Why can't mypy verify that a sorting function actually sorts?
- What is a refinement type, and how does it differ from a type hint?
- Why does duck typing require homotopy type theory rather than simple subtyping?
- What is the advantage of sidecar proofs over inline annotations?