Trust & Verification Levels
Not all proofs are created equal. A theorem checked by the Lean kernel is more trustworthy than one an LLM says "looks right." Deppy makes this distinction explicit with a four-level trust lattice that tracks exactly how much confidence you can place in each verified property.
Kernel Trust Levels
Deppy has two trust axes, each with its own label
space. The kernel trust level records how the deppy
kernel verified a single proof term. The Lean trust
level records how (or whether) the exported Lean 4 source was
checked by Lean itself. A proof can be
KERNEL-trusted by deppy and LEAN_EXPORTED
by Lean — those are independent dimensions.
The seven kernel levels
| Level | Badge | What It Means | How You Get It |
|---|---|---|---|
🟢 KERNEL |
KERNEL | Every sub-proof went through kernel inference rules (Refl / Sym / Trans / Cong / PathAbs / PathApp / induction constructors). No Z3 leaf, no axiom invocation, no structural pinky-promise. | Build a proof term out of the inference-rule constructors
only. The kernel returns this trust level from
ProofKernel.verify. |
🟡 Z3_VERIFIED |
Z3_VERIFIED | Proved by the Z3 SMT solver: Z3 returned unsat
on the negation of the goal. Z3 is mature but not
constructive; soundness depends on Z3's correctness. |
Use a Z3Proof proof term, or the
T.z3 / T.z3_auto tactics. |
🟡 STRUCTURAL_CHAIN |
STRUCTURAL_CHAIN | The kernel assembled the proof's structure, but at least
one leaf is Structural (an unchecked
pinky-promise). Use ProofCertificate.kernel_verified
to reject chains that contain such a leaf. |
A proof whose composition is kernel-checked but whose atoms are declarative structural witnesses. |
🟠 LLM_CHECKED |
🟠 LLM_CHECKED | An LLM oracle says the property is correct with confidence ≥ the configured threshold. Useful for natural-language specs and heuristic checks; not a formal proof. | Oracle query returns {"correct": true, "confidence": ≥ threshold}. |
🟠 AXIOM_TRUSTED |
🟠 AXIOM_TRUSTED | The proof cites a registered axiom. Trust is only as
good as the axiom's faithfulness to the source. For
code-derived axioms (those produced by
axiomize()) the axiom IS the source, which
is usually what you want. |
AxiomInvocation(name, module) — or any
tactic built from one, e.g., T.axiom /
T.cases / T.unfold. |
🟠 EFFECT_ASSUMED |
🟠 EFFECT_ASSUMED | An effect witness was accepted at face value. Used by the effect pipeline when a function declares effects but the discharge is deferred. | EffectWitness proof terms. |
🔴 UNTRUSTED |
UNTRUSTED | No verification. Default for any obligation without a
proof. Also emitted as a failure state when a discharge
path (e.g. a missing verify_fn, an
Unfold without sub-proof) can't run. |
Default starting point for all new specs. |
The six Lean export labels
compile_to_lean(fn) emits Lean 4 source
without invoking the Lean compiler. To actually run
Lean, call cert.verify_with_lean() on the returned
LeanCertificate. The following labels record where
the certificate is on that trajectory:
| Label | What It Means | How You Get It |
|---|---|---|
🟢 LEAN_KERNEL_VERIFIED |
The lean binary was invoked on the rendered
file and returned success. This is the gold standard —
equivalent to a Lean kernel-accepted proof. |
cert.verify_with_lean() succeeds; requires
lean on PATH. |
🟡 LEAN_SYNTAX_COMPLETE |
No sorry, no raw axioms, but
Lean has NOT been run. The file is syntactically
complete — a necessary but not sufficient condition for
acceptance. |
compile_to_lean(fn) with a spec that has a
full proof strategy and no sorry insertions. |
🟠 LEAN_EXPORTED |
Source emitted; one or more sorrys remain.
This is the default when a spec's translator can't
produce a full proof. |
Default when a spec fails to be fully translated; also
the default on the initial compile_to_lean. |
🟠 ASSUMPTION_DEPENDENT |
The emitted file declares raw axiom
statements. Lean will accept them unconditionally, so
any theorem routed through one inherits the assumption. |
When a spec needs an axiom deppy can't derive from source (e.g., a classical-logic axiom like LEM). |
🔴 LEAN_REJECTED |
verify_with_lean() ran lean and
Lean rejected the file. cert.lean_check_stderr
carries the diagnostic. |
A bug in the translator, a stale Mathlib import, or a tactic Lean can't prove. |
⚪ LEAN_UNAVAILABLE |
The lean binary was not found on PATH. No
kernel check was attempted. |
Install Lean 4 (see the Lean setup guide) to enable kernel verification. |
ProofCertificate.kernel_verified — it returns
True only when success AND no
Structural leaf appears in the proof tree. For Lean
integration, use cert.is_kernel_verified — True
only after verify_with_lean() accepted.
-
The 7-element deppy-kernel lattice
(
TrustLevel) is a total order:KERNEL > Z3_VERIFIED > STRUCTURAL_CHAIN > LLM_CHECKED > AXIOM_TRUSTED > EFFECT_ASSUMED > UNTRUSTED. The kernel takes themin_trust(...)of all sub-proofs to compute the overall verdict. -
The 6-element Lean export lattice
(
LeanTrustLevel) is a parallel total order:LEAN_KERNEL_VERIFIED > LEAN_SYNTAX_COMPLETE > LEAN_EXPORTED > ASSUMPTION_DEPENDENT > LEAN_REJECTED > LEAN_UNAVAILABLE. This lattice describes how much of the Lean toolchain has accepted the proof; it is tracked separately because Lean acceptance is a different guarantee than deppy-kernel acceptance (Lean has stronger metatheory but a smaller axiom budget).
LEAN_KERNEL_VERIFIED is not greater than
KERNEL; they certify different things.
When Each Level Is Appropriate
🟢 LEAN_KERNEL_VERIFIED — The Gold Standard
Use when:
- Safety-critical code (medical, financial, aerospace)
- Core data structure invariants
- Cryptographic protocol correctness
- Foundational theorems your entire codebase depends on
# A property so important it needs Lean verification
@guarantee("sorted list is a permutation of input", trust="LEAN_KERNEL_VERIFIED")
def sort(xs: list[int]) -> list[int]:
# This generates a Lean proof obligation:
# theorem sort_correct : ∀ xs, Sorted (sort xs) ∧ Perm (sort xs) xs
...
LEAN_KERNEL_VERIFIED is expensive. The Python → Lean translation adds
development time. Reserve it for properties that must be correct.
🟡 Z3_VERIFIED — The Pragmatic Choice
Use when:
- Arithmetic bounds and overflow checks
- Array index safety
- Precondition verification at call sites
- Linear arithmetic properties
- Most sidecar precondition proofs
# Z3 handles these naturally
@guarantee("index is within bounds", trust="Z3_VERIFIED")
def safe_get(arr: list[int], i: int) -> int:
assume(i >= 0 and i < len(arr))
return arr[i]
# Z3 checks: 0 ≤ i < len(arr) at every call site
🟠 LLM_CHECKED — The Frontier
Use when:
- Natural-language specifications that can't be formalized easily
- Heuristic correctness checks during development
- Semantic properties ("this function returns user-friendly error messages")
- Quick smoke-testing before investing in formal proofs
# Properties that resist formalization
@guarantee("error message is helpful and mentions the invalid field",
trust="LLM_CHECKED", confidence=0.85)
def validate_config(config: dict) -> str | None:
# Oracle checks: does the returned string mention the bad field?
...
LLM_CHECKED is inherently unreliable. The oracle can hallucinate. Use
this level as a stepping stone — flag properties as LLM_CHECKED to
remind yourself they need formal proofs later.
🔴 UNTRUSTED — The Starting Point
Use when:
- You've just written a spec and haven't verified it yet
- External dependencies with no sidecar
- Properties you plan to verify but haven't gotten to
# Every spec starts here
@about("requests.get")
class RequestsGetSpec(ExternalSpec):
trust_level = "UNTRUSTED" # TODO: add real verification
@assume
def postcondition(self, url, result):
return hasattr(result, 'status_code')
Trust Propagation
The most important rule in Deppy's trust system: trust flows downward. A function's trust level is bounded by the minimum trust of everything it depends on.
The Rule
If function A calls function B, then
trust(A) ≤ trust(B).
Formally: trust(f) = min(own_proof_level(f), min(trust(g) for g in deps(f)))
Example: Trust Propagation Chain
# trust_example.py
import math
from decimal import Decimal
@guarantee("result is positive")
def compute_distance(x: float, y: float) -> float:
# Calls math.sqrt — trust depends on math.deppy
return math.sqrt(x * x + y * y)
@guarantee("normalized vector has magnitude 1")
def normalize(x: float, y: float) -> tuple[float, float]:
# Calls compute_distance — trust ≤ trust(compute_distance)
d = compute_distance(x, y)
return (x / d, y / d)
@guarantee("angle is in [0, 2π)")
def angle_between(x1, y1, x2, y2) -> float:
# Calls normalize — trust ≤ trust(normalize)
nx1, ny1 = normalize(x1, y1)
nx2, ny2 = normalize(x2, y2)
return math.atan2(ny2 - ny1, nx2 - nx1) % (2 * math.pi)
$ python -m deppy check trust_example.py --json
{
"module_path": "trust_example.py",
"aggregate_trust": "UNTRUSTED", # pinned by math.atan2
"functions": {
"compute_distance": { "trust": "Z3_VERIFIED" },
"normalize": { "trust": "Z3_VERIFIED" },
"angle_between": { "trust": "UNTRUSTED",
"unaddressed": ["math.atan2"] }
}
}
# The aggregate is min(every function's trust), so adding
# a sidecar spec for math.atan2 lifts angle_between → Z3_VERIFIED
# and the module aggregate climbs to Z3_VERIFIED.
UNTRUSTED.
Mixing Trust Levels in a Project
Real projects will have a mix of trust levels. This is expected and healthy. Deppy visualizes the trust landscape:
# A small project shell loop that aggregates trust across modules:
$ for mod in src/core/types.py src/api/handlers.py src/utils/formatting.py; do
python -m deppy check "$mod" --json | jq -r '.aggregate_trust + " " + .module_path'
done
LEAN_KERNEL_VERIFIED src/core/types.py
Z3_VERIFIED src/api/handlers.py
UNTRUSTED src/utils/formatting.py
# Or, when you have a sidecar:
$ python -m deppy.pipeline.run_verify sympy_geometry.deppy --out sympy_geometry.lean
$ jq '.section_counters' sympy_geometry.json | head -8
{
"foundation_count": 14,
"axiom_count": 38,
"verify_total": 11,
"verify_kernel_verified": 11,
...
}
Gradual Verification
Gradual verification is the practice of incrementally raising your project's trust level over time. Deppy is designed for this workflow.
The Gradual Verification Ladder
python -m deppy.pipeline.run_verify
your.deppy (or --from-module yourmodule)
into CI and let the JSON report be the baseline. The
section_counters.verify_total field starts at 0.
@guarantee to your critical functions. Coverage
climbs to 30%. Many are 🟠 LLM_CHECKED.
src/core/."
CI Integration
Deppy integrates with CI/CD pipelines to enforce trust levels and track verification coverage over time.
GitHub Actions Example
# .github/workflows/verify.yml
name: Deppy Verification
on: [push, pull_request]
jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Deppy run: pip install git+https://github.com/thehalleyyoung/deppy.git && pip install z3-solver
- name: Run verification
run: |
for d in sidecars/*.deppy; do
python -m deppy.pipeline.run_verify "$d" --out "out/$(basename "$d" .deppy).lean"
done
- name: Upload report
uses: actions/upload-artifact@v4
with:
name: verification-report
path: verification-report.json
- name: Fail on regression
run: |
python - <<'PY'
import json, sys
for path in ("out/foo.json", "out/bar.json"):
j = json.load(open(path))
assert j["certified"], f"{path} did not certify: {j['failures']}"
c = j["section_counters"]
assert c["verify_kernel_verified"] == c["verify_total"], \
f"{path}: {c['verify_total']-c['verify_kernel_verified']} verify blocks unverified"
print("All sidecars certified.")
PY
Trust as a CI Gate
The pipeline writes a top-level certified boolean and a
failures list to each JSON sidecar. CI can fail the
build whenever certified drops to false:
$ python -m deppy.pipeline.run_verify sidecar/auth.deppy --out out/auth.lean
$ jq '.certified, .failures, .section_counters.verify_kernel_verified' out/auth.json
false
[
"1 counterexample(s) found for admitted foundations"
]
4
# → CI script greps the boolean and exits non-zero.
Tracking Coverage in CI
The section_counters dict in each JSON sidecar gives
you per-sidecar verification coverage. A small project script can
fold them into one rollup:
$ jq -s '
{
foundations: [.[].section_counters.foundation_count] | add,
z3_verified: [.[].section_counters.foundations_z3_verified] | add,
verify_total: [.[].section_counters.verify_total] | add,
verify_kernel_verified:
[.[].section_counters.verify_kernel_verified] | add,
counterexamples_found:
[.[].section_counters.counterexamples_found] | add,
}
' out/*.json
{
"foundations": 34,
"z3_verified": 9,
"verify_total": 20,
"verify_kernel_verified": 20,
"counterexamples_found": 0
}
Comparison with F*'s --admit and sorry
Other verification systems have mechanisms for incomplete proofs:
| Feature | Deppy | F* | Lean 4 |
|---|---|---|---|
| Skip a proof | trust_level = "UNTRUSTED" |
admit() |
sorry |
| Track skipped proofs | ✅ Trust report, CI gates | ⚠️ Manual --admit_smt_queries |
⚠️ #check @[sorry] |
| Gradual trust levels | ✅ 4 levels, lattice | ❌ Binary (proved/admit) | ❌ Binary (proved/sorry) |
| Trust propagation | ✅ Automatic | ⚠️ Manual dependency analysis | ⚠️ Manual |
| CI coverage tracking | ✅ Built-in | ❌ External tooling | ❌ External tooling |
| Verify external libs | ✅ Sidecar proofs | ❌ Must modify source | ❌ Must write bindings |
F*: admit()
(* F*: binary skip — loses all info *)
let compute_hash (data: bytes) =
admit (); (* black hole — no tracking *)
sha256 data
(* No way to know WHY it was admitted,
or what depends on this admit. *)
Deppy: Trust Levels
# Deppy: tracked, graded, reported
@guarantee("SHA-256 hash",
trust="LLM_CHECKED")
def compute_hash(data: bytes):
return sha256(data)
# Trust report shows: LLM_CHECKED
# Dependents are bounded by 🟠
# CI tracks it for future upgrade
Configuring Trust Levels
Deppy doesn't currently parse a project-level config file. Trust
levels are configured programmatically through
deppy.config.DeppyConfig and per-module via the
--level argument to deppy upgrade:
from deppy.config import DeppyConfig
cfg = DeppyConfig(
trust_level="Z3_VERIFIED", # floor for the safety pipeline
z3_timeout=10000, # ms per Z3 query
z3_rlimit=10_000_000, # deterministic resource budget
parallel=True,
lean_export=True,
)
# CI gating today is project-side: enumerate the .deppy files
# or .py modules you care about, run them through the deppy
# CLIs, and grep ``certified`` / ``aggregate_trust`` from each
# JSON output. See the GitHub Actions example above.
Upgrading Trust Levels
Deppy provides tooling to help you upgrade from one trust level to the next.
The deppy upgrade CLI
The upgrade subcommand re-runs the safety pipeline on
a target module aiming at a specific trust level — it tries to
promote what's currently below that floor:
$ python -m deppy upgrade src/core/validation.py --to Z3_VERIFIED --json
{
"target": "src/core/validation.py",
"target_level": "Z3_VERIFIED",
"per_function": {
"validate_age": { "before": "UNTRUSTED", "after": "Z3_VERIFIED" },
"validate_email": {
"before": "UNTRUSTED",
"after": "UNTRUSTED",
"reason": "regex matching not in Z3 fragment"
}
},
"upgraded": 1,
"failed": 1
}
# For Lean (the strongest kernel):
$ python -m deppy export src/core/crypto.py --output crypto.lean --run-lean
$ # --run-lean makes the deppy CLI invoke ``lean`` against the output
$ # and sets LEAN_KERNEL_VERIFIED on success.
Exercises
main():
main()callsprocess()andlog()process()is 🟡 Z3_VERIFIED, callsvalidate()validate()is 🟢 LEAN_KERNEL_VERIFIEDlog()is 🟠 LLM_CHECKED
LEAN_KERNEL_VERIFIED? Which can stay at
Z3_VERIFIED? How would you express the trust floor
using the JSON output of
python -m deppy.pipeline.run_verify?
random.randint() which is 🔴 UNTRUSTED. Write a sidecar spec
for random.randint that would allow the function to be at least
🟡 Z3_VERIFIED. What can you prove about randint(a, b)?
(Hint: a ≤ result ≤ b.)
trust(A) ≤ trust(B) when A calls B. But what about higher-order
functions? If map(f, xs) is 🟢 LEAN_KERNEL_VERIFIED but f is
🔴 UNTRUSTED, what is map(f, xs)'s trust level? How should Deppy handle this? Propose a rule and justify it.
python -m deppy.pipeline.run_verify,
(b) enforces a trust gate of Z3_VERIFIED for src/core/,
(c) posts a coverage report as a PR comment,
(d) prevents trust regressions.