Trust & Verification Levels

Not all proofs are created equal. A theorem checked by the Lean kernel is more trustworthy than one an LLM says "looks right." Deppy makes this distinction explicit with a four-level trust lattice that tracks exactly how much confidence you can place in each verified property.

Kernel Trust Levels

Deppy has two trust axes, each with its own label space. The kernel trust level records how the deppy kernel verified a single proof term. The Lean trust level records how (or whether) the exported Lean 4 source was checked by Lean itself. A proof can be KERNEL-trusted by deppy and LEAN_EXPORTED by Lean — those are independent dimensions.

The seven kernel levels

Level	Badge	What It Means	How You Get It
🟢 `KERNEL`	KERNEL	Every sub-proof went through kernel inference rules (Refl / Sym / Trans / Cong / PathAbs / PathApp / induction constructors). No Z3 leaf, no axiom invocation, no structural pinky-promise.	Build a proof term out of the inference-rule constructors only. The kernel returns this trust level from `ProofKernel.verify`.
🟡 `Z3_VERIFIED`	Z3_VERIFIED	Proved by the Z3 SMT solver: Z3 returned `unsat` on the negation of the goal. Z3 is mature but not constructive; soundness depends on Z3's correctness.	Use a `Z3Proof` proof term, or the `T.z3` / `T.z3_auto` tactics.
🟡 `STRUCTURAL_CHAIN`	STRUCTURAL_CHAIN	The kernel assembled the proof's structure, but at least one leaf is `Structural` (an unchecked pinky-promise). Use `ProofCertificate.kernel_verified` to reject chains that contain such a leaf.	A proof whose composition is kernel-checked but whose atoms are declarative structural witnesses.
🟠 `LLM_CHECKED`	🟠 LLM_CHECKED	An LLM oracle says the property is correct with confidence ≥ the configured threshold. Useful for natural-language specs and heuristic checks; not a formal proof.	Oracle query returns `{"correct": true, "confidence": ≥ threshold}`.
🟠 `AXIOM_TRUSTED`	🟠 AXIOM_TRUSTED	The proof cites a registered axiom. Trust is only as good as the axiom's faithfulness to the source. For code-derived axioms (those produced by `axiomize()`) the axiom IS the source, which is usually what you want.	`AxiomInvocation(name, module)` — or any tactic built from one, e.g., `T.axiom` / `T.cases` / `T.unfold`.
🟠 `EFFECT_ASSUMED`	🟠 EFFECT_ASSUMED	An effect witness was accepted at face value. Used by the effect pipeline when a function declares effects but the discharge is deferred.	`EffectWitness` proof terms.
🔴 `UNTRUSTED`	UNTRUSTED	No verification. Default for any obligation without a proof. Also emitted as a failure state when a discharge path (e.g. a missing `verify_fn`, an `Unfold` without sub-proof) can't run.	Default starting point for all new specs.

The six Lean export labels

compile_to_lean(fn) emits Lean 4 source without invoking the Lean compiler. To actually run Lean, call cert.verify_with_lean() on the returned LeanCertificate. The following labels record where the certificate is on that trajectory:

Label	What It Means	How You Get It
🟢 `LEAN_KERNEL_VERIFIED`	The `lean` binary was invoked on the rendered file and returned success. This is the gold standard — equivalent to a Lean kernel-accepted proof.	`cert.verify_with_lean()` succeeds; requires `lean` on PATH.
🟡 `LEAN_SYNTAX_COMPLETE`	No `sorry`, no raw `axiom`s, but Lean has NOT been run. The file is syntactically complete — a necessary but not sufficient condition for acceptance.	`compile_to_lean(fn)` with a spec that has a full proof strategy and no `sorry` insertions.
🟠 `LEAN_EXPORTED`	Source emitted; one or more `sorry`s remain. This is the default when a spec's translator can't produce a full proof.	Default when a spec fails to be fully translated; also the default on the initial `compile_to_lean`.
🟠 `ASSUMPTION_DEPENDENT`	The emitted file declares raw `axiom` statements. Lean will accept them unconditionally, so any theorem routed through one inherits the assumption.	When a spec needs an axiom deppy can't derive from source (e.g., a classical-logic axiom like LEM).
🔴 `LEAN_REJECTED`	`verify_with_lean()` ran `lean` and Lean rejected the file. `cert.lean_check_stderr` carries the diagnostic.	A bug in the translator, a stale Mathlib import, or a tactic Lean can't prove.
⚪ `LEAN_UNAVAILABLE`	The `lean` binary was not found on PATH. No kernel check was attempted.	Install Lean 4 (see the Lean setup guide) to enable kernel verification.

The right soundness gate is ProofCertificate.kernel_verified — it returns True only when success AND no Structural leaf appears in the proof tree. For Lean integration, use cert.is_kernel_verified — True only after verify_with_lean() accepted.

Trust comes from two separate lattices:

The 7-element deppy-kernel lattice (TrustLevel) is a total order: KERNEL > Z3_VERIFIED > STRUCTURAL_CHAIN > LLM_CHECKED > AXIOM_TRUSTED > EFFECT_ASSUMED > UNTRUSTED. The kernel takes the min_trust(...) of all sub-proofs to compute the overall verdict.
The 6-element Lean export lattice (LeanTrustLevel) is a parallel total order: LEAN_KERNEL_VERIFIED > LEAN_SYNTAX_COMPLETE > LEAN_EXPORTED > ASSUMPTION_DEPENDENT > LEAN_REJECTED > LEAN_UNAVAILABLE. This lattice describes how much of the Lean toolchain has accepted the proof; it is tracked separately because Lean acceptance is a different guarantee than deppy-kernel acceptance (Lean has stronger metatheory but a smaller axiom budget).

The two lattices are not directly comparable — LEAN_KERNEL_VERIFIED is not greater than KERNEL; they certify different things.

When Each Level Is Appropriate

🟢 LEAN_KERNEL_VERIFIED — The Gold Standard

Use when:

Safety-critical code (medical, financial, aerospace)
Core data structure invariants
Cryptographic protocol correctness
Foundational theorems your entire codebase depends on

# A property so important it needs Lean verification
@guarantee("sorted list is a permutation of input", trust="LEAN_KERNEL_VERIFIED")
def sort(xs: list[int]) -> list[int]:
    # This generates a Lean proof obligation:
    # theorem sort_correct : ∀ xs, Sorted (sort xs) ∧ Perm (sort xs) xs
    ...

LEAN_KERNEL_VERIFIED is expensive. The Python → Lean translation adds development time. Reserve it for properties that must be correct.

🟡 Z3_VERIFIED — The Pragmatic Choice

Use when:

Arithmetic bounds and overflow checks
Array index safety
Precondition verification at call sites
Linear arithmetic properties
Most sidecar precondition proofs

# Z3 handles these naturally
@guarantee("index is within bounds", trust="Z3_VERIFIED")
def safe_get(arr: list[int], i: int) -> int:
    assume(i >= 0 and i < len(arr))
    return arr[i]
    # Z3 checks: 0 ≤ i < len(arr) at every call site

🟠 LLM_CHECKED — The Frontier

Use when:

Natural-language specifications that can't be formalized easily
Heuristic correctness checks during development
Semantic properties ("this function returns user-friendly error messages")
Quick smoke-testing before investing in formal proofs

# Properties that resist formalization
@guarantee("error message is helpful and mentions the invalid field",
           trust="LLM_CHECKED", confidence=0.85)
def validate_config(config: dict) -> str | None:
    # Oracle checks: does the returned string mention the bad field?
    ...

LLM_CHECKED is inherently unreliable. The oracle can hallucinate. Use this level as a stepping stone — flag properties as LLM_CHECKED to remind yourself they need formal proofs later.

🔴 UNTRUSTED — The Starting Point

Use when:

You've just written a spec and haven't verified it yet
External dependencies with no sidecar
Properties you plan to verify but haven't gotten to

# Every spec starts here
@about("requests.get")
class RequestsGetSpec(ExternalSpec):
    trust_level = "UNTRUSTED"  # TODO: add real verification

    @assume
    def postcondition(self, url, result):
        return hasattr(result, 'status_code')

Trust Propagation

The most important rule in Deppy's trust system: trust flows downward. A function's trust level is bounded by the minimum trust of everything it depends on.

The Rule

If function A calls function B, then trust(A) ≤ trust(B).

Formally: trust(f) = min(own_proof_level(f), min(trust(g) for g in deps(f)))

Example: Trust Propagation Chain

# trust_example.py
import math
from decimal import Decimal

@guarantee("result is positive")
def compute_distance(x: float, y: float) -> float:
    # Calls math.sqrt — trust depends on math.deppy
    return math.sqrt(x * x + y * y)

@guarantee("normalized vector has magnitude 1")
def normalize(x: float, y: float) -> tuple[float, float]:
    # Calls compute_distance — trust ≤ trust(compute_distance)
    d = compute_distance(x, y)
    return (x / d, y / d)

@guarantee("angle is in [0, 2π)")
def angle_between(x1, y1, x2, y2) -> float:
    # Calls normalize — trust ≤ trust(normalize)
    nx1, ny1 = normalize(x1, y1)
    nx2, ny2 = normalize(x2, y2)
    return math.atan2(ny2 - ny1, nx2 - nx1) % (2 * math.pi)

$ python -m deppy check trust_example.py --json

{
  "module_path": "trust_example.py",
  "aggregate_trust": "UNTRUSTED",    # pinned by math.atan2
  "functions": {
    "compute_distance": { "trust": "Z3_VERIFIED" },
    "normalize":        { "trust": "Z3_VERIFIED" },
    "angle_between":    { "trust": "UNTRUSTED",
                            "unaddressed": ["math.atan2"] }
  }
}

# The aggregate is min(every function's trust), so adding
# a sidecar spec for math.atan2 lifts angle_between → Z3_VERIFIED
# and the module aggregate climbs to Z3_VERIFIED.

Trust propagation makes verification gaps visible. Even if your own code is perfectly verified, a single unspecified library call drags the entire call chain down to UNTRUSTED.

Mixing Trust Levels in a Project

Real projects will have a mix of trust levels. This is expected and healthy. Deppy visualizes the trust landscape:

# A small project shell loop that aggregates trust across modules:
$ for mod in src/core/types.py src/api/handlers.py src/utils/formatting.py; do
    python -m deppy check "$mod" --json | jq -r '.aggregate_trust + " " + .module_path'
  done

LEAN_KERNEL_VERIFIED  src/core/types.py
Z3_VERIFIED           src/api/handlers.py
UNTRUSTED             src/utils/formatting.py

# Or, when you have a sidecar:
$ python -m deppy.pipeline.run_verify sympy_geometry.deppy --out sympy_geometry.lean
$ jq '.section_counters' sympy_geometry.json | head -8
{
  "foundation_count": 14,
  "axiom_count": 38,
  "verify_total": 11,
  "verify_kernel_verified": 11,
  ...
}

Gradual Verification

Gradual verification is the practice of incrementally raising your project's trust level over time. Deppy is designed for this workflow.

The Gradual Verification Ladder

Day 1: Everything is 🔴 UNTRUSTED. That's fine. Wire

python -m deppy.pipeline.run_verify
          your.deppy

(or --from-module yourmodule) into CI and let the JSON report be the baseline. The section_counters.verify_total field starts at 0.

Week 1: Write sidecar specs for your most-used library functions. Add @guarantee to your critical functions. Coverage climbs to 30%. Many are 🟠 LLM_CHECKED.

Month 1: Replace LLM_CHECKED specs with Z3 proofs where possible. Add refinement types. Coverage hits 60%, most at 🟡 Z3_VERIFIED.

Quarter 1: Translate core invariants to Lean for 🟢 LEAN_KERNEL_VERIFIED. Set CI gate: "no regressions below Z3_VERIFIED for src/core/."

Ongoing: New code must meet the team's trust floor. Refactor UNTRUSTED modules when touching them ("boy scout rule" for verification).

CI Integration

Deppy integrates with CI/CD pipelines to enforce trust levels and track verification coverage over time.

GitHub Actions Example

# .github/workflows/verify.yml
name: Deppy Verification
on: [push, pull_request]

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Deppy run: pip install git+https://github.com/thehalleyyoung/deppy.git && pip install z3-solver

      - name: Run verification
        run: |
          for d in sidecars/*.deppy; do
            python -m deppy.pipeline.run_verify "$d" --out "out/$(basename "$d" .deppy).lean"
          done

      - name: Upload report
        uses: actions/upload-artifact@v4
        with:
          name: verification-report
          path: verification-report.json

      - name: Fail on regression
        run: |
          python - <<'PY'
          import json, sys
          for path in ("out/foo.json", "out/bar.json"):
              j = json.load(open(path))
              assert j["certified"], f"{path} did not certify: {j['failures']}"
              c = j["section_counters"]
              assert c["verify_kernel_verified"] == c["verify_total"], \
                  f"{path}: {c['verify_total']-c['verify_kernel_verified']} verify blocks unverified"
          print("All sidecars certified.")
          PY

Trust as a CI Gate

The pipeline writes a top-level certified boolean and a failures list to each JSON sidecar. CI can fail the build whenever certified drops to false:

$ python -m deppy.pipeline.run_verify sidecar/auth.deppy --out out/auth.lean
$ jq '.certified, .failures, .section_counters.verify_kernel_verified' out/auth.json

false
[
  "1 counterexample(s) found for admitted foundations"
]
4
# → CI script greps the boolean and exits non-zero.

Tracking Coverage in CI

The section_counters dict in each JSON sidecar gives you per-sidecar verification coverage. A small project script can fold them into one rollup:

$ jq -s '
  {
    foundations: [.[].section_counters.foundation_count] | add,
    z3_verified: [.[].section_counters.foundations_z3_verified] | add,
    verify_total: [.[].section_counters.verify_total] | add,
    verify_kernel_verified:
        [.[].section_counters.verify_kernel_verified] | add,
    counterexamples_found:
        [.[].section_counters.counterexamples_found] | add,
  }
' out/*.json

{
  "foundations": 34,
  "z3_verified": 9,
  "verify_total": 20,
  "verify_kernel_verified": 20,
  "counterexamples_found": 0
}

Comparison with F*'s `--admit` and `sorry`

Other verification systems have mechanisms for incomplete proofs:

Feature	Deppy	F*	Lean 4
Skip a proof	`trust_level = "UNTRUSTED"`	`admit()`	`sorry`
Track skipped proofs	✅ Trust report, CI gates	⚠️ Manual `--admit_smt_queries`	⚠️ `#check @[sorry]`
Gradual trust levels	✅ 4 levels, lattice	❌ Binary (proved/admit)	❌ Binary (proved/sorry)
Trust propagation	✅ Automatic	⚠️ Manual dependency analysis	⚠️ Manual
CI coverage tracking	✅ Built-in	❌ External tooling	❌ External tooling
Verify external libs	✅ Sidecar proofs	❌ Must modify source	❌ Must write bindings

F*: `admit()`

(* F*: binary skip — loses all info *)
let compute_hash (data: bytes) =
  admit ();  (* black hole — no tracking *)
  sha256 data

(* No way to know WHY it was admitted,
   or what depends on this admit.  *)

Deppy: Trust Levels

# Deppy: tracked, graded, reported
@guarantee("SHA-256 hash",
           trust="LLM_CHECKED")
def compute_hash(data: bytes):
    return sha256(data)

# Trust report shows: LLM_CHECKED
# Dependents are bounded by 🟠
# CI tracks it for future upgrade

Configuring Trust Levels

Deppy doesn't currently parse a project-level config file. Trust levels are configured programmatically through deppy.config.DeppyConfig and per-module via the --level argument to deppy upgrade:

from deppy.config import DeppyConfig

cfg = DeppyConfig(
    trust_level="Z3_VERIFIED",    # floor for the safety pipeline
    z3_timeout=10000,             # ms per Z3 query
    z3_rlimit=10_000_000,         # deterministic resource budget
    parallel=True,
    lean_export=True,
)

# CI gating today is project-side: enumerate the .deppy files
# or .py modules you care about, run them through the deppy
# CLIs, and grep ``certified`` / ``aggregate_trust`` from each
# JSON output.  See the GitHub Actions example above.

Upgrading Trust Levels

Deppy provides tooling to help you upgrade from one trust level to the next.

The `deppy upgrade` CLI

The upgrade subcommand re-runs the safety pipeline on a target module aiming at a specific trust level — it tries to promote what's currently below that floor:

$ python -m deppy upgrade src/core/validation.py --to Z3_VERIFIED --json

{
  "target": "src/core/validation.py",
  "target_level": "Z3_VERIFIED",
  "per_function": {
    "validate_age": { "before": "UNTRUSTED", "after": "Z3_VERIFIED" },
    "validate_email": {
      "before": "UNTRUSTED",
      "after": "UNTRUSTED",
      "reason": "regex matching not in Z3 fragment"
    }
  },
  "upgraded": 1,
  "failed": 1
}

# For Lean (the strongest kernel):
$ python -m deppy export src/core/crypto.py --output crypto.lean --run-lean
$ # --run-lean makes the deppy CLI invoke ``lean`` against the output
$ # and sets LEAN_KERNEL_VERIFIED on success.

Exercises

Exercise 4.9: Given the following dependency chain, compute the trust level of main():

main() calls process() and log()
process() is 🟡 Z3_VERIFIED, calls validate()
validate() is 🟢 LEAN_KERNEL_VERIFIED
log() is 🟠 LLM_CHECKED

Exercise 4.10: Sketch a CI script for a medical-device project. Which modules should require LEAN_KERNEL_VERIFIED? Which can stay at Z3_VERIFIED? How would you express the trust floor using the JSON output of python -m deppy.pipeline.run_verify?

Exercise 4.11: You have a function that is 🟡 Z3_VERIFIED, but it calls random.randint() which is 🔴 UNTRUSTED. Write a sidecar spec for random.randint that would allow the function to be at least 🟡 Z3_VERIFIED. What can you prove about randint(a, b)? (Hint: a ≤ result ≤ b.)

Exercise 4.12 (Challenge): The trust propagation rule says trust(A) ≤ trust(B) when A calls B. But what about higher-order functions? If map(f, xs) is 🟢 LEAN_KERNEL_VERIFIED but f is 🔴 UNTRUSTED, what is map(f, xs)'s trust level? How should Deppy handle this? Propose a rule and justify it.

Exercise 4.13 (Project): Set up a GitHub Actions workflow for a small Python project that: (a) runs python -m deppy.pipeline.run_verify, (b) enforces a trust gate of Z3_VERIFIED for src/core/, (c) posts a coverage report as a PR comment, (d) prevents trust regressions.

Trust & Verification Levels

Kernel Trust Levels

The seven kernel levels

The six Lean export labels

When Each Level Is Appropriate

🟢 LEAN_KERNEL_VERIFIED — The Gold Standard

🟡 Z3_VERIFIED — The Pragmatic Choice

🟠 LLM_CHECKED — The Frontier

🔴 UNTRUSTED — The Starting Point

Trust Propagation

The Rule

Example: Trust Propagation Chain

Mixing Trust Levels in a Project

Gradual Verification

The Gradual Verification Ladder

CI Integration

GitHub Actions Example

Trust as a CI Gate

Tracking Coverage in CI

Comparison with F*'s --admit and sorry

F*: admit()

Deppy: Trust Levels

Configuring Trust Levels

Upgrading Trust Levels

The deppy upgrade CLI

Exercises

Comparison with F*'s `--admit` and `sorry`

F*: `admit()`

The `deppy upgrade` CLI