Equivalence Checking
Automatically proving two functions equivalent, finding counterexamples when they differ, and checking that implementations adhere to specifications.
One of Deppy's most practical features is automatic equivalence checking: given two Python functions with the same signature, Deppy can determine whether they always produce the same output — and if not, give you a concrete counterexample. No annotations, no proof terms, no decorators. Just two functions.
from deppy import check_equiv
def f(x: int) -> int:
return x * 2
def g(x: int) -> int:
return x + x
result = check_equiv(f, g)
print(result)
# EquivResult(EQUIVALENT, method='z3')
Deppy proved these functions identical for all integers — not by testing a
handful of inputs, but by compiling both bodies to Z3 symbolic expressions and
asking the SMT solver to find any input where they differ. Z3 reports UNSAT:
no such input exists.
How It Works
Deppy's equivalence engine uses three strategies, tried in order:
- Z3 symbolic — Compile both function bodies to Z3 expressions.
Check
∃x. f(x) ≠ g(x). If UNSAT → equivalent (proven). If SAT → counterexample (proven). If UNKNOWN → fall through. - Property-based testing — Generate 1,000 random inputs (seeded for reproducibility), run both functions, compare outputs. High confidence but not a proof.
- Specification comparison — If both functions have
@guaranteespecs, compare the specs structurally.
The result always tells you which method was used and at what confidence level:
result = check_equiv(f, g)
result.equivalent # True / False / None
# True — *proven* by Z3 or reflexivity
# False — concrete counterexample found
# None — no proof AND no counterexample;
# testing-based checks always end here
# (see note below)
result.method # 'z3', 'testing', 'spec', 'spec+z3', 'inconclusive'
result.confidence # 1.0 for z3/refl; <1 for testing-based evidence
result.counterexample # dict of {param: value} when inequivalent
result.likely_equivalent # True iff proven OR testing confidence >= 0.9
# — use this for candidate ranking,
# never for soundness gates
None, not True.
Testing samples inputs — 1000, 10000, however many — and checks
that the two functions agree on each. Even agreement on every
trial does not prove the functions are equal; the
(N+1)-th input might disagree. So
_testing_check_equiv returns
equivalent=None with a confidence score
computed from the number of trials.
equivalent is True is reserved for real proofs (Z3
unsat, or reflexivity). Use likely_equivalent when
you want "this is probably equivalent" for ranking; use
equivalent is True when you need soundness.
verify_composition follows the same rule.
The testing-only path returns verified=False now
(the old behaviour of verified = (checked > 0) was
wrong for the same reason above). To obtain
verified=True the caller must supply a spec the Z3
path can discharge.
@guarantee string — or two
guarantees that Z3 proves logically equivalent —
check_equiv returns equivalent=None with a
spec_info payload. Two functions can satisfy
"result >= 0" while computing entirely different
non-negative values; a matching spec is necessary but not
sufficient for code equivalence.
Tier 1: Fully Automatic, Annotation-Free
🟢 Zero annotations needed — just pass two functions
For functions over int, float, and bool
that use arithmetic, comparisons, branching, abs/min/max,
and simple assignments, Deppy handles everything automatically via Z3.
You write no decorators, no specs, no proof terms.
Example 1: Algebraic identity
def square_diff(a: int, b: int) -> int:
return (a + b) * (a - b)
def explicit_diff(a: int, b: int) -> int:
return a * a - b * b
result = check_equiv(square_diff, explicit_diff)
# EquivResult(EQUIVALENT, method='z3') ✓
Z3 proves (a+b)·(a−b) = a²−b² for all integers.
No annotations needed.
Example 2: Branching refactors
def clamp_v1(x: int) -> int:
if x < 0:
return 0
if x > 100:
return 100
return x
def clamp_v2(x: int) -> int:
return max(0, min(x, 100))
result = check_equiv(clamp_v1, clamp_v2)
# EquivResult(EQUIVALENT, method='z3') ✓
The if-chain and the max/min expression are compiled to
nested z3.If trees. Z3 proves they produce the same value for every
integer x.
Example 3: Detecting a bug — counterexample
def safe_div(a: int, b: int) -> int:
if b == 0:
return 0
return a // b
def unsafe_div(a: int, b: int) -> int:
return a // b if b != 0 else a # bug: returns a, not 0
result = check_equiv(safe_div, unsafe_div)
print(result)
# EquivResult(INEQUIVALENT, method='z3',
# counterexample={'a': 1, 'b': 0})
Z3 found that when b=0, safe_div returns 0
but unsafe_div returns a. The counterexample is concrete —
you can replay it yourself.
Example 4: Conditional assignment patterns
def relu_v1(x: int) -> int:
if x < 0:
x = 0
return x
def relu_v2(x: int) -> int:
return x if x >= 0 else 0
result = check_equiv(relu_v1, relu_v2)
# EquivResult(EQUIVALENT, method='z3') ✓
The "mutating if" pattern (if x < 0: x = 0 with no return and
no else) is compiled to x = z3.If(x < 0, 0, x) — a conditional
assignment. Deppy recognizes this Pythonic pattern automatically.
Example 5: Multi-parameter polynomial
def poly_factored(x: int, y: int) -> int:
return (x + y) ** 2
def poly_expanded(x: int, y: int) -> int:
return x * x + 2 * x * y + y * y
result = check_equiv(poly_factored, poly_expanded)
# EquivResult(EQUIVALENT, method='z3') ✓
Small-exponent ** (up to 6) is expanded to repeated multiplication,
so (x+y)**2 becomes (x+y)*(x+y) in Z3.
Example 6: Augmented assignment
def accumulate(x: int, y: int) -> int:
result = x
result += y
result += y
return result
def direct(x: int, y: int) -> int:
return x + 2 * y
result = check_equiv(accumulate, direct)
# EquivResult(EQUIVALENT, method='z3') ✓
What's covered automatically
| Construct | Handled by Z3? | Example |
|---|---|---|
Arithmetic (+, -, *, //, %) | ✅ Yes | x * 2 vs x + x |
Comparisons (<, <=, ==, !=, etc.) | ✅ Yes | x >= 0 |
Boolean ops (and, or, not) | ✅ Yes | x > 0 and x < 100 |
if/elif/else | ✅ Yes | branching → z3.If |
Ternary (x if c else y) | ✅ Yes | x if x > 0 else -x |
Conditional assignment (if c: x = v) | ✅ Yes | mutating-if pattern |
abs, min, max | ✅ Yes | abs(x) → If(x≥0, x, -x) |
Small powers (x**2 through x**6) | ✅ Yes | expanded to multiplication |
+=, -=, *= | ✅ Yes | augmented assignment |
Local variables (r = x + 1) | ✅ Yes | tracked in symbolic env |
| List/string operations | ❌ → testing | xs.append(x) |
Loops (for, while) | ❌ → testing | for x in xs: |
| Function calls (non-builtin) | ❌ → testing | helper(x) |
| Classes / method calls | ❌ → testing | obj.method() |
method='testing' and
confidence=0.95. It's a strong signal, but not a proof.
Specification Adherence (Also Automatic)
Beyond comparing two functions, Deppy can check whether a function adheres to its specification — fully automatically for arithmetic specs.
from deppy import check_adherence, guarantee
@guarantee("result >= 0")
def abs_val(x: int) -> int:
if x >= 0:
return x
return -x
results = check_adherence(abs_val)
print(results[0])
# AdherenceResult(adheres=True, spec='result >= 0', method='z3')
Deppy symbolically executes the body, extracts the result expression per path, and checks that the specification holds on every path. If it finds a violation:
@guarantee("result > 0") # wrong! fails at x=0
def abs_val_buggy(x: int) -> int:
if x >= 0:
return x
return -x
results = check_adherence(abs_val_buggy)
print(results[0])
# AdherenceResult(adheres=False, spec='result > 0', method='z3',
# counterexample={'x': 0})
The counterexample tells you exactly which input violates the spec:
x=0 gives result=0, which does not satisfy result > 0.
Preconditions narrow the domain
from deppy import guarantee, requires
@requires("n > 0")
@guarantee("result > 0")
def double_pos(n: int) -> int:
return n * 2
results = check_adherence(double_pos)
print(results[0])
# AdherenceResult(adheres=True, spec='result > 0', method='z3')
# Z3 checks: ∀n. (n > 0) → (n * 2 > 0) ✓
The precondition n > 0 is added as a Z3 hypothesis, so Deppy only
checks the guarantee for valid inputs.
Tier 2: Annotation-Guided Equivalence
🟡 Some annotations needed — specs help Deppy decide
When the function body is too complex for Z3 (lists, loops, higher-order functions), but you have specifications, Deppy can compare the specs to determine equivalence.
Example: Sorting implementations
from deppy import guarantee, check_equiv
@guarantee("result is sorted and is a permutation of input")
def my_sort_a(xs: list[int]) -> list[int]:
# bubble sort
xs = list(xs)
for i in range(len(xs)):
for j in range(len(xs) - 1):
if xs[j] > xs[j+1]:
xs[j], xs[j+1] = xs[j+1], xs[j]
return xs
@guarantee("result is sorted and is a permutation of input")
def my_sort_b(xs: list[int]) -> list[int]:
return sorted(xs)
result = check_equiv(my_sort_a, my_sort_b)
print(result)
# EquivResult(EQUIVALENT, method='testing', confidence=1.00)
Z3 can't handle the nested loops, so Deppy falls back to testing. Both functions
produce the same output on all 1,000 random test inputs. The result has
confidence≈1.0 — a strong signal, but not a formal proof.
However, if you want a proven result, you can tell Deppy that the functions satisfy the same spec, and use a proof term:
from deppy import Proof, by_ext
# Prove equivalence via shared spec
proof = (Proof("my_sort_a(xs) == my_sort_b(xs)")
.by_ext("xs",
lambda xs: by_axiom("sorted_unique",
"a sorted permutation of a list is unique"))
.qed())
Axiom Conflict Detection
When you build an AxiomRegistry from many sources
(multiple sidecar files, several modules' axioms, a mix of
user-supplied and auto-derived), the registry can flag pairs of
axioms that are *syntactically* in conflict — same LHS shape, different
RHS shape:
from deppy.proofs.pipeline import AxiomRegistry
reg = AxiomRegistry()
reg.add_all(axiomize(my_module, module="demo"), source="demo")
# ... add more axioms ...
for a, b, head in reg.conflicts():
print(f"⚠ axioms {a} and {b} both define {head} differently")
conflicts() is purely syntactic. It
compares axiom endpoints by their string representations
(repr). This means:
- False positives — two RHSes that are
provably equal but written differently
(
1 + xvsx + 1) will be flagged as conflicting. - False negatives — two LHSes that are alpha-
equivalent but written with different variable names
(
f(a)vsf(b)) will not match.
For semantic conflict detection, escalate the candidate pair to
check_equiv (which routes through Z3) or
check_spec_equiv (which compares spec strings). The
registry's job is just to surface a candidate set fast enough for
interactive use.
Tier 3: Manual Proof Terms
🔴 Full proof terms — for properties beyond automation
Some equivalences require genuine mathematical reasoning: induction over data structures, appeal to domain-specific lemmas, or transport across type equivalences.
Example: Proving a recursive identity by induction
from deppy import guarantee, lemma, induction, Proof
@guarantee("result == n * (n + 1) // 2")
def sum_to_loop(n: int) -> int:
total = 0
for i in range(n + 1):
total += i
return total
@guarantee("result == n * (n + 1) // 2")
def sum_to_formula(n: int) -> int:
return n * (n + 1) // 2
# Z3 can prove the formula satisfies the spec:
from deppy import check_adherence
check_adherence(sum_to_formula)
# AdherenceResult(adheres=True, method='z3')
# But the loop version needs induction:
@lemma
@induction("n")
def sum_loop_correct(n: int):
"""sum_to_loop(n) == n * (n + 1) // 2"""
# Base case: sum_to_loop(0) = 0 = 0 * 1 // 2 ✓
# Inductive step: assume sum_to_loop(k) = k*(k+1)//2
# then sum_to_loop(k+1) = sum_to_loop(k) + (k+1)
# = k*(k+1)//2 + (k+1)
# = (k+1)*(k+2)//2 ✓
pass
Example: Transport across duck-type equivalence
from deppy import path, by_duck_type, transport_proof, guarantee
class ArrayStack:
@guarantee("result >= 0")
def size(self) -> int: ...
def push(self, item): ...
def pop(self): ...
class LinkedStack:
@guarantee("result >= 0")
def size(self) -> int: ...
def push(self, item): ...
def pop(self): ...
# Build a path: ArrayStack ≃ LinkedStack (same Stack protocol)
@path(ArrayStack, LinkedStack)
def stack_path():
return by_duck_type("Stack", evidence={
"size": "same signature and guarantee",
"push": "same signature",
"pop": "same signature",
})
# Transport: any proof about ArrayStack.size
# transfers to LinkedStack.size
@transport_proof(stack_path, "size returns non-negative")
def linked_size_nonneg():
pass # proof transferred automatically
This is not just testing — it's a genuine proof that any property proved about one implementation applies to the other, by virtue of their shared protocol structure.
Proving Inequivalence with Counterexamples
When Z3 finds that two functions are not equivalent, it provides a concrete counterexample — specific input values where the functions diverge. Deppy replays the counterexample to confirm:
def triple(x: int) -> int:
return x * 3
def triple_approx(x: int) -> int:
return x * 3 + 1
result = check_equiv(triple, triple_approx)
print(result.equivalent) # False
print(result.method) # 'z3'
print(result.counterexample) # {'x': 0}
# triple(0) = 0, triple_approx(0) = 1 — different!
The counterexample is always replayed — Deppy executes both functions with the Z3-provided inputs and verifies the outputs really differ. This guards against Z3 model artifacts.
More counterexample examples
# Off-by-one
def f(x: int) -> int:
return x + 1
def g(x: int) -> int:
return x + 2
check_equiv(f, g)
# INEQUIVALENT, counterexample={'x': 0}
# f(0)=1, g(0)=2
# Branch difference
def sign_a(x: int) -> int:
if x > 0: return 1
if x < 0: return -1
return 0
def sign_b(x: int) -> int:
if x >= 0: return 1 # bug: 0 → 1 instead of 0
return -1
check_equiv(sign_a, sign_b)
# INEQUIVALENT, counterexample={'x': 0}
# sign_a(0)=0, sign_b(0)=1
Ad-Hoc Spec Checking
You don't even need @guarantee — pass a spec string directly:
from deppy import check_adherence
def clamp(x: int) -> int:
if x < 0: return 0
if x > 100: return 100
return x
# Check any property you like, without decorating:
check_adherence(clamp, spec="result >= 0")[0]
# AdherenceResult(adheres=True, spec='result >= 0', method='z3')
check_adherence(clamp, spec="result <= 100")[0]
# AdherenceResult(adheres=True, spec='result <= 100', method='z3')
check_adherence(clamp, spec="result >= 0 and result <= 100")[0]
# AdherenceResult(adheres=True, spec='result >= 0 and result <= 100', method='z3')
# Try a false property:
check_adherence(clamp, spec="result == x")[0]
# AdherenceResult(adheres=False, spec='result == x', method='z3',
# counterexample={'x': -1}) — clamp(-1)=0 ≠ -1
Summary: The Three Tiers
| Tier | What you write | What Deppy does | Guarantee |
|---|---|---|---|
| Tier 1 | Two functions — nothing else | Z3 symbolic execution | Mathematical proof or counterexample. confidence=1.0 |
| Tier 2 | Functions + @guarantee specs |
Spec comparison + testing | High-confidence testing. confidence≈1.0 |
| Tier 3 | Functions + specs + proof terms | Kernel-checked proof | Machine-checked, exportable to Lean. KERNEL trust |
The key insight: Tier 1 requires zero annotations. For the decidable fragment (integer/float arithmetic with branching), Deppy is a fully automatic equivalence prover — you just hand it two functions and get a proven verdict.
pip install git+https://github.com/thehalleyyoung/deppy.git
pip install z3-solver
python3 -c "
from deppy import check_equiv
def f(x: int) -> int: return x * x - 1
def g(x: int) -> int: return (x - 1) * (x + 1)
print(check_equiv(f, g))
"