Merkle Trees
Verified hash-based data structures — integrity proofs, membership, and tamper detection.
What Is a Merkle Tree?
A Merkle tree is a binary tree where every leaf holds a data block and every internal node holds the cryptographic hash of its children. Any change to a leaf cascades up, changing the root hash — giving you tamper detection for free.
Merkle trees underpin Git, Bitcoin, Ethereum, certificate transparency, and IPFS. In this chapter we build one in Python, then prove its security properties using Deppy's path types.
Data Structure
from __future__ import annotations
from dataclasses import dataclass
from typing import TypeVar, Generic, Union
import hashlib
from deppy import guarantee, check, assume
from deppy.types import PathType
T = TypeVar('T')
def sha256(data: bytes) -> str:
return hashlib.sha256(data).hexdigest()
@dataclass(frozen=True)
class Leaf(Generic[T]):
"""A leaf node containing data and its hash."""
data: T
hash: str
@dataclass(frozen=True)
class Branch(Generic[T]):
"""An internal node: hash of its two children."""
left: MerkleTree[T]
right: MerkleTree[T]
hash: str
# The inductive type
MerkleTree = Union[Leaf[T], Branch[T]]
Building a Merkle Tree
@guarantee("root hash reflects all leaves")
def build_leaf(data: bytes) -> Leaf[bytes]:
return Leaf(data=data, hash=sha256(data))
@guarantee("branch hash == sha256(left.hash + right.hash)")
def build_branch(
left: MerkleTree[T],
right: MerkleTree[T],
) -> Branch[T]:
combined = sha256(
left.hash.encode() + right.hash.encode()
)
return Branch(left, right, combined)
@guarantee("produces a balanced Merkle tree")
def build_tree(blocks: list[bytes]) -> MerkleTree[bytes]:
assume("len(blocks) > 0")
leaves = [build_leaf(b) for b in blocks]
# Pad to power of 2 if needed
while len(leaves) > 1:
next_level: list[MerkleTree[bytes]] = []
for i in range(0, len(leaves), 2):
if i + 1 < len(leaves):
next_level.append(build_branch(leaves[i], leaves[i+1]))
else:
next_level.append(leaves[i])
leaves = next_level
return leaves[0]
The Hash Integrity Invariant
The fundamental property of a Merkle tree is its hash integrity invariant: every node's stored hash equals the hash recomputed from its children.
@guarantee("all hashes in the tree are consistent")
def verify_integrity(tree: MerkleTree[bytes]) -> bool:
match tree:
case Leaf(data, h):
return h == sha256(data)
case Branch(left, right, h):
expected = sha256(
left.hash.encode() + right.hash.encode()
)
return (
h == expected
and verify_integrity(left)
and verify_integrity(right)
)
verify_integrity structurally: the recursive calls
decrease on the tree structure, guaranteeing termination, and the
match is exhaustive over Leaf | Branch.
Membership Proofs
A Merkle proof (also called an audit path) proves that a specific leaf is part of the tree, using only O(log n) hashes.
@dataclass(frozen=True)
class MerkleProof:
"""A path from a leaf to the root."""
leaf_hash: str
siblings: list[tuple[str, str]] # (sibling_hash, "left"|"right")
root_hash: str
@guarantee("proof is valid iff the hash chain reaches the root")
def verify_proof(proof: MerkleProof) -> bool:
current = proof.leaf_hash
for sibling_hash, side in proof.siblings:
if side == "left":
current = sha256(
sibling_hash.encode() + current.encode()
)
else:
current = sha256(
current.encode() + sibling_hash.encode()
)
return current == proof.root_hash
Path Types: Hash Chains as Paths
Here is where Deppy's homotopy foundations shine. A Merkle proof is a
path in the tree — a chain of hash equalities connecting the
leaf to the root. We can represent this as a PathType:
from deppy.types import PathType, Refl, Trans, Cong
# A Merkle proof IS a path in the hash space:
# PathType(HashSpace, leaf_hash, root_hash)
# Each step in the proof is a path segment:
# Trans(current_path, sha256_step)
@guarantee("constructs a valid path from leaf to root")
def proof_as_path(
proof: MerkleProof,
) -> PathType[str, "proof.leaf_hash", "proof.root_hash"]:
"""
Each sibling step is a congruence:
Cong(sha256, step_i) : sha256(h_i) = sha256(h_{i+1})
The full proof is their transitive composition.
"""
path = Refl(proof.leaf_hash) # start: leaf = leaf
current = proof.leaf_hash
for sibling, side in proof.siblings:
if side == "left":
step = Cong(sha256, sibling, current)
else:
step = Cong(sha256, current, sibling)
path = Trans(path, step)
current = step.target
return path
Trans) corresponds to following
the chain step by step.
Security: Tamper Detection
The core security theorem: if any leaf is modified, the root hash changes. We prove this by contradiction.
@guarantee("modifying any leaf changes the root hash")
def tamper_detection_theorem(
tree: MerkleTree[bytes],
original_root: str,
) -> None:
"""
Proof sketch:
1. Assume tree has integrity (verify_integrity(tree) == True).
2. Modify leaf data → leaf.hash changes (collision resistance).
3. Parent hash = sha256(left.hash + right.hash) → parent changes.
4. By induction up the tree, root hash changes.
5. Therefore root_hash ≠ original_root.
"""
assume("sha256 is collision-resistant")
assume("verify_integrity(tree) is True")
# The proof proceeds by structural induction on the tree.
# Deppy discharges each step to Z3 + sidecar.
check("any modification to a leaf produces a different root")
Proof: Tamper detection
data' ≠ data, then
sha256(data') ≠ sha256(data) by collision resistance.
sha256(left.hash + right.hash) changes by collision resistance.
root_hash(modified_tree) ≠ root_hash(original_tree). QED.
Comparison with F*
F*
type merkle_tree =
| MLeaf : data:bytes
-> hash:hash_t
-> merkle_tree
| MBranch : merkle_tree
-> merkle_tree
-> hash:hash_t
-> merkle_tree
Deppy
MerkleTree = Union[
Leaf[T],
Branch[T],
]
# Pattern match with
# match / case syntax
Real-World Application: Verifying Blockchain Data
In Bitcoin and Ethereum, each block header contains a Merkle root of all transactions. A light client can verify that a transaction is included without downloading the entire block:
@guarantee("transaction is in block iff proof verifies against block root")
def verify_tx_inclusion(
tx: bytes,
proof: MerkleProof,
block_root: str,
) -> bool:
check(proof.leaf_hash == sha256(tx), "proof starts at tx")
check(proof.root_hash == block_root, "proof targets block root")
return verify_proof(proof)
sha256 were
broken (i.e., two different inputs could produce the same hash), tamper
detection would fail. Deppy models this as an assume —
the proof is conditional on the cryptographic assumption.
Exercises
generate_proof(tree, leaf_index) → MerkleProof
that generates a membership proof for a leaf at a given index.
Prove that verify_proof(generate_proof(tree, i)) always returns True
for any valid index i.