Merkle Trees

Verified hash-based data structures — integrity proofs, membership, and tamper detection.

What Is a Merkle Tree?

A Merkle tree is a binary tree where every leaf holds a data block and every internal node holds the cryptographic hash of its children. Any change to a leaf cascades up, changing the root hash — giving you tamper detection for free.

Merkle trees underpin Git, Bitcoin, Ethereum, certificate transparency, and IPFS. In this chapter we build one in Python, then prove its security properties using Deppy's path types.

Data Structure

from __future__ import annotations
from dataclasses import dataclass
from typing import TypeVar, Generic, Union
import hashlib
from deppy import guarantee, check, assume
from deppy.types import PathType

T = TypeVar('T')

def sha256(data: bytes) -> str:
    return hashlib.sha256(data).hexdigest()

@dataclass(frozen=True)
class Leaf(Generic[T]):
    """A leaf node containing data and its hash."""
    data: T
    hash: str

@dataclass(frozen=True)
class Branch(Generic[T]):
    """An internal node: hash of its two children."""
    left:  MerkleTree[T]
    right: MerkleTree[T]
    hash:  str

# The inductive type
MerkleTree = Union[Leaf[T], Branch[T]]

Building a Merkle Tree

@guarantee("root hash reflects all leaves")
def build_leaf(data: bytes) -> Leaf[bytes]:
    return Leaf(data=data, hash=sha256(data))

@guarantee("branch hash == sha256(left.hash + right.hash)")
def build_branch(
    left: MerkleTree[T],
    right: MerkleTree[T],
) -> Branch[T]:
    combined = sha256(
        left.hash.encode() + right.hash.encode()
    )
    return Branch(left, right, combined)

@guarantee("produces a balanced Merkle tree")
def build_tree(blocks: list[bytes]) -> MerkleTree[bytes]:
    assume("len(blocks) > 0")
    leaves = [build_leaf(b) for b in blocks]
    # Pad to power of 2 if needed
    while len(leaves) > 1:
        next_level: list[MerkleTree[bytes]] = []
        for i in range(0, len(leaves), 2):
            if i + 1 < len(leaves):
                next_level.append(build_branch(leaves[i], leaves[i+1]))
            else:
                next_level.append(leaves[i])
        leaves = next_level
    return leaves[0]

The Hash Integrity Invariant

The fundamental property of a Merkle tree is its hash integrity invariant: every node's stored hash equals the hash recomputed from its children.

@guarantee("all hashes in the tree are consistent")
def verify_integrity(tree: MerkleTree[bytes]) -> bool:
    match tree:
        case Leaf(data, h):
            return h == sha256(data)
        case Branch(left, right, h):
            expected = sha256(
                left.hash.encode() + right.hash.encode()
            )
            return (
                h == expected
                and verify_integrity(left)
                and verify_integrity(right)
            )
Deppy verifies verify_integrity structurally: the recursive calls decrease on the tree structure, guaranteeing termination, and the match is exhaustive over Leaf | Branch.

Membership Proofs

A Merkle proof (also called an audit path) proves that a specific leaf is part of the tree, using only O(log n) hashes.

@dataclass(frozen=True)
class MerkleProof:
    """A path from a leaf to the root."""
    leaf_hash: str
    siblings:  list[tuple[str, str]]  # (sibling_hash, "left"|"right")
    root_hash: str

@guarantee("proof is valid iff the hash chain reaches the root")
def verify_proof(proof: MerkleProof) -> bool:
    current = proof.leaf_hash
    for sibling_hash, side in proof.siblings:
        if side == "left":
            current = sha256(
                sibling_hash.encode() + current.encode()
            )
        else:
            current = sha256(
                current.encode() + sibling_hash.encode()
            )
    return current == proof.root_hash

Path Types: Hash Chains as Paths

Here is where Deppy's homotopy foundations shine. A Merkle proof is a path in the tree — a chain of hash equalities connecting the leaf to the root. We can represent this as a PathType:

from deppy.types import PathType, Refl, Trans, Cong

# A Merkle proof IS a path in the hash space:
#   PathType(HashSpace, leaf_hash, root_hash)
# Each step in the proof is a path segment:
#   Trans(current_path, sha256_step)

@guarantee("constructs a valid path from leaf to root")
def proof_as_path(
    proof: MerkleProof,
) -> PathType[str, "proof.leaf_hash", "proof.root_hash"]:
    """
    Each sibling step is a congruence:
      Cong(sha256, step_i) : sha256(h_i) = sha256(h_{i+1})
    The full proof is their transitive composition.
    """
    path = Refl(proof.leaf_hash)  # start: leaf = leaf
    current = proof.leaf_hash
    for sibling, side in proof.siblings:
        if side == "left":
            step = Cong(sha256, sibling, current)
        else:
            step = Cong(sha256, current, sibling)
        path = Trans(path, step)
        current = step.target
    return path
This is a preview of the Equality & Path Types chapter. The key insight: a hash chain is literally a path in the space of hash values, and path composition (Trans) corresponds to following the chain step by step.

Security: Tamper Detection

The core security theorem: if any leaf is modified, the root hash changes. We prove this by contradiction.

@guarantee("modifying any leaf changes the root hash")
def tamper_detection_theorem(
    tree: MerkleTree[bytes],
    original_root: str,
) -> None:
    """
    Proof sketch:
    1. Assume tree has integrity (verify_integrity(tree) == True).
    2. Modify leaf data → leaf.hash changes (collision resistance).
    3. Parent hash = sha256(left.hash + right.hash) → parent changes.
    4. By induction up the tree, root hash changes.
    5. Therefore root_hash ≠ original_root.
    """
    assume("sha256 is collision-resistant")
    assume("verify_integrity(tree) is True")
    # The proof proceeds by structural induction on the tree.
    # Deppy discharges each step to Z3 + sidecar.
    check("any modification to a leaf produces a different root")

Proof: Tamper detection

1
Base case (Leaf): If data' ≠ data, then sha256(data') ≠ sha256(data) by collision resistance.
2
Inductive step (Branch): If a child's hash changes, then sha256(left.hash + right.hash) changes by collision resistance.
3
Propagation: By induction, the change propagates from the modified leaf up to the root.
4
Conclusion: root_hash(modified_tree) ≠ root_hash(original_tree). QED.

Comparison with F*

F*

type merkle_tree =
  | MLeaf : data:bytes
    -> hash:hash_t
    -> merkle_tree
  | MBranch : merkle_tree
    -> merkle_tree
    -> hash:hash_t
    -> merkle_tree

Deppy

MerkleTree = Union[
    Leaf[T],
    Branch[T],
]
# Pattern match with
# match / case syntax

Real-World Application: Verifying Blockchain Data

In Bitcoin and Ethereum, each block header contains a Merkle root of all transactions. A light client can verify that a transaction is included without downloading the entire block:

@guarantee("transaction is in block iff proof verifies against block root")
def verify_tx_inclusion(
    tx: bytes,
    proof: MerkleProof,
    block_root: str,
) -> bool:
    check(proof.leaf_hash == sha256(tx), "proof starts at tx")
    check(proof.root_hash == block_root, "proof targets block root")
    return verify_proof(proof)
The collision-resistance assumption is critical. If sha256 were broken (i.e., two different inputs could produce the same hash), tamper detection would fail. Deppy models this as an assume — the proof is conditional on the cryptographic assumption.

Exercises

Exercise 2.7: Implement generate_proof(tree, leaf_index) → MerkleProof that generates a membership proof for a leaf at a given index. Prove that verify_proof(generate_proof(tree, i)) always returns True for any valid index i.
Exercise 2.8: Extend the Merkle tree to support updates: replace a leaf and recompute only the affected path. Prove that the new root hash is consistent with the updated data.
Exercise 2.9 (Challenge): Implement a Merkle Patricia Trie (as used in Ethereum). How does the path-type representation change when keys are arbitrary byte strings rather than positional indices?