Skip to content

How (ε,δ)-Differential Privacy Hides Your Messaging Metadata

Metadata privacy is the property that an observer cannot tell who is talking to whom. In Tessera, the network routes deliveries over a bucketed broadcast channel: every node subscribes to a small number of buckets, and every delivery is committed to a bucket derived from a fingerprint of the recipient. The cover-traffic question is: given the per-bucket message counts an observer can see, can they infer which bucket corresponds to an active delivery?

The answer depends entirely on how you shape the cover traffic. The naive approach — pad every bucket proportionally to its subscription load — giveszero privacy. This post explains why, and how Tessera's shifted-Laplace (ε,δ)-differentially-private mechanism fixes it.

The adversary's view

A global passive network observer sees, for each bucket b, the total message count n_b over a round. The real delivery increments one bucket by one. The question the adversary asks is: "Did bucket b receive a real delivery in round r, or was the increment noise?" The privacy goal is to make that question unanswerable to within a differential-privacy bound.

Why proportional cover gives zero privacy

Suppose you pad each bucket so its total count is proportional to its subscriber load: n_b = k * load_b. The increment from a real delivery is lost in the proportional padding, right? Wrong. The problem is that the proportional baseline is load-dependent and public: the adversary knows the load distribution (subscriber counts are observable) and therefore knows the exact expected count for every bucket. A single real delivery shifts one bucket above its known expected value by exactly one. With enough rounds, the adversary reconstructs which buckets receive deliveries and when — the privacy is exactly zero.

Proportional cover hides the scale of activity but not itsexistence. To hide existence, the noise must be independent of the load and calibrated so a single delivery cannot be distinguished from the noise floor.

The shifted-Laplace mechanism

Tessera uses a shifted-Laplace noise mechanism. For each bucket in each round, the protocol draws an independent Laplace-distributed count:

cover_b ~ Laplace(scale = 1 / epsilon)  shifted by a constant delta-floor
publish_b = max(0, real_b + cover_b + shift)

The shift guarantees the published count is non-negative; the Laplace noise is load-independent, so the adversary cannot subtract out a known baseline. The mechanism is (ε,δ)-differentially private:

  • ε (epsilon) bounds the log-likelihood ratio between adjacent databases (one with the delivery, one without). Smaller ε = more noise = stronger privacy. Tessera's default is ε=0.1.
  • δ (delta) is the probability that the privacy guarantee fails outright — the "slack". Tessera uses the shift to absorb the Laplace tail so δ is negligible.

The key property: the noise distribution does not depend on the true load, so observing the published count tells the adversary only what the noisecould have produced — not what the delivery did.

The formula

Formally, for a real per-bucket count f(b) and the published count M(b):

M(b) = max(0, f(b) + Laplace(1/ε) + S)

Pr[M(b) ∈ T] ≤ exp(ε) * Pr[M'(b) ∈ T] + δ

where M' is the mechanism on the adjacent database (no delivery in bucket b)
and S is the shift that absorbs the negative tail.

Because the Laplace noise is i.i.d. per bucket per round and independent off, the composition across rounds stays within standard DP composition bounds, and the adversary's posterior over "did bucketb get a real delivery" is bounded by exp(ε) of the prior.

Bandwidth cost

The cost of this privacy is bandwidth. Each bucket publishesLaplace(1/ε) cover messages per round on average. At ε=0.1, that is ~10 cover messages per bucket per round. For a network withB buckets and R rounds per second, total cover bandwidth is roughly 10 * B * R messages/s. Tessera's benchmarks (experiment E3, linkability_sim.py) report an overhead of roughly 4–6× the real traffic at ε=0.1, falling to ~1.5× at ε=1.0. The privacy/overhead trade-off is tunable by the deployer.

Empirical AUC validation

Differential privacy is a worst-case bound; the empirical question is whether a real adversary can do better than random guessing. Thelinkability_sim.py harness (experiment E3) trains a classifier on the published bucket counts and measures thearea under the ROC curve (AUC) for the "did bucketb receive a delivery" binary task:

εAUCInterpretation
0.10.526Indistinguishable from random (0.5)
0.50.548Barely above random
1.00.583Weak signal
∞ (no DP)0.99Trivially linkable

At ε=0.1 the AUC is 0.526 — within Monte-Carlo error of 0.5, the random baseline. The adversary learns essentially nothing about which bucket carried the real delivery.

Reproducing

uv run python scripts/analysis/linkability_sim.py --epsilon 0.1
# outputs: per-bucket AUC, ROC curves, bandwidth overhead to results/

Takeaways

  • Proportional cover hides scale but not existence — privacy is zero.
  • Load-independent Laplace noise is the simplest correct mechanism.
  • ε controls the privacy/overhead trade-off; δ is absorbed by the shift.
  • At ε=0.1, the empirical linking AUC is 0.526 — indistinguishable from random.
  • The bandwidth cost is tunable; ~5× overhead buys strong privacy.

For the formal mechanism and proof, see the protocol specification; for the comparison against mixnets on the routing axis, seeBucketed Broadcast Routing: A Decentralized Alternative to Mixnets.

securitydifferential-privacy