> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vana.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Provenance & verifiability

> A protocol-level commitment that anchors a data point's provenance on-chain, while the proof itself stays off-chain and pluggable.

Some applications need more than data — they need a **proof of where it came from**: "eligible data carries a proof it came from service X." The protocol provides a standardized place to express that: a **provenance commitment** for a data point.

The design is deliberately minimal to be flexible. The protocol **anchors a fingerprint (a hash) on-chain**; the actual proof and metadata live **off-chain**. A buyer fetches the proof and checks it against the on-chain hash. The protocol stays cheap, private, and **not opinionated about what the proof is**. In some cases, that proof is a ZK proof of a specific attribute of the data, for example, that a given file contains a minimum number of conversations, or that a given video meets a quality requirement.

<Info>**Coming soon.** This is being added to the data registry so the protocol has a standardized place to express verifiability. It anchors provenance; the off-chain attestation is what establishes origin.</Info>

## How it works

```mermaid theme={null}
flowchart TB
    subgraph onchain["On-chain — protocol anchors hashes only"]
      anchor["dataCommitment · metadataHash · metadataURI"]:::protocol
    end
    subgraph offchain["Off-chain — pluggable, protocol not opinionated"]
      proof["Proof / attestation<br/>signed by an attestation service today → zkTLS / decentralized later"]:::neutral
      meta["Provenance metadata"]:::neutral
    end
    proof --> meta
    meta -->|hashed into| anchor
    buyer["Buyer / builder"]:::neutral -->|① fetch via metadataURI| meta
    buyer -->|② recompute hash, check vs anchor| anchor
    buyer -->|③ recompute dataCommitment over the bytes received| anchor

    classDef protocol fill:#DCE4FF,stroke:#4141FC,color:#11104a;
    classDef neutral fill:#EEF1F5,stroke:#9AA4B2,color:#1f2937;
```

The protocol stores **only hashes** — no data, no metadata, and no file location on-chain. Each data type defines its own metadata schema off-chain; the protocol never parses formats.

## What the protocol anchors

A few fields on the (scope-aware) data registry:

| Field                      | What it commits to                                                                                       |
| -------------------------- | -------------------------------------------------------------------------------------------------------- |
| `dataCommitment`           | A commitment to the **exact data the buyer receives**, recomputable by the buyer over the bytes they get |
| `metadataHash`             | Hash of the off-chain metadata, including the provenance attestation                                     |
| `metadataURI` *(optional)* | Where to fetch that metadata                                                                             |

Metadata can live in a third party-provided service, in personal servers, or any other off-chain store. Only authorized buyers fetch it and verify it against the on-chain hash.

## Pluggable proofs

The protocol is **not opinionated about the proof**. The on-chain anchor is the same regardless of how provenance is actually established, so the proof system can evolve **with no contract change**:

* **Today:** an **attestation signed by a Vana verification service** — "this data came from service X."
* **Later:** a **decentralized attester network**, or **zkTLS**-style proofs, or any other scheme a data type wants.

Attestors are deliberately not fixed. Depending on its requirements, a data type can rely on a **centralized provider**, a **decentralized attester network**, or a **trusted execution environment** writing the attestation — and different data types can make different choices at the same time. This is also the answer to attester trust: a single signed attestation means trusting that signer, so where that assumption is too strong, a data type can choose a decentralized or TEE-based attestor instead. Because the protocol only stores a hash of whatever proof exists, these implementations coexist behind the same fields, and can evolve, with no contract change.

## Binding to the data

A provenance proof is only meaningful if it is about **the exact data the buyer receives**. Otherwise a buyer could verify a real, valid attestation that is not actually about the bytes they were handed (proof substitution).

That is what `dataCommitment` is for, and why it is distinct from `metadataHash`:

* `dataCommitment` binds the **data**; `metadataHash` binds the **proof/metadata**.
* A future attestation must sign over or reference `dataCommitment`, so "service X produced this" is provably about *this* committed data, not a free-floating claim.
* The binding handle has to exist in the fields from day one — otherwise the "harden to zkTLS with no contract change" property is lost.

## What this gives you

A standardized, cheap, private slot to commit to a data point's provenance, and an upgrade path from a signed attestation to stronger proofs (zkTLS, decentralized attesters) without touching the contract. The on-chain anchor commits to the data; the strength of the origin guarantee comes from the off-chain attestation, which buyers fetch and verify against the anchor. Freshness and revocation can be layered on as a data type requires.

<Info>**Status.** The fields are being added to the new data registry so verifiability has a standardized place to live; the attestation layer (and zkTLS hardening) follows.</Info>
