Proof of Contribution
A guide to Proof of Contribution (PoC), the core Vana framework for ensuring data integrity through verifiable, onchain assertions.
Proof of Contribution (PoC) is a verifiable set of assertions that a Data Validator makes about a piece of contributed data by running a DataDAO's registered PoC function. It is the core mechanism that underpins the integrity, quality, and ultimate value of any dataset aggregated by a DataDAO.
A well-designed PoC is one of the most critical components for a successful DataDAO, as it ensures that every data point measurably adds value to the collective dataset.
The PoC is essentially mapping each data point, which is non fungible, to a fungible score which can be used to distribute VRC-20 data tokens proportional to how much the data point added to the dataset.
A Framework for Data Integrity
A strong PoC is built on verifiable assertions. The specific assertions will vary for every DataDAO, but they often address common dimensions of data integrity. The following are foundational examples to use as a starting point for your design.
- Authenticity: Is the data what it claims to be and from the source it claims to be from? This is crucial for datasets with a ground truth, like financial records or verified usage data.
- Ownership: Does the contributor have the legitimate right to contribute this data? This is typically verified through wallet signatures and control over the source account.
- Uniqueness: Is this data novel, or is it a duplicate or minor variation of existing data? This prevents spam and ensures each contribution adds new value.
- Quality: Is the data useful and relevant? This is the most use-case-specific dimension, measuring everything from the coherence of text to the resolution of an image.
A DataDAO may use some, all, or none of these, and may invent entirely new dimensions. For example, a dataset of self-submitted user surveys has no "authenticity" to check against a ground truth, but its quality and uniqueness are paramount.
Case Study: PoC for a ChatGPT DataDAO
To make these concepts concrete, let's examine how a DataDAO for ChatGPT conversations might apply this framework.
Goal: To verify that a user's uploaded ChatGPT data export is authentic, owned, unique, and of high quality.
-
1. Asserting Authenticity & Ownership
A contributor provides their exported.zip
file and the unique download link from their "export is ready" email from OpenAI. The Data Validator verifies authenticity by comparing the checksum of the user-uploaded file with the checksum of the file downloaded directly from OpenAI's link. Ownership is implicitly proven by the user's ability to provide this unique, account-specific link. -
2. Asserting Uniqueness
To prevent duplicate submissions, the validator can "fingerprint" the data. One advanced technique is to generate a feature vector from the conversation data, a method explored in research on model influence functions. This fingerprint is then compared against a vector store of all previously submitted data to flag similarities. For a practical implementation of a similarity check, see this example. -
3. Asserting Quality
The validator can assess quality by taking sample conversations from the data and evaluating them with another language model. The model can score the conversations for metrics like coherence, relevance, and length, producing a final quality score for the entire contribution.
Designing Your PoC
The PoC for every DataDAO will be different because the definition of "important data" is different. The core task of the DataDAO creator is to think deeply about what assertions matter for their dataset and then design a programmatic, verifiable process for a Data Validator to enforce them.
Updated 1 day ago