Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Vana uses a Proof-of-Contribution (PoC) system to validate data submitted to the network. The PoC system functions to ensure the integrity and quality of data within Data Liquidity Pools (DLPs). Everyone's data is different, so to enable data liquidity, data must be mapped to some fungible asset.
Each DLP implements their own proof of contribution function based on their particular dataset. For example, r/datadao measured contributions based on amount of karma, and included an ownership check having users post a code in their reddit profile to confirm ownership. This proof-of-contribution check depends on the goals of the data liquidity pool and the best way to measure data contributions.
The proof-of-contribution function defines success for your data liquidity pool. If you do not want a particular kind of data in your DLP, but it passes or is rewarded by your proof-of-contribution function, then your proof-of-contribution function is not complete.
To validate data submissions, DLP Validators scan through the data transactions and assign a score using the DLP's contribution function. The function takes into account various data characteristics, such as completeness, accuracy, and relevance to the DLP’s purpose.
Each function depends on the constraints imposed by the DLP that receives the data contributions. As such, DLP Validators may impose their own unique functions to incentivize the type and quality of data they collect. This flexibility ensures efficient evaluation of data for each DLP while ensuring that data contributions are accurately evaluated.
One recommended implementation for DLP Proof-of-Contribution is to run a model influence function, which measures exactly how much new information a given data point teaches the AI model.
To protect the privacy of data contributions, great care has gone in to protecting the user's data. Validators can act as a trusted party and securely run PoC on user data. Read more about how Validators protect data in Data Privacy.
The PoC system supports zero-knowledge proofs. When a Data Contributor or Custodian submits data to the DLP, they generate a zero-knowledge proof that verifies the authenticity and integrity of the data, as well as its contribution to the DLP, without revealing its full contents. Read more about it in Zero-Knowledge Proof of Contribution.
Submit test data to see how your crypto wallet can manage permissions and to establish yourself as an early user. Choose a DLP, submit test data, and observe the network in action.
In order to participate in the Testnet, we encourage users to create new test accounts for their activities. This approach ensures that the data remains valid for testing purposes but minimizes potential risks associated with the use of personal information.
Save your real data for the Vana mainnet launch.
Do Not Upload Personal Data
Please do not upload personal data. Create a new test account instead to ensure data validity and protect your privacy.
Non-Migratory Data
All testnet data will not be migrated to mainnet. Do not expect your testnet data to be available on the mainnet.
Testing Purposes Only
Uploading data is for testing purposes only. No tokens have value on testnet.
Testnet participants can access DLPs and contribute their individual test data. Each DLP has specific requirements for the type of data needed, how to contribute it, and the rewards users will receive.
Some DLPs offer a frontend interface for data submission, while others require contributors to manually write their data contribution transactions.
Please join our Discord to get an overview over all existing DLPs and how to access them.
If you need some initial VANA token to pay for testnet transaction fees, a faucet is available at: https://faucet.vana.org/
Vana turns data into currency. Data submitted to the network becomes a digital asset that is transferable throughout Vana’s open data economy. Each data transaction includes the following metadata:
Data type
Format
Data properties
Reference to encrypted data storage location
Attestations about the validity of the data
Users choose how they store their (encrypted) data and, at all times, maintain control over the encryption keys of their data. However, Vana allows users to delegate control and access of their data to trusted data custodians within the network. Data Custodians act as intermediaries that host and store user data following strict data protection regulations. Data Custodians can lessen the burden for users when it comes to managing and transacting with their data assets.
When files are added to the Data Registry Contract, a small gas fee must be paid to write that information onchain.
This is a draft and should not be relied upon as a legal promise or guarantee for future implementations.
Data Liquidity Pools (DLPs) transform raw data into valuable onchain assets, playing a crucial role in Vana's architecture. Given their central importance, it is logical for DLPs to be at the heart of community governance.
Combined with the , DLP Governance ensures the sustainability of the ecosystem by aligning governance with contributions that support the network's growth.
The DLP Rewards apply to the top 16 DLP slots, prioritizing quality over quantity. These DLPs are selected through a staking mechanism where VANA token holders stake their tokens with DLPs they believe will perform well. The top 16 DLPs, ranked by total staked tokens, qualify for rewards, which they .
This competitive system ensures only the highest quality data feeds into Vana. Each DLP must demonstrate a robust "" system, validating the value of their collected data. This focus on data quality is crucial for Vana's long-term goal of building a user-owned AI model capable of outperforming advanced systems like GPT-6. Leading DLPs are set out in the .
DLP Staking in the Vana network allows users to stake their VANA tokens in support of their preferred DLPs, directly influencing which DLPs are eligible for block rewards.
Regular periodic snapshots are taken to ensure that the top 16 DLPs, determined by the amount of VANA staked, continue to reflect the community's choice.
This system ensures that DLPs that add value to those who are most invested in network growth (and therefore largest VANA holders) avail of DLP rewards. It also provides clear incentives for DLPs to continuously create value. At the same time, it fosters competition to drive high quality data by allowing new entrants, with community support, to overtake incumbents.
The parameter of the top 16 slots earning rewards is governable, and if the community votes to increase slots, it can be expanded to allow more DLPs to earn rewards for onboarding data onto the network. Vana is a permissionless network, so any new DLP can join the network, but only the top 16 DLPs by stake earn rewards.
At launch, the initial DLP slots are allocated based on performance on Satori and Moksha Testnets and key criteria that reflect their contribution to the network. DLPs are encouraged to publicly launch earlier to ensure that they can start driving these metrics sooner.
DLPs are selected and maintained by a system of governance that aligns with Vana's vision to prioritize community participation. To incentivize competition and ensure high-quality data within the DLPs, Data Liquidity Rewards are distributed based on weighted, normalized performance across multiple metrics.
Exact metrics and their weights will be defined by the Vana DAO participants.
This comprehensive approach ensures that DLPs are evaluated on their overall contribution to the network's growth and sustainability.
The following section provide more details about the key elements of the Vana Network with focus on:
and how data is transformed into digital assets
to validate data
for DLP creators and Propagators to promote data liquidity and security
for holders to influence key decisions
Each DLP implements their own proof of contribution function based on their particular dataset. As an example, the handles Proof of Contribution via four categories below.
The authenticity check aims to prove that the data submitted is authentic and not tampered with. The attack vector this aims to mitigate is submitting altered data to the DLP. For example, a malicious data contributor may add synthetically generated conversation history to their chats, making the data seem more valuable than it actually is. They may also alter their personal information, such as their birthday or when the account was created.
In the ChatGPT DLP, we rely on the email from OpenAI linking the user to their export to verify the authenticity of the data.
User requests a data export of their ChatGPT data.
Once they receive the "Your export is ready email", they download the zip file and copy the download link from the email.
In gptdatadao.org, along with uploading their zip file, they are asked to provide the download link. Both are encrypted such that only a DLP validator can see them.
The DLP validator receives the encrypted file and download link. They download and decrypt the file from the user's storage, as well as the one provided in the link. They calculate a checksum of both files and ensure they match, ensuring the zip that's uploaded to the user's storage has not been tampered with.
The ownership check aims to prove that the data contributor indeed owns the data they are submitting. The attack vector this prevents is a data contributor contributing someone else's data.
Specifically for the ChatGPT DLP, ownership is covered by the authenticity check, because it's difficult fake a unique link to download a ChatGPT export.
The quality check aims to prove that the data submitted is of high quality. If a data contributor submits a data export for a newly created account, the data will still be authentic and rightfully owned by the contributor, however, it is probably not very useful.
We leverage an LLM and sample conversations to determine the quality of the data.
When data is submitted to a validator, they take a few randomly sampled conversations and sends them to an LLM ( OpenAI in this case) and is prompted to determine the coherence and relevance of the conversation and score it from 0-100.
The scores from different conversations are then averaged, giving an idea of the quality of the data.
The uniqueness check aims to prove that the data submitted is unique. Similar to the authenticity check, this proof aims to thwart malicious data contributors who may submit the same data multiple times to the DLP.
We implement a model influence function that fingerprints a data point and compares it to other data points on the network.
The validator calculates a feature vector of the zip file by first getting a deterministic string representation of the file, and converting it to a feature vector. This is the fingerprint of that data point. If a slightly altered file is ran through this same process, it will produce a very similar fingerprint, unlike a hash, which will be vastly different even if 1 bit of the underlying data is changed.
The validator then records this on-chain so other validators are aware of the fingerprints of other data points in the network. They then build a local vector store of all existing data points.
After the fingerprint is calculated, it inserts the fingerprint into the local vector store and checks how similar it is to other fingerprints in the store. If it is too similar, it will reject the data point.
While Proof-of-contribution is different for different DLPs, some ideas outlined here can be applied to other DLPs. By checking authenticity, ownership, quality and uniqueness, the DLP creator can be sure that their data DAO consists of high-quality, meaningful data while preventing attackers who submit low-quality data.
The Vana network relies on several key smart contracts to facilitate data liquidity.
The data registry contract functions as a central repository for managing all data within the network, functioning as a comprehensive file catalog. It allows users to add new files to the system, with each file receiving a unique identifier for future reference.
The contract manages access control for these files, enabling file owners to grant specific addresses permission to access their files. It also handles the storage of file metadata, including any offchain proofs or attestations related to file validation, which can include various metrics such as authenticity, ownership, and quality scores. Users can retrieve detailed information about any file in the registry using its unique identifier, including its permissions and associated proofs.
Moksha:
Satori:
The TEE Pool contract manages and coordinates the and serves as an escrow for holding fees associated with validation tasks. Users pay a fee to submit data for validation, and the contract ensures that the validators process the data and provide proof of validation. The contract also allows the owner to add or remove validators, and it securely holds and disburses the fees related to these validation services.
Moksha:
Satori:
The DLP Root contract manages the registration and reward distribution for Data Liquidity Pools (DLPs) in the Vana ecosystem. It operates on an epoch-based system, where the top 16 most staked DLPs and their stakers receive rewards at the end of each epoch. The contract allows users to stake VANA tokens as guarantors for DLPs, with rewards distributed based on the staking position at the beginning of each epoch.
To prevent exploitation, the contract implements a minimum staking period and requires stakers to claim their rewards manually. DLP owners can set custom reward percentages to attract more stakers, potentially securing a position in the top 16. The system also allows for multi-DLP staking and requires an initial minimum stake from DLP owners for registration.
Moksha:
This is a draft and should not be relied upon as a legal promise or guarantee for future implementations.
Data Liquidity Pools (DLPs) are critical to Vana, as they incentivize and verify data coming into the network. Our core strategy is to gather the highest quality data, build the best AI models, and monetize them, providing a programmable way to work with user-owned data and build the frontiers of decentralized AI.
For two years following the Token Generation Event (TGE), DLP rewards are 17.5% of the Fully Diluted Valuation (FDV), distributed over twenty-four months:
DLP Rewards | % of FDV | Tokens |
---|
On average, the top 16 DLPs will earn 0.547% of FDV, with top DLPs earning upwards of 1% of FDV. Vana has a slightly deflationary supply of 120 million tokens, similar to Ethereum.
Top DLPs can earn upwards of 1% of FDV
These rewards are designed to incentivize Data Liquidity Contributions similar to early Ethereum miners. We believe that to scale crypto adoption, it's important to invite new participants. Proof of stake networks have the downfall of only making the rich richer. Proof of work and proof of contribution networks are incredibly powerful, as they allow anyone to contribute to the network.
Participating in a data liquidity pool on Vana is like being an early Ethereum miner, and we've structured the rewards similarly.
DLPs control how they distribute rewards to the DLP creator, data token holders, and DLP stakers. This allows new DLPs to break into the top 16 by incentivizing stakers and issuing rewards to their token holders before directly monetizing the data.
Here's an example breakdown for a top DLP earning 1% of FDV in the first year:
% of rewards | % of FDV | Tokens | Role in DLP |
---|
The DLP creator chooses the initial rewards split. After launch, the split is decided by DLP token holders, forming a data DAO.
L1 Validators rely on Proof of Stake (PoS) to validate data transactions and maintain network security.
To participate, L1 Validators must stake a minimum amount of VANA tokens, which are held as collateral to ensure honest behaviour. L1 Validators are randomly selected to propose and validate blocks, with the selection probability proportional to their staked amount.
This process includes proposing new blocks, having them attested by other propagators, reaching consensus through a supermajority, and ensuring decentralization and fairness. Successful L1 Validators earn VANA rewards, while those who act maliciously may be penalized.
A zero-knowledge proof (ZKP) is a cryptographic method by which one party (the prover) can prove to another party (the verifier) that they know a value without conveying any information apart from the fact that they know the value. This means the verifier learns nothing about the value itself, only that the prover knows it.
To protect the privacy of data contributions, a DLP can implement a Proof of Contribution using ZKP. When a Data Contributor or Custodian submits data to the DLP, they generate a zero-knowledge proof that verifies the authenticity and integrity of the data and its contribution to the DLP without revealing its full contents.
To illustrate this example, imagine a DLP for ChatGPT data exports. The DLP considers a data point "valid" if the number of conversations inside the zip file exceeds 50. We can generate cryptographic proof that a file meets this requirement without revealing its contents (or even the exact number of conversations in the file).
To protect against tampering with the proof generation while maintaining privacy and ensuring the data doesn't leave the user's browser unencrypted, the proof is generated in a WebAssembly environment, which is much harder to tamper with than generating proofs in the browser in plain JavaScript.
This method offers an efficient way to check similarity against all other files in the network. If you'd like to use this in your DLP, see for an example.
Satori:
The DLP Rewards apply to the top 16 DLP slots, prioritizing quality over quantity. These DLPs are selected through a staking mechanism where VANA token holders stake their tokens with DLPs they believe will perform well. The top 16 DLPs, ranked by total staked tokens, qualify for rewards, which they .
A detailed breakdown of the DLP Selection process can be found in the section.
We provide an example here:
The source code is available here:
Year 1 | 8.75% | 10.5 million |
Year 2 | 8.75% | 10.5 million |
DLP Creator | 40% | 0.4% | 480,000 | Implements proof of contribution and a method for data contributors |
DLP Stakers | 20% | 0.2% | 240,000 | Puts stake behind the DLP based on its data value to Vana |
DLP Token Holders, including Data Contributors and Validators | 40% | 0.4% | 480,000 | Includes data contributors, validators, and token purchasers |