Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Vana turns data into currency. Data submitted to the network becomes a digital asset that is transferable throughout Vana’s open data economy. Each data transaction includes the following metadata:
Data type
Format
Data properties
Reference to encrypted data storage location
Attestations about the validity of the data
Users choose how they store their (encrypted) data and, at all times, maintain control over the encryption keys of their data. However, Vana allows users to delegate control and access of their data to trusted data custodians within the network. Data Custodians act as intermediaries that host and store user data following strict data protection regulations. Data Custodians can lessen the burden for users when it comes to managing and transacting with their data assets.
When files are added to the Data Registry Contract, a small gas fee must be paid to write that information onchain.
The following section provide more details about the key elements of the Vana Network with focus on:
Data Transaction and how data is transformed into digital assets
Proof of Contribution to validate data
Incentives for DLP creators and Propagators to promote data liquidity and security
Governance for VANA holders to influence key decisions
Submit test data to see how your crypto wallet can manage permissions and to establish yourself as an early user. Choose a DLP, submit test data, and observe the network in action.
In order to participate in the Testnet, we encourage users to create new test accounts for their activities. This approach ensures that the data remains valid for testing purposes but minimizes potential risks associated with the use of personal information.
Save your real data for the Vana mainnet launch.
Do Not Upload Personal Data
Please do not upload personal data. Create a new test account instead to ensure data validity and protect your privacy.
Non-Migratory Data
All testnet data will not be migrated to mainnet. Do not expect your testnet data to be available on the mainnet.
Testing Purposes Only
Uploading data is for testing purposes only. No tokens have value on testnet.
Testnet participants can access DLPs and contribute their individual test data. Each DLP has specific requirements for the type of data needed, how to contribute it, and the rewards users will receive.
Some DLPs offer a frontend interface for data submission, while others require contributors to manually write their data contribution transactions.
Please join our Discord to get an overview over all existing DLPs and how to access them.
If you need some initial VANA token to pay for testnet transaction fees, a faucet is available at: https://faucet.vana.org/
Vana uses a Proof-of-Contribution (PoC) system to validate data submitted to the network. The PoC system functions to ensure the integrity and quality of data within Data Liquidity Pools (DLPs). Everyone's data is different, so to enable data liquidity, data must be mapped to some fungible asset.
Each DLP implements their own proof of contribution function based on their particular dataset. For example, r/datadao measured contributions based on amount of karma, and included an ownership check having users post a code in their reddit profile to confirm ownership. This proof-of-contribution check depends on the goals of the data liquidity pool and the best way to measure data contributions.
The proof-of-contribution function defines success for your data liquidity pool. If you do not want a particular kind of data in your DLP, but it passes or is rewarded by your proof-of-contribution function, then your proof-of-contribution function is not compete.
To validate data submissions, DLP Validators scan through the data transactions and assign a score using the DLP's contribution function. The function takes into account various data characteristics, such as completeness, accuracy, and relevance to the DLP’s purpose.
Each function depends on the constraints imposed by the DLP that receives the data contributions. As such, DLP Validators may impose their own unique functions to incentivize the type and quality of data they collect. This flexibility ensures efficient evaluation of data for each DLP while ensuring that data contributions are accurately evaluated.
One recommended implementation for DLP Proof-of-Contribution is to run a model influence function, which measures exactly how much new information a given data point teaches the AI model.
To protect the privacy of data contributions, great care has gone in to protecting the user's data. Validators can act as a trusted party and securely run PoC on user data. Read more about how Validators protect data in Data Privacy.
The PoC system supports zero-knowledge proofs. When a Data Contributor or Custodian submits data to the DLP, they generate a zero-knowledge proof that verifies the authenticity and integrity of the data, as well as its contribution to the DLP, without revealing its full contents. Read more about it in Zero-Knowledge Proof of Contribution.
The authenticity check aims to prove that the data submitted is authentic and not tampered with. The attack vector this aims to mitigate is submitting altered data to the DLP. For example, a malicious data contributor may add synthetically generated conversation history to their chats, making the data seem more valuable than it actually is. They may also alter their personal information, such as their birthday or when the account was created.
In the ChatGPT LDP, we rely on the email from OpenAI linking the user to their export to verify the authenticity of the data.
User requests a data export of their ChatGPT data.
Once they receive the "Your export is ready email", they download the zip file and copy the download link from the email.
In gptdatadao.org, along with uploading their zip file, they are asked to provide the download link. Both are encrypted such that only a DLP validator can see them.
The DLP validator receives the encrypted file and download link. They download and decrypt the file from the user's storage, as well as the one provided in the link. They calculate a checksum of both files and ensure they match, ensuring the zip that's uploaded to the user's storage has not been tampered with.
The ownership check aims to prove that the data contributor indeed owns the data they are submitting. The attack vector this prevents is a data contributor contributing someone else's data.
Specifically for the ChatGPT DLP, ownership is covered by the authenticity check, because it's difficult fake a unique link to download a ChatGPT export.
The quality check aims to prove that the data submitted is of high quality. If a data contributor submits a data export for a newly created account, the data will still be authentic and rightfully owned by the contributor, however, it is probably not very useful.
We leverage an LLM and sample conversations to determine the quality of the data.
When data is submitted to a validator, they take a few randomly sampled conversations and sends them to an LLM ( OpenAI in this case) and is prompted to determine the coherence and relevance of the conversation and score it from 0-100.
The scores from different conversations are then averaged, giving an idea of the quality of the data.
The uniqueness check aims to prove that the data submitted is unique. Similar to the authenticity check, this proof aims to thwart malicious data contributors who may submit the same data multiple times to the DLP.
We implement a model influence function that fingerprints a data point and compares it to other data points on the network.
The validator calculates a feature vector of the zip file by first getting a deterministic string representation of the file, and converting it to a feature vector. This is the fingerprint of that data point. If a slightly altered file is ran through this same process, it will produce a very similar fingerprint, unlike a hash, which will be vastly different even if 1 bit of the underlying data is changed.
The validator then records this on-chain so other validators are aware of the fingerprints of other data points in the network. They then build a local vector store of all existing data points.
After the fingerprint is calculated, it inserts the fingerprint into the local vector store and checks how similar it is to other fingerprints in the store. If it is too similar, it will reject the data point.
While Proof-of-contribution is different for different DLPs, some ideas outlined here can be applied to other DLPs. By checking authenticity, ownership, quality and uniqueness, the DLP creator can be sure that their data DAO consists of high-quality, meaningful data while preventing attackers who submit low-quality data.
Each DLP implements their own proof of contribution function based on their particular dataset. As an example, the handles Proof of Contribution via four categories below.
This method offers an efficient way to check similarity against all other files in the network. If you'd like to use this in your DLP, see for an example.
A zero-knowledge proof (ZKP) is a cryptographic method by which one party (the prover) can prove to another party (the verifier) that they know a value without conveying any information apart from the fact that they know the value. This means the verifier learns nothing about the value itself, only that the prover knows it.
To protect the privacy of data contributions, a DLP can implement a Proof of Contribution using ZKP. When a Data Contributor or Custodian submits data to the DLP, they generate a zero-knowledge proof that verifies the authenticity and integrity of the data and its contribution to the DLP without revealing its full contents.
To illustrate this example, imagine a DLP for ChatGPT data exports. The DLP considers a data point "valid" if the number of conversations inside the zip file exceeds 50. We can generate cryptographic proof that a file meets this requirement without revealing its contents (or even the exact number of conversations in the file).
To protect against tampering with the proof generation while maintaining privacy and ensuring the data doesn't leave the user's browser unencrypted, the proof is generated in a WebAssembly environment, which is much harder to tamper with than generating proofs in the browser in plain JavaScript.
We provide an example here: https://zk-proof-poc.vercel.vana.com/
The source code is available here: https://github.com/vana-com/zk-proof-poc
This is a draft and should not be relied upon as a legal promise or guarantee for future implementations.
Data Liquidity Pools (DLPs) are critical to Vana, as they incentivize and verify data coming into the network. Our core strategy is to gather the highest quality data, build the best AI models, and monetize them, providing a programmable way to work with user-owned data and build the frontiers of decentralized AI.
For two years following the Token Generation Event (TGE), DLP rewards are 17.5% of the Fully Diluted Valuation (FDV), distributed over twenty-four months:
On average, the top 16 DLPs will earn 0.547% of FDV, with top DLPs earning upwards of 1% of FDV. Vana has a slightly deflationary supply of 120 million tokens, similar to Ethereum.
Top DLPs can earn upwards of 1% of FDV
These rewards are designed to incentivize Data Liquidity Contributions similar to early Ethereum miners. We believe that to scale crypto adoption, it's important to invite new participants. Proof of stake networks have the downfall of only making the rich richer. Proof of work and proof of contribution networks are incredibly powerful, as they allow anyone to contribute to the network.
Participating in a data liquidity pool on Vana is like being an early Ethereum miner, and we've structured the rewards similarly.
DLPs control how they distribute rewards to the DLP creator, data token holders, and DLP stakers. This allows new DLPs to break into the top 16 by incentivizing stakers and issuing rewards to their token holders before directly monetizing the data.
Here's an example breakdown for a top DLP earning 1% of FDV in the first year:
The DLP creator chooses the initial rewards split. After launch, the split is decided by DLP token holders, forming a data DAO.
Propagators rely on Proof of Stake (PoS) to validate data transactions and maintain network security.
To participate, Propagators must stake a minimum amount of VANA tokens, which are held as collateral to ensure honest behaviour. Propagators are randomly selected to propose and validate blocks, with the selection probability proportional to their staked amount.
This process includes proposing new blocks, having them attested by other propagators, reaching consensus through a supermajority, and ensuring decentralization and fairness. Successful Propagators earn VANA rewards, while those who act maliciously may be penalized.
This is a draft and should not be relied upon as a legal promise or guarantee for future implementations.
DLP Staking in the Vana network allows users to stake their VANA tokens in support of their preferred DLPs, directly influencing which DLPs are eligible for block rewards.
Regular periodic snapshots are taken to ensure that the top 16 DLPs, determined by the amount of VANA staked, continue to reflect the community's choice.
This system ensures that DLPs that add value to those who are most invested in network growth (and therefore largest VANA holders) avail of DLP rewards. It also provides clear incentives for DLPs to continuously create value. At the same time, it fosters competition to drive high quality data by allowing new entrants, with community support, to overtake incumbents.
The parameter of the top 16 slots earning rewards is governable, and if the community votes to increase slots, it can be expanded to allow more DLPs to earn rewards for onboarding data onto the network. Vana is a permissionless network, so any new DLP can join the network, but only the top 16 DLPs by stake earn rewards.
The initial DLP slots at launch are allocated based on performance on Satori Testnet and the following criteria:
Total Transaction Fees (Gas Costs) created by the DLP
Total Number of Unique wallets with Verified Data Uploads
Amount Staked in the DLP
DLPs are encouraged to publicly launch earlier to ensure that they can start driving these metrics sooner.
To incentivize competition and ensure high-quality data within the DLPs, Data Liquidity Rewards are distributed based on weighted, normalized performance across the below metrics. Note that DLP Revenue and Trading Volume are currently set to 0% but will be adjusted as the Network continues to grow.
Total Transaction Fees (Gas Costs) created by the DLP (TFC) - 20%
Total Number of Unique wallets with Verified Data Uploads (VDU) - 20%
Amount Staked (AST) - 60%
DLP Revenue in $VANA (REV) - 0%
2 week avg DLP Trading Volume (DTV) - 0%
The score 𝑆𝑖 for each Data Liquidity Pool (DLP) 𝑖 is calculated by summing the weighted contributions of three metrics: TFC, VDU, and AST. Each metric is normalized by dividing it by the total of that metric across all DLPs, and then multiplied by its respective weight to reflect its importance as follows:
DLP Rewards | % of FDV | Tokens |
---|
% of rewards | % of FDV | Tokens | Role in DLP |
---|
The DLP Rewards apply to the top 16 DLP slots, prioritizing quality over quantity. These DLPs are selected through a staking mechanism where VANA token holders stake their tokens with DLPs they believe will perform well. The top 16 DLPs, ranked by total staked tokens, qualify for rewards, which they .
A detailed breakdown of the DLP Selection process can be found in the section.
Data Liquidity Pools (DLPs) transform raw data into valuable onchain assets, playing a crucial role in Vana's architecture. Given their central importance, it is logical for DLPs to be at the heart of community governance, allowing holders to directly influence key decisions.
Combined with the , DLP Governance ensures the sustainability of the ecosystem by aligning governance with contributions that support the network's growth.
The DLP Rewards apply to the top 16 DLP slots, prioritizing quality over quantity. These DLPs are selected through a staking mechanism where VANA token holders stake their tokens with DLPs they believe will perform well. The top 16 DLPs, ranked by total staked tokens, qualify for rewards, which they .
This competitive system ensures only the highest quality data feeds into Vana. Each DLP must demonstrate a robust "" system, validating the value of their collected data. This focus on data quality is crucial for Vana's long-term goal of building a user-owned AI model capable of outperforming advanced systems like GPT-6. Leading DLPs are set out in the .
DLPs are selected and maintained by a that aligns with Vana’s vision to prioritize community participation.
Year 1 | 8.75% | 10.5 million |
Year 2 | 8.75% | 10.5 million |
DLP Creator | 40% | 0.4% | 480,000 | Implements proof of contribution and a method for data contributors |
DLP Stakers | 20% | 0.2% | 240,000 | Puts stake behind the DLP based on its data value to Vana |
DLP Token Holders, including Data Contributors and Validators | 40% | 0.4% | 480,000 | Includes data contributors, validators, and token purchasers |