Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Our north star is to empower users to own their data and the value it creates. We believe data will power the AI economic shift over the next decade. Giving users true ownership of their data opens up walled gardens and pushes AI progress forward through data abundance.
We apply the sovereign, decentralized technology that powers bitcoin and ethereum to personal data, shifting power from monopolistic big tech and distributing it back into the hands of the users who created the data. Vana provides the infrastructure to generate user-owned datasets that can replicate and supersede the datasets that big tech companies are today selling for hundreds of millions of dollars.
We live under data serfdom, where we create valuable data and see no economic upside in the value we've helped create.
OpenAI, Meta, and Google all train models on publicly scraped data from the searchable internet, and are starting to buy up private data as they need access to more training data. Reddit for example, earns $200M from selling user-generated content as AI training data. As AI starts to play a larger and larger economic role in society, the economic impact of this data will grow.
We risk heading towards a future where AI models trained on our data displace us, with the economic gains flowing to a small set of shareholders. At the same time, we hold all the power through our data, but it requires collective action.
We believe that you should own a piece of the AI models that your data helps create.
With Vana, users and developers can incentivize global data contribution and accelerate the development of user-owned data applications, AI models, and data liquidity pools. These use cases have guided our architecture:
Incentivize 100 million people to export their Google, Facebook, Instagram, and Reddit data to create the first user-owned data treasury.
Vana enables non-custodial data storage, attributes voting rights based on data contributions, and verifies the legitimacy of data to ensure quality.
Each user adds their data to their personal server and grants access to a trusted verifier. Users then contribute their data to a collective server by encrypting it with the server's public key. The collective server operates according to rules set by the data contributors.
Build a model owned and governed by 100 million data-contributing users.
Vana stores model weights in a non-custodial way, secures distributed training on private data, allows users to earn through model usage, and enables collective governance of the model.
Users train a piece of the model on their personal servers and grant access to the foundation model DAO to merge all individual pieces. The foundation model DAO evaluates the value each person's data contributes and rewards them with a model-specific token. Developers interact with the model API by burning this token.
Vana is a decentralized data liquidity network designed to establish the first trustless open data economy.
Vana turns data into currency to push the frontiers of decentralized AI. It is a layer one blockchain designed for private, user-owned data. It allows users to collectively own, govern, and earn from the AI models trained on their data. For more context on why we built Vana, see this blog post.
At its core, Vana is a data liquidity network. It makes data liquid by solving the double spend problem for data, ensuring that data can be used like a financial asset in flexible, modular ways. This is achieved through two mechanisms:
Proof-of-contribution, which verifies the value of private data in a privacy-preserving manner
Non-custodial data, which ensures that the data is only used for approved operations
These mechanisms create a trustless environment where data can be securely tokenized, traded, and utilized for AI training without compromising user privacy or control. This paradigm shift not only democratizes AI development but also introduces a new economic model where data contributors become active stakeholders in the AI value chain.
Vana aligns incentives between data owners, developers, and data consumers. It creates a data-powered economy owned by its participants rather than centralized entities.
Learn more about the core concepts of the Vana Network by exploring these sections:
To participate in the network, you can:
Here is an overview of the top .
How to establish ownership of the AI models created using our data
Empower users to own their data and the value it creates by decentralized technology
Build a User-Owned Data Treasury and a User-Owned Foundation Model
Understand the core building blocks of the Vana ecosystem
Explore the different participants and their role in the Vana network
Understand how data is transformed and validated and incitives will work
Build your own DLP based on provided templates and deploy to the Vana Network
Start the validation of data for specific DLPs on your own hardware
Submit your data to a DLP and observe your contribution onchain
The Data Liquidity Layer is where data is contributed, validated, and recorded to the network into data liquidity pools (DLPs). Here, DLP creators deploy DLP smart contracts with specific data contribution objectives, including the DLP’s purpose, validation method, and data contribution parameters.
In the Data Liquidity Layer, data contributors and custodians submit data to DLPs for validation and receive both governance rights and rewards for their data contributions based on parameters set out in the DLP validation process.
The purpose of the Data Liquidity Layer is to bring data onchain and facilitate data transactions among Data Contributors, Custodians, and DLP Validators. This network layer organizes all data collection and management for user and developer accessibility throughout the ecosystem.
Vana is a decentralized network enabling global data creation and advancing AI innovation within an open data economy.
The network relies on three key concepts.
Vana consists of a Data Liquidity Layer, designed to introduce data on-chain as transferable digital assets, and an Application Layer (the so-called Data Portability Layer) for user-owned data applications powered by verified data. The universal connectome is a permissionless, real-time map of data flowing through the ecosystem.
Collective, pooled data is much more valuable than individual data as it can be used as AI training data. The Data Liquidity Layer facilitates the contribution, validation, and recording of data into Data Liquidity Pools (DLPs) via smart contracts.
DLP creators set specific objectives, including the purpose, validation method, and contribution parameters. Data contributors and custodians submit data for validation, receiving governance rights and rewards based on the DLP's parameters.
The primary goal is to bring data onchain, enabling transactions among Data Contributors, Custodians, and Validators, thereby organizing data collection and management for user and developer accessibility within the ecosystem.
Data Liquidity Pools (DLPs) aggregate similar types of data submitted to the data liquidity layer by Data Contributors and Custodians. Each DLP is an individual peer-to-peer network that uses DLP Validators to ensure data integrity through Vana’s proof-of-contribution system. DLPs provide trustless, private, and attributable data liquidity pools from which users and developers can:
Govern the use of their data in model training and application development.
Build community-focused applications in Vana’s application layer.
Sell data liquidity access to Data Consumers for model training.
The Data Portability Layer, or Application Layer, is a collaborative space for Data Contributors and developers to build applications using data from DLPs. It provides the infrastructure for training user-owned models and developing AI dApps. This layer acts as a hub where online communities and developers create economic value from data, fostering an interactive ecosystem. Data contributors benefit from the network effects and value generated by the intelligence derived from their data.
The Connectome is a decentralized ledger that records real-time data transactions in Vana’s ecosystem using proof-of-stake consensus. It ensures valid DLP token transfers and enables cross-DLP data access for user-owned applications. External parties can view and monitor these transactions. The Connectome is EVM-compatible, allowing interoperability with other EVM networks, protocols, and DeFi applications.
The Data Portability Layer or Application Layer is an open data playground for Data Contributors and developers to collaborate and build applications with the data liquidity amassed by DLPs. As the Data Liquidity Layer verifiably brings data onchain, the Application Layer provides the infrastructure for the distributed training of user-owned foundation models and the development of novel AI dApps.
The Application Layer functions as an active data hub, where online communities can collaborate with developers to create real economic value from their data. This fosters an interactive data creation ecosystem, where data contributors benefit from the downstream network effects and value emerging from the intelligence that their data helps to create.
Vana makes data non-custodial by running operations in a personal server
Vana uses a personal server architecture to allow users and data collectives to store private information off-chain and make it non-custodial. You may be familiar with this architecture through Solid Project. The basic idea is that you have some off-chain compute environment where you can execute code that works with private data. You can think of this as a safe little home to execute code that requires your private data, and follows the data permissions rules you have laid out.
There is no expectation that every user runs their own personal server - it is simply a secure environment where you can work with private data. It can be secure because it's on your macbook, or secure because you trust the third party provider, similar to Infura for ethereum. You can also just use someone's browser as a lightweight personal server, generating proofs or encrypting data client-side so it is fully secured as soon as it leaves their device.
Personal servers exist for users and for collectives. For example, I have a personal server that runs on my Macbook. It has all of my private platform data - my messages, my journal entries, my emails, my writing - and allows me to run personalized inference on the macbook and return the output to a third party application without the data leaving my machine.
Personal servers follow a set of rules set out in a data permissions smart contract. As a simple example, here is a data permissions contract with a whitelist.
Bring your data and models across applications
With Vana, you can bring your own models and private data across applications. You can also use any models or data on collective servers that you have access to.
For example, here is the OAuth screen for Chirper AI. By logging in with Vana, you can bring your models and data to the application. If you prefer, you can authenticate using a crypto wallet (experimental). You choose the level of access to grant the application. You can grant access to only inference of your personal model, keeping the underlying data private, or you can grant full access to allow the app to have a copy of your data.
If you are a developer who would like to add support for Vana in your application, please see the API Integration docs for detailed steps.
Today, we offer two options for a personal server:
Self-host on a Macbook (M1 chip or better required)
Hosted application in partnership with Replicate
Vana supports a non-custodial architecture in which users and data collectives can store private information off-chain. Personal servers allow users to store personal data and train models on their data in a secure local environments. Urbit and Solid Project were the first to pioneer similar architectures. Vana uses it specifically for private, personal data storage. To run your personal server, get started .
The code to self-host your personal server is provided on our GitHub. Alternatively, you can run a pre-built executable also found there.
We have partnered with Replicate to offer a hosted version of the personal server. It contains a personal vector database with your private information, an audio model trained on your voice, and an image model trained on your photos. You can create it through our consumer-friendly application at app.vana.com
. Note that there is a paywall to cover compute costs.
If you are a developer, you can use the development version at https://development-app.vana.com/
, and bypass the paywall using the test card number 4242 4242 4242 4242
, choosing any future date for the expiry, and using any three digits for the CVC code. The development environment will be slower as there are fewer GPUs assigned.
Store private data off-chain while permissioning it onchain
How can we ensure private data is stored off-chain while managing permissions onchain?
Depending on a DLPs tokenomics, they may want to hold user data in escrow. Other DLPs may use liveness checks and rewards to ensure that user data remains connected, even without holding an encrypted copy of it.
For DLPs and other user-owned data apps that require access to a collective dataset or model weights, Vana uses a secured compute node with asymmetric encryption. Users contribute their data by encrypting it client-side with the node's public key. It is decrypted with the corresponding private key held securely on the node.
The trusted compute node can only run code that is approved by the collective. With a community-owned dataset, for example, for a company to pay for access to train on the data, the company (or someone on their behalf) puts up a pull request with the code that they need to run on the secure node. This could include code to transmit the data to them, or code to train the model. Note the trusted compute node does not have access to a GPU, so to train larger models, the data requester must setup a secure compute node that that its own private key, then request that an encrypted copy is sent to that heavy compute node. Then, the data requester submits a proposal to the DAO describing what they would like to do.
If the DAO approves the proposal (and code), the pull request is merged and deployed to the node. Here is the code running on the node. However, this still requires trusting that the node operator(s) will adhere to the approved code and not introduce any vulnerabilities.
Our current approach is a step towards decentralization, but still relies on trust in the compute node and code approval process. If the node operator, which at this point is only Vana, were to deploy malicious code to the server, or otherwise compromise the private key that can decrypt the data, then the node operator would be able to access the underlying data.
We are eager to add more secure and decentralized options as fully homomorphic encryption, distributed training, and other privacy-preserving technologies mature. We looked at implementing ways to reduce the trust required by splitting model weights or splitting the dataset across many machines in a way that preserves privacy, but have not found a satisfactory solution, so rely on a trusted compute node today. For example, petals.dev is doing great work letting many users load a small part of the model for distributed inference.
The Connectome allows external parties to view and monitor data transactions throughout the network. Moreover, the Connectome is EVM-compatible allowing for interoperability with other EVM networks, protocols, and DeFi applications.
The Connectome is a decentralized ledger of real-time data transactions throughout Vana’s ecosystem. Using proof-of-stake consensus, Connectome propagate within the network. This ensures that DLP token transfers are valid and allows for cross-DLP data access by user-owned data applications.
Each participant in the Vana network plays a vital role in maintaining Vana’s open data ecosystem. These participants include:
Data Custodians, and
The Vana network provides building blocks that can be used to create a Data Liquidity Pool (DLP) tailored for collecting any kind of meaningful data.
Formally, a DLP is a smart contract registered with and approved by the root network.
DLPs must meet the root network's standard to be activated. To do so, each DLP onboards a class of data assets, creating liquidity, and guarantees that data is secure, ensuring privacy.
Typically, validators uphold the integrity of a DLP by evaluating the data for its usefulness and validity. Data comes in all shapes and sizes, so DLPs have a lot of freedom to operate in a way that best fits the data specific to that pool.
The root network is responsible for governing the DLPs in the network. A DLP owner must register their DLP with the root network before it is considered active on the network. If the DLP is in the top 16, the root network distributes rewards to these DLPs.
Users can also stake in a specific DLP, which helps the DLP secure a spot in the top 16. Users that stake in a top 16 DLP will receive a portion of the rewards earned by that DLP.
Validators are critical for maintaining the integrity, value, and trustworthiness of data within the network. Depending on the DLP's architecture, different validators may be required.
TEE Validators - the recommended approach, a group of confidential validators that can perform validation for any DLP
DLP Validators - a group of validators specific to a DLP, useful when TEE validators cannot be used
No Validators - some DLPs may not require validators at all
For most DLPs, TEE Validators are the way to go. They offer a simpler DLP architecture, one where DLPs focus on building out their proof-of-contribution. TEE Validators are DLP agnostic and can run PoC from any DLP.
In specific instances, when running PoC is not possible within the limits of the TEE Validators, a DLP can opt to deploy its own network of DLP validators.
Some DLPs may not require validators; for example, a DLP that generates the proof-of-contribution entirely on the client side using ZK Proofs.
Propagators are essential for ensuring the composability and security of the network.
Consolidate and write data transactions onto the Connectome
Verify the validity of network transactions.
Maintain network state liveness through accurate and timely block creation.
The Moksha testnet will be released as a Proof-of-Stake (PoS) for propagators to ensure the testnet remains stable for data liquidity pools.
To run a reliable, performant node, we suggest that the node’s hardware profile should match or exceed the following specifications:
8-core CPU
32 GB RAM
1.2 TB high-speed SSD
x86-64 architecture
2-core CPU
8 GB RAM
100 GB high-speed SSD
x86-64 architecture
These hardware requirements are rough guidelines, and each node operator should monitor their node to ensure good performance for the intended task.
We will release more information on how to run a propagator node soon.
Propagators earn transaction fees and block rewards in the form of for processing network transactions and finalizing blocks.
Satori testnet is currently run as Proof-of-Authority (PoA) chain for now. More information will be shared, as the testnet progresses. Please join our for more information.
Data Consumers pay for access to DLPs and user-owned data apps within the ecosystem. They create demand for data contribution and ecosystem growth.
Provide demand for DLP creation and data collection.
Pay fees to access DLP data banks for external services.
Propose ideas and data needs to DLP Creators.
We expect big tech companies will fill the role of Data Consumers and request access to niche data on the Vana network (if the users allow) to push the frontiers of their models. We also expect the Vana architecture to enable new model creators to finance and aggregate data for their projects.
Data Custodians increase the efficiency of data contributions within the Vana network by aggregating data from data contributors before it is submitted to a DLP. DLPs might opt to leverage Data Custodians to lower the barrier of entry for Data Contributors to share their data with DLPs.
Act as intermediaries for Data Contributors to host their data.
Simplify the process for Data Contributors to contribute data to DLPs.
Ensure the safekeeping and accessibility of data and provide encryption to protect data from exploits.
Comply with data protection regulations (GDPR, CCPA), ensuring data is hosted, stored, and accessed within authorized jurisdictions only.
Generate zk-proofs for validators to verify the authenticity, integrity, and contribution of the data.
Data Custodians provide an optional service to simplify the data contribution process. Data Contributors share revenue with Custodians when they subscribe to their services.
Data Contributors are the foundation of the system, supplying the valuable data on which the network is built.
Submit data to network DLPs for token rewards and governance rights.
Allow personal data to be written onto the Vana network for use across different DLPs and user-owned data apps.
Pay transaction fees in VANA and additional fees to validators to maintain network stability.
Data Contributors provide their data to the network in return for DLP token rewards and ownership. Data contributors can also directly use their verified data in a dApp.
The following section provide more details about the key elements of the Vana Network with focus on:
Data Transaction and how data is transformed into digital assets
Proof of Contribution to validate data
Incentives for DLP creators and Propagators to promote data liquidity and security
Governance for VANA holders to influence key decisions
Submit test data to see how your crypto wallet can manage permissions and to establish yourself as an early user. Choose a DLP, submit test data, and observe the network in action.
In order to participate in the Testnet, we encourage users to create new test accounts for their activities. This approach ensures that the data remains valid for testing purposes but minimizes potential risks associated with the use of personal information.
Save your real data for the Vana mainnet launch.
Do Not Upload Personal Data
Please do not upload personal data. Create a new test account instead to ensure data validity and protect your privacy.
Non-Migratory Data
All testnet data will not be migrated to mainnet. Do not expect your testnet data to be available on the mainnet.
Testing Purposes Only
Uploading data is for testing purposes only. No tokens have value on testnet.
Testnet participants can access DLPs and contribute their individual test data. Each DLP has specific requirements for the type of data needed, how to contribute it, and the rewards users will receive.
Some DLPs offer a frontend interface for data submission, while others require contributors to manually write their data contribution transactions.
Please join our Discord to get an overview over all existing DLPs and how to access them.
If you need some initial VANA token to pay for testnet transaction fees, a faucet is available at: https://faucet.vana.org/
Vana turns data into currency. Data submitted to the network becomes a digital asset that is transferable throughout Vana’s open data economy. Each data transaction includes the following metadata:
Data type
Format
Data properties
Reference to encrypted data storage location
Attestations about the validity of the data
Users choose how they store their (encrypted) data and, at all times, maintain control over the encryption keys of their data. However, Vana allows users to delegate control and access of their data to trusted data custodians within the network. Data Custodians act as intermediaries that host and store user data following strict data protection regulations. Data Custodians can lessen the burden for users when it comes to managing and transacting with their data assets.
When files are added to the Data Registry Contract, a small gas fee must be paid to write that information onchain.
The authenticity check aims to prove that the data submitted is authentic and not tampered with. The attack vector this aims to mitigate is submitting altered data to the DLP. For example, a malicious data contributor may add synthetically generated conversation history to their chats, making the data seem more valuable than it actually is. They may also alter their personal information, such as their birthday or when the account was created.
In the ChatGPT DLP, we rely on the email from OpenAI linking the user to their export to verify the authenticity of the data.
User requests a data export of their ChatGPT data.
Once they receive the "Your export is ready email", they download the zip file and copy the download link from the email.
In gptdatadao.org, along with uploading their zip file, they are asked to provide the download link. Both are encrypted such that only a DLP validator can see them.
The DLP validator receives the encrypted file and download link. They download and decrypt the file from the user's storage, as well as the one provided in the link. They calculate a checksum of both files and ensure they match, ensuring the zip that's uploaded to the user's storage has not been tampered with.
The ownership check aims to prove that the data contributor indeed owns the data they are submitting. The attack vector this prevents is a data contributor contributing someone else's data.
Specifically for the ChatGPT DLP, ownership is covered by the authenticity check, because it's difficult fake a unique link to download a ChatGPT export.
The quality check aims to prove that the data submitted is of high quality. If a data contributor submits a data export for a newly created account, the data will still be authentic and rightfully owned by the contributor, however, it is probably not very useful.
We leverage an LLM and sample conversations to determine the quality of the data.
When data is submitted to a validator, they take a few randomly sampled conversations and sends them to an LLM ( OpenAI in this case) and is prompted to determine the coherence and relevance of the conversation and score it from 0-100.
The scores from different conversations are then averaged, giving an idea of the quality of the data.
The uniqueness check aims to prove that the data submitted is unique. Similar to the authenticity check, this proof aims to thwart malicious data contributors who may submit the same data multiple times to the DLP.
We implement a model influence function that fingerprints a data point and compares it to other data points on the network.
The validator calculates a feature vector of the zip file by first getting a deterministic string representation of the file, and converting it to a feature vector. This is the fingerprint of that data point. If a slightly altered file is ran through this same process, it will produce a very similar fingerprint, unlike a hash, which will be vastly different even if 1 bit of the underlying data is changed.
The validator then records this on-chain so other validators are aware of the fingerprints of other data points in the network. They then build a local vector store of all existing data points.
After the fingerprint is calculated, it inserts the fingerprint into the local vector store and checks how similar it is to other fingerprints in the store. If it is too similar, it will reject the data point.
While Proof-of-contribution is different for different DLPs, some ideas outlined here can be applied to other DLPs. By checking authenticity, ownership, quality and uniqueness, the DLP creator can be sure that their data DAO consists of high-quality, meaningful data while preventing attackers who submit low-quality data.
Vana uses a Proof-of-Contribution (PoC) system to validate data submitted to the network. The PoC system functions to ensure the integrity and quality of data within Data Liquidity Pools (DLPs). Everyone's data is different, so to enable data liquidity, data must be mapped to some fungible asset.
Each DLP implements their own proof of contribution function based on their particular dataset. For example, r/datadao measured contributions based on amount of karma, and included an ownership check having users post a code in their reddit profile to confirm ownership. This proof-of-contribution check depends on the goals of the data liquidity pool and the best way to measure data contributions.
The proof-of-contribution function defines success for your data liquidity pool. If you do not want a particular kind of data in your DLP, but it passes or is rewarded by your proof-of-contribution function, then your proof-of-contribution function is not complete.
To validate data submissions, DLP Validators scan through the data transactions and assign a score using the DLP's contribution function. The function takes into account various data characteristics, such as completeness, accuracy, and relevance to the DLP’s purpose.
Each function depends on the constraints imposed by the DLP that receives the data contributions. As such, DLP Validators may impose their own unique functions to incentivize the type and quality of data they collect. This flexibility ensures efficient evaluation of data for each DLP while ensuring that data contributions are accurately evaluated.
Each DLP implements their own proof of contribution function based on their particular dataset. As an example, the handles Proof of Contribution via four categories below.
This method offers an efficient way to check similarity against all other files in the network. If you'd like to use this in your DLP, see for an example.
One recommended implementation for DLP Proof-of-Contribution is to run a , which measures exactly how much new information a given data point teaches the AI model.
To protect the privacy of data contributions, great care has gone in to protecting the user's data. Validators can act as a trusted party and securely run PoC on user data. Read more about how Validators protect data in .
The PoC system supports zero-knowledge proofs. When a Data Contributor or Custodian submits data to the DLP, they generate a zero-knowledge proof that verifies the authenticity and integrity of the data, as well as its contribution to the DLP, without revealing its full contents. Read more about it in .
A zero-knowledge proof (ZKP) is a cryptographic method by which one party (the prover) can prove to another party (the verifier) that they know a value without conveying any information apart from the fact that they know the value. This means the verifier learns nothing about the value itself, only that the prover knows it.
To protect the privacy of data contributions, a DLP can implement a Proof of Contribution using ZKP. When a Data Contributor or Custodian submits data to the DLP, they generate a zero-knowledge proof that verifies the authenticity and integrity of the data and its contribution to the DLP without revealing its full contents.
To illustrate this example, imagine a DLP for ChatGPT data exports. The DLP considers a data point "valid" if the number of conversations inside the zip file exceeds 50. We can generate cryptographic proof that a file meets this requirement without revealing its contents (or even the exact number of conversations in the file).
To protect against tampering with the proof generation while maintaining privacy and ensuring the data doesn't leave the user's browser unencrypted, the proof is generated in a WebAssembly environment, which is much harder to tamper with than generating proofs in the browser in plain JavaScript.
We provide an example here: https://zk-proof-poc.vercel.vana.com/
The source code is available here: https://github.com/vana-com/zk-proof-poc
This is a draft and should not be relied upon as a legal promise or guarantee for future implementations.
Data Liquidity Pools (DLPs) are critical to Vana, as they incentivize and verify data coming into the network. Our core strategy is to gather the highest quality data, build the best AI models, and monetize them, providing a programmable way to work with user-owned data and build the frontiers of decentralized AI.
For two years following the Token Generation Event (TGE), DLP rewards are 17.5% of the Fully Diluted Valuation (FDV), distributed over twenty-four months:
On average, the top 16 DLPs will earn 0.547% of FDV, with top DLPs earning upwards of 1% of FDV. Vana has a slightly deflationary supply of 120 million tokens, similar to Ethereum.
Top DLPs can earn upwards of 1% of FDV
These rewards are designed to incentivize Data Liquidity Contributions similar to early Ethereum miners. We believe that to scale crypto adoption, it's important to invite new participants. Proof of stake networks have the downfall of only making the rich richer. Proof of work and proof of contribution networks are incredibly powerful, as they allow anyone to contribute to the network.
Participating in a data liquidity pool on Vana is like being an early Ethereum miner, and we've structured the rewards similarly.
DLPs control how they distribute rewards to the DLP creator, data token holders, and DLP stakers. This allows new DLPs to break into the top 16 by incentivizing stakers and issuing rewards to their token holders before directly monetizing the data.
Here's an example breakdown for a top DLP earning 1% of FDV in the first year:
The DLP creator chooses the initial rewards split. After launch, the split is decided by DLP token holders, forming a data DAO.
The DLP Rewards apply to the top 16 DLP slots, prioritizing quality over quantity. These DLPs are selected through a staking mechanism where VANA token holders stake their tokens with DLPs they believe will perform well. The top 16 DLPs, ranked by total staked tokens, qualify for rewards, which they share with their stakers.
A detailed breakdown of the DLP Selection process can be found in the DLP Governance section.
Propagators rely on Proof of Stake (PoS) to validate data transactions and maintain network security.
To participate, Propagators must stake a minimum amount of VANA tokens, which are held as collateral to ensure honest behaviour. Propagators are randomly selected to propose and validate blocks, with the selection probability proportional to their staked amount.
This process includes proposing new blocks, having them attested by other propagators, reaching consensus through a supermajority, and ensuring decentralization and fairness. Successful Propagators earn VANA rewards, while those who act maliciously may be penalized.
DLP Rewards | % of FDV | Tokens |
---|---|---|
% of rewards | % of FDV | Tokens | Role in DLP | |
---|---|---|---|---|
Year 1
8.75%
10.5 million
Year 2
8.75%
10.5 million
DLP Creator
40%
0.4%
480,000
Implements proof of contribution and a method for data contributors
DLP Stakers
20%
0.2%
240,000
Puts stake behind the DLP based on its data value to Vana
DLP Token Holders, including Data Contributors and Validators
40%
0.4%
480,000
Includes data contributors, validators, and token purchasers
To interact with the Vana Network, please refer to the instructions on setting up the network, managing hot and cold keys, and setting up a wallet.
In this section, you will find information on:
Instructions how to set up the network
Using the to Block explorer
Obtaining test token from the faucet
In this section, you will find details about:
Key pairings
Cold keys
Hot keys
In this section, you will find instructions on how to:
set up a wallet in the browser
set up a local wallet with CLI
This is a draft and should not be relied upon as a legal promise or guarantee for future implementations.
Data Liquidity Pools (DLPs) transform raw data into valuable onchain assets, playing a crucial role in Vana's architecture. Given their central importance, it is logical for DLPs to be at the heart of community governance, allowing VANA holders to directly influence key decisions.
Combined with the DLP Rewards, DLP Governance ensures the sustainability of the ecosystem by aligning governance with contributions that support the network's growth.
The DLP Rewards apply to the top 16 DLP slots, prioritizing quality over quantity. These DLPs are selected through a staking mechanism where VANA token holders stake their tokens with DLPs they believe will perform well. The top 16 DLPs, ranked by total staked tokens, qualify for rewards, which they share with their stakers.
This competitive system ensures only the highest quality data feeds into Vana. Each DLP must demonstrate a robust "proof of contribution" system, validating the value of their collected data. This focus on data quality is crucial for Vana's long-term goal of building a user-owned AI model capable of outperforming advanced systems like GPT-6. Leading DLPs are set out in the DLP Leaderboard.
DLP Staking in the Vana network allows users to stake their VANA tokens in support of their preferred DLPs, directly influencing which DLPs are eligible for block rewards.
Regular periodic snapshots are taken to ensure that the top 16 DLPs, determined by the amount of VANA staked, continue to reflect the community's choice.
This system ensures that DLPs that add value to those who are most invested in network growth (and therefore largest VANA holders) avail of DLP rewards. It also provides clear incentives for DLPs to continuously create value. At the same time, it fosters competition to drive high quality data by allowing new entrants, with community support, to overtake incumbents.
The parameter of the top 16 slots earning rewards is governable, and if the community votes to increase slots, it can be expanded to allow more DLPs to earn rewards for onboarding data onto the network. Vana is a permissionless network, so any new DLP can join the network, but only the top 16 DLPs by stake earn rewards.
The initial DLP slots at launch are allocated based on performance on Satori Testnet and the following criteria:
Total Transaction Fees (Gas Costs) created by the DLP
Total Number of Unique wallets with Verified Data Uploads
Amount Staked in the DLP
DLPs are encouraged to publicly launch earlier to ensure that they can start driving these metrics sooner.
DLPs are selected and maintained by a system of governance that aligns with Vana’s vision to prioritize community participation.
To incentivize competition and ensure high-quality data within the DLPs, Data Liquidity Rewards are distributed based on weighted, normalized performance across the below metrics. Note that DLP Revenue and Trading Volume are currently set to 0% but will be adjusted as the Network continues to grow.
Total Transaction Fees (Gas Costs) created by the DLP (TFC) - 20%
Total Number of Unique wallets with Verified Data Uploads (VDU) - 20%
Amount Staked (AST) - 60%
DLP Revenue in $VANA (REV) - 0%
2 week avg DLP Trading Volume (DTV) - 0%
The score 𝑆𝑖 for each Data Liquidity Pool (DLP) 𝑖 is calculated by summing the weighted contributions of three metrics: TFC, VDU, and AST. Each metric is normalized by dividing it by the total of that metric across all DLPs, and then multiplied by its respective weight to reflect its importance as follows:
The Vana network is a fully EVM-compatible L1 designed for private data. Currently in testnet, it is a testing ground for data liquidity pool creators and data contributors.
Add a new network with these details within your preferred wallet (e.g. MetaMask). The video below demonstrates how to do this.
Moksha is our current testnet, running on Proof-of-Stake consensus. This is the recommended testnet for active development.
Satori is our first testnet, running on Proof-of-Authority consensus. While we don't recommend this testnet for active development, it is available for historical purposes.
A quick video on setting up the Satori Testnet and perform a simple data transaction.
You will also find a button on the bottom left in the block explorer to add the network with one click to your MetaMask. If using mobile, you will need to use the MetaMask browser to see this button.
A block explorer is a web-based tool that allows users to view details about blocks, transactions, addresses, and other activities on a blockchain network. Block explorers provide a user-friendly interface for accessing blockchain data, making it easier to track transactions, monitor network activity, and verify information without needing to run a full node.
To begin testing on Satori, use the faucet to send free tokens to your wallet so you can register a DLP or a validator node. The faucet is rate-limited, but if you need more tokens to test, send your wallet address in discord and the community will help you out.
Our testnet is under active development. You may encounter bugs, performance issues, and other disruptions. Data on the testnet may be wiped without warning. We appreciate your feedback and contributions to help to resolve any issues. Please join our for the most up-to-date information.
Please make sure to read carefully the before you participate in any activities on Satori Testnet.
Moksha Testnet:
Satori Testnet:
Available at:
RPC URL | https://rpc.moksha.vana.org |
Chain ID | 14800 |
Network name | Vana Moksha Testnet |
Currency | VANA |
Block Explorer | https://moksha.vanascan.io |
RPC URL | https://rpc.satori.vana.org |
Chain ID | 14801 |
Network name | Vana Satori Testnet |
Currency | VANA |
Block Explorer | https://satori.vanascan.io |
This is an overview of how to create a Vana wallet, and associated keys. A Vana wallet holds the core ownership of assets on the Vana network, acting as the identity for all operations.
The Vana network is EVM-compatible and supports Ethereum-compatible addresses. Any wallet that supports EVM chains can be used to create a wallet that can send and receive VANA, including hardware wallets. Some recommended wallet applications are MetaMask, Rabby, and Trust Wallet.
Network operators like DLP validators can use the CLI tool that comes with the Vana framework to manage their wallets.
The Vana framework supports wallets that each contain:
A coldkey: an address representing the owner of a service running in the network.
A hotkey: an address representing the service running in the network.
Coldkeys are secure keys stored encrypted offline, used for critical or infrequent transactions. A hotkey allows a validator to call a DLP smart contract and must be loaded into the live service environment.
Each of these is a pair of separate cryptographic keys. A coldkey has a private key and a public key, as does a hotkey.
The coldkey is synonymous with the wallet name. For example, the --wallet.name
option in a vanacli
command accepts the coldkey as its value, while --wallet.hotkey
accepts the hotkey. One coldkey can have multiple hotkeys.
Storage: Holds VANA.
Delegation: For delegating and undelegating VANA.
DLP Creation: Used for creating a DLP.
Security: Provides the highest level of security; encrypted at rest.
You can create multiple hotkeys paired with a single coldkey. In a DLP, you are identified by your hotkey, keeping your coldkey secure. The same hotkey cannot be used for two nodes in the same DLP but can be used in different DLPs.
Transactions: Signing transactions.
Operations: Registering and running DLP nodes.
Delegation: VANA holders can delegate their VANA to a validator’s hotkey.
Security: Less secure, generally unencrypted, used for regular operational tasks.
Create a local wallet using the vanacli
command line tool on your computer, so it can be used to create or participate in a DLP.
Keep your mnemonic safe
When a wallet is created, a mnemonic is created that can be used to recover your wallet. Anyone who knows the mnemonic for your wallet account can access your VANA tokens. Hence you must always keep this mnemonic in a safe and secure place, known only to you. More important, if you lose your wallet address, you can use its mnemonic (that you stored away in safekeeping) to restore the wallet.
To create a wallet using the CLI:
Clone the vana-framework repository and follow the steps in Getting Started to install the CLI tool. Use the wallet create
command to start the process of creating a wallet.
You will be prompted to enter the wallet name (aka coldkey name), hotkey name, and a password to encrypt your wallet with.
Local wallets are stored on your machine under ~/.vana/wallets
.
The Vana network is EVM-compatible and supports Ethereum-compatible addresses. A Vana wallet holds the core ownership of assets on the Vana network, acting as the identity for all operations. Your wallet is also used to derive different encryption keys to secure your data.
Network participants like DLP validators can use the CLI tool that comes with the Vana cli to manage their wallets.
This guide explains how to work with Vana wallet keys. For instructions on creating a Vana wallet, see Creating a Wallet.
A Vana wallet consists of a coldkey and a hotkey, used for different operations in the Vana ecosystem.
Each key is a pair of separate cryptographic keys. A coldkey has a private key and a public key, as does a hotkey.
The coldkey is synonymous with the wallet name. For example, the --wallet.name
option in a vanacli
command accepts the coldkey as its value, while --wallet.hotkey
accepts the hotkey. One coldkey can have multiple hotkeys.
Storage: Holds VANA.
Delegation: For delegating and undelegating VANA.
DLP Creation: Used for creating a DLP.
Security: Provides the highest level of security; always encrypted.
You can create multiple hotkeys paired with a single coldkey. In a DLP, you are identified by your hotkey, keeping your coldkey secure. The same hotkey cannot be used for two nodes in the same DLP but can be used in different DLPs.
Transactions: Signing transactions.
Operations: Registering and running DLP nodes.
Delegation: VANA holders can delegate their VANA to a validator’s hotkey.
Coldkey: Highly secure and always encrypted, used for storing and managing VANA securely.
Hotkey: Less secure, generally unencrypted, used for regular operational tasks.
The Vana network relies on several key smart contracts to facilitate data liquidity.
The data registry contract functions as a central repository for managing all data within the network, functioning as a comprehensive file catalog. It allows users to add new files to the system, with each file receiving a unique identifier for future reference.
The contract manages access control for these files, enabling file owners to grant specific addresses permission to access their files. It also handles the storage of file metadata, including any offchain proofs or attestations related to file validation, which can include various metrics such as authenticity, ownership, and quality scores. Users can retrieve detailed information about any file in the registry using its unique identifier, including its permissions and associated proofs.
The TEE Pool contract manages and coordinates the TEE Validators and serves as an escrow for holding fees associated with validation tasks. Users pay a fee to submit data for validation, and the contract ensures that the validators process the data and provide proof of validation. The contract also allows the owner to add or remove validators, and it securely holds and disburses the fees related to these validation services.
The DLP Root contract manages the registration and reward distribution for Data Liquidity Pools (DLPs) in the Vana ecosystem. It operates on an epoch-based system, where the top 16 most staked DLPs and their stakers receive rewards at the end of each epoch. The contract allows users to stake VANA tokens as guarantors for DLPs, with rewards distributed based on the staking position at the beginning of each epoch.
To prevent exploitation, the contract implements a minimum staking period and requires stakers to claim their rewards manually. DLP owners can set custom reward percentages to attract more stakers, potentially securing a position in the top 16. The system also allows for multi-DLP staking and requires an initial minimum stake from DLP owners for registration.
The Vana network utilizes independent Trusted Execution Environment (TEE) validators to perform data validation requests for DLPs, simplifying their architecture by eliminating the need for DLPs to operate their own validator nodes. We refer to these highly confidential compute nodes as the Satya Validators.
This setup ensures that neither the node operators nor any other parties can access the interior of the processor or the data being processed. The TEE environment guarantees that deployed programs within it are immutable and private, maintaining operational integrity and preventing unauthorized changes while processing user data.
Verify user data by running Proof-of-Contribution docker images to assess data legitimacy and value.
Attest to the validity of the data and write the attestation back on-chain.
Claim rewards for completing a validation job.
Satya Validators are a work in progress. Code samples and technical documentation will be added soon.
Creating a DLP is the best way for highly motivated builders to get involved and rewarded. It takes a few hours to deploy a template, and 1-2 days to modify the template for your chosen data source.
The Data Liquidity Layer is where data is contributed, validated, and recorded to the network into data liquidity pools (DLPs). Here, DLP creators deploy DLP smart contracts with specific data contribution objectives, including the DLP’s purpose, validation method, and data contribution parameters.
The top DLPs earn data liquidity pool rewards, so DLP slots are competitive. There are 16 slots available on mainnet. This limit is intended to incentivize quality over quantity.
A data liquidity pool consists of the following components:
An optional UI for data contributors to add data to a liquidity pool.
Validators use of the vana
framework to interact with all the components of the Vana network. A starting point can be found here:
Data value and validation depends on the data source. DLPs have the flexibility to validate data according to their unique requirements. Regardless of the specific verification methods employed, a DLP should strive to ensure that each data point passes the following criteria, which is implemented through a proof-of-contribution function:
Meaningfulness: The data should contribute novel and valuable insights to the DLP's overall dataset. It should enhance the richness and diversity of the available information.
Validity: The data must adhere to a schema that is consistent with the existing data within the DLP. Conforming to a standardized structure ensures compatibility and facilitates efficient data processing.
Authenticity and Contributor's rights: The data must be genuine and, if required, originate from a legitimate account. This ensures the integrity and reliability of the information. The data contributor submitting the data must have the necessary rights and permissions to do so.
This proof-of-contribution function checks data validity and maps the diverse data submissions to a score that can be used to reward fungible DLP-specific tokens.
A DLP smart contract is responsible for rewarding data contributors DLP-specific tokens for their data. After proof-of-contribution is run, a score from 0-1 is given to the file, which can be used to determine how valuable the data is, and convert that to DLP-specific tokens (example: $TDAT for Twitter data).
The template below can be used, swapping out the DLP token to something specific to your DLP (example: $TDAT for Twitter data).
Some DLPs use a UI for data contributors to upload their data to a DLP. It's up to the DLP to decide what is best for the DLP's contributors to be able to write data transactions onchain. A starting point for a UI, which implements client-side encryption, is here:
Other DLPs are using Chrome extensions or scrapers for users to contribute data.
Now that you're familiar with the components of a DLP, here's a conceptual step-by-step guide to launch a DLP of your own.
Choose a valuable data source.
Current community requests: stack overflow, telegram, youtube, email, google drive
Tell users how to access that data source.
For example, through a GDPR or CCPA data export request, through scraping their own data, or through an API.
Optionally, build an UI that makes it easy for data contributors to add their data via your smart contract.
Build a UI, CLI tool, browser extension, mobile app, etc to allow data contributors to upload data to the network.
Data validation and value depends on the data source.
Once decided, implement your incentives and validation checks.
We recommend rolling this out iteratively in phases to test the incentives.
Deploy your DLP smart contract and DLP Token contract.
Congrats, you're up and running!
To qualify for block rewards, get voted into the top 16 DLP slots based on data value and participation.
Vana Testnet is for Testing Purposes Only
Ensure you create a different configuration for data validation on testnet versus mainnet. Data privacy cannot be guaranteed in testnet.
Once a DLP Smart Contract is deployed to the Vana network, reach out to us and we will register this contract as a DLP. We will be manually approving DLP registration at the beginning of the testnet before moving to voting-based approval.
Each Satya validator node is a sealed, encrypted virtual machine running on an independently operated server with a bare metal install of , an advanced hardware-level isolation technology.
A detailed data flow diagram of how TEE Validators operate can be .
To register your DLP or provide updates to be added to the leaderboard, submit .
To see other DLPs, view the
Join our server to present your DLP and chat with other devs creating DLPs
A DLP smart contract that issues rewards and communicates back to the .
A proof-of-contribution function to measure data quality. This can be run by the .
A validator runs a proof-of-contribution function that measures data quality. There are two kinds of Validators that DLPs can use: (recommended) and .
Vana framework with core components:
See the ChatGPT DLP for a
If using Satya validators, use this as a starting point.
The Smart Contract repo contains a , that is an ERC20 token with additional features such as minting permissions, an admin role, and a blocklist for addresses. It leverages OpenZeppelin's ERC20, ERC20Permit, ERC20Votes, and Ownable modules for extended functionality.
Generic UI for uploading data to DLPs: gpt data dao: reddit data dao:
Follow the for a step-by-step guide on creating a DLP
To see what others are building, visit the
Implement a to validate and value data.
Register your DLP with the . This requires meeting a minimum staking threshold.
A DLP Smart contract will be able to accept new nodes into the DLP. To register a new node, create a wallet for it and call the registration function in the DLP smart contract. More information can be found in .
In the Vana network, data is stored encrypted off-chain in a storage solution of the DLP's choice, providing flexibility and control over their data. This approach allows data contributors to utilize familiar platforms such as Dropbox, Google Drive, or decentralized options like IPFS.
Vana's system only requires two key pieces of information: a URL pointing to the data's location and an optional identifier that changes when the data is modified (e.g., an ETAG or last modified date). This ensures data at a particular location has not changed since it was uploaded there.
By keeping data off-chain but accessible through these identifiers, Vana maintains a balance between data privacy, user control, and cost efficiency.
To make data discoverable in the Vana network, it must be written onchain using the Data Registry contract. The data contributor first uploads an encrypted file to a storage provider of their choice, then writes a pointer to that file (the URL) and an optional content integrity hash to the registry.
To add data to the Vana Network:
Generate a signature, the encryption_key
, by asking the data contributor to sign a message, the encryption_seed
Symmetrically encrypt the data using the encryption_key
. Code samples available in Data Privacy.
Upload the encrypted data to a location of your choice. This can be a Web2 storage solution like Google Cloud, Dropbox, etc, or a Web3 solution like IPFS
Get the storage URL of the uploaded file
Add file to the data registry contract: addFile(encrypted_data_url)
The data registry returns a file_id
, which can be used later to look up the file
DLPs may choose to rely on a network of DLP Validators to run their DLP's proof-of-contribution (PoC). After running PoC, these validators form a consensus with each other and write the proof-of-contribution assessment back on-chain. In this model, DLPs are responsible for deploying and maintaining their validators. DLP Validators earn DLP token rewards for accurate and consistent data evaluations.
Verify user data within DLPs according to standards set by DLP owners.
Use Vana’s Proof-of-Contribution system to assess data legitimacy and value.
Attest to the validity of the data and write the attestation back on-chain
Participate in the Nagoya Consensus to ensure consistent and accurate data scoring.
Perform accurate and consistent data evaluations and disincentivize bad actors through slashing.
Evaluate the performance of other validators to maintain network integrity.
Back evaluations with personal stake to ensure accuracy and reliability.
Respond to queries from data consumers, including decrypting data and validating results.
Each DLP owner is responsible for deploying a smart contract specific to the DLP's needs. We provide contract templates that offer a starting point for registering DLP validators, recording and verifying data transactions written on-chain, and validators reaching consensus through Nagoya consensus. We also provide a template implementation for the corresponding validators.
The provided templates include:
A smart contract: https://github.com/vana-com/vana-dlp-smart-contracts
A sample validator node (transacts with contract): https://github.com/vana-com/vana-dlp-hotdog
The Vana framework (used by validators): https://github.com/vana-com/vana-framework
The Vana Framework is a library designed to streamline the process of building a DLP.
This object encapsulates interactions with the blockchain.
The state contains information about the current state of the DLP, including nodes in the network (and how they can be reached), scores of the nodes, when it was last synced, current block number, etc.
The Vana framework provides a node abstraction that simplifies the creation and management of a peer-to-peer network that operates the DLP.
A node is a network participant responsible for validating, querying, scoring, or performing any arbitrary task necessary for the DLP to perform proof-of-contribution. A node can be a validator tasked with ensuring a data point belongs to the data contributor and is not fraudulent. A node can also be a miner responsible for aggregating data points to respond to a data query. A DLP is responsible for defining who the DLP participants are, and how they're incentivized for good behavior and penalized for bad.
Nodes can communicate with each other by encapsulating information in a Message object, and sending that object back and forth using a client-server relationship over HTTP.
A NodeClient is responsible for building the inputs of a Message object, and sending it to one or more NodeServers.
The NodeServer runs a FastAPI server that is responsible for responding to API requests sent from a NodeClient. It will perform a task, then fill the outputs of the Message object and send it back to the NodeClient that requested it.
The Messages object is sent back and forth between nodes, providing a vehicle for communication between nodes. It wraps the inputs and outputs of a communication exchange sent between nodes.
Writer's note: We had to design a new consensus mechanism to handle the fuzziness of data contributions. For example, if I believe your data deserves a score of 100, and another validator believes your data deserves a score of 102, we could both be pretty much right. Neither of us as validators are acting maliciously or incorrectly. But generally crypto consensus mechanisms are designed for exact consensus only. Bittensor proposed an early version of this fuzzy consensus, which we have modified to work for private data and proof of contribution.
To reach a state of agreement on data contributions and disincentivize malicious validators, the Proof-of-Contribution system employs Nagoya Consensus. In Nagoya Consensus, each DLP Validator expresses their perspective on the quality and value of data contributions as a rating. Validators then use their rating to score other validators through a set of ratings weighted by stake.
Nagoya Consensus rewards validators for producing data contribution scores that are in agreement with the evaluations of other validators. This disincentivizes divergence from the consensus majority while incentivizing validators to converge on honest assessments of data contribution.
By requiring validators to put stake behind their evaluations and rewarding convergence weighted by stake, Nagoya Consensus makes it economically unfavorable for even a significant minority of validators to collude and manipulate the state of the DLP. As long as an honest majority of stake-weighted validators participate, the system can come to consensus on data contribution scores that accurately reflect the quality and value of data in the DLP.
Validate user data for specific Data Liquidity Pools
Start to validate user data for specific Data Liquidity Pools.
Please join our Discord to get an overview of all existing DLPs and how to access them.
If you need to meet specific minimum staking requirements for a DLP, please reach out to the community in our #testnet-help channel for support as well.
You can run a validator on your own hardware or on a cloud provider like GCP and AWS, ensuring the quality of data in the pool and earning rewards accordingly.
Minimum hardware requirements: 1 CPU, 8GB RAM, 10GB free disk space
See example integration of a Validator here.
The following outlines the high level steps to run a DLP validator.
Choose the DLP you'd like to run a validator for.
You can run validators in multiple DLPs
Register as a validator through the DLP via its smart contract.
You must meet the minimum staking requirements for the DLP
Wait for your registration request to be approved by the DLP.
Run the validator node specific to the DLP. Confirm that your validator is running correctly. Your logs should look something like this, which will vary by DLP:
See DLP-specific instructions for running a validator node
Congratulations, your validator is up and running! You can keep track of your stats and trust score by looking onchain.
Once a DLP validator has been built, we are ready to deploy it. DLP nodes form a peer-to-peer network, and can be deployed to any infrastructure of your choice (ex, AWS, Google Cloud Platform, Azure, etc). Ideally, nodes have a static IP to more easily communicate with each other. Each node must have a wallet used to register the node to the DLP. Once the node is running and registered, they can begin serving proof-of-contribution requests within the DLP.
Clone the DLP smart contract from a template, modify it for the needs of your specific DLP, and deploy it to the network following the readme.
The Data Liquidity Pool smart contract is designed to manage the registration of validators. This contract ensures that validators can participate in maintaining the network's integrity and security while earning rewards for their contributions. To register a new node, create a wallet for it and call the registration function in the DLP smart contract.
Validators earn rewards for validating uploaded files. For a given data validation request, each validator scores data based on metrics that are relevant to the data type. The scores are aggregated and written onchain, but how does the DLP decide how to reward its validators?
Every 1800 blocks (~3 hours), a DLP epoch concludes and the DLP contract sends a chunk of rewards to its validators. The precise amount of rewards a given validator receives depends is determined by the Nagoya consensus process.
In Nagoya consensus, each validator submits a score for every other validator to the DLP smart contract. Validators score each other based on the quality of their assessments and their operational performance. For instance, if a validator is perceived as wrongly giving an uploaded file that appears fraudulent or low quality a high score, it may receive low scores from other validators.
Somewhat more formally: a validator peer's emissions amount is based on calculations for their rank and consensus. It is calculated by multiplying these two scores, integrating both the peer's individual valuation by the network (rank) and the collective agreement on that valuation (consensus). This multiplication ensures that emissions are allocated in a manner that considers both the quality of contributions and the degree of communal support for those contributions.
When an epoch concludes, the consensus process converts the most recent validator scores into a distribution for that epoch's emissions, and the rewards are distributed to the validators accordingly. Low-ranking validators quickly realize fewer rewards for their contributions, ensuring that an honest majority of validators is able to out-earn dishonest actors and therefore uphold the integrity of the DLP over time.
To run PoC in the Satya Network, a DLP builder must implement a simple proof-of-contribution function using this template.
The diagram below explains how this PoC template works.
They request a validation job, paying a small fee. Once a Satya node is available to run the job, they connect directly to the node, and send them the encryption key and the proof-of-contribution docker image that needs to run on the data to validate it.
The Satya node receives the key, and downloads the encrypted file, and decrypts it
The Satya node places the decrypted file in a temporary, shielded* location. The node operator cannot see the contents of this location.
The Satya node downloads and initializes a docker container to run the specified proof-of-contribution, and mounts the input and output volumes. The PoC container will have access to the decrypted file.
The Satya node reads the output, and generates the proof.
The Satya node writes the proof onchain, and claims the fee as a reward for completing that work.
More information on integrating with the Satya network for data validation is coming soon.
Data attestations live offchain, and a URL to a data's attestation is written onchain alongside the data itself.
The attestation of a data point must follow a spec. Attestations show relevant information about how the data was evaluated, proof-of-contribution scores, integrity checksums, and custom metadata relevant to a specific DLP.
An example of when this would be useful: consider a ChatGPT DLP that accepts GDPR exports from chatgpt.com. Say the DLP considers the export to be high quality when the number of conversations in the export exceeds 10. This DLP can insert numberOfConversations: xxx
in the attestation when Proof of Contribution is run, and anyone can see how valuable that encrypted data point is.
signed_fields
Contains the main data fields that are signed by the prover.
subject
Information about the datapoint being attested for.
url
URL where the encrypted file lives.
owner_address
Wallet address of the file owner.
decrypted_file_checksum
Checksum of the decrypted file for integrity verification.
encrypted_file_checksum
Checksum of the encrypted file for integrity verification.
encryption_seed
The message that was signed by the owner to retrieve the encryption key.
prover
Information about the prover.
type
Type of the prover, satya
is one of the confidential TEE nodes in the Satya network. Proofs can also be self-signed
where the data owner generates the proof.
address
Wallet address of the prover.
url
URL or address where the prover service is hosted.
proof
Details about the generated proof.
image_url
Docker image URL of where the instructions to generate the proof is downloaded from
created_at
Timestamp of when the proof was created.
duration
Duration of the proof generation process, in seconds.
valid
Boolean indicating if the subject is valid.
score
Overall score of the subject, from 0-1.
authenticity
Authenticity score of the subject, from 0-1.
ownership
Ownership score of the subject, from 0-1.
quality
Quality score of the subject, from 0-1.
uniqueness
Uniqueness score of the subject, from 0-1.
attributes
Additional key/value pairs that will be available on the public proof. These can be used to quickly view properties about the encrypted subject.
signature
Generated by the prover signing a stringified representation of signed_fields
, sorted by the key name. To verify it, we can take the signature and the striningifed representation, and extract the address that signed it, which should match the prover.address
.
The Vana network strives to ensure personal data remains private and is only shared with trusted parties.
Vana uses a patented non-custodial encryption technique to encrypt personal data. Data does not leave the user's browser unencrypted. A user's file is symmetrically encrypted with their encryption key, and if the encryption key is shared with another party, the key is encrypted with that party's public key so only the intended recipient can decrypt the key and the data.
The steps are as follows:
The user uploads a decrypted file (F).
They are prompted to sign a fixed message (the encryption seed) with their wallets, creating a unique signature that can only be recreated by signing that same message using that same wallet.
The generated signature is used as the encryption key (EK) to encrypt the file F using a symmetric encryption technique, creating an encrypted file EF.
The encryption key EK is then encrypted with the trusted party's public key, making an encrypted encryption key (EEK).
The encrypted file EF and encrypted encryption key EEK can be safely shared with the intended receipient.
Once the data has been encrypted, it can be decrypted by either a dApp or trusted party.
The dApp prompts the user to sign the same fixed message, to retreive the same EK as above
The EK is used to decrypt the encrypted file EF and retreive F
The trusted party receives the encrypted file EF and the encrypted encryption key EEK
They decrypt the EEK using their private key, to retreive the encryption key EK
The EK is used to decrypt the encrypted file EF and retreive F
Data DAOs can use the encryption technique described above to efficiently encrypt large files (up to several gigabytes in size) in the browser.
Vana uses a system to validate data submitted to the network. "Valid" means something different in each DLP, because different DLPs value data differently.
The recommended way of validating data securely in the Vana Network is by using the Satya Network, a group of highly confidential nodes that run on special hardware. At a high level, the data contributor adds unverified data, and requests a proof-of-contribution job from the (and pay a small fee to have their data validated). Once validated, the Satya validator will write the proof on chain.
PoC Template:
The data contributor adds their encrypted data onchain, via the .
The PoC container runs its validations on the decrypted data, and outputs the attestation. More information on data attestation can be found here: .
* A is a specialized type of container that leverages the Gramine library OS to run applications in a secure, isolated environment, typically utilizing hardware-based trusted execution environments (TEEs) like Intel SGX.
Anyone can submit data to the Vana network. However, for data to be considered valid by a DLP, it must be attested for by a trusted party, i.e. in a DLP. These trusted parties issue an attestation to prove that this data is, in fact, authentic, high-quality, unique, and has whatever other properties DLPs value in its data contributions.
dlp_id
DLP ID from the , this is used to tie the proof to a DLP.
If the encryption key EK needs to be shared with a trusted party, it can be encrypted with their public key. To generate a new public/private keypair, see the instructions .
The trusted party can now receive and decrypt EEK, resulting in EK which can then be used to decrypt the user file F. A Python code example of this is shown .
Definitions of all relevant terms within the Vana ecosystem
Definitions of all relevant terms within the Vana ecosystem.
A network overlay validating real-time data transactions within Vana’s ecosystem, ensuring cross-DLP data access and interoperability with other networks.
$DAT or Data Autonomy Token is now $VANA, the native token of the Vana ecosystem.
An entity that queries DLPs to access high-quality datasets for purposes such as AI model training, paying with pool tokens that are burned upon transaction.
A user who submits data to DLPs, adhering to set standards, and receives rewards for valuable data contributions.
An entity that hosts and facilitates data contributions, ensuring compliance with regulations and providing encryption to protect data privacy.
The layer where data is contributed, validated, and recorded into data liquidity pools (DLPs), facilitating data transactions.
An aggregation of similar data assets within the Vana network, operating as a peer-to-peer network with validators ensuring data integrity and quality.
Submissions to the Vana network including metadata and encrypted storage locations, ensuring contributors retain control over their data.
Contracts deployed within DLPs that handle the validation and consensus process for data contributions.
A function measuring the impact of a data contribution on a machine learning model's performance, used within the Proof of Contribution system.
A mechanism incentivizing validators to agree on data quality and value within a DLP, rewarding those whose evaluations align with others.
A feature allowing users to move their data and models across applications without interference, ensuring privacy and censorship resistance.
A user-operated server storing private data off-chain, enabling secure local training of models without exposing raw data.
A system validating the integrity and quality of data within DLPs using validators who evaluate and score data contributions.
A network participant is responsible for writing data transactions to the blockchain, maintaining network consensus, and earning fees for their work.
The governance layer overseeing DLPs, distributing rewards, and ensuring network compliance.
The current information about the DLP, including node status, scores, and block numbers.
A collaborative model governed by data-contributing users, where each user trains part of the model on their personal server and is rewarded based on their contribution.
A participant who verifies data authenticity and quality within a DLP, rewarded for accurate evaluations and governed by the Nagoya Consensus mechanism.
A decentralized data liquidity network enabling users to own and monetize their data through a trustless open data economy.
The native token of the Vana Network is called $VANA. The token is used to pay for transaction fees (so-called gas fees) and will be generated by creating new blocks.
$VANA is not available on the Satori Testnet. Instead, Vana Points are used to fuel network activity. VANA POINTS ON TESTNET HOLD NO VALUE, ARE NOT CONVERTIBLE TO $VANA, AND ARE NOT INDICATIVE OF A FUTURE AIRDROP.
Existing crypto infrastructure has enabled digital ownership for assets like art and collectibles, but private data poses unique challenges:
Data is non-excludable and can be copied once made public (the "data double spend problem").
Data is non-fungible but must be aggregated to be valuable as AI training data.
Vana addresses these challenges through:
Non-custodial data allows users to use data in an application or to train an AI model while keeping it in their full control
Proof-of-contribution allows groups of users to pool their data while ensuring everyone is rewarded fairly. This enables data liquidity.
Data ownership unlocks better AI by allowing access to new training datasets. Proper attribution and incentives enable frontier AI models collectively owned and governed by contributors with datasets that would otherwise be I walled gardensn. Data portability levels the playing field by allowing builders to access cross-platform data.
Previous data ownership projects have taken too ideological an approach and remained academic or esoteric. Vana believes in a pragmatic, full-stack approach to building infrastructure that gets real adoption. We started with a data portability API, helped build viral user-owned data apps that onboarded over 1M users, and created the infrastructure for the world's first Data DAO was created, attracting over 140k participants in under a week.
A data DAO is a specific form of a data liquidity pool that uses a dataset-specific token for governance. Some data liquidity pools may not create a new token and instead use a stablecoin or existing token for payments to data contributors.
Earning a slot on Vana's mainnet for DLPs is designed to be competitive. Factors affecting competitiveness include data quality, community support, and testnet performance metrics. DLPs are elected by native token holders on mainnet.
Building a DLP can be rewarding, with benefits such as block rewards, priority access to mainnet slots, fundraising support, and the ability to bootstrap AI projects using token incentives.
Yes, you can develop and test your DLP on the testnet without a live token before moving to token integration and mainnet deployment.
DLPs will be evaluated based on performance metrics like total transactions, transaction fees, verified data uploads and unique wallet interactions. On mainnet, it's up to the native token holders to vote.
Vana is an L1 to ensure users maintain data privacy and control. Existing L2 sequencer approaches are too centralized and would be subject to data regulations that would not be possible to fulfill in a blockchain context.
Vana is the first decentralized network designed for private data. Data is encrypted with user-controlled keys, and access is granted to run operations in a secure environment, including from the user's device.
AI researchers can train models on data without seeing the underlying data through a secure compute environment and, eventually, distributed training. This "renting data" prevents data copying while keeping data under the owner's control.