Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Data Portability Layer or Application Layer is an open data playground for Data Contributors and developers to collaborate and build applications with the data liquidity amassed by DLPs. As the Data Liquidity Layer verifiably brings data onchain, the Application Layer provides the infrastructure for the distributed training of user-owned foundation models and the development of novel AI dApps.
The Application Layer functions as an active data hub, where online communities can collaborate with developers to create real economic value from their data. This fosters an interactive data creation ecosystem, where data contributors benefit from the downstream network effects and value emerging from the intelligence that their data helps to create.
Vana makes data non-custodial by running operations in a personal server
Vana uses a personal server architecture to allow users and data collectives to store private information off-chain and make it non-custodial. You may be familiar with this architecture through Solid Project. The basic idea is that you have some off-chain compute environment where you can execute code that works with private data. You can think of this as a safe little home to execute code that requires your private data, and follows the data permissions rules you have laid out.
There is no expectation that every user runs their own personal server - it is simply a secure environment where you can work with private data. It can be secure because it's on your macbook, or secure because you trust the third party provider, similar to Infura for ethereum. You can also just use someone's browser as a lightweight personal server, generating proofs or encrypting data client-side so it is fully secured as soon as it leaves their device.
Personal servers exist for users and for collectives. For example, I have a personal server that runs on my Macbook. It has all of my private platform data - my messages, my journal entries, my emails, my writing - and allows me to run personalized inference on the macbook and return the output to a third party application without the data leaving my machine.
Personal servers follow a set of rules set out in a data permissions smart contract. As a simple example, here is a data permissions contract with a whitelist.
Vana supports a non-custodial architecture in which users and data collectives can store private information off-chain. Personal servers allow users to store personal data and train models on their data in a secure local environments. Urbit and Solid Project were the first to pioneer similar architectures. Vana uses it specifically for private, personal data storage. To run your personal server, get started here.
Today, we offer two options for a personal server:
Self-host on a Macbook (M1 chip or better required)
Hosted application in partnership with Replicate
The code to self-host your personal server is provided on our GitHub. Alternatively, you can run a pre-built executable also found there.
Bring your data and models across applications
With Vana, you can bring your own models and private data across applications. You can also use any models or data on collective servers that you have access to.
For example, here is the OAuth screen for Chirper AI. By logging in with Vana, you can bring your models and data to the application. If you prefer, you can authenticate using a crypto wallet (experimental). You choose the level of access to grant the application. You can grant access to only inference of your personal model, keeping the underlying data private, or you can grant full access to allow the app to have a copy of your data.
If you are a developer who would like to add support for Vana in your application, please see the API Integration docs for detailed steps.
Store private data off-chain while permissioning it onchain
How can we ensure private data is stored off-chain while managing permissions onchain?
Depending on a DLPs tokenomics, they may want to hold user data in escrow. Other DLPs may use liveness checks and rewards to ensure that user data remains connected, even without holding an encrypted copy of it.
For DLPs and other user-owned data apps that require access to a collective dataset or model weights, Vana uses a secured compute node with asymmetric encryption. Users contribute their data by encrypting it client-side with the node's public key. It is decrypted with the corresponding private key held securely on the node.
The trusted compute node can only run code that is approved by the collective. With a community-owned dataset, for example, for a company to pay for access to train on the data, the company (or someone on their behalf) puts up a pull request with the code that they need to run on the secure node. This could include code to transmit the data to them, or code to train the model. Note the trusted compute node does not have access to a GPU, so to train larger models, the data requester must setup a secure compute node that that its own private key, then request that an encrypted copy is sent to that heavy compute node. Then, the data requester submits a proposal to the DAO describing what they would like to do.
Our current approach is a step towards decentralization, but still relies on trust in the compute node and code approval process. If the node operator, which at this point is only Vana, were to deploy malicious code to the server, or otherwise compromise the private key that can decrypt the data, then the node operator would be able to access the underlying data.
We have partnered with Replicate to offer a hosted version of the personal server. It contains a personal vector database with your private information, an audio model trained on your voice, and an image model trained on your photos. You can create it through our consumer-friendly application at . Note that there is a paywall to cover compute costs.
If you are a developer, you can use the development version at , and bypass the paywall using the test card number 4242 4242 4242 4242
, choosing any future date for the expiry, and using any three digits for the CVC code. The development environment will be slower as there are fewer GPUs assigned.
If the DAO approves the proposal (and code), the pull request is merged and deployed to the node. is the code running on the node. However, this still requires trusting that the node operator(s) will adhere to the approved code and not introduce any vulnerabilities.
We are eager to add more secure and decentralized options as fully homomorphic encryption, distributed training, and other privacy-preserving technologies mature. We looked at implementing ways to reduce the trust required by splitting model weights or splitting the dataset across many machines in a way that preserves privacy, but have not found a satisfactory solution, so rely on a trusted compute node today. For example, is doing great work letting many users load a small part of the model for distributed inference.