The Vana network strives to ensure personal data remains private and is only shared with trusted parties.
Vana uses a patented non-custodial encryption technique to encrypt personal data. Data does not leave the user's browser unencrypted. A user's file is symmetrically encrypted with their encryption key, and if the encryption key is shared with another party, the key is encrypted with that party's public key so only the intended recipient can decrypt the key and the data.
The steps are as follows:
The user uploads a decrypted file (F).
They are prompted to sign a fixed message (the encryption seed) with their wallets, creating a unique signature that can only be recreated by signing that same message using that same wallet.
The generated signature is used as the encryption key (EK) to encrypt the file F using a symmetric encryption technique, creating an encrypted file EF.
The encryption key EK is then encrypted with the trusted party's public key, making an encrypted encryption key (EEK).
The encrypted file EF and encrypted encryption key EEK can be safely shared with the intended receipient.
Once the data has been encrypted, it can be decrypted by either a dApp or trusted party.
The dApp prompts the user to sign the same fixed message, to retreive the same EK as above
The EK is used to decrypt the encrypted file EF and retreive F
The trusted party receives the encrypted file EF and the encrypted encryption key EEK
They decrypt the EEK using their private key, to retreive the encryption key EK
The EK is used to decrypt the encrypted file EF and retreive F
Data DAOs can use the encryption technique described above to efficiently encrypt large files (up to several gigabytes in size) in the browser.
If the encryption key EK needs to be shared with a trusted party, it can be encrypted with their public key. To generate a new public/private keypair, see the instructions here.
The trusted party can now receive and decrypt EEK, resulting in EK which can then be used to decrypt the user file F. A Python code example of this is shown here.
In the Vana network, data is stored encrypted off-chain in a storage solution of the DLP's choice, providing flexibility and control over their data. This approach allows data contributors to utilize familiar platforms such as Dropbox, Google Drive, or decentralized options like IPFS.
By keeping data off-chain but accessible through these identifiers, Vana maintains a balance between data privacy, user control, and cost efficiency.
To add data to the Vana Network:
Generate a signature, the encryption_key
, by asking the data contributor to sign a message, the encryption_seed
Upload the encrypted data to a location of your choice. This can be a Web2 storage solution like Google Cloud, Dropbox, etc, or a Web3 solution like IPFS
Get the storage URL of the uploaded file
The data registry returns a file_id
, which can be used later to look up the file
Vana's system only requires two key pieces of information: a URL pointing to the data's location and an optional identifier that changes when the data is modified (e.g., an or last modified date). This ensures data at a particular location has not changed since it was uploaded there.
To make data discoverable in the Vana network, it must be written onchain using the . The data contributor first uploads an to a storage provider of their choice, then writes a pointer to that file (the URL) and an optional content integrity hash to the registry.
Symmetrically encrypt the data using the encryption_key
. Code samples available in .
Add file to the : addFile(encrypted_data_url)
Anyone can submit data to the Vana network. However, for data to be considered valid by a DLP, it must be attested for by a trusted party, i.e. Validators in a DLP. These trusted parties issue an attestation to prove that this data is, in fact, authentic, high-quality, unique, and has whatever other properties DLPs value in its data contributions.
Data attestations live offchain, and a URL to a data's attestation is written onchain alongside the data itself.
The attestation of a data point must follow a spec. Attestations show relevant information about how the data was evaluated, proof-of-contribution scores, integrity checksums, and custom metadata relevant to a specific DLP.
An example of when this would be useful: consider a ChatGPT DLP that accepts GDPR exports from chatgpt.com. Say the DLP considers the export to be high quality when the number of conversations in the export exceeds 10. This DLP can insert numberOfConversations: xxx
in the attestation when Proof of Contribution is run, and anyone can see how valuable that encrypted data point is.
signed_fields
Contains the main data fields that are signed by the prover.
subject
Information about the datapoint being attested for.
url
URL where the encrypted file lives.
owner_address
Wallet address of the file owner.
decrypted_file_checksum
Checksum of the decrypted file for integrity verification.
encrypted_file_checksum
Checksum of the encrypted file for integrity verification.
encryption_seed
The message that was signed by the owner to retrieve the encryption key.
prover
Information about the prover.
type
Type of the prover, satya
is one of the confidential TEE nodes in the Satya network. Proofs can also be self-signed
where the data owner generates the proof.
address
Wallet address of the prover.
url
URL or address where the prover service is hosted.
proof
Details about the generated proof.
image_url
Docker image URL of where the instructions to generate the proof is downloaded from
created_at
Timestamp of when the proof was created.
duration
Duration of the proof generation process, in seconds.
dlp_id
DLP ID from the Root Network Contract, this is used to tie the proof to a DLP.
valid
Boolean indicating if the subject is valid.
score
Overall score of the subject, from 0-1.
authenticity
Authenticity score of the subject, from 0-1.
ownership
Ownership score of the subject, from 0-1.
quality
Quality score of the subject, from 0-1.
uniqueness
Uniqueness score of the subject, from 0-1.
attributes
Additional key/value pairs that will be available on the public proof. These can be used to quickly view properties about the encrypted subject.
signature
Generated by the prover signing a stringified representation of signed_fields
, sorted by the key name. To verify it, we can take the signature and the striningifed representation, and extract the address that signed it, which should match the prover.address
.
Vana uses a Proof of Contributionsystem to validate data submitted to the network. "Valid" means something different in each DLP, because different DLPs value data differently.
The recommended way of validating data securely in the Vana Network is by using the Satya Network, a group of highly confidential nodes that run on special hardware. At a high level, the data contributor adds unverified data, and requests a proof-of-contribution job from the Satya Validators (and pay a small fee to have their data validated). Once validated, the Satya validator will write the proof on chain.
To run PoC in the Satya Network, a DLP builder must implement a simple proof-of-contribution function using this template.
PoC Template: https://github.com/vana-com/vana-satya-proof-template
The diagram below explains how this PoC template works.
The data contributor adds their encrypted data onchain, via the Data Registry.
They request a validation job, paying a small fee. Once a Satya node is available to run the job, they connect directly to the node, and send them the encryption key and the proof-of-contribution docker image that needs to run on the data to validate it.
The Satya node receives the key, and downloads the encrypted file, and decrypts it
The Satya node places the decrypted file in a temporary, shielded* location. The node operator cannot see the contents of this location.
The Satya node downloads and initializes a docker container to run the specified proof-of-contribution, and mounts the input and output volumes. The PoC container will have access to the decrypted file.
The PoC container runs its validations on the decrypted data, and outputs the attestation. More information on data attestation can be found here: Data Attestation.
The Satya node reads the output, and generates the proof.
The Satya node writes the proof onchain, and claims the fee as a reward for completing that work.
* A Gramine shielded container is a specialized type of container that leverages the Gramine library OS to run applications in a secure, isolated environment, typically utilizing hardware-based trusted execution environments (TEEs) like Intel SGX.
More information on integrating with the Satya network for data validation is coming soon.