Querying pooled data
Use this flow when you need queryable, aggregate data from a Data Liquidity Pool (e.g. for training or analytics) rather than individual user data via the data portability path. You’ll discover datasets, get permission from the DLP owner, and run queries in a TEE so raw data stays protected. The flow uses QueryEngine, ComputeEngine, and DataRefinerRegistry — addresses and ABIs are in Contract addresses. You need: a wallet, permission from a DLP/DataDAO owner for the dataset and compute instruction you want, and the targetrefinerId (or a way to discover it).
Overview
- Discover a dataset — Use DataRefinerRegistry to find refiner IDs and schema definitions for the datasets you want to query.
- Request and verify access — The owner grants you data access (for a specific
refinerId) and compute access (for a specificcomputeInstructionId). Verify both onchain before submitting jobs. - Submit and execute a job — Pre-pay on ComputeEngine, submit a job onchain, then trigger execution via the TEE API with your query and signatures.
- Retrieve results — Poll job status and download artifacts (e.g. query results) from the TEE.
| Network | computeInstructionId |
|---|---|
| Mainnet | 3 |
| Moksha (testnet) | 40 |
1. Discover a dataset
The DataRefinerRegistry contract stores refiner types: each refiner has an off-chain schema definition and a refinement/processing image. Callrefiners(refinerId) to get the schema and metadata for the dataset you want. The schema defines the queryable structure (e.g. SQLite tables); use it to write valid SQL for your job.
Contract: DataRefinerRegistry · Function: refiners(uint256 refinerId)
Example schema shape (simplified):
2. Request and verify access
To query a DLP’s data, you need two permissions from the pool owner:- Data access — Permission to query a specific dataset (
refinerId). QueryEngine records this. Verify:getPermissions(uint256 refinerId, address grantee)— use your app’s or wallet’s grantee address. - Compute access — Permission to run a specific compute instruction on that DLP’s data. ComputeInstructionRegistry records this. Verify:
isApproved(uint256 instructionId, uint256 dlpId).
3. Submit and execute a job
Pre-pay
Deposit funds on the ComputeEngine:deposit(address token, uint256 amount). Use token address 0x0 for VANA on the target network.
Submit job
Call ComputeEnginesubmitJob(uint80 maxTimeout, bool gpuRequired, uint256 computeInstructionId). Example: maxTimeout: 300, gpuRequired: false, computeInstructionId from the table above. This returns a jobId and a tee-url.
Sign
With the wallet that submitted the job, generate two signatures:- Job ID signature — Sign the
jobIdas a 32-byte hex string. Send in thex-job-id-signatureheader. - Query signature — Sign the raw SQL query string (e.g.
SELECT * FROM users LIMIT 10). Send in the request body.
Trigger execution
POST to https://{tee-url}/job/{job-id}/ with header x-job-id-signature and body:
4. Retrieve results
Poll job status:GET https://{tee-url}/job/{job-id}/ with header x-job-id-signature. When status is success, the response includes an artifacts array:
GET https://{tee-url}/job/{job-id}/artifacts/{artifact-id} (same header). The file is typically a database (e.g. query_results.db) or JSON.
Next steps
Use the downloaded artifact in your app, data pipeline, or AI agent. To process results with custom logic (e.g. embeddings, model training), use a custom compute instruction below.Custom compute instructions
A compute instruction is a Docker image that runs inside Vana’s Compute Engine. It defines how query results are processed before you receive them. Vana provides a default instruction that returns results as a database file; creating your own enables use cases like data normalization, embedding extraction, or AI model training.Before you begin: This guide assumes you have Docker installed and can build and push images to a public registry (e.g. Docker Hub or GitHub Container Registry).
Step 1: Start with the template
Use the Python job template to handle setup and receive query results from the Query Engine:- Template: vana-compute-job-template-py
worker.py to load the input data, run your logic, and write outputs as artifacts.
- Input: The SQL query results are available in the container as
query_results.db. - Output: Write any files you want to retrieve to
/mnt/output/.
Step 2: Build and publish the image
Build your Docker image and push it to a public container registry. The template repo includes a GitHub Actions workflow to automate this; you can also build and push manually. The image must be publicly pullable by the Compute Engine.Step 3: Generate the image checksum
Compute the SHA256 digest of your image. This checksum is registered on-chain so the system can verify the image each time it runs.Step 4: Register the instruction on-chain
Register your compute instruction with the ComputeInstructionRegistry contract to get a uniquecomputeInstructionId. Contract address: Contract addresses.
Function: addComputeInstruction(string calldata hash, string calldata url)
hash— The SHA256 checksum from step 3.url— The public URL of your Docker image (e.g.docker.io/your-username/your-image:latest).
Step 5: Use your new instruction
Your newcomputeInstructionId is ready. Get it approved by each DataDAO whose data you want to process (they call the same registry to approve the instruction for their DLP). Then use this ID when you submit jobs instead of the default (3 mainnet, 40 Moksha).