Skip to main content

Querying pooled data

Use this flow when you need queryable, aggregate data from a Data Liquidity Pool (e.g. for training or analytics) rather than individual user data via the data portability path. You’ll discover datasets, get permission from the DLP owner, and run queries in a TEE so raw data stays protected. The flow uses QueryEngine, ComputeEngine, and DataRefinerRegistry — addresses and ABIs are in Contract addresses. You need: a wallet, permission from a DLP/DataDAO owner for the dataset and compute instruction you want, and the target refinerId (or a way to discover it).

Overview

  1. Discover a dataset — Use DataRefinerRegistry to find refiner IDs and schema definitions for the datasets you want to query.
  2. Request and verify access — The owner grants you data access (for a specific refinerId) and compute access (for a specific computeInstructionId). Verify both onchain before submitting jobs.
  3. Submit and execute a job — Pre-pay on ComputeEngine, submit a job onchain, then trigger execution via the TEE API with your query and signatures.
  4. Retrieve results — Poll job status and download artifacts (e.g. query results) from the TEE.
Default compute instruction (returns query results as a database file):
NetworkcomputeInstructionId
Mainnet3
Moksha (testnet)40
For custom processing (e.g. embeddings, normalization), see Custom compute instructions below. Contract addresses: Contract addresses.

1. Discover a dataset

The DataRefinerRegistry contract stores refiner types: each refiner has an off-chain schema definition and a refinement/processing image. Call refiners(refinerId) to get the schema and metadata for the dataset you want. The schema defines the queryable structure (e.g. SQLite tables); use it to write valid SQL for your job. Contract: DataRefinerRegistry · Function: refiners(uint256 refinerId) Example schema shape (simplified):
{
  "name": "spotify",
  "dialect": "sqlite",
  "schema": "CREATE TABLE IF NOT EXISTS \"albums\"(...); CREATE TABLE IF NOT EXISTS \"artists\"(...);"
}

2. Request and verify access

To query a DLP’s data, you need two permissions from the pool owner:
  • Data access — Permission to query a specific dataset (refinerId). QueryEngine records this. Verify: getPermissions(uint256 refinerId, address grantee) — use your app’s or wallet’s grantee address.
  • Compute access — Permission to run a specific compute instruction on that DLP’s data. ComputeInstructionRegistry records this. Verify: isApproved(uint256 instructionId, uint256 dlpId).
Access is usually agreed off-chain (e.g. terms, Discord), then the owner grants the permissions onchain. If either check fails, request access from the DLP/DataDAO owner before submitting jobs.

3. Submit and execute a job

Pre-pay

Deposit funds on the ComputeEngine: deposit(address token, uint256 amount). Use token address 0x0 for VANA on the target network.

Submit job

Call ComputeEngine submitJob(uint80 maxTimeout, bool gpuRequired, uint256 computeInstructionId). Example: maxTimeout: 300, gpuRequired: false, computeInstructionId from the table above. This returns a jobId and a tee-url.

Sign

With the wallet that submitted the job, generate two signatures:
  1. Job ID signature — Sign the jobId as a 32-byte hex string. Send in the x-job-id-signature header.
  2. Query signature — Sign the raw SQL query string (e.g. SELECT * FROM users LIMIT 10). Send in the request body.

Trigger execution

POST to https://{tee-url}/job/{job-id}/ with header x-job-id-signature and body:
{
  "input": {
    "query": "SELECT id, locale FROM users LIMIT ?",
    "query_signature": "0x...",
    "refinerId": 12,
    "params": [10]
  }
}

4. Retrieve results

Poll job status: GET https://{tee-url}/job/{job-id}/ with header x-job-id-signature. When status is success, the response includes an artifacts array:
{
  "job_id": "123",
  "status": "success",
  "artifacts": [
    {
      "id": "art-9643cb38bea94261b5d2d2bba701bd2b",
      "url": "https://{tee-url}/job/100/artifacts/art-9643cb38bea94261b5d2d2bba701bd2b",
      "file_name": "stats.json",
      "status": "available"
    }
  ]
}
Download an artifact: GET https://{tee-url}/job/{job-id}/artifacts/{artifact-id} (same header). The file is typically a database (e.g. query_results.db) or JSON.

Next steps

Use the downloaded artifact in your app, data pipeline, or AI agent. To process results with custom logic (e.g. embeddings, model training), use a custom compute instruction below.

Custom compute instructions

A compute instruction is a Docker image that runs inside Vana’s Compute Engine. It defines how query results are processed before you receive them. Vana provides a default instruction that returns results as a database file; creating your own enables use cases like data normalization, embedding extraction, or AI model training.
Before you begin: This guide assumes you have Docker installed and can build and push images to a public registry (e.g. Docker Hub or GitHub Container Registry).

Step 1: Start with the template

Use the Python job template to handle setup and receive query results from the Query Engine: Edit worker.py to load the input data, run your logic, and write outputs as artifacts.
  • Input: The SQL query results are available in the container as query_results.db.
  • Output: Write any files you want to retrieve to /mnt/output/.

Step 2: Build and publish the image

Build your Docker image and push it to a public container registry. The template repo includes a GitHub Actions workflow to automate this; you can also build and push manually. The image must be publicly pullable by the Compute Engine.

Step 3: Generate the image checksum

Compute the SHA256 digest of your image. This checksum is registered on-chain so the system can verify the image each time it runs.
docker pull your-username/your-image:latest
docker save your-username/your-image:latest | sha256sum
Copy the full SHA256 output.

Step 4: Register the instruction on-chain

Register your compute instruction with the ComputeInstructionRegistry contract to get a unique computeInstructionId. Contract address: Contract addresses. Function: addComputeInstruction(string calldata hash, string calldata url)
  • hash — The SHA256 checksum from step 3.
  • url — The public URL of your Docker image (e.g. docker.io/your-username/your-image:latest).

Step 5: Use your new instruction

Your new computeInstructionId is ready. Get it approved by each DataDAO whose data you want to process (they call the same registry to approve the instruction for their DLP). Then use this ID when you submit jobs instead of the default (3 mainnet, 40 Moksha).

Support

Contract ABIs and addresses: Contract addresses. TEE API details: official Vana deployment and builder docs. To find DataDAOs or get help: Community & Discord and Datahub to explore available pools.