How to access data on Vana?

Your step-by-step guide to querying and processing data from DataDAOs using Vana’s Data Access Layer (v0)

📘

Notice

Vana Data Access is in v0, meaning things will change and improve based on feedback from early adopters like you.

For detailed documentation, check out Data Access Layer.


🔁 Workflow Overview

As an application builder, your typical workflow will be:

  1. Discover – Explore what datasets are available through published schemas
  2. Define – Choose what kind of compute job you want to run (default or custom)
  3. Get Approval – Request access to specific data and get your compute job approved by the DataDAO
  4. Execute – Submit the job and trigger execution via API
  5. (Optional) Monitor – Track job status while it runs
  6. Retrieve – Download and use the generated artifact in your application


1. 🧭 Discover Available Data

Every DataDAO publishes refined datasets via schemas called refiners. These are structured SQL views (usually using libSQL) that expose selected data from raw user contributions — cleaned, masked, and standardized by the DataDAO. You never touch raw data directly; queries always run against refiners.

Each refiner includes a name, schema definition, and a unique refinerId. You’ll use this ID in your query requests and compute jobs.

🧱 Call this contract:

function refiners(uint256 refinerId) external view returns (Refiner memory)

Contract: DataRefinerRegistry -> 0x93c3EF89369fDcf08Be159D9DeF0F18AB6Be008c

Example schema that can be accessed via schemaDefinitionUrl:

{
  "name": "spotify",
  "version": "0.0.1",
  "description": "Schema for storing music-related data",
  "dialect": "sqlite",
  "schema": "CREATE TABLE IF NOT EXISTS \"albums\"(\n    [AlbumId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\n    [Title] NVARCHAR(160)  NOT NULL,\n    [ArtistId] INTEGER  NOT NULL,\n    FOREIGN KEY ([ArtistId]) REFERENCES \"artists\" ([ArtistId]) \n\t\tON DELETE NO ACTION ON UPDATE NO ACTION\n);\nCREATE TABLE IF NOT EXISTS \"artists\"(\n    [ArtistId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\n    [Name] NVARCHAR(120)\n);\n"
}

Use the returned refinerId to reference this dataset in the next steps.



2. ⚙️ Define What Happens to the Data

Each compute job defines what your app wants to do with the data — run stats, format it into a CSV, generate embeddings, train a model, etc.

Jobs run inside a secure enclave using a Docker image you register.

You can either:

For most applications we recommend to create your own compute instruction, but for early testing just using the default image will be fine.

Once you’ve chosen your compute logic, you’ll get a computeInstructionId. You’ll send this to the DataDAO in the next step for approval.



3. 🔐 Ask For Access

DataDAOs are the owners of their data. To work with any dataset, you need their approval — both for the data you want to query and the compute logic you want to run on it.

Sometimes access is free and permissioned globally. In most cases, you’ll need to reach out off-chain and get explicit approval.

🧾 What DataDAOs approve:

  1. Data access – which schemas, tables, or columns you're allowed to query
  2. Compute logic – the specific compute instruction (Docker image) you're using

🗣 Request flow:

  1. Find the DataDAO owner (Telegram, Discord, etc.)
  2. Share:
    • The data you want (full schema, specific table/column)
    • Your computeInstructionId (see previous step)
  3. Agree on terms — price per query in $VANA (or free)
  4. DataDAO approves both on-chain. If you’re using the default compute job, many DataDAOs will recognize it and approve instantly

🔍 Check your permissions

function getPermissions(uint256 refinerId, address grantee)

Contract: QueryEngine -> 0xd25Eb66EA2452cf3238A2eC6C1FD1B7F5B320490

🔍 Check if your compute job is approved

function isApproved(uint256 instructionId, uint256 dlpId) external view returns (bool)

Contract: ComputeInstructionRegistry -> 0x5786B12b4c6Ba2bFAF0e77Ed30Bf6d32805563A5

Once both approvals are granted, you're ready to execute.



4. 🚀 Submit the Job

With access and an approved compute job, you're ready to run. First you prepay in $VANA, then you register the job on-chain and trigger execution via API.

The TEE will automatically fetch your permissions, run the query, and pass the results to your compute image.

First, fund your account

function deposit(address token, uint256 amount)

Use 0x0 in token for $VANA.

Then register the job:

function submitJob(
  uint80 maxTimeout,
  bool gpuRequired,
  uint256 computeInstructionId
) external payable returns (uint256 jobId)
  • gpuRequired – set to false for most jobs, including the default compute job
  • maxTimeout – maximum job duration in seconds. The current hard cap is 300. You can safely use 300 to ensure the job finishes within this time limit.
  • computeInstructionId – the Docker-based compute job ID (see earlier steps)

Contract: ComputeEngine -> 0xb2BFe33FA420c45F1Cf1287542ad81ae935447bd


Kick off the job via API:

When you call submitJob(), it returns a jobId. The assigned TEE and its URL will be selected automatically — you'll get the tee-url from the job registration response.

Then, use this tee-url to trigger the job execution.

Your actual SQL query should be included inside the query field in the body — this is the concrete SQL that will be executed by the Query Engine.

You must generate x-job-id-signature by signing the jobId (as a 32-byte hex) using the wallet that submitted the job. This proves you're authorized to run or fetch results for this job.

The query_signature must be created by signing the raw string content of the SQL query using your wallet. This ensures the query is authenticated and hasn’t been tampered with during transmission.

POST https://{tee-url}/job/{job-id}/
Headers:
  x-job-id-signature: "0x..."
Body:
{
  "input": {
    "query": "SELECT id, locale FROM users LIMIT ?",
    "query_signature": "0x...",
    "refinerId": 12,
    "params": [10]
  }
}


5. 🧪 (Optional) Monitor Job Status

The TEE acts as an isolated container that executes your logic privately. It pulls data from the Query Engine, runs your job, and can produce output without ever exposing raw data to your app or infrastructure.

This is what makes Vana composable and privacy-preserving.

Monitor job status:

GET https://{tee-url}/job/{job-id}/
Headers:
  x-job-id-signature: "0x..."

Possible statuses:

  • not_found: Job not registered on-chain
  • no_runs: Job registered but no runs found
  • pending: Awaiting query results
  • queued: Queued for processing
  • running: In progress
  • query_failed: Query failed (possibly due to insufficient funds)
  • success: Completed successfully
  • cancelled: Cancelled by owner
  • failed: Execution error


6. 📥 Get Your Output

When the job completes, it produces one or more artifacts — .json, .csv, charts, etc. These are downloadable directly from the Compute Engine API.

Artifacts are ephemeral, so make sure to grab them before they expire.

When status is success, the job returns artifacts.

Example artifact:

{
    "job_id": "123",
    "run_id": "123-1b6dc6acbeb84f5ea50f79e7b081e9e6",
    "status": "success",
    "artifacts": [
        {
            "id": "art-9643cb38bea94261b5d2d2bba701bd2b",
            "url": "https://{tee-url}/job/100/artifacts/art-9643cb38bea94261b5d2d2bba701bd2b",
            "size": 72,
            "mimetype": "application/json",
            "expires_at": "2025-05-07T08:46:55.629878",
            "status": "available",
            "file_name": "stats.json",
            "file_extension": ".json"
        }
    ],
    "usage": {
        "cpu_time_ms": 1234891,
        "memory_mb": 12.3,
        "duration_ms": 2458312
    }
}

Download:

GET https://{tee-url}/job/{job-id}/artifacts/{artifact-id}
Headers:
  x-job-id-signature: 0x...

Artifacts expire, so store them if you need long-term use.



🧠 That’s It

You’ve accessed a decentralized dataset in a privacy-preserving way — and can now plug results into your app, your pipeline, or your AI agent framework.

Need help or want intros to DataDAO owners? Join the Vana Builder Discord or explore DataDAOs directly on Datahub.