Querying Data on Vana

A step-by-step tutorial to discovering datasets, getting access, and executing a query on the Vana network.

This tutorial will walk you through the end-to-end process of submitting a job to query data from a DataDAO. We will use the default compute instruction, which simply returns the query results as a database file.

Step 1: Discover a Dataset

The DataRefinerRegistry contract holds the list of all available data refiner types. Refiners store references to the off-chain schema definitions and the Docker image used to process the data. Interact with the refiners function to find the refinerId of the dataset you want to query.

  • Contract: DataRefinerRegistry (0x93c...)
  • Function: refiners(uint256 refinerId)

An example schema definition returned by this function might look like this:

{
  "name": "spotify",
  "version": "0.0.1",
  "description": "Schema for storing music-related data",
  "dialect": "sqlite",
  "schema": "CREATE TABLE IF NOT EXISTS \"albums\"(\n   [AlbumId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\n   [Title] NVARCHAR(160)  NOT NULL,\n   [ArtistId] INTEGER  NOT NULL,\n   FOREIGN KEY ([ArtistId]) REFERENCES \"artists\" ([ArtistId]) \n\t\tON DELETE NO ACTION ON UPDATE NO ACTION\n);\nCREATE TABLE IF NOT EXISTS \"artists\"(\n   [ArtistId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\n   [Name] NVARCHAR(120)\n);\n"
}

Step 2: Request & Verify Access

To query a dataset, you need permission from the DataDAO that owns it. This process typically involves an off-chain agreement on terms (e.g., in Discord), followed by the DataDAO granting you on-chain permissions.

You need approval for two things:

  1. Data Access: Permission to query the specific dataset (refinerId).
  2. Compute Access: Permission to use a specific computeInstructionId with that DataDAO's data. For this tutorial, we will use the default instruction:
    • MainnetcomputeInstructionId: 3
    • Moksha TestnetcomputeInstructionId: 40

How to Verify Your Permissions

Before submitting a job, you can check on-chain to see if your wallet already has the necessary permissions.

  1. Check Data Access Permission:
    The QueryEngine contract is the gateway for permissioning data access. DataDAOs use it to approve or revoke access requests for specific datasets.

    • Contract: QueryEngine (0xd25...)
    • Function: getPermissions(uint256 refinerId, address grantee)
  2. Check Compute Access Permission:
    The ComputeInstructionRegistry holds the list of approved compute instructions that can be run on a DataDAO's data.

    • Contract: ComputeInstructionRegistry (0x578...)
    • Function: isApproved(uint256 instructionId, uint256 dlpId)

If both checks pass, you are ready to proceed. If not, you will need to contact the DataDAO owner (e.g., via Discord) to request access for both the dataset and the default compute instruction ID.

Step 3: Submit and Execute Your Job

This is a multi-part process involving both on-chain transactions and off-chain API calls, all managed by the ComputeEngine. The ComputeEngine contract handles job submissions, pre-deposited payments, and tracking the status of your query.

3a. Pre-pay for the Job
Fund your account on the ComputeEngine contract to cover compute costs.

  • Contract: ComputeEngine (0xb2B...)
  • Function: deposit(address token, uint256 amount) (Use 0x0 for the token address to deposit $VANA).

3b. Register the Job On-Chain
Call submitJob on the ComputeEngine contract. This returns a jobId that you will use in the next steps.

  • Function: submitJob(uint80 maxTimeout, bool gpuRequired, uint256 computeInstructionId)
    • maxTimeout: Use 300 (seconds).
    • gpuRequired: Set to false for the default job.
    • computeInstructionId: Use the default ID for your target network (e.g., 3 for Mainnet).

3c. Prepare Signatures
You must generate two signatures with the wallet that submitted the job:

  1. Job ID Signature: Sign the jobId (as a 32-byte hex string). This is used in the x-job-id-signature header.
  2. Query Signature: Sign the raw string content of your SQL query (e.g., "SELECT * FROM users LIMIT 10"). This is used in the request body.

Example using Ethers.js:

3d. Trigger Execution via API
The response from your submitJob transaction will include the tee-url. Make a POST request to this URL to start the job.

  • Endpoint: POST https://{tee-url}/job/{job-id}/
  • Header: x-job-id-signature: "0x..."
  • Body:
    {
      "input": {
        "query": "SELECT id, locale FROM users LIMIT ?",
        "query_signature": "0x...",
        "refinerId": 12,
        "params": [10]
      }
    }

Step 4: Monitor and Retrieve Results

You can poll the job status using its ID.

  • Endpoint: GET https://{tee-url}/job/{job-id}/
  • Header: x-job-id-signature: "0x..."

Once the job status is success, the response will contain an artifacts array:

{
    "job_id": "123",
    "run_id": "123-1b6dc6acbeb84f5ea50f79e7b081e9e6",
    "status": "success",
    "artifacts": [
        {
            "id": "art-9643cb38bea94261b5d2d2bba701bd2b",
            "url": "https://{tee-url}/job/100/artifacts/art-9643cb38bea94261b5d2d2bba701bd2b",
            "size": 72,
            "mimetype": "application/json",
            "expires_at": "2025-05-07T08:46:55.629878",
            "status": "available",
            "file_name": "stats.json",
            "file_extension": ".json"
        }
    ],
    "usage": {
        "cpu_time_ms": 1234891,
        "memory_mb": 12.3,
        "duration_ms": 2458312
    }
}

To download your results, make a final GET request:

  • Endpoint: GET https://{tee-url}/job/{job-id}/artifacts/{artifact-id}
  • Header: x-job-id-signature: "0x..."

The artifact (e.g., query_results.db) can now be used in your application.

Next Steps & Support

You have now successfully queried a decentralized dataset in a privacy-preserving way. You can plug the results from your downloaded artifact into an application, data pipeline, or AI agent framework.

Need help or want to find DataDAOs to work with?