Data Refinement

Data Refinement & Indexing
The Vana Data Access Layer expands upon our existing data ingress processes to also include secure data refinement and indexing to support later permissioned query access by applications and application builders.
Data Refinement

Data Refinement Steps
Data refinement is the process of ensuring that ingested datasets meet verifiable quality and security standards for future access before decentralized storage in IPFS, directly integrated into a DataDAO's Proof-of-Contribution (PoC) mechanism.
3 Steps
The refinement process can be broken down into three clearly defined steps:
- Normalization to structure data to adhere to the on-chain schema definition and processing found in the
DataRefinementRegistry
contract. - Masking of ingested data to optionally suppress any information DLP owners do not want to provide access to.
- Encryption of the end result to protect against unauthorized access with strictly defined access control mechanisms.
Once refined, the final data output is structured, masked (optional), and encrypted, forming a solid foundation for decentralized data commerce.
Why This Matters
- Standardized, structured datasets are easier to validate and integrate in end-user applications, increasing downstream value creation.
- Masking helps DataDAOs to control which data is exposed to data consumers.
- Encryption ensures that even distributed datasets remain protected, improving overall ecosystem safety.
- Well-structured, highly indexed, privacy-preserving datasets allow high-throughput ingestion for more scalable and cost-effective consuming applications.
Storage & Availability

Data Refinement & Storage
As part of a DataDAO's Proof-of-Contribution (PoC) process, refined data (normalized, masked (optional), encrypted) is uploaded to a decentralized storage network chosen by the DataDAO, such as IPFS. The resulting content identifier (CID) is recorded on-chain within the corresponding file entity, tracking all file refinements across different schema versions.
This triggers an on-chain event, prompting the Query Engine to index the newly refined data safely into a centralized schema database within a Trusted Execution Environment (TEE). The Query Engine then processes permissioned query requests, ensuring secure, controlled access to the refined dataset.
This guarantees:
- Immutable data integrity, ensuring tamper-proof, verifiable storage through IPFS and decentralized solutions.
- Granular data access control with strictly enforced on-chain permissioning mechanisms.
- Real-time data availability with event-driven indexing and aggregation for up-to-date, query-ready datasets.
- Resilient, fault tolerant storage, providing high availability and failover protection through decentralized redundancy.
DataDAO Data Refiners

Data Refiner Creation
Data refiners, defined by DataDAOs and DLP owners, serve as the primary processing units responsible for determining how data is normalized, as well as the structure of the resulting output.
There are two core components to this:
- The data refiner image, which when executed as part of the Proof-of-Contribution (PoC) step performs the data refinement on the raw input data. This is documented on-chain as the data refiner's "refinement instruction url" in the Data Refiner Registry contract.
- The schema definition, which is uploaded to IPFS and documents the structure of the refined dataset as a queryable database. The IPFS content id (CID) is documented on-chain as the data refiner's schema in the Data Refiner Registry contract.
Updated 2 days ago