KYVE Fundamentals Article #3: Simplifying Access to Trustless Data

As more and more data enters the space, retrieving any specific data becomes increasingly difficult, especially when the pieces are computationally reliant on one another.

For example, one major impact this has is on the synchronization times when nodes first enter blockchain networks, as seen with Ethereum, where synchronizing a full archival node can take weeks, as it requires recomputing the entire chain state, syncing all blocks and transactions. Not to mention Ethereum’s switch to PoS, removing the incentivization for nodes holding the entire chain state. Without intervention, syncing threatens to become a multi-year endeavor.

However, synchronizing nodes is just one of the many affected areas when it comes to overcrowded data, seeing that one must also factor in the issues of bringing off-chain data on-chain in a secure manner, or even communicating with Web2 data that’s locked in home-grown or closed-source solutions.

In order to ease the overall experience in accessing and working with data, we need a decentralized data-sourcing solution with built-in validation and access tooling around it, much like KYVE.

KYVE is a decentralized data lake that allows anyone to create customized pools of data, telling validators in its network to fetch the requested data, store it onto Web3 storage providers like Arweave, make it public, and validate it in a decentralized way.

In the end, KYVE cuts out the need for tedious querying as well as the worry of if the data you sourced is truly valid or not. Then, to tap into these pools of valid testnet data, KYVE provides a no-code solution via Data Pipeline; and, for those more familiar with code, a REST API.

No-Code Data Access

In order to allow anyone to access and implement trustless data, no matter their coding skills, KYVE has introduced the Data Pipeline. This solution came about while analyzing this classic issue of overcrowded data space, especially when it comes to sourcing data across Web2 and Web3.

For example, data analysts needing weather data, sports results, or even Web3 data like transactions, token pricing data, and more typically need to source from multiple APIs and sort through thousands of blocks, or rely on a querying service, either in-house or an oracle or indexer solution. In the end, making the process of fetching and implementing the data very time-consuming, and in the end, they can’t be sure the data they sourced is fully accurate.

In order to ease this process, users can benefit from Data Pipeline, which enables anyone to import KYVE’s trustless data into top data backends such as MongoDB, Google BigQuery, SQL Databases, and more, within just a few clicks! Thanks to this, data analysts and engineers alike can easily implement KYVE data, no longer worrying about the validity or reliability of the data they are sourcing.

How does it work? Data Pipeline is a customized data source, or “connector,” built on Airbyte, which provides an ELT approach allowing users to select which KYVE data set to extract, load it onto their preferred backend, then transform it in whatever or however many ways they need for their use case. Find out more via our docs.

Overall, this upcoming product opens even more doors to new use cases for KYVE data, especially in Web2, providing more simplicity for those working with data across the globe. For those wanting to code their own solution for sourcing KYVE data, they can do so via the REST-API.

Data Access Via REST-API

The REST-API is a resource model outlining the major details you need to know about a given piece of data. For example, where and how it’s being stored. For KYVE, its data is available through the REST-API exposed by its chain nodes. This integration is a native way to access data in a completely trustless way via coding your own solution to get KYVE data (currently on testnet) onto your own application and or database. To get started, a list of REST endpoints can be found here.

Since KYVE stores data onto specialized providers and not natively on its own chain, developers must fetch the underlying data directly from the storage provider. The KYVE REST-API holds the proof that data on a storage provider is valid and can be retrieved trustless. A list of valid bundles is available at the kyve/query/v1beta1/finalized_bundles/[pool_id] path.

From this, the developer will get the storage_provider_id which points to a storage provider. In combination with the storage_id, developers can retrieve the uploaded data.

Some integrations might compress data before storing it on the storage provider. The compression_id indicates which compression method has been used.

Below displays the general workflow of this process:

After the data has been retrieved from the storage provider and decompressed, developers can use it in their applications.

More Solutions to Come

KYVE’s goal is to help build a secure, scalable, and trustless data infrastructure for all; therefore, our team is always looking for new ways to ease the data use and access experience. With new data risks and KYVE use cases popping up each day, KYVE is just getting started introducing new solutions to best benefit from its decentralized data lake. Stay tuned for what else might come!

Interested in testing out KYVE’s Beta version Data Pipeline? Feel free to check it out, and as always, we love to hear feedback from our community on how we can continue to improve the user experience, so don’t hesitate to let us know on our Discord.

Blog Author: Margaux, KYVE Head of Marketing