Introducing KYVE’s Data Pipeline: Web3’s No-Code Data Gateway

We’re thrilled to officially introduce KYVE’s upcoming new product: Data Pipeline, the no-code solution for providing Web2 and Web3 with trustless data.

Now officially in public beta,Data Pipelineallows KYVE data to be imported into any data backend supported byAirbytewithin just a few clicks. Via a customizable ELT format, data analysts and engineers alike can easily implement KYVE data in a trustless way, no longer worrying about the validity or reliability of the data they are sourcing.

Data Pipeline

Airbyte

In this article, we’re going to break down all the unique elements of this product and how even you can start testing it out today!

Introducing Data Pipeline

Connecting Web2 and Web3 DataToday, there is a lack of communication between Web2 and Web3, causing roadblocks when bringing Web2 data on-chain, sourcing Web3 data for Web2, and even between Web3 projects themselves. It’s extremely tedious for indexers to query for the data needed — sourcing from multiple APIs and sorting through thousands of blocks — and after all that effort, you can’t be 100% sure that the data gathered is truly accurate or up-to-date.

This issue of not being able to easily and reliably access trustless data in both Web2 and Web3 is growing exponentially, increasingdata manipulation and hacking riskswithin blockchains and dApps, data analyses, and impacting the overall stability of data infrastructures. Having just one source to access an entire bank of decentralized, valid data that can be imported into any data warehouse or backend would make working with data easier and more secure for all. This is where KYVE and its new Data Pipeline come in!

data manipulation and hacking risks

KYVE’s Data Lake provides customized pools of data where validators go and fetch the requested data, store it onto Web3 storage providers like Arweave, and validate it in a decentralized way, ensuring that all data used from KYVE is truly reliable. There are a few ways one can tap into this decentralized data lake, the easiest being via Data Pipeline.

Data Pipeline is a no-code solution that enables anyone to pull data from KYVE’s data lake and import it into their preferred data warehouse, such as MongoDB, Google BigQuery, SQL Databases, and more. It is built with an ELT structure, allowing for the data to be customizable once loaded onto the desired data backend. Overall, Data Pipeline offers an easy and customizable access point for developers, researchers, and more to use KYVE’s trustless data, no longer needing to worry about reliably accessing the true data they need.

“Our goal was to make it as easy as possible for developers to integrate with KYVE. By creating this no-code solution, it makes it extremely straightforward for developers to onboard KYVE and integrate it into their local tech stack,” said Fabian Riewe, Co-Founder and CEO of KYVE.

Two major elements that aided in the ease of Data Pipeline’s usability would be its ELT process and the use of Airbyte.

What Do We Mean By ELT?A quite common way for developers, analysts, and more to source and implement data is via a set of processes called ETL (extract, transform, load). However, this order of processes creates limitations for its users, seeing that they need to transform the data before loading it onto their data backends to use it, meaning that anytime they need to transform it again, they would need to start the entire process over again.

Since KYVE’s data lake stores and validates all types of raw data that can be used in many different ways, it just makes sense to go for an ELT (extract, load, transform) approach. This allows users to keep the original data in their database and transform it in however many ways they need, making it very flexible to their needs.

Choosing AirbyteOur goal in searching for a framework was to ensure that we work with a tech stack that stays fully trustless and promotes developers using their local machines to maintain full control over what they can do with the data and not rely on a third-party, fitting in with our goal in providing decentralized data products. Airbyte, being an open-source data-integration platform where you can easily implement a path for directing data from one place to another, provides exactly this, therefore being an easy choice on our side.

We decided not to build our own pipeline because we wanted to leverage the benefits that Airbyte already provides as well as tap into their already vast ecosystem. Thanks to this, KYVE’s data can easily be imported into all top data backends, such as Snowflake, BigQuery, S3, MongoDB, and more.

How It Works

Data Pipeline was designed to be as easy and fluid as possible for anyone to be able to source and use the data they need in a trustless, efficient way. No matter if you are technically skilled or not, this solution applies to all levels of developers, data analysts, researchers, and more, seeing that it requires no code to implement. This product is built on Airbyte as a custom data source. Therefore, KYVE pools can be accessed through a connecter set up by any user.

To use Data Pipeline, simply visit KYVE’sGitHub, download the code, and start it on your local machine. Once that’s ready, you can select a custom source, pulling from one of KYVE’s data lake pools, customize the sync settings to best fit your needs, and let Data Pipeline do the rest!

GitHub

Visit ourstep-by-step guideto find out more. Further resources and use case examples will be released in the following weeks. Meanwhile, our Co-Founder and CEO, Fabian Riewe, gave ademoat Arweave Demo Day in Lisbon in November this year detailing these steps. If you want to learn more or have any questions, feel free to contact the KYVE team via our social media channels.

step-by-step guide

demo

Projected Use Cases

There are an unlimited number of ways Data Pipeline can be useful for data users and providers. Seeing that anyone can access it and implement it within a few clicks, you can imagine users range from data analysts, researchers, software developers, and data engineers to even companies who want to provide more streamlined and customizable access to their data for their teams or partners. Let’s break down a few of these examples…

Data analysts needing to source weather data, sports results, or even Web3 data like transactions, token pricing data, and so on can rely on Data Pipeline to provide them with constant, updated access to these data sets while assuring their validity in a trustless way thanks to KYVE’s decentralized data lake.

Web3 developers building a blockchain or dApp needing access to cross-chain data, for example, can rely on KYVE’s data lake to bring this data on-chain in a trustless way, making it easily accessible via Data Pipeline to implement into their preferred data backends/warehouses for more streamlined, secure building.

You can even imagine a global company that has a large database they want to make available to all of their partnered projects to help boost the projects being built around them. For example, a company that has a large base of climate data and wants to provide it to partnered projects building climate-neutral solutions. This data can easily become a trustless data set on KYVE that can be sourced by all in the format they need via Data Pipeline.

Test Data Pipeline Yourself!

Data Pipeline is currently in public Beta connected to KYVE’s testnet and is open for anyone to test out! Please keep in mind that since KYVE is on testnet, we cannot guarantee the quality of data. Data Pipeline’s official launch and the display of its full capabilities will follow KYVE’s mainnet.

Go ahead and test outData Pipeline in Beta versiontoday, and as always, we love to hear feedback from our community on how we can continue to improve the user experience, so don’t hesitate to let us know on our Discord.

Data Pipeline in Beta version

About KYVE

KYVE, the Web3 data lake solution, is a protocol that enables data providers to standardize, validate, and permanently store blockchain data streams. By leveraging permanent data storage solutions like Arweave, KYVE’s Cosmos SDK chain creates permanent backups and ensures the scalability, immutability, and availability of these resources over time.

KYVE

KYVE’s network is powered by decentralized uploaders and validators funded by $KYVE tokens and aims to operate as a DAO in the near future. This past year KYVE has gained major support, currently backed by top VCs, including Hypersphere Ventures, Coinbase Ventures, Distributed Global, Mechanism Capital, CMS Holdings, IOSG Ventures, and blockchains such as Arweave, Avalanche, Solana, Interchain, and NEAR.

Join KYVE’s community:Twitter|Discord|Telegram

Twitter

Discord