Why Trustless Data and Robust Infrastructure Are Key to Web3's Success

Web3 has introduced groundbreaking solutions to Web2 challenges, such as decentralization, open-source frameworks, and trustless systems. However, even these innovations have overlooked pain points, most notably — not entirely relying on trustless, verified data, leaving data feeds vulnerable to malicious attacks.

‍

The Problem with Current Data Practices in Web3

Today, anyone can bring data on-chain and claim it’s true, creating a significant risk for those who might implement it into their project. The idea of a trustless environment emerged from the need to eliminate these untrustworthy sources, instead relying on network participants with distributed trust and incentivization to only bring forward valid data. In theory, this is a great process; however, improper implementation in the node infrastructure and overlooked details can easily lead to its own risk factors.

‍

Take the Ronin Bridge attack, for example, in March 2022. A hacker compromised five of the network’s nine validator nodes, taking the majority to approve their withdrawal of over 173.6k ETH and 25.5M USDC from the bridge. In this situation, the Ronin Bridge team did not suspect anything, as there was no process to raise suspicion about the true incentives of each node. The Ronin Bridge attack in 2022 highlighted a vulnerability still relevant today—weak node infrastructure that enables attacks.

‍

Now begs the question, how exactly can data be implemented into all future Web3 builds in a truly trustless way? And once on-chain, how can we correctly validate this data in a decentralized way, assuring no further risk of tampering or manipulation?

‍

Vulnerabilities When Introducing Data

When projects source off-chain data, oracles are their go-to tool as they provide an easy access point to deterministic Web2 data. However, whether an oracle claims that they verify the data or not, there is an underlying issue: Lack of trustlessness. How can we be sure that the data the oracle fetched was from the correct source and or hasn’t been previously tampered with?

‍

Why Oracles Aren’t Fully Trustless:
‍

Data can be incorrect or manipulated before being fetched
Dependency on a single source creates a critical point of failure
Limited transparency in how data is verified and relayed

‍

Suppose a project automatically implements incorrect data provided by an oracle without double-checking. In that case, it can become highly vulnerable to attacks, losses, improper pricing, and/or data process discrepancies as it opens itself up to an attack vector from one source.

‍

For example, at the time of writing, Mango Markets, a trading platform on Solana, had a targeted attack via oracle discrepancies. Two malicious accounts took an outsized position in MNGO-PERP, leading to a fluctuation of 5–10x in the MNGO price. With this, two oracles updated their MNGO benchmark, further causing a market-to-market increase in the value, all from an unrealized profit.

‍

As mentioned by the Mango Markets team, “…neither oracle providers have any fault here. The oracle price reporting worked as it should have.” Clearly showing how limited oracles can be in terms of providing “valid” data.

‍

Another typical attack is if a hacker targets the majority of validator nodes in a network to approve certain actions, AKA a 51% attack. This can occur if, for example, a network doesn’t have strong enough security in its node infrastructure or, simply, not enough nodes in general. With all projects vulnerable to this type of attack, it’s crucial that they ensure proper decentralization within their node infrastructure.

‍

There are many ways to reduce the risk of a 51% attack. Today, KYVE is heavily focusing on this topic by implementing the right recipe of incentivization, high stake, weighted power, and more to create a secure, fully trustless environment for introducing data.

‍

Once this is achieved, the next hurdle comes: How can we make sure the data introduced into the space is truly correct?

‍

Validation in a Decentralized Way

Since data can be uploaded by anyone and claimed that it’s true, having multiple sources of truth is a probable outcome. How do we ensure data accuracy in a trustless environment? The answer lies in decentralization.

‍

Decentralization is the key pillar in the ethos of Web3, distributing power, trust, acts, etc., among stakeholders and network participants. In general, to determine if a piece of data is valid, there always needs to be a generic solution, i.e., developers creating custom validation methods per data set. However, what’s lacking is managing these different runtimes and ensuring that all data sets are properly sourced and validated quickly and efficiently.

‍

Introducing Trustless Validation: KYVE's Approach

Enter KYVE, the decentralized data hub built to ensure all types of on- and off-chain data are validated, truly decentralized, and continuously updated, providing the tooling developers need to write these custom solutions.

‍

KYVE enables projects to store blockchain data with decentralization, distributing it across data pools, which are then uploaded and verified by validators for use.

‍

Here’s how it works:
‍

Data Pool Creation: The process begins when participants, through KYVE’s governance system, initiate the creation of a data pool.
Data Upload, Fetching, and Bundling: Validators are selected through an algorithm to act as "uploaders." These uploaders fetch a specific range of blockchain data, bundle it together, and securely upload it to decentralized storage platforms such as Arweave or Filecoin. This process ensures that data is efficiently gathered and securely stored in a decentralized manner, making it accessible for future verification and use.
Storage Hash Sharing: Once the data is stored, the uploader generates a SHA256 checksum and storage ID, with which the other validators can download the full bundle and do the comparison. The storage hash and other metadata are also validated.
Cross-Validation and Voting: The other protocol validators independently fetch the same range of data and verify it against both the uploader’s hash and the uploaded data. They then vote on whether the bundled data is accurate. If a consensus is reached, meaning the majority agree that the data is correct, the storage hash is recorded on the KYVE blockchain permanently as the verified data bundle. The storage ID points to the location where the correct data is stored, and other details, such as the storage hash are stored on the KYVE chain.
Incentives and Penalties: Validators that vote incorrectly—those who do not align with the majority—are penalized and face slashing of their stake. Similarly, uploaders who provide incorrect data bundles are also slashed. This entire process operates on a Delegated Proof of Stake (DPoS) system, ensuring that the network remains secure and data integrity is maintained.

‍

In each pool on the KYVE data lake, one node is responsible for uploading the data, with the rest accountable for voting on whether that data is valid. Once the vote is final, the responsibility of uploading data is switched to another randomly selected node. Doing so combats the risk of centralization, i.e., if we only had one node uploading data at all times, that would be a higher risk factor for an attack.

‍

Below you can see KYVE’s current code for evaluating the vote distribution:
‍

‍

Lastly, to incentivize good node behavior and maintain a proper flow of valid data, we introduced specific pool economics. To put it simply, those who require direct and easy access to trustless data act as “funders”, supplying $KYVE tokens as rewards for well-behaved pool participants. There are also “delegators” who delegate their tokens to support nodes in exchange for token rewards. However, if a node misbehaves, their tokens will get slashed.

‍

To further improve the funding experience and enable a more collaborative experience, the multi-coin funding update now allows data pools to be funded in $KYVE and other tokens, enabling other projects to fund their integrations on KYVE with their own tokens.

‍

With such initiatives, KYVE is constantly envisioning new ways to ensure the KYVE Network and overall infrastructure is highly incentivized and fully decentralized for its mission of providing truly trustless data for all to use for building secure and scalable Web3 projects.

‍

Looking Forward‍

In 2024, the global user base for digital currencies reached 562 million people, up from 420 million in 2023 (Source: Triple A). With the current global Web3 adoption soaring, there’s no doubt we have a long way to go, but that doesn’t mean it’s not time to start focusing on creating a secure baseline for all the projects to come in the near future.

‍

Building with trustless data validated in a decentralized way is a necessity for developers when sourcing data for their dApp and or blockchain. Doing so will decrease their project’s risk of data manipulation and attacks and contribute to improving Web3’s data foundation.

‍

Follow KYVE’s journey in taking a lead role in this movement, enabling all to easily access trustless data validated in a decentralized way via our data lake protocol, eliminating any data doubt or hard efforts for builders, node runners, and more.

‍

For more information on KYVE, including how to get started and technical resources, visit the KYVE documentation.
‍

Join KYVE’s community: Twitter | Discord | Telegram

‍

Blog Author: Margaux, KYVE Head of Marketing

‍