Data Availability is Not Data Storage

Recall this grade school experience: you raise your hand and ask, “Can I go to the bathroom?” To which your teacher responds with “I don’t know. Can you?” Might seem far fetched, but this is a perfect entry point to understanding the difference between data availability and data storage.

Let's bring this analogy close to the subject at hand and say Google Drive is acting like your teacher. You upload a photo, and the next day, you want to show that photo to a friend. You ask Google, “Can you show me the photo I uploaded yesterday?”

Imagine if Google responded, “I mean, I have it available. I caaaan show you that photo,” and then just sent you a face cutout from the photo as proof.

You would rightfully be a bit confused. You asked to download your photo, not proof from Google that they have your photo.

The thing is, that’s the core function data availability blockchains perform. All we ask is that they provide us with proof that the chain has the data available if we need it. We don’t actually want to download all of the data from them unless we have to.

Data availability chains like Avail allow users (other blockchains) to upload data, and at a later date, simply check that all their data is available without actually retrieving the contents of the data itself.

[Read more: Unlocking the Modular Blockchain Future]

This is a very different task from what data storage blockchains like Arweave, IPFS, Filecoin, and Sia are asked to perform.

Where decentralized storage chains like Arweave allow end users to store and retrieve files on the Arweave blockchain, Avail is designed to allow other chains to store their chain's activity on the Avail blockchain.

Light clients benefit the most from using Avail. They actually have a goal of never downloading data at all if they don’t have to. The more data they need to download, the more resource intensive it is to be a light client.

[Read more: Understanding Avail & Modular Blockchains through Metaphors]

Avail can provide a mathematical proof that, "the data you're looking for is still here if you need it."

While that explains the differences between storage and availability, the question remains: why would you want just a guarantee of availability at all? The answer is security.

Proof that the data is around - that the data is available - is enough for light clients to be certain that no one's hiding any suspicious activity. If it's available, it's definitionally not hidden. Knowing it’s not hidden is all these light clients are looking for, because hidden data is what allows for "data withholding attacks".

What Are Data Withholding Attacks

Data withholding attacks describe a scenario where malicious validators vote to add a block containing invalid, or missing transactions to a chain. While full nodes can immediately see that the block contains an error, light clients can be fooled since they look only at block headers which are written in part by the validators.

One fix would be for light clients to download all of a block's data in order to verify correctness. But this would turn the light client into a full node, increasing resource requirements to participate in the network.

A better fix? Blockchains can upload their transaction data to Avail. Avail processes uploaded data using things like erasure coding, and KZG commitments. In this processing step, light clients are incredibly likely to find missing data by requesting a few random kilobytes from each block.

The process of sampling those few random kilobytes can be thought of as light clients checking to make sure Avail is not lying when it says it has the data available. By sampling, they ask, “Do you have all the transaction data available if I were to need it?”. If the first few samples come back positive, the light client can be statistically certain that the rest of the data is there if needed.

This lets light clients reach guarantees of data availability all on their own without the need to trust validators, and without making themselves subject to data withholding attacks.

Contrast Avail’s use with decentralized storage. Users of storage services ask, "Hey, I want to see my photo," and they expect to have all of that data explicitly retrieved and returned.

All that is to say that Avail does not compete with decentralized storage providers like Arweave, IPFS, or Filecoin.


We hope you’re as excited about the modular blockchain future as we are.If you want to learn more about Avail, or just want to ask us a question directly, we would love to hear from you. Check out our repository, join our Discord server.

This article was originally published on Polygon's official blog.