Dots connected like stars depicting three people

Replacing HTTP: A Brief Summary of IPFS

Before investing heavily into the entire IPFS infrastructure, you’ll need to wade through the misconceptions and the fud (Fear, Uncertainty, and Doubt). I’m here to help you. It’s important to know what it really means when you see things like permanently stored and decentralized, and I hope to help you understand how IPFS (Interplanetary File System) works, in general.

This way, you’ll be able to make the best decision regarding the stack that runs your services or application. And, you’ll even learn some interesting tidbits on how IPFS can be used.

A Brief Summary of IPFS

There are a few moving parts that are included under the IPFS umbrella. The term “umbrella” is used because it isn’t simply about storage. The umbrella actually contains a collection of other protocols that aim to replace HTTP. I’ll explain some of the key parts below:

Gateway

This is an instance that is often provided to the public by various contributors. It serves as part of the backbone to the entire “decentralized” aspect of the network. The gateway handles things such as routing the client to the node(s) that has the content and also as a cache mechanism that sits in front of the nodes themselves. All requests are first sent to a gateway before they are retrieved from a node.

In terms of a CDN, this can be considered as an edgerouter.

Node

This is basically a self-hosted instance that you own, and it’s connected to the network. It allows users to access files that you’ve made accessible through your node via the add/publish command. In order for your files to be accessible, the node has to be online. There are caching protocols in place (via gateway) which allow users to access your content if cached, by requesting it from other peers that may have the content.

You can think of this as how BitTorrent works, in a sense. If another peer has the content you’ve made available to the network, others can grab it from them. However, if there are no peers with that content, then no one can access it if your node is offline.

Since it’s possible to still produce a single point of failure, though, this can fall into the not really decentralized bucket.

Hash

There are two different types of hashes to take note of: IPFS and IPNS.

IPFS Hash

This is the hash to the data content you’ve added to your node. It’s also used to access the file through the gateway or from your node. This will always be something unique, depending on the contents of the file. If the file hasn’t been modified, then the hash will remain the same. However, if it has been changed, then it will generate a new hash.

This also means that users will need to know the new hash to be able to access it. A lot of topics cover pointing directly to it, but there isn’t a lot around accessing updated data.

That is what IPNS is for.

IPNS Hash

This is the hash that’s associated with your node. Each node instance has its own IPNS hash, which is also its peer identity hash. This is generated upon initialization, when first setting up the node. What is this typically used for, you might ask? It’s so that you can actually have a namespace on the IPFS network that allows you to access published content and always have it pointed to the latest version of the files.

IPNS differs from IPFS in that you may have a new hash generate if the content’s data is changed, which means the new hash would need to be provided for access. This is how you avoid grabbing stale content. Whereas, this is a single “static” hash that can be updated to point to the content published.

In technical terms, an IPNS Hash can be described as both a static address, since it is configured to always point to a specific node, and a unique address, allocated to always point to content published to it.

“Permanently Stored” and “Decentralized”

Here’s the thing. You’re going to find a lot of information about how to host your data on IPFS “forever” and for “free”. Don’t fall into the trap of hype and buzzwords.

Your IPFS can be considered truly decentralized only if multiple sources have it pinned and stored on their IPFS nodes. Otherwise, the only “data” that is decentralized is the IPFS hash(es) of the files added to the network, and not the actual content itself.

The metadata, which is built up of hashes, that constitutes the network, is distributed across the available nodes and thus, decentralized.

This metadata is used to look up which node contains the content and serves it to the client that requests it. It can be similar to blockchain, in which everyone who joins the network has a part of the “chain” (the metadata in our case) and shares the same data to other peers also connected to the network.

What is PINNED/PINNING?

The IPFS protocol has a concept of “garbage collection” which removes infrequently accessed content from the network. This affects the way content is stored on the IPFS network, because after all: resources are finite.

When content is “pinned”, it lets the IPFS instance know that it shouldn’t be removed from its storage when a garbage collection runs. It is important to know that this only affects the node(s) that the action is performed on.

This means that any node(s) that didn’t explicitly pin content would have that content purged from its storage during the next garbage collection.

You can run into a state where, if there are no nodes that have the content pinned, the content is no longer available.

A Word of Caution

IPFS is configured by default to scan the local network to discover additional nodes/gateways. Depending on the hosting provider of your choice, they may believe that your servers were infected and maliciously sending random packets on the network or scanning for ports.

This happened in my case, and almost led to my provider to taking down the server.

You should always be careful when setting up your own instance with some saner defaults so that it won’t run into this issue. IPFS comes with a handful of profiles that can be used for your specific use-case.

For what it’s worth, local-discovery is enabled by default if you run the ipfs init command without any additional flags. This isn’t something most people mention in tutorials and it’s a gotcha I ran into while looking into configuration options.

Conclusion

There isn’t anything wrong with IPFS, or using it. It’s just that there’s quite a bit of misleading information out there on what it does and how it works.

This makes it hard to understand, at a cursory glance. Hopefully this somewhat high-level explanation provides you with some details to make either an informed decision or to gather more knowledge about the workings of IPFS.

As always, it’s recommended to take a look at the IPFS documentation since the protocol is still considered alpha in active development.

We're building an AI-powered Product Operations Cloud, leveraging AI in almost every aspect of the software delivery lifecycle. Want to test drive it with us? Join the ProdOps party at ProdOps.ai.