Permissioned Blockchain frameworks have one problem in common. Data Handling is the key factor in having a successful blockchain solution running and scaling. Therefore it is important to plan upfront which data you want to save on and off the ledger. In the Blockchain world, we distinguish here between On-Chain and Off-Chain data. Note here that we talk about data handling. Don't confuse this with the Off-Chain transactions.
In this article, I will discuss with you 4 points that have to be clarified within the requirements analysis of a particular use case before you start implementing something which leads to a dead end. But first, let us understand the difference between On-Chain and Off-Chain data.
Under On-Chain Data we understand every information which is directly saved into the ledger and shared across the blockchain network. Therefore this is the data that will be transparent and immutable in the system and used as the output in each transaction. The Off-Chain database contains data that belongs to ledger data as additional attributes but will not be saved in the blockchain or shared across. For example, you save the serial-number of an airplane spare part on-chain but the price agreed with the supplier will be saved in an off-chain database and is only visible to you and your supplier.
How you decide which data should be saved on the ledger and which don't will be handled in the following points. These points can be adapted to each blockchain use case to support the data modeling part.
1. Size always matters
Does size really matter? Yes, it does. This obvious point shouldn't surprise you if you already worked with Blockchain Solutions.
Regardless of Blockchain Technology (Hyperledger, Etherum, Corda, etc.) the decision of adequate block size is key for reaching a good performance in the network. If you put in all your data into a blockchain (like you do in a classical database solution) it will be getting slow, the consensus between the participants will take longer, and the overall costs of a transaction increases. In short, that will break your neck.
So, I suggest you reduce the amount of information in a transaction. You need to stick to the goal of your use case and highlight which information is really needed to be in the blockchain. If you start hashing documents, pictures, or any other big files this will block the scalability of your solution. Go with minimalized and simple JSON objects On-Chain and bigger files off the chain.
2. Do you really want immutability?
One of the main characteristics of blockchain is to provide the immutability of your data. In short, all the data within the blockchain can't be changed by any party and the history of each asset will be available as long as the blockchain is running.
Based on the use-case you try to solve, focus on the information you need to have immutable On-chain. That also means that you will have information where the immutability feature is not necessary. Kick that information out of the ledger to your Off-chain database and increase the performance of your solution. Don't lose any processing power on data that doesn't have to be immutable.
3. Sharing is caring
Remember, all information in the blockchain will be shared across each participant. Of course, there are some privacy concepts for different frameworks such as Private Data Collections, but the base feature is the same for all. At this point, you have to figure out which information should be shared with others.
It can happen that you and your competitors are participating in the same blockchain network. Therefore you should move attributes such as product prices, supplier information, and other internal information to your own managed Off-Chain database. You don't want to have data like that at the competitor's ledger, do you?
4. Don't underestimate regulations
The last point I want to share with you is that you have to think about the data protection laws where your solution is running and will be run when you scale. In Europe, we have the General Data Protection Regulation (GDPR) law. This allows the user to have to right to request a deletion of all his data at any time.
You can imagine how painful this is for a technology such as Blockchain where transparency and provenance are one of the main benefits. When this law came out the information handling within a blockchain changed dramatically. No personal data can be included anymore in the ledger and this also means for data where you can conclude to personal data. For example, an employee number shouldn't be part of the ledger. It concludes to personal information even if you deleted it.
So, how can you use the blockchain then? Any idea? You need to save the personal data Off-Chain but this is not enough because the connection of history of transactions is still in the system even if you delete the personal data from the off-chain data. If you have a solution let me know that in the commentary box or wait for my next articles. This topic deserves its own entry.
On-Chain and Off-Chain Data are always one of the first discussion you have with the client. With these four points, you should be able to split the information between these two persistent layers. Try to keep the information sharing within the blockchain as minimal as possible and outsource as much as you can in an Off-Chain database.