We focus on how data is stored in Bitcoin’s database and the structure of transactions to update this database.
Disclaimer: We avoid the nitty gritty of the exact implementation as it will confuse a new reader.
Bitcoin transaction
Figure 1 highlights a bitcoin transaction. It can have one or more inputs, and one or more outputs:
Transaction output. Creates a new row in the database. It is associated with a set of coins and a script.
Transaction input. Identifies a row in the database and provides evidence that should satisfy the spending script. If valid, the row in the database is deleted.
The UTXO model stands for Unspent Transaction Output and the database is simply a list of unspent transaction outputs.
Bitcoin’s database
Figure 2 highlights columns and rows of the Bitcoin database:
ID. A unique identifier to a transaction output that created this entry in the database. It is the concatenation of the transaction hash and output number.
Script. A forth-like script that must be satisfied for the coins to be spent.
Coins. Number of bitcoins (BTC) associated with this database entry.
At the time of this article, there are approximately ~79m database rows, where each row is associated with a set of coins.
Updating the database
Figure 3 presents an example of how a Bitcoin transaction is processed to check its validity and update the database. Key points:
Digital signatures. All meaningful scripts require a digital signature from the user.
Outputs can have different scripts. Coins can be sent to one or more parties within the same transaction.
Coin can be split amongst the outputs. Coins can be split up and combined in a transaction.
Sender may require a “change” output. The sum of coins from all inputs must be spent in this transaction. If the input coins exceeds what the signer wants to spend, then they need to return the remaining coins to a script under their control.
Transaction fee is implicit. Our example rewards the block producer with 0.2 BTC and it is left as an exercise for the reader to work out how.
UTXO management. Unlike an account-system, a user may have one or more unspent transaction outputs (a new UTXO is created for every payment received).
Bitcoin transactions can be treated as a scripting environment. There is no stateful account system or shared state. A transaction simply consumes scripts and produces new scripts. A user may have multiple scripts under their control and scripts are associated with coins in the database.
The UTXO model has some quirks:
Bitcoin address standards. A bitcoin address is the script’s hash. Introducing a new standard script for users on the network results in a new style of Bitcoin address. This has led to wallet recovery issues as users are unaware about the subtle differences in address types and what their wallet software supports.
Coin-selection algorithms. The user will have one or more sets of coins. It is their software which decides which coins to spend when sending funds to another user. If care is not taken, the strategy can leak the user’s privacy on the network. You can find out more here.
Reconcile UTXO trade-off. A UTXO can only be spent if the network fee does not exceed its value. If the user is not careful, they can run into the same issue as Coinbase who had 265 bitcoins spread across 1.5m UTXOs, but it was no longer economical to spend. This brings up the question whether it is the fault of Coinbase (technical incompetence) or a fault with the UTXO model?
Parallel execution. A UTXO does not execute on shared state and it is possible to execute all inputs of a transaction independent of each other.
What does a script look like in practice?
Bitcoin script is a forth-like scripting language. A small memory stack is built and executed upon as the script progresses. The final item on the stack should be true for the final execution to be considered valid.
Figure 4 provides an example of the “pay to pubkey hash” script. It requires the owner of a public key to provide a digital signature to claim their coins. The full script has two components:
Redeem script (input). Evidence that will satisfy the UTXO script.
Spending script (previous output). The script associated with coins in the database.
The input script is run before the script stored in the unspent transaction output. In the example script, the input fills the stack with a signature and the public key. The output script checks the public key in the stack matches the owner’s public key. If so, it will verify the owner’s signature.
This brings up some questions that we leave for homework:
Why does the input need to have a copy of the public key if the hash is already stored in the UTXO?
Why does the UTXO store a hash of the public key and not just the public key?
Is it possible to reveal the entire spending condition script in the input of a transaction as well?
I hope you enjoyed a basic introduction into the UTXO model. We will not attempt to implement a Bitcoin script during the course as there is really no good tooling for doing so.
There is a fun joke that a developer only has basic tools in Bitcoin script;
Verify signature
Verify pre-image of a hash
If statement
Perform action before or after time T
There are no loops and the scripting language is highly restrictive. The philosophy in the community is to move computation off-chain. But still, even with these basic tools, you can implement something like the Lightning Network.
I recommend reading about the segwit, taproot and potential covenant upgrades.