User:Justmoon/IMTUO
Prior Work
This proposal is an extension of Greg Maxwell's proposal of storing a merkle tree root of unspent transactions in a coinbase.
The name is based on Alberto Torres' write-up called MTUT, more precisely Elden Tyrell's comment that MTUO is more correct.
Overview
We propose a protocol and algorithms for a new type of Bitcoin node which uses a so-called Incremental Merkle Tree of Unspent Outputs (IMTUO).
IMTUO nodes do not store old transactions at all (neither inputs nor outputs nor hashes), yet they can fully verify transactions, create new blocks including updating the IMTUO root and bootstrap each other. IMTUO nodes can function completely independently of full nodes, however we expect full nodes to survive for archival purposes and to provides indices.
The proposal aims to solve two main problems:
- Reduce Bitcoin's long-term storage requirements
- Remove the need for a MAX_BLOCK_SIZE through better intrinsic incentives
The proposal does not require any change to the block chain except for the addition of the IMTUO root hash as part of the coinbase. For more information on backwards compatibility and interoperability see the section Deployment.
Verification Information
The basic difference to the classic Bitcoin network is that in an IMTUO Bitcoin network, the burden is on the publisher/relayer of a transaction to prove its validity - rather than on the verifier to look it up.
In order to do that, the publisher creates a piece of data for each output - the so-called verification information or vinfo.
| Field Size | Description | Data type | Comments | 
|---|---|---|---|
| 1 | merkle branch length | uint8_t | The number of entries in the merkle branch | 
| 20+ | merkle branch | char[20] | Merkle hashes that form the heads of the side branches between the leaf and root nodes | 
| ? | merkle branch direction | uchar[] | Bitfield indicating where the side branches attach (0 - prefix; 1 - suffix); length is ceil(merkle branch length / 8) | 
| 1+ | pubkey script length | var_int | Length of the scriptPubKey | 
| ? | pubkey script | uchar[] | The script from the output that is being spent (the length is implied by the header length) | 
| 8 | value | uint64_t | Output value, i.e. amount in Bitcoins | 
If the transaction being spent is contained in a recent block, the vinfo is not needed and can be provided as a 0x00 byte. This is called a "null vinfo".
If the output is a standard P2SH output, the script length is given as 0 and the script deduced from the input.
The reason we include the script and the value is so that we can verify that those are correct, purely based on confirming that this hash is indeed an unspent output. The reason we include the outpoint structure is to ensure the resulting hash is globally unique and we use the outpoint for this because the outpoint is already available to us in the spending input.
Transaction Transmission
Transactions are always sent with their vinfo included.
The vinfo can expire if the MTUO root it connects to moves out of the recent block list or - for null vinfos - if the transation it spends moves out of the recent blocks list. In that case the creator of the transaction or anyone with a vested interest in getting it included can choose to rebroadcast the transaction with an updated vinfo.
Block Transmission
Currently, Bitcoin sends all transactions again when a block is transmitted, even though the receiving node may know most of them already. We propose that blocks be transmitted including a list of transaction hashes and missing transactions be requested as needed.
MTUO
The merkle tree used for this proposal consists of unspent outputs, the leafs of this tree are the hashes as per the following format:
RIPEMD160( SHA256( [outpoint] [pubkey script] [value] ) )
The specific data to be hashed is:
| Field Size | Description | Data type | Comments | 
|---|---|---|---|
| 36 | outpoint | char[36] | The OutPoint structure pointing to this output in the block chain | 
| ? | pubkey script | uchar[] | The script from the output that is being spent (the length is implied by the header length) | 
| 8 | value | uint64_t | Output value, i.e. amount in Bitcoins | 
Client Storage Requirements
A full client stores:
- Block headers
- Coinbases
- Merkle branch connecting coinbase tx to merkle root
- Recent blocks (full blocks including vinfo)
A lightweight client stores:
- Recent block headers
- Recent MTUO roots
There should be a network-wide lower bound for what is considered "recent" that all nodes should agree on. This lower bound is called RECENT_BLOCK_INTERVAL and is chosen by the following criteria:
- Chance of a reorganization beyond that point is negligible
- Leaves enough blocks to initialize the IMTUO insertion spots cache (see section Incremental MTUO - Insertion)
- Leaves enough blocks to trust the oldest MTUO root in the history without being able to verify it
- Large enough that transactions' vinfos don't expire too quickly
- Small enough that clients don't have to store too much data
For the purposes of this proposal we will assume that RECENT_BLOCK_INTERVAL is chosen to be 1024.
Transaction Verification
To verify an input, we follow this algorithm:
- Calculate the hash of the vinfo header.
- Using the result from 1. and the merkle branch provided in the vinfo, calculate the merkle root.
- Reject if merkle root does not match any MTUO root in the recent blocks history
- If it matched a MTUO root other than the current one, translate branch forward and test against latest MTUO root
- (We now know that the previous output exists and is unspent.)
- Verify script
Other checks such as confirming the transaction hash, DoS checks, etc. work the same on a IMTUO node as they do on a classic node.
Protocol Changes
IMTUO entails adding two commands to the Bitcoin protocol:
- txvinfo: Requests a transaction including its verification information; also works for transactions in blocks
- blocktxlist: Requests a block including the list of transaction hashes
// TODO: Specify exactly the new message format
Incremental MTUO
Rather than being regenerated by a node possessing the full blockchain, we propose the tree be incrementally updated by any miner knowing only the new transactions he wishes to include and their corresponding verification information.
Consider the following merkle tree:
A    B  C    D  E    F  G    H
 \  /    \  /    \  /    \  /
  AB      CD      EF      GH
    \    /          \    /
     ABCD            EFGH
         \         /
           ABCDEFGH
Deletion
We want to remove the hash C from the merkle tree and calculate the new root using only the merkle branch verification information.
First consider the verification information for C:
C ---> CD ---> ABCD ---> ABCDEFGH / / / D AB EFGH
It's easy to see that with C removed this can be recalculated as:
-> D ---> ABD ---> ABDEFGH / / / D AB EFGH
We will use the following notation to illustrate this removal:
C ---> CD ---> ABCD ---> ABCDEFGH              C    -> 0       
   /       /         /                         CD   -> D       
  D      AB       EFGH                         ABCD -> ABD     
                                                               
       D       ABD       ABDEFGH                               
The table on the right is called the translation table. As we make more changes to the merkle tree we have to replace any occurrence of one of the hashes on the left hand side of the table with the corresponding value on the right hand side.
So if we now wanted to remove B we would do similarly:
B ---> AB ---> ABCD ---> ABCDEFGH              B    -> 0
   /       /         /                         AB   -> A
  A     (CD)      EFGH                         ABCD -> AD
         D                                 
                                           
       A       AD        ACHEFGH
Note that we had to replace the side branch CD with D as per our translation table.
Here is another example, now we also remove E:
E ---> EF ---> EFGH ---> ABCDEFGH              E    -> 0
   /       /         /                         EF   -> F
  F      GH      (ABCD)                        EFGH -> FGH
                  AD                       
                                           
       F       FGH       ADFGH  
Insertion
To insert, we can recycle one of the vinfo packets and simply reconnect the new hash at a location where another hash was previously removed.
For example, to add a new hash I, we could use the branch information of the removed hash B and simply apply it to I instead:
I ---> AI ---> AICD ---> AICDEFGH              B    -> I
   /       /         /                         AB   -> AI
  A     (CD)     (EFGH)                        ABCD -> AID
         D        FGH                      
                                           
       AI      AID       AIDFGH
We can add another transaction J the same way, be reusing the branch information for the previously removed hash E:
J ---> JF ---> JFGH ---> ABCDJFGH              E    -> J
   /       /         /                         EF   -> JF
  F      GH      (ABCD)                        EFGH -> JFGH
                  AID
       JF      JFGH      AIDJFGH
If we have replaced all the removed hashes with new ones and we still need to add more, we can create new branches in place of any existing hash. For example to add yet another transaction to the branch currently occupied by J, we once again employ E's original branch information:
K ---> EK ---> EKF ---> EKFGH ---> ABCDEFKGH   E    -> JK
   /       /        /          /               EF   -> JKF
 (E)      F       GH       (ABCD)              EFGH -> JKFGH
  J                         AID
       JK      JKF      JKFGH      AIDJKFGH
Based on our translation table, we know that E is now J and ABCD is now AID which leads us to calculate the correct final merkle root AIDJKFGH.
Balancing the tree
Each client keeps a cache of the lowest-degree insertion and removal points it has seen. For any new insertion it selects the oldest, lowest degree point available. This way the tree will balance itself over time.
Clients will also reject insertion points that are too far above the best known insertion point. This means that even a malicious attacker will never be able to arbitrarily unbalance the tree. As soon as the tree starts to be unbalanced, block containing badly chosen insertion points will start to be rejected.
Honest miner will never be affected by this because they learn their insertion points from the same data (recent blocks) of which the limits are calculated, so they will always have numerous insertion points available below the limit.
If knowledge of a good insertion point is somehow lost it only needs to be references once by any miner and it will immediately be picked up by the miners following and any short branch in the tree will tend to be immediately filled up. However given that in any 1024 block interval there are likely to be at least a few blocks made by honest miners it is unlikely that good insertion points will ever be forgotten to begin with.
Wallets
One of the main differences between a IMTUO node and a classic node is that a classic node has access to the full block chain archive and can rescan it to find outputs that match a specific public key for example.
An IMTUO node can only scan the recent blocks. Also, if it is offline for more than 1024 blocks it will be unable to update its wallet on its own.
However, indexing and storing large amounts of data for querying is a well understood and solved problem. It is easy to imagine a DHT that can be queried by public key hash or script hash to retrieve transaction and that is kept up-to-date by Bitcoin clients. Merely storing and indexing the information for occasional querying is not a difficult problem. This proposal however addresses the more difficult issue that verifying nodes need a full list of unspent outputs.
Resource Usage
IMTUO is a storage/network tradeoff. We save most of the storage in exchange for increased network usage.
Changed Incentives
There is currently no way within the Bitcoin protocol to make a rogue miner liable for the costs he incurs for other miners by publishing a very large block. In order to mitigate this problem Bitcoin currently uses a hard-coded maximum block size limit MAX_BLOCK_SIZE.
In an IMTUO network, the storage costs are negligible, since blocks are only stored temporarily. Instead, the network transmission costs are increased. Any miner is interested in getting its newly created block spread across the network so that other miners will build on it. Large blocks will spread more slowly and are therefore disadvantaged. IMTUO adds verification information and therefore increases this effect
To create a block a miner will always have a certain cost to create the proof-of-work. The risk that a too large block will not ultimately make it into the chain is a natural incentive for miners to stay within reasonable limits.
If desired, this can be exacerbated further by adding protocol rules that cause blocks not to be relayed and/or built upon unless a certain spread has been reached. This would but the burden on the creator to spread his (unusually large) block and/or pay other nodes to relay it.
Deployment
IMTUO is a separate network from Bitcoin. Gateways have to run both a regular Bitcoin full node and store a full indexed copy of the IMTUO merkle tree (a 10-50% overhead over running a normal full node.)
// TODO: How to handle the IMTUO root while IMTUO have a minority of mining power.