User:Gmaxwell/block network coding

From Bitcoin Wiki
Jump to: navigation, search

We'd like to be able to efficiently transmit blocks without sending the data we've already sent a node, but we can't be sure which of that data it still remembers. We could use an interactive protocol where first block where only presumed-missing transactions are sent along with a list, but then we take on additional latency from interaction to obtain the missing data. As a bonus it would be nice if our efficient low latency protocol supported parallel data fetching.

First assume we have a fast function that takes an array of bytes and permutes it according to a key where different key values, at a minimum, make the first 32 bits depend on different parts of the word. It must also have a fast inverse. E.g. AES in backwards-CBC would meet this defintion, but so would a function that swapped bits in the first 32 bits with random bits in the rest of the the word. We'll call this permute(data,key).

When we receive a block we make the list of tx hashes in the block and append their sizes. We then take the hash of the block and run permute() on the hashes using the block hash as the key.

Then we take the first 32 bits of the permuted hashes in block order and code them with a 32-bit error correcting code such that there are now (virtually) at least 2^32-1 32 bit words such that n_tx of them are required to recover the complete set of transaction hashes. We use a random seed to select a fraction of n_tx these words at random, with the peer controlling the fraction. If our peer has no other peers it will set its fraction to 1. We send the block header, seed and the fraction to our peer.

We then apply the same error correcting code to the rest of the hashes in 32 bit planes such the there is a separate error correcting code for each remaining 32 bits of the tx hashes which spans the whole block-hash-list. We select a random fraction, using the prior seed, of the resulting words minus all the systematic words (the words equal to the original message), where the fraction is equal to a conservative estimate of what fraction of the transaction we believe the peer already knows (based on what we recently sent it and what it recently INVed to us), times a peer controlled multiplier. We send the result to the peer.

We then take the entire block and code the entire thing under a single error correcting code. We select a fraction of the words, again excluding all the systematic words, equal to our estimate of how much of the block data the peer already has (similar to the above tx count estimate but not identical, both because tx are not constant size and because a peer can be expected to keep more txids than data), times a correction factor that depends on the distribution of transaction sizes, times a peer controlled multiplier. We send the result to the peer.

If all goes well, the peer can immediately decode the list of the first 32 bits of the permuted transaction hashes.

Given the first 32 bits of data it can match up the subsequent hashes for all the txids that it knows. If there is a collision it just ignores that codeword. If it has received a few more words than strictly needed it can also go search for undetected collisions. The permutations make it infeasible for a malicious party to produce txid that will collide in the first 32 bits. It fills in the rest of the known data, and that allows it to recover the rest of the txid list using only the number of words it didn't already know.

Once the txhashes and sizes are reconstructed they can be unpermuted and the known parts of the block can be dropped into place. Now the node only needs the number of missing words from the block to complete the reconstruction. Because a single missing transaction can cause some additional missing words due to alignment a little extra data is required (though it might work out better to just pad all transactions up to a multiple of 32 bits for the purpose of this coding, or to code the uneven ends separately in another higher rate code). Presumably the peer(s) have sent enough data and now the whole block can be reconstructed.

If there was missing data at any step the node it asks peers for more data and they send more random words. It also increases the fraction it asks for in the future. Alas, low latency is lost in this case.

This works for parallel fetching for multiple nodes. Additionally, nodes which haven't yet received enough data to fully reconstruct a block can immediately begin relaying block fragments though if multiple peers do send early block data to you, you'll likely get duplicate data. So it would be better if each node asked only one peer to send early data (though certain locally decodable codes can be recoded with partial data, none are super efficient for smaller block sizes and bitcoin blocks may be too small for them to be helpful). This could be extended further by splitting the early data into several colors and each node would have only a single parent for each color.

The end result is that a node can receive close to the size of the block data it doesn't know (plus 32 bits per transaction) from one or more uncoordinated peers and reconstruct the whole block without additional round trips with high probability.

What error correcting code to use? The obvious thing to use is an RS code, since they're optimal... but RS codes can be moderately slow: E.g. even using an 8 bit field they might only achieve speeds of 100mbit/sec for encoding. Plus RS codes have their maximum rate limited by the word size, and for this we really want an infinite rate code so that data from different peers has a negligible probability of being redundant. One solution is raptor codes (the concatenation of a LT code and a lower level code) but there are some patent complications. Another good solution would be LDPC block codes which are almost as efficient and very fast. The small inefficiency of a few percent from using a non-optimal scheme is easily offset by the savings from not retransmitting already transmitted data.

To further reduce overhead, in all places a txid would be used a strong cryptographic permutation of the transaction could be used instead. This would make data spent transmitting the txid also contribute to transmitting the transaction.

another try at expressing a variation of the idea assuming set reconciliation for IDs

First notion, a split transaction:

Take the transaction apply a keyed cryptographic permutation to it, keyed with the prior block hash. (E.g. AES in a large block mode; so changing any bit of the txn changes the whole output).

Prefix this data with an efficiently encoded length.

Take the first 63 bits. This is the transaction "ID". It is functionally similar to a normal transaction ID (e.g. collision resistant) but has the advantage that sending both it and the rest of the transaction creates no redundancy.

The rest of the bits are the "payload".

Lay out all payload in ID-numerical order.

Encode the non-deterministic part of the order of the transactions, given their content (inputs), efficiently, append that to the payload. (E.g. take the set N of transactions whos inputs aren't in the block itself, using a range coder code [0..N), remove the selected txn from the set, add any dependents which can now be included, update N and code a value in the range [0..N), and repeat.)


Take the set of IDs as the roots of a polynomial over GF(2^64), interpolate values at positions 2^63+ which are never zero because they're not roots, because the input is too small.

Transmit, in round robbin fashion to peers the header, payload size, and count of transactions,

Transmit, in round robbin fashion to peers the ID interpolations.

Transmit, in round robbin fashion to peers a rateless (or very high redundancy) FEC computed over the payload data.

Peers can recover the IDs when they've received data equal to the size of the difference in their sets plus one. Polynomial set reconciliation is basically optimal (you need only one more term than the symmetric difference size). The rational interpolation isn't super fast, but its on a small amount of data. (Apparently an implementation in OCAML is fast enough for production use in the SKS PGP key servers over the whole list of PGP IDs)

Given the IDs, the receiver can layout the payload and substitute in all the data they know.

They can now recover all the payload after receiving as many codewords as they have missing (NOT THE DIFFERENCE), exactly as many if a RS code is used.. somewhat more for less efficient infinite rate codes. (the data from txn they don't know about are just erasures)

Before a peer has reconstructed anything they can still forward on what they received. If you color your peering links you can ensure you never get sent duplicate codewords. Everything runs in parallel.

Once a peer has recovered the whole block they can generate more codewords beyond what they received, to help further neighbors finish their recovery.

Result is that in the time it takes a single message to proceed depth first from the source to the horizon of the network, the entire network can have recovered the block. (with some assumptions about cross section bandwidth).

Bandwidth cost is primarily the missing members, no huge penalty for knowing about 'too many' transactions, or need to gratuitously force mempool consistency.

Some implementation care is needed when getting data from the same block from multiple peers. Unless blocks commit to the codewords, you could receive bad data. If you cannot recover, request more data from a single peer. Recover, and ban the peer(s) that gave you bad data. Some codes, like RS codes can recover from unknown errors but they require considerably more data to to so.