P2SH²

From Bitcoin Wiki
Jump to: navigation, search

P2SH² (also written P2SH^2) is a proposal to make it difficult for users to store arbitrary data in parts of the block chain that cannot be safely pruned.

The problem: storing data in hashes

Bitcoin allows users (wallets) to receive bitcoins to a public key and then spend those bitcoins by providing a cryptographic signature made by the same private key that generated the public key. However, for several reasons, almost all wallets instead receive bitcoins to a series of 20 or 32 bytes called a hash that uniquely identifies all the public keys and other information used to secure the payment. A Bitcoin address is just this hash wrapped up with some metadata in a format that's easy to copy and paste.

The code that generates the unique identifiers (hashes) can only work in one direction: you can turn public keys and other information into a hash, but you can't turn a hash back into public keys or other information. This is why it's called a hash: the original data is chopped up (hashed) into tiny pieces, mixed up, and squeezed together (compressed) to produce something that doesn't look like the original but which still uniquely identifies it.

To spend bitcoins secured by a hash, you simply need to provide a copy of the public keys and other information used to create that hash and then provide the cryptographic signatures or other necessary spending information. This is what the overwhelming majority of Bitcoin transactions have done so far.

However, when you pay Bitcoins to an address (hash), full nodes and miners don't perform any checks to ensure you or someone else has a copy of the public keys or other data used to generate the hash. This lack of validation has been abused in the past to store data in the block chain by placing it where we'd normally expect a hash.

For example, in block 138,725 (30 July 2011),[1] cryptographer Dan Kaminsky paid what looked like 78 twenty-byte hashes a total of 0.78 BTC to render some ASCII art in the block chain[2] (see the end of this section for a partial reproduction of the art). Given the technical properties of the hash function used here, we can reliably guess that Kaminsky didn't generate these pseudo-hashes using public keys. Instead, he just arranged the bytes of the pseudo-hashes the way he wanted and paid bitcoins to them knowing that nobody knew the public key (or private key) necessary to spend those bitcoins.

Most Bitcoin users don't consider it a problem if people like Kaminsky want to destroy their bitcoins. But what is a problem is that the Bitcoin protocol has no way to determine that the bitcoins spent to pseudo-hashes are impossible to spend, so every full node treats these payments as valid and spendable. Because they seem spendable, nodes can't prune these payments to save disk space, which raises the cost of operating a full node to help keep Bitcoin decentralized. Again, because they seem spendable, miners (in particular) need to store information about these payments in a high-performance database so that they can quickly validate any blocks spending those payments, which raises the fixed costs of mining and creates economies of scale that favor larger miners over smaller miners.

For this reason, many concerned Bitcoin users try to discourage this type of hash abuse and instead recommend that users who want to store data in the block chain use the OP_RETURN (nulldata) technique to create a provably-unspendable transaction output that modern full nodes know how to prune and which doesn't need to be stored in a database.

However, for small amounts of data the cost of storing data in a hash field is about the same as storing data in an OP_RETURN output. Worse, it's easier for most current Bitcoin wallets to pay pseudo-hashes than OP_RETURN outputs, and for some clients it's easier to retrieve data from the block chain if it's stored in pseudo-hashes than in an OP_RETURN output. So it would still be nice if the Bitcoin protocol could otherwise discourage people from storing data in hash fields.

Below is a partial reproduction of the data Kaminsky stored in the block chain using psudeo-hashes:[3]


   ASCII BERNANKE   
:'::.:::::.:::.::.: 
: :.: ' ' ' ' : :': 
:.:     _.__    '.: 
:   _,^"   "^x,   : 
'  x7'        `4,   
 XX7            4XX 
 XX              XX 
 Xl ,xxx,   ,xxx,XX 
( ' _,+o, | ,o+,"   
 4   "-^' X "^-'" 7 
 l,     ( ))     ,X 
 :Xx,_ ,xXXXxx,_,XX 
  4XXiX'-___-`XXXX' 
   4XXi,_   _iXX7'  
  , `4XXXXXXXXX^ _, 
  Xx,  ""^^^XX7,xX  
W,"4WWx,_ _,XxWWX7' 
Xwi, "4WW7""4WW7',W 
TXXWw, ^7 Xk 47 ,WH 
:TXXXWw,_ "), ,wWT: 
::TTXXWWW lXl WWT:  

The P2SH² solution

In April 2013, Bitcoin developer Gregory Maxwell proposed a solution[4] as a thought experiment,[5] giving it the name P2SH^2. The idea is simple: when someone gives you a Bitcoin address (hash), you hash it again and then add that hash (rather than the address hash) to the transaction which gets included in the block chain. The second ("outside") hash uniquely references the first ("inside") hash, which itself uniquely references the public key and other information that secure the payment.

For example, if someone were to submit the following pseudo-hash as the inside hash,

   ASCII BERNANKE

The outside hash that would be added to the block chain would be,

r�I�i��E�6

Which, in case it doesn't display for you, is random gibberish that ruins any sophisticated ASCII art and also most other schemes to add arbitrary non-hash data to the block chain in fields meant for address-type hashes.

In Maxwell's proposal, nodes would only relay an unconfirmed transaction if it contained both the outer hash (as part of the transaction) and the inner hash (as separate metadata), proving that the outer hash intended for the block chain was not selected to hold data (or at least, not much data, as it is possible to add some data to the outer hash by brute-force manipulating the inner hash).

Miner circumvention and its mitigations

Although relay nodes bear the uncompensated extra costs of being permanently unable to prune payments to pseudo-hashes from their disk drives, and so are likely to generally enforce a P2SH^2-style policy, a miner can be paid through transaction fees to circumvent the P2SH^2 policy. It would generally be worth it for the miner to accept competitive fees for the circumvention because adding a single pseudo-hash payment to the block chain isn't going to cause much harm. Instead, it's the aggregate damage of many miners each adding many pseudo-hash payments to the block chain over a long period of time that would slow down the payment validation database to which miners need fast access in order to effectively remain decentralized.

Maxwell speculated in his proposal that if miners were often seen circumventing P2SH^2, then nodes that received blocks containing many transactions for which the inner hash was unknown could delay acceptance of those blocks to give complaint blocks created by other miners an advantage.

This could be controversial as it could lead to temporary consensus divergence between different nodes, particularly high-uptime nodes and nodes that just came back online (or which use -blocksonly mode or other relay-limiting features).

Updated proposal

All Bitcoin addresses today (and this is expected to continue) already contain an inner and outer hash, although this is mostly a legacy of an early design choice made by Satoshi Nakamoto. This creates an opportunity[6] to introduce a slight variant of the P2SH^2 proposal as an unenforced add-on to the existing system and then transition to the enforced system later after everyone has adopted the add-on.

  1. Introduce a new address type that contains only the inner hash
  2. Have wallets accepting the new address type generate the outer hash and pay that
  3. Later, allow nodes to relay the inner hash along with transactions that pay the outer hash (with the inner hash not meant to be included in the block chain)
  4. Much later, when almost all relayed transactions contain both the inner and outer hashes, begin to discourage relay of transactions without the inner hash
  5. Even later, maybe discourage blocks from containing too many transactions for which the inner hash was not previously seen

Current status

Although the original P2SH^2 proposal dates back to early 2013, it has so far only been a discussion topic. It has not yet been proposed as a BIP.

Related information

  • Applied cryptographer Peter Todd has discussed P2SH^2 in the context of whether or not proof-of-publication via the Bitcoin block chain can effectively be made expensive.[7]
  • Bitcoin developer Luke Dashjr suggested that P2SH^2-like logic could be added to the new bech32 address format.[6] Note: Dashjr's proposal was made after bech32's specification, BIP173, moved from draft to proposed status, and after several wallets had already implemented the draft proposal, which may have been too late for such a significant change.
  • Gregory Maxwell has described how P2SH^2 could be part of an improved Namecoin design, and how he also has a scheme that uses "pairing cryptography that allows you to have a value which is provably a hash."[8][9]

References

  1. Answer: In which Block was Len Sassaman memorialised?, Chris Moore, Bitcoin StackExchange, 7 April 2012, retrieved 2018-01-05
  2. A Tribute to Len "rabbi" Sassama, BitcoinTalk.org discussion thread, retrieved 2018-01-05
  3. Text of Dan Kaminsky's "A Tribute to Len "rabbi" Sassama", Pastebin, retrieved 2018-01-05
  4. To prevent arbitrary data storage in txouts---The Ultimate Solution, Bitcoin-Development mailing list (via Archive.org), 2013-04-10
  5. Re: Bech32 and P2SH², Gregory Maxwell, bitcoin-dev mailing list, 6 January 2018, retrieved 2017-01-06h
  6. 6.0 6.1 Bech32 and P2SH², Luke Dashjr, Bitcoin-Dev mailing list, 2018-01-04, retrieved 2018-01-05
  7. Setting the record straight on Proof-of-Publication, Peter Todd, PeterTodd.org, 12 December 2014, retrieved 2017-01-05
  8. Namecoin that sucks less, Gregory Maxwell, Bitcoin Wiki, 2 March 2014, retrieved 2017-01-05
  9. #bitcoin-dev IRC log 7 January 2014, Gregory Maxwell (gmaxwell), IRC