BIP 0050

From Bitcoin Wiki
Revision as of 16:52, 20 March 2013 by Gavinandresen (talk | contribs)
Jump to: navigation, search

This page describes a BIP (Bitcoin Improvement Proposal).
Please see BIP 2 for more information about BIPs and creating them. Please do not just create a wiki page.

  BIP: 50
  Title: March 2013 Chain Fork Post-Mortem
  Author: Gavin Andresen <gavinandresen@gmail.com>
  Status: Draft
  Type: Informational
  Created: 20-03-2013

What went wrong

A block that had a larger number of total transaction inputs than previously seen was mined and broadcast. Bitcoin 0.8 nodes were able to handle this, but some Bitcoin 0.7 nodes rejected it, causing an unexpected hard fork of the chain. The 0.7 incompatible chain at that point had around 60% of the hash power ensuring the split did not automatically resolve.

In order to restore a canonical chain as soon as possible, BTCGuild and Slush downgraded their Bitcoin 0.8 nodes to 0.7 so their pools would also reject the larger block. This placed majority hashpower on the chain without the larger block.

During this time there was at least one large double spend. However, it was done by someone experimenting to see if it was possible and was not intended to be malicious.

What went right

  • The split was detected very quickly.
  • The right people were online and available in IRC or could be raised via Skype.
  • Marek Palatinus and Michael Marsee quickly downgraded their nodes to restore a 0.7 chain as canonical, despite the fact that this caused them to sacrifice significant amounts of money and they were the ones running the bug-free version.
  • Deposits to the major exchanges and payments via BitPay were also suspended (and then un-suspended) very quickly.
  • Fortunately, the only attack on a merchant was done by someone who was not intending to actually steal money

Root cause

Bitcoin 0.7 configures an insufficient number of Berkeley DB locks to process large but technically valid blocks. Berkeley DB locks have to be manually configured by API users depending on anticipated load. The manual says this:

The recommended algorithm for selecting the maximum number of locks, lockers, and lock objects is to run the application under stressful conditions and then review the lock system's statistics to determine the maximum number of locks, lockers, and lock objects that were used. Then, double these values for safety.

Because max-sized blocks had been successfully processed on the testnet, it did not occur to anyone that there could be blocks that were smaller but require more locks than were available.

Bitcoin 0.8 does not use Berkeley DB. It uses LevelDB instead, which does not require this kind of pre-configuration. Therefore it was able to process the forking block successfully.

Note that BDB locks are also required during processing of re-organizations. Versions prior to 0.8 may be unable to process some valid re-orgs.

This would be an issue even if the entire network was running version 0.7.2. It is theoretically possible for one 0.7.2 node to create a block that others are unable to validate, or for 0.7.2 nodes to create block re-orgs that peers cannot validate, because the contents of each node's blkindex.dat database is not identical, and the number of locks required depends on the exact arrangement of the blkindex.dat on disk (locks are acquired per-page).

Action items

Immediately

Done: Release a version 0.8.1, forked directly from 0.8.0, that, for the next two months has the following new rules:

  1. Reject blocks that could cause more than 10,000 locks to be taken.
  2. Limit the maximum block-size created to 500,000 bytes
  3. Release a patch for older versions that implements the same rules, but also increases the maximum number of locks to 120,000
  4. Create a web page on bitcoin.org that will urge users to upgrade to 0.8.1, but will tell them how to set DB_CONFIG to 120,000 locks if they absolutely cannot.
  5. Over the next 2 months, send a series of alerts to users of older versions, pointing to the web page.

Alert system

Done: Review who has access to the alert system keys, make sure they all have contact information for each other, and get good timezone overlap by people with access to the keys.

Implement a new bitcoind feature so services can get timely notification of alerts: -alertnotify=<command> Run command when an AppliesToMe() alert is received.

Pre-generate 52 test alerts, and set a time every week when they are broadcast on -testnet (so -alertnotify scripts can be tested in as-close-to-real-world conditions as possible).

Idea from Michael Gronager: encourage merchants/exchanges (and maybe pools) to run new code behind a bitcoind running the network-majority version.

Safe mode

Perhaps trigger this an alert if there is a long enough side chain detected, even if it is not the main chain. Pools could use this to automatically suspend payouts if a long side-chain suddenly appeared out of nowhere (it’s hard for an attacker to mine such a thing).

Testing

Start running bots on the testnet that grab some coins from a testnet faucet, generate large numbers of random transactions that split/recombine them and then send them back to the faucet. Randomized online testing on the testnet might have revealed the pathological block type earlier.

Double spending

A double spend attack was successful, despite that both sides of the chain heard about the transactions in the same order. The reason is most likely that the memory pools were cleared when the mining pool nodes were downgraded. A solution is for nodes to sync their mempools to each other at startup, however, this requires a memory pool expiry policy to be implemented as currently node restarts are the only way for unconfirmed transactions to be evicted from the system.