Difference between revisions of "BIP 0039"

From Bitcoin Wiki
Jump to: navigation, search
(Added seed stretching algo)
(Update BIP text with latest version from https://github.com/bitcoin/bips/blob/19c429ee2831d898/bip-0039.mediawiki)
(10 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
{{bip}}
 
{{bip}}
 +
{{BipMoved|bip-0039.mediawiki}}
  
 
<pre>
 
<pre>
   BIP:     BIP-0039
+
   BIP: 39
   Title:   Mnemonic code for generating deterministic keys
+
  Layer: Applications
   Author: Pavol Rusnak <stick@gk2.sk>
+
   Title: Mnemonic code for generating deterministic keys
          Marek Palatinus <info@bitcoin.cz>
+
   Author: Marek Palatinus <slush@satoshilabs.com>
          Aaron Voisine <voisine@gmail.com>
+
          Pavol Rusnak <stick@satoshilabs.com>
   Status: Draft
+
          Aaron Voisine <voisine@gmail.com>
   Type:   Standards Track
+
          Sean Bowe <ewillbefull@gmail.com>
   Created: 10-09-2013
+
  Comments-Summary: Unanimously Discourage for implementation
 +
  Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-0039
 +
   Status: Proposed
 +
   Type: Standards Track
 +
   Created: 2013-09-10
 
</pre>
 
</pre>
  
 
==Abstract==
 
==Abstract==
  
This BIP proposes a scheme for translating binary data (usually master seeds
+
This BIP describes the implementation of a mnemonic code or mnemonic sentence --
for deterministic keys, but it can be applied to any binary data) into a group
+
a group of easy to remember words -- for the generation of deterministic wallets.
of easy to remember words also known as mnemonic code or mnemonic sentence.
+
 
 +
It consists of two parts: generating the mnemonic and converting it into a
 +
binary seed. This seed can be later used to generate deterministic wallets using
 +
BIP-0032 or similar methods.
  
 
==Motivation==
 
==Motivation==
  
Such mnemonic code or mnemonic sentence is much easier to work with than working
+
A mnemonic code or sentence is superior for human interaction compared to the
with the binary data directly (or its hexadecimal interpretation). The sentence
+
handling of raw binary or hexadecimal representations of a wallet seed. The
could be writen down on paper (e.g. for storing in a secure location such as
+
sentence could be written on paper or spoken over the telephone.
safe), told over telephone or other voice communication method, or memorized
+
 
in ones memory (this method is called brainwallet).
+
This guide is meant to be a way to transport computer-generated randomness with
 +
a human-readable transcription. It's not a way to process user-created
 +
sentences (also known as brainwallets) into a wallet seed.
 +
 
 +
==Generating the mnemonic==
  
==Backwards Compatibility==
+
The mnemonic must encode entropy in a multiple of 32 bits. With more entropy
 +
security is improved but the sentence length increases. We refer to the
 +
initial entropy length as ENT. The allowed size of ENT is 128-256 bits.
  
As this BIP is written, only one Bitcoin client (Electrum) implements mnemonic
+
First, an initial entropy of ENT bits is generated. A checksum is generated by
codes, but it uses a different wordlist than the proposed one.
+
taking the first <pre>ENT / 32</pre> bits of its SHA256 hash. This checksum is
 +
appended to the end of the initial entropy. Next, these concatenated bits
 +
are split into groups of 11 bits, each encoding a number from 0-2047, serving
 +
as an index into a wordlist. Finally, we convert these numbers into words and
 +
use the joined words as a mnemonic sentence.
  
For compatibility reasons we propose adding a checkbox to Electrum, which will
+
The following table describes the relation between the initial entropy
allow user to indicate if the legacy code is being entered during import or
+
length (ENT), the checksum length (CS), and the length of the generated mnemonic
it is a new one that is BIP-0039 compatible. For exporting, only the new format
+
sentence (MS) in words.
will be used, so this is not an issue.
+
 
 +
<pre>
 +
CS = ENT / 32
 +
MS = (ENT + CS) / 11
 +
 
 +
|  ENT  | CS | ENT+CS |  MS  |
 +
+-------+----+--------+------+
 +
|  128  |  4 |  132  |  12  |
 +
|  160  |  5 |  165  |  15  |
 +
|  192  |  6 |  198  |  18  |
 +
|  224  |  7 |  231  |  21  |
 +
|  256  |  8 |  264  |  24  |
 +
</pre>
  
==Rationale==
+
==Wordlist==
  
Our proposal is inspired by implementation used in Electrum, but we enhanced
+
An ideal wordlist has the following characteristics:
the wordlist and algorithm so it meets the following criteria:
 
  
 
a) smart selection of words
 
a) smart selection of words
   - wordlist is created in such way that it's enough to type just first four
+
   - the wordlist is created in such a way that it's enough to type the first four
 
     letters to unambiguously identify the word
 
     letters to unambiguously identify the word
  
 
b) similar words avoided
 
b) similar words avoided
   - words as "build" and "built", "woman" and "women" or "quick" or "quickly"
+
   - word pairs like "build" and "built", "woman" and "women", or "quick" and "quickly"
     not only make remembering the sentence difficult, but are also more error
+
     not only make remembering the sentence difficult but are also more error
     prone and more difficult to guess (see point below)
+
     prone and more difficult to guess
  - we avoid these words by carefully selecting them during addition
 
  
 
c) sorted wordlists
 
c) sorted wordlists
   - wordlist is sorted which allow more efficient lookup of the code words
+
   - the wordlist is sorted which allows for more efficient lookup of the code words
     (i.e. implementation can use binary search instead of linear search)
+
     (i.e. implementations can use binary search instead of linear search)
   - this also allows trie (prefix tree) to be used, e.g. for better compression
+
   - this also allows trie (a prefix tree) to be used, e.g. for better compression
 +
 
 +
The wordlist can contain native characters, but they must be encoded in UTF-8
 +
using Normalization Form Compatibility Decomposition (NFKD).
  
d) localized wordlists
+
==From mnemonic to seed==
  - we would like to allow localized wordlists, so it is easier for users
 
    to remember the code in their native language
 
  - by using wordlists with no colliding words among languages, it's easy to
 
    determine which language was used just by checking the first word of
 
    the sentence
 
  
e) mnemonic checksum
+
A user may decide to protect their mnemonic with a passphrase. If a passphrase is not
  - this leads to better user experience, because user can be notified
+
present, an empty string "" is used instead.
    if the mnemonic sequence is wrong, instead of showing the confusing
 
    data generated from the wrong sequence.
 
  
f) seed stretching
+
To create a binary seed from the mnemonic, we use the PBKDF2 function with a mnemonic
  - before the encoding and after the decoding the input binary sequence is
+
sentence (in UTF-8 NFKD) used as the password and the string "mnemonic" + passphrase (again
    stretched using a symmetric cipher (Blowfish) in order to prevent
+
in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and HMAC-SHA512 is used as
    brute-force attacks in case some of the mnemonic words are leaked
+
the pseudo-random function. The length of the derived key is 512 bits (= 64 bytes).
  
==Specification==
+
This seed can be later used to generate deterministic wallets using BIP-0032 or
 +
similar methods.
  
<pre>
+
The conversion of the mnemonic sentence to a binary seed is completely independent
Our proposal implements two methods - "encode" and "decode".
+
from generating the sentence. This results in a rather simple code; there are no
 +
constraints on sentence structure and clients are free to implement their own
 +
wordlists or even whole sentence generators, allowing for flexibility in wordlists
 +
for typo detection or other purposes.
  
The first method takes a binary data which have to length (L) in bytes divisable
+
Although using a mnemonic not generated by the algorithm described in "Generating the
by four and returns a sentence that consists of (L/4*3) words from the wordlist.
+
mnemonic" section is possible, this is not advised and software must compute a
 +
checksum for the mnemonic sentence using a wordlist and issue a warning if it is
 +
invalid.
  
The second method takes sentences generated by first method (number of words in
+
The described method also provides plausible deniability, because every passphrase
the sentence has to be divisable by 3) and reconstructs the original binary data.
+
generates a valid seed (and thus a deterministic wallet) but only the correct one
 +
will make the desired wallet available.
  
Words can repeat in the sentence more than one time.
+
==Wordlists==
  
Wordlist contains 2048 words (instead of 1626 words in Electrum), allowing
+
Since the vast majority of BIP39 wallets supports only the English wordlist,
the code to compute the checksum of the whole mnemonic sequence.
+
it is '''strongly discouraged''' to use non-English wordlists for generating
Each 32 bits of input data add 1 bit of checksum.
+
the mnemonic sentences.
  
See the following table for relation between input lengths, output lengths and
+
If you still feel your application really needs to use a localized wordlist,
checksum sizes for the most common usecases:
+
use one of the following instead of inventing your own.
  
+--------+---------+---------+----------+
+
* [[bip-0039/bip-0039-wordlists.md|Wordlists]]
| input  |  input  | output  | checksum |
 
| (bits) | (bytes) | (words) |  (bits)  |
 
+--------+---------+---------+----------+
 
|  128  |    16  |    12  |    4    |
 
|  192  |    24  |    18  |    6    |
 
|  256  |    32  |    24  |    8    |
 
+--------+---------+---------+----------+
 
</pre>
 
  
===Algorithm:===
+
==Test vectors==
  
<pre>
+
The test vectors include input entropy, mnemonic and seed. The
Encoding:
+
passphrase "TREZOR" is used for all vectors.
1. Read input data (I).
 
2. Make sure its length (L) is divisable by 64 bits.
 
3. Encrypt input data 1000x with Blowfish (ECB) using the word "mnemonic" as key.
 
4. Compute the length of the checkum (LC). LC = L/32
 
5. Split I into chunks of LC bits (I1, I2, I3, ...).
 
6. XOR them altogether and produce the checksum C. C = I1 xor I2 xor I3 ... xor In.
 
7. Concatenate I and C into encoded data (E). Length of E is divisable by 33 bits.
 
8. Keep taking 11 bits from E until there are none left.
 
9. Treat them as integer W, add word with index W to the output.
 
  
Decoding:
+
https://github.com/trezor/python-mnemonic/blob/master/vectors.json
1. Read input mnemonic (M).
 
2. Make sure its wordcount is divisable by 6.
 
3. Figure out word indexes in a dictionary and output them as binary stream E.
 
4. Length of E (L) is divisable by 33 bits.
 
5. Split E into two parts: B and C, where B are first L/33*32 bits, C are last L/33 bits.
 
6. Make sure C is the checksum of B (using the step 5 from the above paragraph).
 
7. If it's not we have invalid mnemonic code.
 
8. Treat B as binary data.
 
9. Decrypt this data 1000x with Blowfish (ECB) using the word "mnemonic" as key.
 
10. Return the result as output.
 
</pre>
 
  
==Test vectors==
+
Also see https://github.com/bip32JP/bip32JP.github.io/blob/master/test_JP_BIP39.json
  
See https://github.com/trezor/python-mnemonic/blob/master/vectors.json
+
(Japanese wordlist test with heavily normalized symbols as passphrase)
  
 
==Reference Implementation==
 
==Reference Implementation==
Line 139: Line 142:
  
 
http://github.com/trezor/python-mnemonic
 
http://github.com/trezor/python-mnemonic
 +
 +
==Other Implementations==
 +
 +
Go:
 +
* https://github.com/tyler-smith/go-bip39
 +
 +
Elixir:
 +
* https://github.com/aerosol/mnemo
 +
 +
Objective-C:
 +
* https://github.com/nybex/NYMnemonic
 +
 +
Haskell:
 +
* https://github.com/haskoin/haskoin
 +
 +
.NET C# (PCL):
 +
* https://github.com/Thashiznets/BIP39.NET
 +
 +
.NET C# (PCL):
 +
* https://github.com/NicolasDorier/NBitcoin
 +
 +
JavaScript:
 +
* https://github.com/bitpay/bitcore-mnemonic
 +
* https://github.com/bitcoinjs/bip39 (used by [[https://github.com/blockchain/My-Wallet-V3/blob/v3.8.0/src/hd-wallet.js#L121-L146|blockchain.info]])
 +
 +
Java:
 +
* https://github.com/bitcoinj/bitcoinj/blob/master/core/src/main/java/org/bitcoinj/crypto/MnemonicCode.java
 +
 +
Ruby:
 +
* https://github.com/sreekanthgs/bip_mnemonic
 +
 +
Rust:
 +
* https://github.com/maciejhirsz/tiny-bip39/
 +
 +
Swift:
 +
* https://github.com/CikeQiu/CKMnemonic
 +
* https://github.com/yuzushioh/WalletKit
 +
* https://github.com/pengpengliu/BIP39
 +
* https://github.com/matter-labs/web3swift/blob/develop/Sources/web3swift/KeystoreManager/BIP39.swift
 +
* https://github.com/zcash-hackworks/MnemonicSwift
 +
 +
C++:
 +
* https://github.com/libbitcoin/libbitcoin-system/blob/master/include/bitcoin/system/wallet/mnemonic.hpp
 +
 +
C (with Python/Java/Javascript bindings):
 +
* https://github.com/ElementsProject/libwally-core

Revision as of 21:20, 12 February 2021

This page describes a BIP (Bitcoin Improvement Proposal).
Please see BIP 2 for more information about BIPs and creating them. Please do not just create a wiki page.

Please do not modify this page. This is a mirror of the BIP from the source Git repository here.

  BIP: 39
  Layer: Applications
  Title: Mnemonic code for generating deterministic keys
  Author: Marek Palatinus <slush@satoshilabs.com>
          Pavol Rusnak <stick@satoshilabs.com>
          Aaron Voisine <voisine@gmail.com>
          Sean Bowe <ewillbefull@gmail.com>
  Comments-Summary: Unanimously Discourage for implementation
  Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-0039
  Status: Proposed
  Type: Standards Track
  Created: 2013-09-10

Abstract

This BIP describes the implementation of a mnemonic code or mnemonic sentence -- a group of easy to remember words -- for the generation of deterministic wallets.

It consists of two parts: generating the mnemonic and converting it into a binary seed. This seed can be later used to generate deterministic wallets using BIP-0032 or similar methods.

Motivation

A mnemonic code or sentence is superior for human interaction compared to the handling of raw binary or hexadecimal representations of a wallet seed. The sentence could be written on paper or spoken over the telephone.

This guide is meant to be a way to transport computer-generated randomness with a human-readable transcription. It's not a way to process user-created sentences (also known as brainwallets) into a wallet seed.

Generating the mnemonic

The mnemonic must encode entropy in a multiple of 32 bits. With more entropy security is improved but the sentence length increases. We refer to the initial entropy length as ENT. The allowed size of ENT is 128-256 bits.

First, an initial entropy of ENT bits is generated. A checksum is generated by

taking the first
ENT / 32
bits of its SHA256 hash. This checksum is

appended to the end of the initial entropy. Next, these concatenated bits are split into groups of 11 bits, each encoding a number from 0-2047, serving as an index into a wordlist. Finally, we convert these numbers into words and use the joined words as a mnemonic sentence.

The following table describes the relation between the initial entropy length (ENT), the checksum length (CS), and the length of the generated mnemonic sentence (MS) in words.

CS = ENT / 32
MS = (ENT + CS) / 11

|  ENT  | CS | ENT+CS |  MS  |
+-------+----+--------+------+
|  128  |  4 |   132  |  12  |
|  160  |  5 |   165  |  15  |
|  192  |  6 |   198  |  18  |
|  224  |  7 |   231  |  21  |
|  256  |  8 |   264  |  24  |

Wordlist

An ideal wordlist has the following characteristics:

a) smart selection of words

  - the wordlist is created in such a way that it's enough to type the first four
    letters to unambiguously identify the word

b) similar words avoided

  - word pairs like "build" and "built", "woman" and "women", or "quick" and "quickly"
    not only make remembering the sentence difficult but are also more error
    prone and more difficult to guess

c) sorted wordlists

  - the wordlist is sorted which allows for more efficient lookup of the code words
    (i.e. implementations can use binary search instead of linear search)
  - this also allows trie (a prefix tree) to be used, e.g. for better compression

The wordlist can contain native characters, but they must be encoded in UTF-8 using Normalization Form Compatibility Decomposition (NFKD).

From mnemonic to seed

A user may decide to protect their mnemonic with a passphrase. If a passphrase is not present, an empty string "" is used instead.

To create a binary seed from the mnemonic, we use the PBKDF2 function with a mnemonic sentence (in UTF-8 NFKD) used as the password and the string "mnemonic" + passphrase (again in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and HMAC-SHA512 is used as the pseudo-random function. The length of the derived key is 512 bits (= 64 bytes).

This seed can be later used to generate deterministic wallets using BIP-0032 or similar methods.

The conversion of the mnemonic sentence to a binary seed is completely independent from generating the sentence. This results in a rather simple code; there are no constraints on sentence structure and clients are free to implement their own wordlists or even whole sentence generators, allowing for flexibility in wordlists for typo detection or other purposes.

Although using a mnemonic not generated by the algorithm described in "Generating the mnemonic" section is possible, this is not advised and software must compute a checksum for the mnemonic sentence using a wordlist and issue a warning if it is invalid.

The described method also provides plausible deniability, because every passphrase generates a valid seed (and thus a deterministic wallet) but only the correct one will make the desired wallet available.

Wordlists

Since the vast majority of BIP39 wallets supports only the English wordlist, it is strongly discouraged to use non-English wordlists for generating the mnemonic sentences.

If you still feel your application really needs to use a localized wordlist, use one of the following instead of inventing your own.

Test vectors

The test vectors include input entropy, mnemonic and seed. The passphrase "TREZOR" is used for all vectors.

https://github.com/trezor/python-mnemonic/blob/master/vectors.json

Also see https://github.com/bip32JP/bip32JP.github.io/blob/master/test_JP_BIP39.json

(Japanese wordlist test with heavily normalized symbols as passphrase)

Reference Implementation

Reference implementation including wordlists is available from

http://github.com/trezor/python-mnemonic

Other Implementations

Go:

Elixir:

Objective-C:

Haskell:

.NET C# (PCL):

.NET C# (PCL):

JavaScript:

Java:

Ruby:

Rust:

Swift:

C++:

C (with Python/Java/Javascript bindings):