|
|
Line 12: |
Line 12: |
| </pre> | | </pre> |
| | | |
− | ==Abstract== | + | ==Draft== |
| | | |
− | This BIP proposes a scheme for translating binary data (usually master seeds
| + | See https://github.com/trezor/python-mnemonic/blob/master/README |
− | for deterministic keys, but it can be applied to any binary data) into a group
| |
− | of easy to remember words also known as mnemonic code or mnemonic sentence.
| |
− | | |
− | ==Motivation==
| |
− | | |
− | Such mnemonic code or mnemonic sentence is much easier to work with than working
| |
− | with the binary data directly (or its hexadecimal interpretation). The sentence
| |
− | could be writen down on paper (e.g. for storing in a secure location such as
| |
− | safe), told over telephone or other voice communication method, or memorized
| |
− | in ones memory (this method is called brainwallet).
| |
− | | |
− | ==Backwards Compatibility==
| |
− | | |
− | As this BIP is written, only one Bitcoin client (Electrum) implements mnemonic
| |
− | codes, but it uses a different wordlist than the proposed one.
| |
− | | |
− | For compatibility reasons we propose adding a checkbox to Electrum, which will
| |
− | allow user to indicate if the legacy code is being entered during import or
| |
− | it is a new one that is BIP-0039 compatible. For exporting, only the new format
| |
− | will be used, so this is not an issue.
| |
− | | |
− | ==Rationale==
| |
− | | |
− | Our proposal is inspired by implementation used in Electrum, but we enhanced
| |
− | the wordlist and algorithm so it meets the following criteria:
| |
− | | |
− | a) smart selection of words
| |
− | - wordlist is created in such way that it's enough to type just first four
| |
− | letters to unambiguously identify the word
| |
− | | |
− | b) similar words avoided
| |
− | - words as "build" and "built", "woman" and "women" or "quick" or "quickly"
| |
− | not only make remembering the sentence difficult, but are also more error
| |
− | prone and more difficult to guess (see point below)
| |
− | - we avoid these words by carefully selecting them during addition
| |
− | | |
− | c) sorted wordlists
| |
− | - wordlist is sorted which allow more efficient lookup of the code words
| |
− | (i.e. implementation can use binary search instead of linear search)
| |
− | - this also allows trie (prefix tree) to be used, e.g. for better compression
| |
− | | |
− | d) localized wordlists
| |
− | - we would like to allow localized wordlists, so it is easier for users
| |
− | to remember the code in their native language
| |
− | - by using wordlists with no colliding words among languages, it's easy to
| |
− | determine which language was used just by checking the first word of
| |
− | the sentence
| |
− | | |
− | e) mnemonic checksum
| |
− | - this leads to better user experience, because user can be notified
| |
− | if the mnemonic sequence is wrong, instead of showing the confusing
| |
− | data generated from the wrong sequence.
| |
− | | |
− | f) seed stretching
| |
− | - before the encoding and after the decoding the input binary sequence is
| |
− | stretched using a symmetric cipher (Blowfish) in order to prevent
| |
− | brute-force attacks in case some of the mnemonic words are leaked
| |
− | | |
− | ==Specification==
| |
− | | |
− | <pre>
| |
− | Our proposal implements two methods - "encode" and "decode".
| |
− | | |
− | The first method takes a binary data which have to length (L) in bytes divisable
| |
− | by four and returns a sentence that consists of (L/4*3) words from the wordlist.
| |
− | | |
− | The second method takes sentences generated by first method (number of words in
| |
− | the sentence has to be divisable by 3) and reconstructs the original binary data.
| |
− | | |
− | Words can repeat in the sentence more than one time.
| |
− | | |
− | Wordlist contains 2048 words (instead of 1626 words in Electrum), allowing
| |
− | the code to compute the checksum of the whole mnemonic sequence.
| |
− | Each 32 bits of input data add 1 bit of checksum.
| |
− | | |
− | See the following table for relation between input lengths, output lengths and
| |
− | checksum sizes for the most common usecases:
| |
− | | |
− | +--------+---------+---------+----------+
| |
− | | input | input | output | checksum |
| |
− | | (bits) | (bytes) | (words) | (bits) |
| |
− | +--------+---------+---------+----------+
| |
− | | 128 | 16 | 12 | 4 |
| |
− | | 192 | 24 | 18 | 6 |
| |
− | | 256 | 32 | 24 | 8 |
| |
− | +--------+---------+---------+----------+
| |
− | </pre>
| |
− | | |
− | ===Algorithm:===
| |
− | | |
− | <pre>
| |
− | Encoding:
| |
− | 1. Read input data (I).
| |
− | 2. Make sure its length (L) is divisable by 64 bits.
| |
− | 3. Encrypt input data 1000x with Blowfish (ECB) using the word "mnemonic" as key.
| |
− | 4. Compute the length of the checkum (LC). LC = L/32
| |
− | 5. Split I into chunks of LC bits (I1, I2, I3, ...).
| |
− | 6. XOR them altogether and produce the checksum C. C = I1 xor I2 xor I3 ... xor In.
| |
− | 7. Concatenate I and C into encoded data (E). Length of E is divisable by 33 bits.
| |
− | 8. Keep taking 11 bits from E until there are none left.
| |
− | 9. Treat them as integer W, add word with index W to the output.
| |
− | | |
− | Decoding:
| |
− | 1. Read input mnemonic (M).
| |
− | 2. Make sure its wordcount is divisable by 6.
| |
− | 3. Figure out word indexes in a dictionary and output them as binary stream E.
| |
− | 4. Length of E (L) is divisable by 33 bits.
| |
− | 5. Split E into two parts: B and C, where B are first L/33*32 bits, C are last L/33 bits.
| |
− | 6. Make sure C is the checksum of B (using the step 5 from the above paragraph).
| |
− | 7. If it's not we have invalid mnemonic code.
| |
− | 8. Treat B as binary data.
| |
− | 9. Decrypt this data 1000x with Blowfish (ECB) using the word "mnemonic" as key.
| |
− | 10. Return the result as output.
| |
− | </pre>
| |
− | | |
− | ==Test vectors==
| |
− | | |
− | See https://github.com/trezor/python-mnemonic/blob/master/vectors.json | |
| | | |
| ==Reference Implementation== | | ==Reference Implementation== |