Satoshi Client Node Discovery: Difference between revisions
→DNS Addresses: Add reference to source. |
→DNS Addresses: Updating line number for seed nodes and month to now. |
||
(15 intermediate revisions by 6 users not shown) | |||
Line 5: | Line 5: | ||
# Nodes discover their own external address by various methods. | # Nodes discover their own external address by various methods. | ||
# Nodes receive the callback address of remote nodes that connect to them. | # Nodes receive the callback address of remote nodes that connect to them. | ||
# Nodes makes DNS request to receive IP addresses. | # Nodes makes DNS request to receive IP addresses. | ||
# Nodes can use addresses hard coded into the software. | # Nodes can use addresses hard coded into the software. | ||
Line 14: | Line 13: | ||
A timestamp is kept for each address to keep track of when the node | A timestamp is kept for each address to keep track of when the node | ||
address was last seen. The AddressCurrentlyConnected in net.cpp handles | address was last seen. The AddressCurrentlyConnected in [https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp net.cpp] handles | ||
updating the timestamp whenever a message is received from a node. | updating the timestamp whenever a message is received from a node. | ||
Timestamps are only updated on an address and saved to the database | Timestamps are only updated on an address and saved to the database | ||
Line 26: | Line 25: | ||
timestamps, it will become more clear why timestamps are kept the way | timestamps, it will become more clear why timestamps are kept the way | ||
they are for each of the different ways an address is discovered. | they are for each of the different ways an address is discovered. | ||
==Handling Message "getaddr"== | ==Handling Message "getaddr"== | ||
Line 45: | Line 43: | ||
===Local Client's External Address=== | ===Local Client's External Address=== | ||
The client uses | The client uses public web services which return the information to determine its own external, routable IP address. | ||
public web services which return the information | |||
The client | The client runs a thread called ThreadGetMyExternalIP (in [https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp net.cpp]) | ||
which attempts to determine the client's IP address as seen from the outside | which attempts to determine the client's IP address as seen from the outside | ||
world | world. | ||
First, it attempts to connect to 91.198.22.70 port 80, which should be | First, it attempts to connect to 91.198.22.70 port 80, which should be | ||
the checkip.dyndns.org server. If connection fails, a DNS request is made | the checkip.dyndns.org server. If connection fails, a DNS request is made | ||
for checkip.dyndns.org and a connection is attempted to that address. | for checkip.dyndns.org and a connection is attempted to that address. | ||
Next, it | Next, it attempts to connect to 74.208.43.192 port 80, which should be | ||
the www.showmyip.com server. If connection fails, a DNS request is made | the www.showmyip.com server. If connection fails, a DNS request is made | ||
for www.showmyip.com and a connection is attempted to that address. | for www.showmyip.com and a connection is attempted to that address. | ||
Line 71: | Line 59: | ||
send a HTTP request, read the appropriate response line, and parse the | send a HTTP request, read the appropriate response line, and parse the | ||
IP address from it. | IP address from it. | ||
If this succeeds, the IP is returned, it is advertised to any connected | If this succeeds, the IP address is returned, it is advertised to any connected | ||
nodes, and then the thread finishes (without proceeding to the next | nodes, and then the thread finishes (without proceeding to the next | ||
address). | address). | ||
===Connect Callback Address=== | |||
When a node receives an initial "version" message, and that node initiated the connection, then the node advertises its address to the remote so that it can connect back to the local node if it wants to. | |||
After sending its own address, it sends a "getaddr" request message to the remote node to learn about more addresses, if the remote node version is recent or if the local node does not yet have 1000 addresses. | |||
After sending its own address, it sends a "getaddr" request message | |||
to the remote node to learn about more addresses, if the remote node | |||
version is recent or if the local node does not yet have 1000 addresses. | |||
===IRC Addresses=== | ===IRC Addresses=== | ||
As of version 0.6.x the Bitcoin client no longer uses IRC bootstrapping by default. This documentation below is accurate for most prior versions. | As of version 0.6.x the Bitcoin client no longer uses IRC bootstrapping by default, and as of version 0.8.2 support for IRC bootstrapping has been removed completely. This documentation below is accurate for most prior versions. | ||
In addition to learning and sharing its own address, the node learned about other node addresses via an IRC channel. See [https://github.com/bitcoin/bitcoin/blob/847593228de8634bf6ef5933a474c7e63be59146/src/irc.cpp irc.cpp]. | |||
After learning its own address, a node encoded its own address into a string to be used as a nickname. Then, it randomly joined an IRC channel named between #bitcoin00 and #bitcoin99. Then it issued a WHO command. The thread read the lines as they appeared in the channel and decoded the IP addresses of other nodes in the channel. It did this in a loop, forever, until the node was shutdown. | |||
After learning its own address, a node | |||
to be used as a nickname. Then, it randomly | |||
between #bitcoin00 and #bitcoin99. Then it | |||
The thread | |||
the IP addresses of other nodes in the channel. It | |||
a loop, forever, until the node | |||
When the client | When the client discovered an address from IRC, it set the timestamp on the address to the current time, but it used a "penalty" of 51 minutes, which means it looked like it was actually seen almost an hour earlier. | ||
on the address to the current time, but it | |||
of 51 minutes, which means it | |||
almost an hour | |||
===DNS Addresses=== | ===DNS Addresses=== | ||
Upon startup, if peer node discovery is needed, the client then issues DNS requests to learn about the addresses of other peer nodes. The client includes a list of | Upon startup, if peer node discovery is needed, the client then issues DNS requests to learn about the addresses of other peer nodes. The client includes a list of host names for DNS services that are seeded. As of December, 2017 the list (from chainparams.cpp<ref>[https://github.com/bitcoin/bitcoin/blob/master/src/chainparams.cpp#L128 bitcoin/src/chainparams.cpp at master]</ref>) includes: | ||
* | * seed.bitcoin.sipa.be | ||
* dnsseed.bluematt.me | * dnsseed.bluematt.me | ||
* dnsseed.bitcoin.dashjr.org | * dnsseed.bitcoin.dashjr.org | ||
* seed.bitcoinstats.com | |||
* seed.bitcoin.jonasschnelli.ch | |||
* seed.btc.petertodd.org | |||
A DNS reply can contain multiple IP addresses for a requested name. | A DNS reply can contain multiple IP addresses for a requested name. | ||
Line 117: | Line 94: | ||
===Hard Coded "Seed" Addresses=== | ===Hard Coded "Seed" Addresses=== | ||
The client contains hard coded IP addresses that represent bitcoin nodes | The client contains hard coded IP addresses that represent bitcoin nodes. | ||
There is code is move away from seed nodes when possible. The presumption | These addresses are only used as a last resort, if no other method has produced any addresses at all. When the loop in the connection handling thread ThreadOpenConnections2() sees an empty address map, it uses the "seed" IP addresses as backup. | ||
is that this is to avoid overloading those nodes. Once the local node has | |||
enough addresses (presumably learned from the seed nodes), the | There is code is move away from seed nodes when possible. The presumption is that this is to avoid overloading those nodes. Once the local node has enough addresses (presumably learned from the seed nodes), the connection thread will close seed node connections. | ||
connection thread will close seed node connections. | |||
Seed Addresses are initially given a zero timestamp, | Seed Addresses are initially given a zero timestamp, | ||
therefore they are not advertised in response to a "getaddr" request. | therefore they are not advertised in response to a "getaddr" request. | ||
===Ongoing "addr" advertisements=== | ===Ongoing "addr" advertisements=== | ||
Nodes may receive addresses in an "addr" message after having | Nodes may receive addresses in an "addr" message after having sent a "getaddr" request, or "addr" messages may arrive unsolicited, because nodes advertise addresses gratuitously when they relay addresses (see below), when they advertise their own address periodically, and when a connection is made. | ||
sent a "getaddr" request, or "addr" messages may arrive | |||
unsolicited, because nodes advertise addresses gratuitously | |||
when they relay addresses (see below), when they advertise | |||
their own address periodically, and when a connection is made. | |||
If the address is from a really old version, it is ignored; if from | If the address is from a really old version, it is ignored; if from a not-so-old version, it is ignored if we have 1000 addresses already. | ||
a not-so-old version, it is ignored if we have 1000 addresses already. | |||
If the sender sent over 1000 addresses, they are all ignored. | If the sender sent over 1000 addresses, they are all ignored. | ||
Addresses received from an "addr" message have a timestamp, but the | Addresses received from an "addr" message have a timestamp, but the timestamp is not necessarily honored directly. | ||
timestamp is not necessarily honored directly. | |||
Line 153: | Line 117: | ||
# We subtract 2 hours from the timestamp and add the address. | # We subtract 2 hours from the timestamp and add the address. | ||
Note that when any address is added, for any reason, the code that calls | Note that when any address is added, for any reason, the code that calls AddAddress() does not check to see if it already exists. The AddAddresss() function in [https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp net.cpp] will do that, and if the address already exists, further processing is done to update the address record. If the advertised services of the address have changed, that is updated and stored. | ||
AddAddress() does not check to see if it already exists. The AddAddresss() | |||
function in net.cpp will do that, and if the address already exists, further | If the address has been seen in the last 24 hours and the timestamp is currently over 60 minutes old, then it is updated to 60 minutes ago. | ||
processing is done to update the address record. If the advertised services | |||
of the address have changed, that is updated and stored. | If the address has NOT been seen in the last 24 hours, and the timestamp is currently over 24 hours old, then it is updated to 24 hours ago. | ||
If the address has been seen in the last 24 hours and the timestamp is | |||
currently over 60 minutes old, then it is updated to 60 minutes ago. | |||
If the address has NOT been seen in the last 24 hours, and the timestamp is | |||
currently over 24 hours old, then it is updated to 24 hours ago. | |||
====Address Relay==== | ====Address Relay==== | ||
Once addresses are added from an "addr" message (see above), they then | Once addresses are added from an "addr" message (see above), they then may be relayed to the other nodes. First, the following criteria must be set [9]: | ||
may be relayed to the other nodes. First, the following criteria | |||
must be set [9]: | |||
#The address timestamp, after processing, is within 60 minutes of the current time | #The address timestamp, after processing, is within 60 minutes of the current time | ||
Line 174: | Line 132: | ||
#The address must be routable. | #The address must be routable. | ||
For every address that meets the above criteria, the node hash the address, the current day (in the form of an integer), and a random 256 bit value (generated at client startup). The node takes the two addresses with the lowest hash values and relays "addr" messages to them. This ensures that each node only relays "addr" messages to two other clients at any given time, that the two other clients are randomly selected, and that the random selection starts over at least once every 24 hours. | |||
and the two | |||
the | |||
====Self broadcast==== | ====Self broadcast==== | ||
Every 24 hours, the node advertises its own address to all connected nodes. | Every 24 hours, the node advertises its own address to all connected nodes. | ||
It also clears the list of the addresses we think the remote node has, which | |||
will trigger a refresh of sends to nodes. This code is in SendMessages() | It also clears the list of the addresses we think the remote node has, which will trigger a refresh of sends to nodes. This code is in SendMessages() in [https://github.com/bitcoin/bitcoin/blob/master/src/main.cpp main.cpp]. | ||
in main.cpp. | |||
====Old Address Cleanup==== | ====Old Address Cleanup==== | ||
In SendMessages() in main.cpp, there is code to remove old addresses. | In SendMessages() in [https://github.com/bitcoin/bitcoin/blob/master/src/main.cpp main.cpp], there is code to remove old addresses. | ||
This is done every ten minutes, as long as there are 3 active connections. | This is done every ten minutes, as long as there are 3 active connections. | ||
The node erases messages that have not been used in 14 days as long as there are at least 1000 addresses in the map, and as long as the erasing process has not taken more than 20 seconds. | |||
===Addresses stored in the Database=== | ===Addresses stored in the Database=== | ||
Addresses are stored in the database when AddAddress() is called. | Addresses are stored in the database when AddAddress() is called. | ||
Addresses are read on startup when AppInit2() calls LoadAddresses(), | Addresses are read on startup when AppInit2() calls LoadAddresses(), which is located in [https://github.com/bitcoin/bitcoin/blob/master/src/db.cpp db.cpp]. | ||
which is located in db.cpp. | |||
Currently, it appears all addresses are stored all at once whenever any address is stored or updated<ref>[http://bitcointalk.org/index.php?topic=26436.0 Lots of disk activity on Bitcoin startup: easy fix?]</ref>. Indeed, AddAddress is seen to take over .01 seconds in various testing and is typically called tens of thousands of times in the initial 12 hours of running the client. | |||
===Command Line Provided Addresses=== | ===Command Line Provided Addresses=== | ||
The user can specify nodes to connect to with the -addnode <ip> | The user can specify nodes to connect to with the | ||
-addnode <ip> | |||
command line argument. Multiple nodes may be specified. | command line argument. Multiple nodes may be specified. | ||
Addresses provided on the command line are initially given a zero | Addresses provided on the command line are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request. | ||
timestamp, therefore they are not advertised in response to a "getaddr" | |||
request. | |||
The user can also specify an address to connect to with the -connect <ip> | The user can also specify an address to connect to with the -connect <ip> | ||
command line argument. Multiple nodes may be specified. | command line argument. Multiple nodes may be specified. | ||
The -connect argument differs from -addnode in that -connect addresses are not added to the address database and when -connect is specified, only those addresses are used. | |||
===Text File Provided Addresses=== | |||
The client will automatically read a file named "addr.txt" in the bitcoin data directory and will add any addresses it finds in there as node addresses. These nodes are given no special preference over other addresses. They are just added to the pool. | |||
The client will automatically read a file named "addr.txt" in the | |||
bitcoin data directory and will add any addresses it finds in there | |||
as node addresses. These nodes are given no special preference over | |||
other addresses. They are just added to the pool. | |||
Addresses loaded from the text file are initially given a zero timestamp, | Addresses loaded from the text file are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request. | ||
therefore they are not advertised in response to a "getaddr" request. | |||
Line 241: | Line 179: | ||
* [[Network#Bootstrapping|Network - Boostrapping]] | * [[Network#Bootstrapping|Network - Boostrapping]] | ||
== | ==References== | ||
<references /> | |||
[[Category:Developer]] | [[Category:Developer]] | ||
[[Category:Technical]] | [[Category:Technical]] |
Latest revision as of 02:29, 19 December 2017
Overview
The Satoshi client discovers the IP address and port of nodes in several different ways.
- Nodes discover their own external address by various methods.
- Nodes receive the callback address of remote nodes that connect to them.
- Nodes makes DNS request to receive IP addresses.
- Nodes can use addresses hard coded into the software.
- Nodes exchange addresses with other nodes.
- Nodes store addresses in a database and read that database on startup.
- Nodes can be provided addresses as command line arguments
- Nodes read addresses from a user provided text file on startup
A timestamp is kept for each address to keep track of when the node address was last seen. The AddressCurrentlyConnected in net.cpp handles updating the timestamp whenever a message is received from a node. Timestamps are only updated on an address and saved to the database when the timestamp is over 20 minutes old.
See the Node Connectivity article for information on which type of addresses take precedence when actually connecting to nodes.
In the first section we will cover how a node handles a request for addresses via the "getaddr" message. By understanding the role of timestamps, it will become more clear why timestamps are kept the way they are for each of the different ways an address is discovered.
Handling Message "getaddr"
When a node receives a "getaddr" request, it first figures out how many addresses it has that have a timestamp in the last 3 hours. Then it sends those addresses, but if there are more than 2500 addresses seen in the last 3 hours, it selects around 2500 out of the available recent addresses by using random selection.
Now lets look at the ways a node finds out about node addresses.
Discovery Methods
Local Client's External Address
The client uses public web services which return the information to determine its own external, routable IP address.
The client runs a thread called ThreadGetMyExternalIP (in net.cpp) which attempts to determine the client's IP address as seen from the outside world.
First, it attempts to connect to 91.198.22.70 port 80, which should be the checkip.dyndns.org server. If connection fails, a DNS request is made for checkip.dyndns.org and a connection is attempted to that address. Next, it attempts to connect to 74.208.43.192 port 80, which should be the www.showmyip.com server. If connection fails, a DNS request is made for www.showmyip.com and a connection is attempted to that address.
For each address attempted above, the client attempts to connect, send a HTTP request, read the appropriate response line, and parse the IP address from it. If this succeeds, the IP address is returned, it is advertised to any connected nodes, and then the thread finishes (without proceeding to the next address).
Connect Callback Address
When a node receives an initial "version" message, and that node initiated the connection, then the node advertises its address to the remote so that it can connect back to the local node if it wants to.
After sending its own address, it sends a "getaddr" request message to the remote node to learn about more addresses, if the remote node version is recent or if the local node does not yet have 1000 addresses.
IRC Addresses
As of version 0.6.x the Bitcoin client no longer uses IRC bootstrapping by default, and as of version 0.8.2 support for IRC bootstrapping has been removed completely. This documentation below is accurate for most prior versions.
In addition to learning and sharing its own address, the node learned about other node addresses via an IRC channel. See irc.cpp.
After learning its own address, a node encoded its own address into a string to be used as a nickname. Then, it randomly joined an IRC channel named between #bitcoin00 and #bitcoin99. Then it issued a WHO command. The thread read the lines as they appeared in the channel and decoded the IP addresses of other nodes in the channel. It did this in a loop, forever, until the node was shutdown.
When the client discovered an address from IRC, it set the timestamp on the address to the current time, but it used a "penalty" of 51 minutes, which means it looked like it was actually seen almost an hour earlier.
DNS Addresses
Upon startup, if peer node discovery is needed, the client then issues DNS requests to learn about the addresses of other peer nodes. The client includes a list of host names for DNS services that are seeded. As of December, 2017 the list (from chainparams.cpp[1]) includes:
- seed.bitcoin.sipa.be
- dnsseed.bluematt.me
- dnsseed.bitcoin.dashjr.org
- seed.bitcoinstats.com
- seed.bitcoin.jonasschnelli.ch
- seed.btc.petertodd.org
A DNS reply can contain multiple IP addresses for a requested name.
Addresses discovered via DNS are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.
Hard Coded "Seed" Addresses
The client contains hard coded IP addresses that represent bitcoin nodes.
These addresses are only used as a last resort, if no other method has produced any addresses at all. When the loop in the connection handling thread ThreadOpenConnections2() sees an empty address map, it uses the "seed" IP addresses as backup.
There is code is move away from seed nodes when possible. The presumption is that this is to avoid overloading those nodes. Once the local node has enough addresses (presumably learned from the seed nodes), the connection thread will close seed node connections.
Seed Addresses are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.
Ongoing "addr" advertisements
Nodes may receive addresses in an "addr" message after having sent a "getaddr" request, or "addr" messages may arrive unsolicited, because nodes advertise addresses gratuitously when they relay addresses (see below), when they advertise their own address periodically, and when a connection is made.
If the address is from a really old version, it is ignored; if from a not-so-old version, it is ignored if we have 1000 addresses already.
If the sender sent over 1000 addresses, they are all ignored.
Addresses received from an "addr" message have a timestamp, but the timestamp is not necessarily honored directly.
For every address in the message:
- If the timestamp is too low or too high, it is set to 5 days ago.
- We subtract 2 hours from the timestamp and add the address.
Note that when any address is added, for any reason, the code that calls AddAddress() does not check to see if it already exists. The AddAddresss() function in net.cpp will do that, and if the address already exists, further processing is done to update the address record. If the advertised services of the address have changed, that is updated and stored.
If the address has been seen in the last 24 hours and the timestamp is currently over 60 minutes old, then it is updated to 60 minutes ago.
If the address has NOT been seen in the last 24 hours, and the timestamp is currently over 24 hours old, then it is updated to 24 hours ago.
Address Relay
Once addresses are added from an "addr" message (see above), they then may be relayed to the other nodes. First, the following criteria must be set [9]:
- The address timestamp, after processing, is within 60 minutes of the current time
- The "addr" message contains 10 addresses or less
- And fGetAddr is not set on the node. fGetAddr starts false, is set to true when we request addresses from a node, and it is cleared when we receive less than 1000 addresses from a node.
- The address must be routable.
For every address that meets the above criteria, the node hash the address, the current day (in the form of an integer), and a random 256 bit value (generated at client startup). The node takes the two addresses with the lowest hash values and relays "addr" messages to them. This ensures that each node only relays "addr" messages to two other clients at any given time, that the two other clients are randomly selected, and that the random selection starts over at least once every 24 hours.
Self broadcast
Every 24 hours, the node advertises its own address to all connected nodes.
It also clears the list of the addresses we think the remote node has, which will trigger a refresh of sends to nodes. This code is in SendMessages() in main.cpp.
Old Address Cleanup
In SendMessages() in main.cpp, there is code to remove old addresses.
This is done every ten minutes, as long as there are 3 active connections.
The node erases messages that have not been used in 14 days as long as there are at least 1000 addresses in the map, and as long as the erasing process has not taken more than 20 seconds.
Addresses stored in the Database
Addresses are stored in the database when AddAddress() is called.
Addresses are read on startup when AppInit2() calls LoadAddresses(), which is located in db.cpp.
Currently, it appears all addresses are stored all at once whenever any address is stored or updated[2]. Indeed, AddAddress is seen to take over .01 seconds in various testing and is typically called tens of thousands of times in the initial 12 hours of running the client.
Command Line Provided Addresses
The user can specify nodes to connect to with the
-addnode <ip>
command line argument. Multiple nodes may be specified.
Addresses provided on the command line are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.
The user can also specify an address to connect to with the -connect <ip> command line argument. Multiple nodes may be specified.
The -connect argument differs from -addnode in that -connect addresses are not added to the address database and when -connect is specified, only those addresses are used.
Text File Provided Addresses
The client will automatically read a file named "addr.txt" in the bitcoin data directory and will add any addresses it finds in there as node addresses. These nodes are given no special preference over other addresses. They are just added to the pool.
Addresses loaded from the text file are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.