Satoshi Client Node Discovery

From Bitcoin Wiki
Jump to: navigation, search

Overview

The Satoshi client discovers the IP address and port of nodes in several different ways.

  1. Nodes discover their own external address by various methods.
  2. Nodes receive the callback address of remote nodes that connect to them.
  3. Nodes makes DNS request to receive IP addresses.
  4. Nodes can use addresses hard coded into the software.
  5. Nodes exchange addresses with other nodes.
  6. Nodes store addresses in a database and read that database on startup.
  7. Nodes can be provided addresses as command line arguments
  8. Nodes read addresses from a user provided text file on startup

A timestamp is kept for each address to keep track of when the node address was last seen. The AddressCurrentlyConnected in net.cpp handles updating the timestamp whenever a message is received from a node. Timestamps are only updated on an address and saved to the database when the timestamp is over 20 minutes old.

See the Node Connectivity article for information on which type of addresses take precedence when actually connecting to nodes.

In the first section we will cover how a node handles a request for addresses via the "getaddr" message. By understanding the role of timestamps, it will become more clear why timestamps are kept the way they are for each of the different ways an address is discovered.

Handling Message "getaddr"

When a node receives a "getaddr" request, it first figures out how many addresses it has that have a timestamp in the last 3 hours. Then it sends those addresses, but if there are more than 2500 addresses seen in the last 3 hours, it selects around 2500 out of the available recent addresses by using random selection.


Now lets look at the ways a node finds out about node addresses.


Discovery Methods

Local Client's External Address

The client uses public web services which return the information to determine its own external, routable IP address.

The client runs a thread called ThreadGetMyExternalIP (in net.cpp) which attempts to determine the client's IP address as seen from the outside world.

First, it attempts to connect to 91.198.22.70 port 80, which should be the checkip.dyndns.org server. If connection fails, a DNS request is made for checkip.dyndns.org and a connection is attempted to that address. Next, it attempts to connect to 74.208.43.192 port 80, which should be the www.showmyip.com server. If connection fails, a DNS request is made for www.showmyip.com and a connection is attempted to that address.

For each address attempted above, the client attempts to connect, send a HTTP request, read the appropriate response line, and parse the IP address from it. If this succeeds, the IP address is returned, it is advertised to any connected nodes, and then the thread finishes (without proceeding to the next address).

Connect Callback Address

When a node receives an initial "version" message, and that node initiated the connection, then the node advertises its address to the remote so that it can connect back to the local node if it wants to.

After sending its own address, it sends a "getaddr" request message to the remote node to learn about more addresses, if the remote node version is recent or if the local node does not yet have 1000 addresses.


IRC Addresses

As of version 0.6.x the Bitcoin client no longer uses IRC bootstrapping by default, and as of version 0.8.2 support for IRC bootstrapping has been removed completely. This documentation below is accurate for most prior versions.

In addition to learning and sharing its own address, the node learned about other node addresses via an IRC channel. See irc.cpp.

After learning its own address, a node encoded its own address into a string to be used as a nickname. Then, it randomly joined an IRC channel named between #bitcoin00 and #bitcoin99. Then it issued a WHO command. The thread read the lines as they appeared in the channel and decoded the IP addresses of other nodes in the channel. It did this in a loop, forever, until the node was shutdown.

When the client discovered an address from IRC, it set the timestamp on the address to the current time, but it used a "penalty" of 51 minutes, which means it looked like it was actually seen almost an hour earlier.

DNS Addresses

Upon startup, if peer node discovery is needed, the client then issues DNS requests to learn about the addresses of other peer nodes. The client includes a list of host names for DNS services that are seeded. As of December, 2017 the list (from chainparams.cpp[1]) includes:

  • seed.bitcoin.sipa.be
  • dnsseed.bluematt.me
  • dnsseed.bitcoin.dashjr.org
  • seed.bitcoinstats.com
  • seed.bitcoin.jonasschnelli.ch
  • seed.btc.petertodd.org

A DNS reply can contain multiple IP addresses for a requested name.

Addresses discovered via DNS are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.

Hard Coded "Seed" Addresses

The client contains hard coded IP addresses that represent bitcoin nodes.

These addresses are only used as a last resort, if no other method has produced any addresses at all. When the loop in the connection handling thread ThreadOpenConnections2() sees an empty address map, it uses the "seed" IP addresses as backup.

There is code is move away from seed nodes when possible. The presumption is that this is to avoid overloading those nodes. Once the local node has enough addresses (presumably learned from the seed nodes), the connection thread will close seed node connections.

Seed Addresses are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.

Ongoing "addr" advertisements

Nodes may receive addresses in an "addr" message after having sent a "getaddr" request, or "addr" messages may arrive unsolicited, because nodes advertise addresses gratuitously when they relay addresses (see below), when they advertise their own address periodically, and when a connection is made.

If the address is from a really old version, it is ignored; if from a not-so-old version, it is ignored if we have 1000 addresses already.

If the sender sent over 1000 addresses, they are all ignored.

Addresses received from an "addr" message have a timestamp, but the timestamp is not necessarily honored directly.


For every address in the message:

  1. If the timestamp is too low or too high, it is set to 5 days ago.
  2. We subtract 2 hours from the timestamp and add the address.

Note that when any address is added, for any reason, the code that calls AddAddress() does not check to see if it already exists. The AddAddresss() function in net.cpp will do that, and if the address already exists, further processing is done to update the address record. If the advertised services of the address have changed, that is updated and stored.

If the address has been seen in the last 24 hours and the timestamp is currently over 60 minutes old, then it is updated to 60 minutes ago.

If the address has NOT been seen in the last 24 hours, and the timestamp is currently over 24 hours old, then it is updated to 24 hours ago.

Address Relay

Once addresses are added from an "addr" message (see above), they then may be relayed to the other nodes. First, the following criteria must be set [9]:

  1. The address timestamp, after processing, is within 60 minutes of the current time
  2. The "addr" message contains 10 addresses or less
  3. And fGetAddr is not set on the node. fGetAddr starts false, is set to true when we request addresses from a node, and it is cleared when we receive less than 1000 addresses from a node.
  4. The address must be routable.

For every address that meets the above criteria, the node hash the address, the current day (in the form of an integer), and a random 256 bit value (generated at client startup). The node takes the two addresses with the lowest hash values and relays "addr" messages to them. This ensures that each node only relays "addr" messages to two other clients at any given time, that the two other clients are randomly selected, and that the random selection starts over at least once every 24 hours.

Self broadcast

Every 24 hours, the node advertises its own address to all connected nodes.

It also clears the list of the addresses we think the remote node has, which will trigger a refresh of sends to nodes. This code is in SendMessages() in main.cpp.

Old Address Cleanup

In SendMessages() in main.cpp, there is code to remove old addresses.

This is done every ten minutes, as long as there are 3 active connections.

The node erases messages that have not been used in 14 days as long as there are at least 1000 addresses in the map, and as long as the erasing process has not taken more than 20 seconds.

Addresses stored in the Database

Addresses are stored in the database when AddAddress() is called.

Addresses are read on startup when AppInit2() calls LoadAddresses(), which is located in db.cpp.

Currently, it appears all addresses are stored all at once whenever any address is stored or updated[2]. Indeed, AddAddress is seen to take over .01 seconds in various testing and is typically called tens of thousands of times in the initial 12 hours of running the client.

Command Line Provided Addresses

The user can specify nodes to connect to with the

-addnode <ip> 

command line argument. Multiple nodes may be specified.

Addresses provided on the command line are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.

The user can also specify an address to connect to with the -connect <ip> command line argument. Multiple nodes may be specified.

The -connect argument differs from -addnode in that -connect addresses are not added to the address database and when -connect is specified, only those addresses are used.

Text File Provided Addresses

The client will automatically read a file named "addr.txt" in the bitcoin data directory and will add any addresses it finds in there as node addresses. These nodes are given no special preference over other addresses. They are just added to the pool.

Addresses loaded from the text file are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.


See Also

References