Satoshi Client Node Discovery: Difference between revisions

From Bitcoin Wiki
Jump to navigation Jump to search
Sgornick (talk | contribs)
→‎DNS Addresses: Add reference to source.
Sgornick (talk | contribs)
Formatting.
Line 45: Line 45:


===Local Client's External Address===
===Local Client's External Address===
The client uses two methods to determine its own external, routable IP
The client uses two methods to determine its own external, routable IP address: it uses IRC, preferably, and if that does not succeed, it uses public web services which return the information.
address: it uses IRC, preferably, and if that does not succeed, it uses
public web services which return the information.


From a thread created for this work (called ThreadIRCSeed in irc.cpp),
From a thread created for this work (called ThreadIRCSeed in irc.cpp), the client makes an IRC connection to 92.243.23.21 or irc.lfnet.org, if the direct IP connection fails. The port is 6667.
the client makes an IRC connection to 92.243.23.21 or irc.lfnet.org,
 
if the direct IP connection fails. The port is 6667.[1]
If the connection succeeds, the client issues a USERHOST command to the IRC server, in order to get their own IP address.
If the connection succeeds, the client issues a USERHOST command to
the IRC server, in order to get their own IP address.[2]


The client also runs a thread called ThreadGetMyExternalIP (in net.cpp)
The client also runs a thread called ThreadGetMyExternalIP (in net.cpp)
Line 76: Line 72:




===Connect Callback Address===
When a node receives an initial "version" message, and that node initiated the connection, then the node advertises its address to the remote so that it can connect back to the local node if it wants to.


===Connect Callback Address===
After sending its own address, it sends a "getaddr" request message to the remote node to learn about more addresses, if the remote node version is recent or if the local node does not yet have 1000 addresses.
When a node receives an initial "version" message, and that node initiated
the connection, then the node advertises its address to the remote so
that it can connect back to the local node if it wants to.[3]
After sending its own address, it sends a "getaddr" request message
to the remote node to learn about more addresses, if the remote node
version is recent or if the local node does not yet have 1000 addresses.




Line 90: Line 82:
As of version 0.6.x the Bitcoin client no longer uses IRC bootstrapping by default.  This documentation below is accurate for most prior versions.
As of version 0.6.x the Bitcoin client no longer uses IRC bootstrapping by default.  This documentation below is accurate for most prior versions.


In addition to learning and sharing its own address, the node
In addition to learning and sharing its own address, the node learns about other node addresses via an IRC channel. See irc.cpp.
learns about other node addresses via an IRC channel. See irc.cpp.
 
After learning its own address, a node encodes its own address into a string
After learning its own address, a node encodes its own address into a string to be used as a nickname. Then, it randomly joins an IRC channel named between #bitcoin00 and #bitcoin99. Then it issues a WHO command. The thread reads the lines as they appear in the channel and decodes the IP addresses of other nodes in the channel. It does this in a loop, forever, until the node is shutdown.
to be used as a nickname. Then, it randomly joins an IRC channel named
between #bitcoin00 and #bitcoin99. Then it issues a WHO command.
The thread reads the lines as they appear in the channel and decodes
the IP addresses of other nodes in the channel. It does this in
a loop, forever, until the node is shutdown.


When the client discovers an address from IRC, it sets the timestamp
When the client discovers an address from IRC, it sets the timestamp on the address to the current time, but it uses a "penalty" of 51 minutes, which means it looks like it was actually seen almost an hour ago.
on the address to the current time, but it uses a "penalty"
of 51 minutes, which means it looks like it was actually seen
almost an hour ago.


===DNS Addresses===
===DNS Addresses===
Line 117: Line 101:


===Hard Coded "Seed" Addresses===
===Hard Coded "Seed" Addresses===
The client contains hard coded IP addresses that represent bitcoin nodes.[6]
The client contains hard coded IP addresses that represent bitcoin nodes.
These addresses are only used as a last resort, if no other method
has produced any addresses at all.[7]
When the loop in the connection handling thread ThreadOpenConnections2()
sees an empty address map, it uses the "seed" IP addresses as backup.


There is code is move away from seed nodes when possible. The presumption
These addresses are only used as a last resort, if no other method has produced any addresses at all. When the loop in the connection handling thread ThreadOpenConnections2() sees an empty address map, it uses the "seed" IP addresses as backup.
is that this is to avoid overloading those nodes. Once the local node has
 
enough addresses (presumably learned from the seed nodes), the
There is code is move away from seed nodes when possible. The presumption is that this is to avoid overloading those nodes. Once the local node has enough addresses (presumably learned from the seed nodes), the connection thread will close seed node connections.
connection thread will close seed node connections.[8]


Seed Addresses are initially given a zero timestamp,
Seed Addresses are initially given a zero timestamp,
therefore they are not advertised in response to a "getaddr" request.
therefore they are not advertised in response to a "getaddr" request.


===Ongoing "addr" advertisements===
===Ongoing "addr" advertisements===
Nodes may receive addresses in an "addr" message after having
Nodes may receive addresses in an "addr" message after having sent a "getaddr" request, or "addr" messages may arrive unsolicited, because nodes advertise addresses gratuitously when they relay addresses (see below), when they advertise their own address periodically, and when a connection is made.
sent a "getaddr" request, or "addr" messages may arrive  
unsolicited, because nodes advertise addresses gratuitously
when they relay addresses (see below), when they advertise
their own address periodically, and when a connection is made.


If the address is from a really old version, it is ignored; if from
If the address is from a really old version, it is ignored; if from a not-so-old version, it is ignored if we have 1000 addresses already.
a not-so-old version, it is ignored if we have 1000 addresses already.


If the sender sent over 1000 addresses, they are all ignored.
If the sender sent over 1000 addresses, they are all ignored.


Addresses received from an "addr" message have a timestamp, but the
Addresses received from an "addr" message have a timestamp, but the timestamp is not necessarily honored directly.
timestamp is not necessarily honored directly.




Line 153: Line 124:
# We subtract 2 hours from the timestamp and add the address.
# We subtract 2 hours from the timestamp and add the address.


Note that when any address is added, for any reason, the code that calls
Note that when any address is added, for any reason, the code that calls AddAddress() does not check to see if it already exists. The AddAddresss() function in net.cpp will do that, and if the address already exists, further processing is done to update the address record. If the advertised services of the address have changed, that is updated and stored.
AddAddress() does not check to see if it already exists. The AddAddresss()
 
function in net.cpp will do that, and if the address already exists, further
If the address has been seen in the last 24 hours and the timestamp is currently over 60 minutes old, then it is updated to 60 minutes ago.
processing is done to update the address record. If the advertised services
 
of the address have changed, that is updated and stored.
If the address has NOT been seen in the last 24 hours, and the timestamp is currently over 24 hours old, then it is updated to 24 hours ago.
If the address has been seen in the last 24 hours and the timestamp is
currently over 60 minutes old, then it is updated to 60 minutes ago.
If the address has NOT been seen in the last 24 hours, and the timestamp is
currently over 24 hours old, then it is updated to 24 hours ago.


====Address Relay====
====Address Relay====


Once addresses are added from an "addr" message (see above), they then
Once addresses are added from an "addr" message (see above), they then may be relayed to the other nodes. First, the following criteria must be set [9]:
may be relayed to the other nodes. First, the following criteria
must be set [9]:


#The address timestamp, after processing, is within 60 minutes of the current time
#The address timestamp, after processing, is within 60 minutes of the current time
Line 174: Line 139:
#The address must be routable.
#The address must be routable.


When they meet the above criteria, the node hashes all the eligible
When they meet the above criteria, the node hashes all the eligible node IP addresses, as well as the current day in the form of an integer, and the two nodes with the lowest hash value are chosen to have the address relayed to them.
node IP addresses, as well as the current day in the form of an integer,
and the two nodes with the lowest hash value are chosen to have
the address relayed to them.


====Self broadcast====
====Self broadcast====


Every 24 hours, the node advertises its own address to all connected nodes.
Every 24 hours, the node advertises its own address to all connected nodes.
It also clears the list of the addresses we think the remote node has, which
 
will trigger a refresh of sends to nodes. This code is in SendMessages()
It also clears the list of the addresses we think the remote node has, which will trigger a refresh of sends to nodes. This code is in SendMessages() in main.cpp.
in main.cpp.


====Old Address Cleanup====
====Old Address Cleanup====


In SendMessages() in main.cpp, there is code to remove old addresses.
In SendMessages() in main.cpp, there is code to remove old addresses.
This is done every ten minutes, as long as there are 3 active connections.
This is done every ten minutes, as long as there are 3 active connections.
The node erases messages that have not been used in 14 days as
long as there are at least 1000 addresses in the map, and as long
as the erasing process has not taken more than 20 seconds.


The node erases messages that have not been used in 14 days as long as there are at least 1000 addresses in the map, and as long as the erasing process has not taken more than 20 seconds.




Line 199: Line 159:
Addresses are stored in the database when AddAddress() is called.
Addresses are stored in the database when AddAddress() is called.


Addresses are read on startup when AppInit2() calls LoadAddresses(),
Addresses are read on startup when AppInit2() calls LoadAddresses(), which is located in db.cpp.
which is located in db.cpp.
 
Currently, it appears all addresses are stored all at once whenever
any address is stored or updated [10]. Indeed, AddAddress is seen
to take over .01 seconds in various testing and is typically called
tens of thousands of times in the initial 12 hours of running the
client.


Currently, it appears all addresses are stored all at once whenever any address is stored or updated. Indeed, AddAddress is seen to take over .01 seconds in various testing and is typically called tens of thousands of times in the initial 12 hours of running the client.




===Command Line Provided Addresses===
===Command Line Provided Addresses===


The user can specify nodes to connect to with the -addnode <ip>
The user can specify nodes to connect to with the
-addnode <ip>  
command line argument. Multiple nodes may be specified.
command line argument. Multiple nodes may be specified.


Addresses provided on the command line are initially given a zero
Addresses provided on the command line are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.
timestamp, therefore they are not advertised in response to a "getaddr"
request.


The user can also specify an address to connect to with the -connect <ip>
The user can also specify an address to connect to with the -connect <ip>
command line argument. Multiple nodes may be specified.
command line argument. Multiple nodes may be specified.
The -connect argument differs from -addnode in that -connect addresses
are not added to the address database and when -connect is specified,
only those addresses are used.


The -connect argument differs from -addnode in that -connect addresses are not added to the address database and when -connect is specified, only those addresses are used.


===Text File Provided Addresses===


===Text File Provided Addresses===
The client will automatically read a file named "addr.txt" in the bitcoin data directory and will add any addresses it finds in there as node addresses. These nodes are given no special preference over other addresses. They are just added to the pool.
The client will automatically read a file named "addr.txt" in the
bitcoin data directory and will add any addresses it finds in there
as node addresses. These nodes are given no special preference over
other addresses. They are just added to the pool.


Addresses loaded from the text file are initially given a zero timestamp,
Addresses loaded from the text file are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.
therefore they are not advertised in response to a "getaddr" request.




Line 241: Line 188:
* [[Network#Bootstrapping|Network - Boostrapping]]
* [[Network#Bootstrapping|Network - Boostrapping]]


==Footnotes==
==References==
1. See:
<references />
CAddress addrConnect("92.243.23.21", 6667); // irc.lfnet.org
in ThreadIRCSeed2() in irc.cpp.
 
2. See:
GetIPFromIRC()
in irc.cpp.
 
3. See:
// Advertise our address
in ProcessMessage() in main.cpp where strCommand == "version"
 
 
6. See
  pnSeed
in net.cpp
 
7. See:
        if (mapAddresses.empty() && (GetTime() - nStart > 60 || fTOR) && !fTestNet)
in ThreadOpenConnections2() in net.cpp.
 
8. See:
            if (fSeedUsed && mapAddresses.size() > ARRAYLEN(pnSeed) + 100)
            { 
                // Disconnect seed nodes
in ThreadOpenConnections2() in net.cpp.
 
9. See:
if (addr.nTime > nSince && !pfrom->fGetAddr && vAddr.size() <= 10 && addr.IsRoutable())
in ProcessMessage() in main.cpp where strCommand == "addr"
 
10. See [https://bitcointalk.org/index.php?topic=26436.0 bitcointalk.org]


[[Category:Developer]]
[[Category:Developer]]
[[Category:Technical]]
[[Category:Technical]]

Revision as of 21:39, 17 May 2012

Overview

The Satoshi client discovers the IP address and port of nodes in several different ways.

  1. Nodes discover their own external address by various methods.
  2. Nodes receive the callback address of remote nodes that connect to them.
  3. Nodes connect to IRC to receive addresses.
  4. Nodes makes DNS request to receive IP addresses.
  5. Nodes can use addresses hard coded into the software.
  6. Nodes exchange addresses with other nodes.
  7. Nodes store addresses in a database and read that database on startup.
  8. Nodes can be provided addresses as command line arguments
  9. Nodes read addresses from a user provided text file on startup

A timestamp is kept for each address to keep track of when the node address was last seen. The AddressCurrentlyConnected in net.cpp handles updating the timestamp whenever a message is received from a node. Timestamps are only updated on an address and saved to the database when the timestamp is over 20 minutes old.

See the Node Connectivity article for information on which type of addresses take precedence when actually connecting to nodes.

In the first section we will cover how a node handles a request for addresses via the "getaddr" message. By understanding the role of timestamps, it will become more clear why timestamps are kept the way they are for each of the different ways an address is discovered.


Handling Message "getaddr"

When a node receives a "getaddr" request, it first figures out how many addresses it has that have a timestamp in the last 3 hours. Then it sends those addresses, but if there are more than 2500 addresses seen in the last 3 hours, it selects around 2500 out of the available recent addresses by using random selection.


Now lets look at the ways a node finds out about node addresses.


Discovery Methods

Local Client's External Address

The client uses two methods to determine its own external, routable IP address: it uses IRC, preferably, and if that does not succeed, it uses public web services which return the information.

From a thread created for this work (called ThreadIRCSeed in irc.cpp), the client makes an IRC connection to 92.243.23.21 or irc.lfnet.org, if the direct IP connection fails. The port is 6667.

If the connection succeeds, the client issues a USERHOST command to the IRC server, in order to get their own IP address.

The client also runs a thread called ThreadGetMyExternalIP (in net.cpp) which attempts to determine the client's IP address as seen from the outside world. It gives the IRC thread a chance to discover the IP address first, sleeping and checking periodically for 2 minutes, and then it proceeds if the IRC method did not succeed.

First, it attempts to connect to 91.198.22.70 port 80, which should be the checkip.dyndns.org server. If connection fails, a DNS request is made for checkip.dyndns.org and a connection is attempted to that address. Next, it attemps to connect to 74.208.43.192 port 80, which should be the www.showmyip.com server. If connection fails, a DNS request is made for www.showmyip.com and a connection is attempted to that address.

For each address attempted above, the client attempts to connect, send a HTTP request, read the appropriate response line, and parse the IP address from it. If this succeeds, the IP is returned, it is advertised to any connected nodes, and then the thread finishes (without proceeding to the next address).


Connect Callback Address

When a node receives an initial "version" message, and that node initiated the connection, then the node advertises its address to the remote so that it can connect back to the local node if it wants to.

After sending its own address, it sends a "getaddr" request message to the remote node to learn about more addresses, if the remote node version is recent or if the local node does not yet have 1000 addresses.


IRC Addresses

As of version 0.6.x the Bitcoin client no longer uses IRC bootstrapping by default. This documentation below is accurate for most prior versions.

In addition to learning and sharing its own address, the node learns about other node addresses via an IRC channel. See irc.cpp.

After learning its own address, a node encodes its own address into a string to be used as a nickname. Then, it randomly joins an IRC channel named between #bitcoin00 and #bitcoin99. Then it issues a WHO command. The thread reads the lines as they appear in the channel and decodes the IP addresses of other nodes in the channel. It does this in a loop, forever, until the node is shutdown.

When the client discovers an address from IRC, it sets the timestamp on the address to the current time, but it uses a "penalty" of 51 minutes, which means it looks like it was actually seen almost an hour ago.

DNS Addresses

Upon startup, if peer node discovery is needed, the client then issues DNS requests to learn about the addresses of other peer nodes. The client includes a list of nost names for DNS services that are seeded. As-of May 17, 2012 the list (from net.cpp[1]) includes:

  • bitseed.xf2.org
  • dnsseed.bluematt.me
  • seed.bitcoin.sipa.be
  • dnsseed.bitcoin.dashjr.org

A DNS reply can contain multiple IP addresses for a requested name.

Addresses discovered via DNS are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.

Hard Coded "Seed" Addresses

The client contains hard coded IP addresses that represent bitcoin nodes.

These addresses are only used as a last resort, if no other method has produced any addresses at all. When the loop in the connection handling thread ThreadOpenConnections2() sees an empty address map, it uses the "seed" IP addresses as backup.

There is code is move away from seed nodes when possible. The presumption is that this is to avoid overloading those nodes. Once the local node has enough addresses (presumably learned from the seed nodes), the connection thread will close seed node connections.

Seed Addresses are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.

Ongoing "addr" advertisements

Nodes may receive addresses in an "addr" message after having sent a "getaddr" request, or "addr" messages may arrive unsolicited, because nodes advertise addresses gratuitously when they relay addresses (see below), when they advertise their own address periodically, and when a connection is made.

If the address is from a really old version, it is ignored; if from a not-so-old version, it is ignored if we have 1000 addresses already.

If the sender sent over 1000 addresses, they are all ignored.

Addresses received from an "addr" message have a timestamp, but the timestamp is not necessarily honored directly.


For every address in the message:

  1. If the timestamp is too low or too high, it is set to 5 days ago.
  2. We subtract 2 hours from the timestamp and add the address.

Note that when any address is added, for any reason, the code that calls AddAddress() does not check to see if it already exists. The AddAddresss() function in net.cpp will do that, and if the address already exists, further processing is done to update the address record. If the advertised services of the address have changed, that is updated and stored.

If the address has been seen in the last 24 hours and the timestamp is currently over 60 minutes old, then it is updated to 60 minutes ago.

If the address has NOT been seen in the last 24 hours, and the timestamp is currently over 24 hours old, then it is updated to 24 hours ago.

Address Relay

Once addresses are added from an "addr" message (see above), they then may be relayed to the other nodes. First, the following criteria must be set [9]:

  1. The address timestamp, after processing, is within 60 minutes of the current time
  2. The "addr" message contains 10 addresses or less
  3. And fGetAddr is not set on the node. fGetAddr starts false, is set to true when we request addresses from a node, and it is cleared when we receive less than 1000 addresses from a node.
  4. The address must be routable.

When they meet the above criteria, the node hashes all the eligible node IP addresses, as well as the current day in the form of an integer, and the two nodes with the lowest hash value are chosen to have the address relayed to them.

Self broadcast

Every 24 hours, the node advertises its own address to all connected nodes.

It also clears the list of the addresses we think the remote node has, which will trigger a refresh of sends to nodes. This code is in SendMessages() in main.cpp.

Old Address Cleanup

In SendMessages() in main.cpp, there is code to remove old addresses.

This is done every ten minutes, as long as there are 3 active connections.

The node erases messages that have not been used in 14 days as long as there are at least 1000 addresses in the map, and as long as the erasing process has not taken more than 20 seconds.


Addresses stored in the Database

Addresses are stored in the database when AddAddress() is called.

Addresses are read on startup when AppInit2() calls LoadAddresses(), which is located in db.cpp.

Currently, it appears all addresses are stored all at once whenever any address is stored or updated. Indeed, AddAddress is seen to take over .01 seconds in various testing and is typically called tens of thousands of times in the initial 12 hours of running the client.


Command Line Provided Addresses

The user can specify nodes to connect to with the

-addnode <ip> 

command line argument. Multiple nodes may be specified.

Addresses provided on the command line are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.

The user can also specify an address to connect to with the -connect <ip> command line argument. Multiple nodes may be specified.

The -connect argument differs from -addnode in that -connect addresses are not added to the address database and when -connect is specified, only those addresses are used.

Text File Provided Addresses

The client will automatically read a file named "addr.txt" in the bitcoin data directory and will add any addresses it finds in there as node addresses. These nodes are given no special preference over other addresses. They are just added to the pool.

Addresses loaded from the text file are initially given a zero timestamp, therefore they are not advertised in response to a "getaddr" request.


See Also

References