Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution of bootnode hostnames not working #31208

Open
powerslider opened this issue Feb 18, 2025 · 11 comments
Open

DNS resolution of bootnode hostnames not working #31208

powerslider opened this issue Feb 18, 2025 · 11 comments
Labels

Comments

@powerslider
Copy link

System information

Geth version: v1.15.2
OS & Version: macOS Sequoia 15.2

Expected behaviour

When I start a geth node with configured bootnode addresses in the format of enode:<public_key>@<hostname> where the hostname is a valid DNS resolvable address I expect the node to join the network and operate normally. NOTE: We are not specifying IPs because we want to rely on the dynamic DNS resolve functionality introduced in v1.15.0.

Actual behaviour

After I run the geth node with the above mentioned configured bootnodes using docker:

docker run --rm --volume=./config.toml:/tmp/config.toml ethereum/client-go:v1.15.2 --config=/tmp/config.toml

I am unable to start the node because of getting:

Fatal: Error starting protocol stack: bad bootstrap node "enode://3116e85de20404db0c64a75b72afffa90e914b2e7c5e7141c445e03fd6702c3da986e23ea554b87c1b6feb58e2423a8588ec17b9a635f3fa0ab2c0b341bb0cf5@foo.com:30303": missing IP address
Fatal: Error starting protocol stack: bad bootstrap node "enode://3116e85de20404db0c64a75b72afffa90e914b2e7c5e7141c445e03fd6702c3da986e23ea554b87c1b6feb58e2423a8588ec17b9a635f3fa0ab2c0b341bb0cf5@foo.com:30303": missing IP address

Steps to reproduce the behaviour

Given the following [Node.P2P] section of config.toml file:

[Node.P2P]
NoDiscovery = false
DiscoveryV4 = true
DiscoveryV5 = true
BootstrapNodes = ["enode://3116e85de20404db0c64a75b72afffa90e914b2e7c5e7141c445e03fd6702c3da986e23ea554b87c1b6feb58e2423a8588ec17b9a635f3fa0ab2c0b341bb0cf5@foo.com:30303"]

[Node.HTTPTimeouts]
ReadTimeout = 30000000000
ReadHeaderTimeout = 30000000000
WriteTimeout = 30000000000
IdleTimeout = 120000000000

and running the node via docker:

docker run --rm --volume=./config.toml:/tmp/config.toml ethereum/client-go:v1.15.2 --config=/tmp/config.toml

I get:

Fatal: Error starting protocol stack: bad bootstrap node "enode://3116e85de20404db0c64a75b72afffa90e914b2e7c5e7141c445e03fd6702c3da986e23ea554b87c1b6feb58e2423a8588ec17b9a635f3fa0ab2c0b341bb0cf5@foo.com:30303": missing IP address
Fatal: Error starting protocol stack: bad bootstrap node "enode://3116e85de20404db0c64a75b72afffa90e914b2e7c5e7141c445e03fd6702c3da986e23ea554b87c1b6feb58e2423a8588ec17b9a635f3fa0ab2c0b341bb0cf5@foo.com:30303": missing IP address

Backtrace

INFO [02-18|22:42:17.209] Starting Geth on Ethereum mainnet...
INFO [02-18|22:42:17.210] Bumping default cache on mainnet         provided=1024 updated=4096
INFO [02-18|22:42:17.219] Maximum peer count                       ETH=50 total=50
INFO [02-18|22:42:17.221] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
WARN [02-18|22:42:17.229] Sanitizing cache to Go's GC limits       provided=4096 updated=2612
INFO [02-18|22:42:17.230] Set global gas cap                       cap=50,000,000
INFO [02-18|22:42:17.230] Initializing the KZG library             backend=gokzg
INFO [02-18|22:42:17.290] Enabling metrics collection
INFO [02-18|22:42:17.291] Enabling stand-alone metrics HTTP endpoint address=127.0.0.1:6060
INFO [02-18|22:42:17.291] Starting metrics server                  addr=http://127.0.0.1:6060/debug/metrics
INFO [02-18|22:42:17.291] Allocated trie memory caches             clean=391.00MiB dirty=653.00MiB
INFO [02-18|22:42:17.292] Defaulting to pebble as the backing database
INFO [02-18|22:42:17.292] Allocated cache and file handles         database=/geth/geth/chaindata cache=1.28GiB handles=524,288
INFO [02-18|22:42:17.337] Opened ancient database                  database=/geth/geth/chaindata/ancient/chain readonly=false
INFO [02-18|22:42:17.338] State schema set to default              scheme=path
INFO [02-18|22:42:17.338] Initialising Ethereum protocol           network=1650 dbversion=<nil>
ERROR[02-18|22:42:17.339] Head block is not reachable
WARN [02-18|22:42:17.340] Sanitizing invalid node buffer size      provided=653.00MiB updated=256.00MiB
INFO [02-18|22:42:17.351] Opened ancient database                  database=/geth/geth/chaindata/ancient/state readonly=false
INFO [02-18|22:42:17.351] Initialized path database                cache=391.00MiB buffer=256.00MiB history=90000
INFO [02-18|22:42:17.351] Writing default main-net genesis block
INFO [02-18|22:42:17.559]
INFO [02-18|22:42:17.559] ---------------------------------------------------------------------------------------------------------------------------------------------------------
INFO [02-18|22:42:17.560] Chain ID:  1 (mainnet)
INFO [02-18|22:42:17.560] Consensus: Beacon (proof-of-stake), merged from Ethash (proof-of-work)
INFO [02-18|22:42:17.560]
INFO [02-18|22:42:17.560] Pre-Merge hard forks (block based):
INFO [02-18|22:42:17.560]  - Homestead:                   #1150000  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/homestead.md)
INFO [02-18|22:42:17.560]  - DAO Fork:                    #1920000  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/dao-fork.md)
INFO [02-18|22:42:17.560]  - Tangerine Whistle (EIP 150): #2463000  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/tangerine-whistle.md)
INFO [02-18|22:42:17.560]  - Spurious Dragon/1 (EIP 155): #2675000  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md)
INFO [02-18|22:42:17.560]  - Spurious Dragon/2 (EIP 158): #2675000  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md)
INFO [02-18|22:42:17.560]  - Byzantium:                   #4370000  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/byzantium.md)
INFO [02-18|22:42:17.560]  - Constantinople:              #7280000  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/constantinople.md)
INFO [02-18|22:42:17.560]  - Petersburg:                  #7280000  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/petersburg.md)
INFO [02-18|22:42:17.560]  - Istanbul:                    #9069000  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/istanbul.md)
INFO [02-18|22:42:17.560]  - Muir Glacier:                #9200000  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/muir-glacier.md)
INFO [02-18|22:42:17.560]  - Berlin:                      #12244000 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/berlin.md)
INFO [02-18|22:42:17.560]  - London:                      #12965000 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/london.md)
INFO [02-18|22:42:17.560]  - Arrow Glacier:               #13773000 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/arrow-glacier.md)
INFO [02-18|22:42:17.560]  - Gray Glacier:                #15050000 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/gray-glacier.md)
INFO [02-18|22:42:17.560]
INFO [02-18|22:42:17.560] Merge configured:
INFO [02-18|22:42:17.560]  - Hard-fork specification:    https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/paris.md
INFO [02-18|22:42:17.560]  - Network known to be merged
INFO [02-18|22:42:17.560]  - Total terminal difficulty:  58750000000000000000000
INFO [02-18|22:42:17.560]
INFO [02-18|22:42:17.560] Post-Merge hard forks (timestamp based):
INFO [02-18|22:42:17.560]  - Shanghai:                    @1681338455 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/shanghai.md)
INFO [02-18|22:42:17.560]  - Cancun:                      @1710338135 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/cancun.md)
INFO [02-18|22:42:17.560]
INFO [02-18|22:42:17.560] ---------------------------------------------------------------------------------------------------------------------------------------------------------
INFO [02-18|22:42:17.560]
INFO [02-18|22:42:17.561] Loaded most recent local block           number=0 hash=d4e567..cb8fa3 age=55y11mo1w
WARN [02-18|22:42:17.562] Failed to load snapshot                  err="missing or corrupted snapshot"
INFO [02-18|22:42:17.562] Rebuilding state snapshot
INFO [02-18|22:42:17.563] Initialized transaction indexer          range="last 2350000 blocks"
INFO [02-18|22:42:17.565] Resuming state snapshot generation       root=d7f897..0f0544 accounts=0 slots=0 storage=0.00B dangling=0 elapsed=2.269ms
INFO [02-18|22:42:17.597] Generated state snapshot                 accounts=8893 slots=0 storage=409.64KiB dangling=0 elapsed=35.117ms
INFO [02-18|22:42:17.619] Gasprice oracle is ignoring threshold set threshold=2
WARN [02-18|22:42:17.620] Engine API enabled                       protocol=eth
INFO [02-18|22:42:17.620] Starting peer-to-peer node               instance=Geth/TestJoinNetwork/v1.15.2-stable-c8c62daf/linux-amd64/go1.23.6
Fatal: Error starting protocol stack: bad bootstrap node "enode://3116e85de20404db0c64a75b72afffa90e914b2e7c5e7141c445e03fd6702c3da986e23ea554b87c1b6feb58e2423a8588ec17b9a635f3fa0ab2c0b341bb0cf5@foo.com:30303": missing IP address
Fatal: Error starting protocol stack: bad bootstrap node "enode://3116e85de20404db0c64a75b72afffa90e914b2e7c5e7141c445e03fd6702c3da986e23ea554b87c1b6feb58e2423a8588ec17b9a635f3fa0ab2c0b341bb0cf5@foo.com:30303": missing IP address
@willianpaixao
Copy link
Contributor

cc @0xVasconcelos

@willianpaixao
Copy link
Contributor

@powerslider could you post a test of another docker container resolving that name? I just want to rule out the basic.

@atenjin
Copy link

atenjin commented Feb 19, 2025

meet same issue, I can ensure it's caused by this pr #29801

@powerslider @fjl it must a bug which comes from this pr, in the code, it just parse the ip from the encoded string, do nothing. and in check logic, if it does not have ip, then exit.

@fjl
Copy link
Contributor

fjl commented Feb 19, 2025

In #30822, we changed the DNS logic so that resolution has to happen explicitly. It is only implemented for static nodes (admin_addPeer or config file) right now.

@atenjin
Copy link

atenjin commented Feb 19, 2025

In #30822, we changed the DNS logic so that resolution has to happen explicitly. It is only implemented for static nodes (admin_addPeer or config file) right now.

So you do you mean for now we can not add bootnodes with dns in cli command line? Do you have some plan to support it in future?

@corverroos
Copy link

The problem is that this is a regression. It worked in v1.14.X where DNS resolution happened when config was parsed. We have been running DNS based bootnodes at Omni Network for a long time. We cannnot upgrade to v1.15 due to this regression.

The root cause of the issue is in p2p.enode::Node.ValidateComplete that errors when only a hostname is set (and not an IP).

Note that ValidateComplete is depreceated with comment: Deprecated: don't use this method..

The solution is to either fix ValidateComplete to not error if hostname is set but not IP. Or to call some other method in p2p/discover::Table.setFallbackNodes.

@fjl
Copy link
Contributor

fjl commented Feb 19, 2025

I looked into it and it's not so easy to fix. As @corverroos mentions, we used to resolve the hostname at parsing time but this is not done anymore. The resolve was moved to dial time for TCP connections because it will keep retrying the resolve that way.

For bootstrap nodes, I also think it'd be good to retry the resolve occasionally.

@corverroos
Copy link

@fjl What about a simple workaround to manually resolve bootstrap enode DNS to IP on startup? So calling similar logic to dnsResolveHostname in p2p/discover::Table.setFallbackNodes before calling ValidateComplete . This at least retains previous behaviour.

Re-resolving bootstrap DNS definitely sounds better, but that isn't a blocker for us at the moment.

@sebastianst
Copy link
Contributor

When we updated our op-geth dependency on go-ethereum and lost the DNS resolution, we added it back for our consensus client op-node with ethereum-optimism/optimism@f571e33 - however this only works for old enode:// URLs.

@Melvillian
Copy link

In order to get around this at Unichain we've had to add some logic in our startup scripts which manually resolves domain names using dig, which is pretty hacky.

@fjl you mention the change was introduced to help in containerized environments, which makes sense. Our nodes run in a containerized environment but we are now needing to add this extra logic to implement DNS-resolution for all of our static peers.

I don't know if it's related or not, but we found that admin_addPeer also required the manual DNS-resolution, else our nodes would not add any nodes that were specified using DNS as peers.

@fjl
Copy link
Contributor

fjl commented Feb 28, 2025

For admin_addPeer (and static peers in configuration), DNS names are resolved whenever the TCP connection is attempted.

sebastianst added a commit to ethereum-optimism/op-geth that referenced this issue Mar 3, 2025
- removed in ethereum/go-ethereum#30822 in favor of on-demand runtime dialling
- reported to have removed bootnodes DNS resolution at ethereum/go-ethereum#31208
- possibly broke DNS resolution for other methods of adding peers
sebastianst added a commit to ethereum-optimism/op-geth that referenced this issue Mar 3, 2025
- removed in ethereum/go-ethereum#30822 in favor of on-demand runtime dialling
- reported to have removed bootnodes DNS resolution at ethereum/go-ethereum#31208
- possibly broke DNS resolution for other methods of adding peers
sebastianst added a commit to ethereum-optimism/op-geth that referenced this issue Mar 3, 2025
- removed in ethereum/go-ethereum#30822 in favor of on-demand runtime dialling
- reported to have removed bootnodes DNS resolution at ethereum/go-ethereum#31208
- possibly broke DNS resolution for other methods of adding peers
corverroos added a commit to omni-network/omni that referenced this issue Mar 6, 2025
Bump geth to v1.15.5.

Note that geth `config.toml` now supports the previously flags-only
`--nat` and `--metrics`:
```
[Node.P2P]
NAT = any|extip:1.2.3.4

[Metrics]
Enabled = true
```

This geth version is required to query pectra chains (holesky/sepolia).

Note the
[regression](ethereum/go-ethereum#31208) and
workaround of using IPs for geth bootnodes.

issue: none

---------

Co-authored-by: Christian Müller <christian@omni.network>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants