[bug]: LND falls out of sync when Bitcoin Core's IP address changes #9353

kallerosenbaum · 2024-12-12T13:20:41Z

Background

We run two LND nodes in kubernetes, and after restarting the backing Bitcoin Core node, we notice that LND falls out of sync with the blockchain.

This happens because, in our kubernetes environment, the IP address of Bitcoin Core changes when it is restarted. synced_to_chain will become false and no new blocks will be received.

Your environment

version of lnd: v0.18.2-beta
which operating system (uname -a on *Nix):
Linux lnd-routing-0 6.8.0-1018-aws #19~22.04.1-Ubuntu SMP Wed Oct 9 17:10:38 UTC 2024 aarch64 Linux
and Linux 9db991b293cb 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 Linux
version of btcd, bitcoind, or other backend: Bitcon Core 27.0
any other relevant environment details: We run our stack in kubernetes

Steps to reproduce

I'll show how I reproduce it in regtest, but we get the same issue in production (running in kubernetes) too.

We run LND with the following config in docker-compose:

        --listen=0.0.0.0:9735
        --externalip=lnd-0
        --rpclisten=0.0.0.0:10009
        --bitcoin.active
        --bitcoin.node=bitcoind
        --bitcoin.regtest
        --bitcoind.rpcuser=test
        --bitcoind.rpcpass=password
        --bitcoind.rpchost=bitcoin:18443
        --bitcoind.zmqpubrawblock=tcp://bitcoin:18501
        --bitcoind.zmqpubrawtx=tcp://bitcoin:18502
        --norest
        --protocol.wumbo-channels

When running this, bitcoin resolves to 172.18.0.2.

Build some blocks and make sure LND is in sync by running lncli -network=regtest getinfo and check that synced_to_chain is true.
Stop bitcoin core, and restart it again, but this time make sure it gets a new IP address, so from now on bitcoin resolves to e.g. 172.18.0.6.
Build a block
Run lncli -network=regtest getinfo. synced_to_chain will be false, but block_height and block_hash will be the most recent one.

After this, LND will not receive any new blocks, but it has apparently reconnected (presumably through RPC) to get the latest block hash. My guess is that ZMQ stops working due to the IP address change.

Expected behaviour

After reconnecting to the node it should eventually show "synced_to_chain": true. Alternatively (it it's a ZMQ connection issue) I'd expect LND to scream pretty loudly in the log.

Actual behaviour

"synced_to_chain": false indefinitely and we see no new logs of type

[INF] NTFN: New block: height=873198, sha=000000000000000000007b48042479e4f07ce2d6ae9a79c2a3ef5223dc78dd5c

The text was updated successfully, but these errors were encountered:

Roasbeef · 2024-12-12T13:51:11Z

Are you running with the health check system on? It's meant to catch failures like this, then cause a restart of lnd. It seems like you expect that lnd will resolve the bitcoind host again automatically, but atm we do the resolution once, then use the IP from there on.

Here're the health check params I'm referring to:

; The number of times we should attempt to query our chain backend before
; gracefully shutting down. Set this value to 0 to disable this health check.
; healthcheck.chainbackend.attempts=3

; The amount of time we allow a call to our chain backend to take before we fail
; the attempt. This value must be >= 1s.
; healthcheck.chainbackend.timeout=30s

; The amount of time we should backoff between failed attempts to query chain
; backend. This value must be >= 1s.
; healthcheck.chainbackend.backoff=2m

; The amount of time we should wait between chain backend health checks. This
; value must be >= 1m.
; healthcheck.chainbackend.interval=1m

kallerosenbaum · 2024-12-12T15:34:49Z

@Roasbeef yes, it's on, and in production we've set

--healthcheck.chainbackend.attempts=30

And we see the following from healthcheck after restart:


2024-12-04 09:55:59.568 [INF] HLCK: Health check: chain backend, call: 1 failed with: invalid http POST response (nil), method: uptime, id: 1215, last error=Post "http://bitcoin-0.bitcoin.crypto.svc.cluster.local:8332": dial tcp: lookup bitcoin-0.bitcoin.crypto.svc.cluster.local on 169.254.20.10:53: no such host, backing off for: 2m0s
2024-12-04 09:58:22.107 [INF] HLCK: Health check: chain backend, call: 2 failed with: invalid http POST response (nil), method: uptime, id: 1216, last error=Post "http://bitcoin-0.bitcoin.crypto.svc.cluster.local:8332": dial tcp: lookup bitcoin-0.bitcoin.crypto.svc.cluster.local on 169.254.20.10:53: no such host, backing off for: 2m0s
2024-12-04 10:00:44.648 [INF] HLCK: Health check: chain backend, call: 3 failed with: invalid http POST response (nil), method: uptime, id: 1217, last error=Post "http://bitcoin-0.bitcoin.crypto.svc.cluster.local:8332": dial tcp: lookup bitcoin-0.bitcoin.crypto.svc.cluster.local on 169.254.20.10:53: no such host, backing off for: 2m0s

Then it succeeds to connect to the RPC port (in spite of IP address change). So at least RPC can handle an IP address change. My guess is that it's the ZMQ connection that stops working, and the health check doesn't verify that connection. So health check doesn't help here.

Dominion5254 · 2025-03-17T21:49:16Z

I'm running into something similar but it seems LND is not able to recover and connect to the new container IP of bitcoind. This occurred after updating Bitcoin Core from 28.0 -> 28.1 resulting in a new container IP for Bitcoin Core. LND logs show the failed Health Check, but it never seems to recover.

2025-03-17T14:43:06-06:00  2025-03-17 20:43:06.807 [CRT] SRVR: Health check: chain backend failed after 5 calls
2025-03-17T14:43:06-06:00  2025-03-17 20:43:06.807 [INF] SRVR: Sending request for shutdown
2025-03-17T14:43:06-06:00  2025-03-17 20:43:06.807 [INF] LTND: Received shutdown request.
2025-03-17T14:43:06-06:00  2025-03-17 20:43:06.808 [INF] LTND: Shutting down...
2025-03-17T14:43:06-06:00  2025-03-17 20:43:06.808 [INF] LTND: Gracefully shutting down.
2025-03-17T14:43:06-06:00  2025-03-17 20:43:06.808 [INF] NANN: Channel Status Manager shutting down...
2025-03-17T14:43:06-06:00  2025-03-17 20:43:06.818 [INF] HSWC: HTLC Switch shutting down...
2025-03-17T14:43:06-06:00  2025-03-17 20:43:06.828 [INF] NTFN: Cancelling epoch notification, epoch_id=6
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.530 [INF] HSWC: Onion processor shutting down...
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.530 [INF] HSWC: Decaying hash log received shutdown request
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.530 [INF] NTFN: Cancelling epoch notification, epoch_id=11
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.530 [INF] INVC: InvoiceRegistry shutting down...
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.530 [INF] INVC: InvoiceRegistry shutting down...
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.530 [INF] NTFN: Cancelling epoch notification, epoch_id=10
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.531 [INF] HSWC: InterceptableSwitch shutting down...
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.531 [INF] NTFN: Cancelling epoch notification, epoch_id=7
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.531 [INF] CRTR: Channel Router shutting down...
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.531 [INF] CNCT: ChainArbitrator shutting down...
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.531 [INF] NTFN: Cancelling epoch notification, epoch_id=8
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.531 [INF] FNDG: Funding manager shutting down...
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.531 [INF] BRAR: Breach arbiter shutting down...
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.531 [INF] UTXN: UTXO nursery shutting down...
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.531 [INF] NTFN: Cancelling epoch notification, epoch_id=5
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.531 [INF] DISC: Authenticated gossiper shutting down...
2025-03-17T14:43:35-06:00  2025-03-17 20:43:35.531 [INF] NTFN: Cancelling epoch notification, epoch_id=9
2025-03-17T14:43:50-06:00  2025-03-17 20:43:50.071 [ERR] RPCS: [/lnrpc.Lightning/GetInfo]: unable to get best block info: invalid http POST response (nil), method: getblockchaininfo, id: 811, last error=Post "http://bitcoind.embassy:8332": dial tcp 172.18.0.68:8332: connect: no route to host

Bitcoin Core's RPC is accessible at bitcoind.embassy:8332, but LND is not able to connect to the host until LND is restarted. It would be desirable for LND to be resilient to changes in Bitcoin Core's container IP instead of requiring the user to restart LND themselves.

Env:
Bitcoin Core 28.0 -> 28.1
LND 0.18.5
StartOS 0.3.5.1

guggero · 2025-03-18T21:38:42Z

@Dominion5254 how does Start9 configure the bitcoind host in the lnd settings? Does it give a container host name or does it resolve an IP and use that directly?
If it's the former, then it would be something lnd has to fix. If it's the latter then Start9 would need to fix that.

Dominion5254 · 2025-03-19T04:18:59Z

It is the former, bitcoind has a static hostname which LND uses.

guggero · 2025-03-20T16:40:44Z

Hmm, okay. Then I guess we need to make sure we re-resolve the IP address when reconnecting.

kallerosenbaum added bug Unintended code behaviour needs triage labels Dec 12, 2024

saubyk added this to the 0.20.0 milestone Dec 19, 2024

saubyk added P1 MUST be fixed or reviewed P2 should be fixed if one has time and removed needs triage P1 MUST be fixed or reviewed labels Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug]: LND falls out of sync when Bitcoin Core's IP address changes #9353

[bug]: LND falls out of sync when Bitcoin Core's IP address changes #9353

kallerosenbaum commented Dec 12, 2024 •

edited

Loading

Roasbeef commented Dec 12, 2024 •

edited

Loading

kallerosenbaum commented Dec 12, 2024

Dominion5254 commented Mar 17, 2025

guggero commented Mar 18, 2025

Dominion5254 commented Mar 19, 2025

guggero commented Mar 20, 2025

[bug]: LND falls out of sync when Bitcoin Core's IP address changes #9353

[bug]: LND falls out of sync when Bitcoin Core's IP address changes #9353

Comments

kallerosenbaum commented Dec 12, 2024 • edited Loading

Background

Your environment

Steps to reproduce

Expected behaviour

Actual behaviour

Roasbeef commented Dec 12, 2024 • edited Loading

kallerosenbaum commented Dec 12, 2024

Dominion5254 commented Mar 17, 2025

guggero commented Mar 18, 2025

Dominion5254 commented Mar 19, 2025

guggero commented Mar 20, 2025

kallerosenbaum commented Dec 12, 2024 •

edited

Loading

Roasbeef commented Dec 12, 2024 •

edited

Loading