geth consumes all ram; drops blocks, peers #20963

nysxah · 2020-04-23T04:19:18Z

Hi, we are running several Geth nodes.

Every week, at least one node which is synced to tip 'loses' 4-10k blocks and begins re-syncing them. At the same time, all peers are dropped/disconnected.

We are seeing OOM errors around the time this happens.

RAM usage creeps up to 100%, then the blocks & peers are dropped.

We upgraded one node to 16GB from 8GB, and it slowly consumed the additional memory and the issue happened again.

What could this be related to, where should we look, and which flags could we modify to potentially resolve this issue?

System information

Geth version: 1.9.12-stable
OS & Version: Ubuntu 16.04.6 LTS

Expected behaviour

node stays in-sync with the network

Actual behaviour

eats up available RAM (8-16GB) and drops 4-10k blocks, begins resyncing them

also drops peers

Steps to reproduce the behaviour

unclear; the node is running and answering RPC requests via websockets. sometimes the issue coincides to a sudden influx of requests to the node, but not always.

The text was updated successfully, but these errors were encountered:

karalabe · 2020-04-23T07:37:05Z

Could you please provide the command you use to run Geth?

The reason for the resync is that the recent state is kept in memory for garbage collection. This however means that a crash loses all that, so when you restart, you need to reprocess the lost blocks.

karalabe · 2020-04-23T07:37:31Z

Could you also provide a memory chart? Would be nice to see the consumption.

cp0k · 2020-04-23T21:06:35Z

The command we are using to run Geth:

/usr/bin/geth --rpcapi eth,web3,debug,txpool,net,shh,db,admin,debug --rpc --ws --wsapi eth,web3,debug,txpool,net,shh,db,admin,debug --wsorigins localhost --gcmode full --rpcport=8547 --maxpeers 250

Memory / system charts for the node in question:

FYI...we are observing this same exact behavior on a node that is 100% idle, having absolutely zero requests thrown at it.

In case you are wondering why the memory chart has a bunch of sudden drops, this is because we also have a bash script that checks total memory in use, if total memory exceeds 80%, geth is restarted. This was put in place as a "bandaid" till we get to the root cause of the issue.

karalabe · 2020-04-24T07:56:30Z

Which version of Go did you build it with? 1.14.0 and 1.14.1 had a GC bug (golang/go#37525) that cause Geth to explode on memory use. It was fixed in 1.14.2.

karalabe · 2020-04-24T08:00:55Z

Another thing that could help, is when your node enters into this strange high memory use, dangerously close to being killed, please run a debug.stacks() from a Geth console. That will create a dump of all the running goroutines. If you share that with us, we might check if there's some leak that might result in memory accumulation.

cp0k · 2020-04-24T19:33:05Z

Which version of Go did you build it with? 1.14.0 and 1.14.1 had a GC bug (golang/go#37525) that cause Geth to explode on memory use. It was fixed in 1.14.2.

Looks like we are running go 1.13.8:

# geth version
Geth
Version: 1.9.12-stable
Git Commit: b6f1c8dcc058a936955eb8e5766e2962218924bc
Git Commit Date: 20200316
Architecture: amd64
Protocol Versions: [65 64 63]
Go Version: go1.13.8
Operating System: linux
GOPATH=
GOROOT=/home/travis/.gimme/versions/go1.13.8.linux.amd64

Another thing that could help, is when your node enters into this strange high memory use, dangerously close to being killed, please run a debug.stacks() from a Geth console. That will create a dump of all the running goroutines. If you share that with us, we might check if there's some leak that might result in memory accumulation.

Thank you for the tip! I'll definitely get back to you with the debug.stacks() output as soon as possible.

drhashes · 2020-04-28T04:08:38Z

By adding
--cache 2048
or
--cache 1024
to your command line will reduce RAM consumption.

mtbitcoin · 2020-04-28T05:24:29Z

We are running into similar issues of memory leak. First with 1.9.12, so we upgraded to 1.9.13 but also encountered similar issues on multiple production servers. No other changes were made

karalabe · 2020-05-06T09:57:29Z

@mtbitcoin What flags are you running with?

mtbitcoin · 2020-05-06T10:27:57Z

@karalabe i am running further tests, but it appears it might be related to someone intentionally dossing the nodes by running eth_call or gas estimates, applying rpc.gascap appears to have helped

cp0k · 2020-05-06T15:01:33Z

Another thing that could help, is when your node enters into this strange high memory use, dangerously close to being killed, please run a debug.stacks() from a Geth console. That will create a dump of all the running goroutines. If you share that with us, we might check if there's some leak that might result in memory accumulation.

As requested - https://pastebin.com/1mGvk4SQ

@karalabe i am running further tests, but it appears it might be related to someone intentionally dossing the nodes by running eth_call or gas estimates, applying rpc.gascap appears to have helped

Thanks for the heads up! will definitely give it a try.

By adding
--cache 2048
or
--cache 1024
to your command line will reduce RAM consumption.

Thanks! I'll try that as well :)

cp0k · 2020-05-26T22:22:36Z

Another thing that could help, is when your node enters into this strange high memory use, dangerously close to being killed, please run a debug.stacks() from a Geth console. That will create a dump of all the running goroutines. If you share that with us, we might check if there's some leak that might result in memory accumulation.

Péter, I replied back with the requested information a couple of weeks back, can you please review it and let us know if anything stands out?

https://pastebin.com/1mGvk4SQ

My organization and I were hoping this would be fixed in Geth 1.9.14-stable, unfortunately, the same issue.

Please let me know if you require any additional information from our end. Thanks!

nysxah · 2020-06-08T07:46:06Z

@karalabe @drhashes adding --cache flags has not helped. The interim workaround is to restart geth once memory reaches a high threshold. How else can we help you debug this?

holiman · 2020-08-06T08:58:48Z

We've been looking into this today, and can't find any obvious culprit. Does this issue still appear with most recent version? Also, if it does, would be great with a new stack trace, since the codelines have changed.

karalabe · 2020-08-06T09:03:29Z

Please check with latest Geth, latest Go. What would really help is to try to minimize the moving components. Lets try a 16GB machine, idling without RPC calls, just syncing with the network. That is what we're running all the time and should not go OOM.

If that works stably, lets try to add RPC into the mix. That would really help if you could provide what API calls you are doing. There are very easy ways to make a node go boom with the "correct" RPC requests.

mtbitcoin · 2020-08-06T09:09:53Z

This is no more an issue for us... (archive and default synch nodes) with the latest version

Edit: the gascap helped on our end (from what we could see, someone figured out they could dos the nodes and were sending in calls with high gas limits)

YpsilonOmega · 2021-01-11T09:32:14Z

I've got the same issue with the following set-up:

Geth version: 1.9.25-stable
OS & Version: Ubuntu 20.10
Go Version: go1.15.6
Hardware: Raspberry Pi 4, 8GB

Geth eats up all my RAM after a long time running.
+600MB after 20min
+5GB after 10 days (actually more due to Swap)

I started with the line:
sudo geth --syncmode fast --cache 2048 --datadir /mnt/ssd/ethereum

Afterwards I tried to use the solution presented by @mtbitcoin :
sudo geth --syncmode fast --cache 2048 --rpc.gascap 500000 --datadir /mnt/ssd/ethereum

However a gascap of 500000 seemed to be too high so I changed it to 300000.
Filling up the RAM was much slower this time, however it took quickly more than 200MB in a few minutes after it had reached the adjusted cache of 2048MB.

@mtbitcoin
How much gas did you use for the gascap and is this a long term fix?

noahh40 · 2021-08-24T15:50:48Z

I've got the same issue with the following set-up:

Geth version: 1.9.25-stable
OS & Version: Ubuntu 20.10
Go Version: go1.15.6
Hardware: Raspberry Pi 4, 8GB

Geth eats up all my RAM after a long time running.
+600MB after 20min
+5GB after 10 days (actually more due to Swap)

I started with the line:
sudo geth --syncmode fast --cache 2048 --datadir /mnt/ssd/ethereum

Afterwards I tried to use the solution presented by @mtbitcoin :
sudo geth --syncmode fast --cache 2048 --rpc.gascap 500000 --datadir /mnt/ssd/ethereum

However a gascap of 500000 seemed to be too high so I changed it to 300000.
Filling up the RAM was much slower this time, however it took quickly more than 200MB in a few minutes after it had reached the adjusted cache of 2048MB.

@mtbitcoin
How much gas did you use for the gascap and is this a long term fix?

also interested in @mtbitcoin reply to the above question

shreethejaBandit · 2022-05-12T05:21:09Z

I've got the same issue with the following set-up:
Geth version: 1.9.25-stable
OS & Version: Ubuntu 20.10
Go Version: go1.15.6
Hardware: Raspberry Pi 4, 8GB
Geth eats up all my RAM after a long time running.
+600MB after 20min
+5GB after 10 days (actually more due to Swap)
I started with the line:
sudo geth --syncmode fast --cache 2048 --datadir /mnt/ssd/ethereum
Afterwards I tried to use the solution presented by @mtbitcoin :
sudo geth --syncmode fast --cache 2048 --rpc.gascap 500000 --datadir /mnt/ssd/ethereum
However a gascap of 500000 seemed to be too high so I changed it to 300000.
Filling up the RAM was much slower this time, however it took quickly more than 200MB in a few minutes after it had reached the adjusted cache of 2048MB.
@mtbitcoin
How much gas did you use for the gascap and is this a long term fix?

also interested in @mtbitcoin reply to the above question

Should there be a Gas Limit ?? Interested to Finding this issue

adamschmideg added the status:triage label Aug 3, 2020

adamschmideg added need:check-with-current-version and removed status:triage labels Aug 6, 2020

wbt mentioned this issue Aug 28, 2020

node: support expressive origin rules in ws.origins #21481

Merged

adamschmideg closed this as completed Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

geth consumes all ram; drops blocks, peers #20963

geth consumes all ram; drops blocks, peers #20963

nysxah commented Apr 23, 2020

karalabe commented Apr 23, 2020

karalabe commented Apr 23, 2020

cp0k commented Apr 23, 2020 •

edited

Loading

karalabe commented Apr 24, 2020

karalabe commented Apr 24, 2020

cp0k commented Apr 24, 2020

drhashes commented Apr 28, 2020

mtbitcoin commented Apr 28, 2020

karalabe commented May 6, 2020

mtbitcoin commented May 6, 2020

cp0k commented May 6, 2020

cp0k commented May 26, 2020

nysxah commented Jun 8, 2020

holiman commented Aug 6, 2020

karalabe commented Aug 6, 2020

mtbitcoin commented Aug 6, 2020 •

edited

Loading

YpsilonOmega commented Jan 11, 2021

noahh40 commented Aug 24, 2021

shreethejaBandit commented May 12, 2022

geth consumes all ram; drops blocks, peers #20963

geth consumes all ram; drops blocks, peers #20963

Comments

nysxah commented Apr 23, 2020

System information

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

karalabe commented Apr 23, 2020

karalabe commented Apr 23, 2020

cp0k commented Apr 23, 2020 • edited Loading

karalabe commented Apr 24, 2020

karalabe commented Apr 24, 2020

cp0k commented Apr 24, 2020

drhashes commented Apr 28, 2020

mtbitcoin commented Apr 28, 2020

karalabe commented May 6, 2020

mtbitcoin commented May 6, 2020

cp0k commented May 6, 2020

cp0k commented May 26, 2020

nysxah commented Jun 8, 2020

holiman commented Aug 6, 2020

karalabe commented Aug 6, 2020

mtbitcoin commented Aug 6, 2020 • edited Loading

YpsilonOmega commented Jan 11, 2021

noahh40 commented Aug 24, 2021

shreethejaBandit commented May 12, 2022

cp0k commented Apr 23, 2020 •

edited

Loading

mtbitcoin commented Aug 6, 2020 •

edited

Loading