etcdctl cluster-health and member list commands do not work correctly #2711

kelseyhightower · 2015-04-19T10:42:51Z

etcd server version

/opt/bin/etcd --version
etcd version 2.0.9

etcd client version

/usr/local/bin/etcdctl --version
etcdctl version 2.0.9

Start a 3 node etcd cluster

vmrun list
Total running VMs: 3
/Users/kelseyhightower/Documents/Virtual Machines.localized/core0.vmwarevm/core0.vmx
/Users/kelseyhightower/Documents/Virtual Machines.localized/core1.vmwarevm/core1.vmx
/Users/kelseyhightower/Documents/Virtual Machines.localized/core2.vmwarevm/core2.vmx

etcdctl cluster-health
cluster is healthy
member 5ae3067007f7fb85 is healthy
member 7931e79c0d8b47c5 is healthy
member 987146e8925f10e5 is healthy

etcdctl member list
5ae3067007f7fb85: name=etcd2 peerURLs=http://192.168.12.52:2380 clientURLs=http://192.168.12.52:2379
7931e79c0d8b47c5: name=etcd0 peerURLs=http://192.168.12.50:2380 clientURLs=http://192.168.12.50:2379
987146e8925f10e5: name=etcd1 peerURLs=http://192.168.12.51:2380 clientURLs=http://192.168.12.51:2379

Poweroff one of the etcd members

vmrun stop /Users/kelseyhightower/Documents/Virtual\ Machines.localized/core0.vmwarevm/core0.vmx

vmrun list
Total running VMs: 2
/Users/kelseyhightower/Documents/Virtual Machines.localized/core1.vmwarevm/core1.vmx
/Users/kelseyhightower/Documents/Virtual Machines.localized/core2.vmwarevm/core2.vmx

The member list commands fails

etcdctl -C http://192.168.12.50:2379,http://192.168.12.51:2379,http://192.168.12.52:2379 member list
context deadline exceeded

The cluster is reported healthy, but no nodes are marked unhealthy even though member 7931e79c0d8b47c5 is powered off.

etcdctl -C http://192.168.12.50:2379,http://192.168.12.51:2379,http://192.168.12.52:2379 cluster-health
cluster is healthy
member 5ae3067007f7fb85 is healthy
member 7931e79c0d8b47c5 is healthy
member 987146e8925f10e5 is healthy

The text was updated successfully, but these errors were encountered:

yichengq · 2015-05-13T21:02:38Z

This is basically the same one as what we met in 2.0.3: #2340

mariusgrigaitis · 2015-05-15T07:28:08Z

Can confirm this.

durzo · 2015-06-24T01:34:38Z

Running v2.1.0-alpha.1, shutting down a node still has cluster-health returning that all members are healthy.

2015/06/24 09:29:30 etcdserver: failed to reach the peerURL(http://etcd2:7001) of member 7a9767de17ea4500 (Get http://etcd2:7001/version: net/http: request canceled while waiting for connection)

root@etcd3:~$ etcdctl cluster-health
cluster is healthy
member 7a9767de17ea4500 is healthy
member cb1f485859524c11 is healthy
member d555fc8f72be9146 is healthy

xiang90 · 2015-06-24T02:49:23Z

/cc @yichengq

yichengq · 2015-06-24T06:47:21Z

@durzo Did it changes to unhealthy finally? How long did you see this false healthy info?

So far, we know that the implementation has some delay(around minutes) on healthy status for hard-kill machine, and we plan to improve it in 2.2. Internal details are that etcd 2.0 sends MsgApp async on HTTP stream, which cannot reflect whether the receive side works.

This method uses raft status exposed at /debug/varz to determine the health of the cluster. It uses whether commit index increases to determine the cluster health, and uses whether match index increases to determine the member health. This could fix the bug etcd-io#2711 that fails to detect follower is unhealthy because it doesn't rely on whether message in long-polling connection is sent. This health check is stricter than the old one, and reflects the situation that whether followers are healthy in the view of the leader. One example is that if the follower is receiving the snapshot, it will turns out to be unhealthy because it doesn't move forward. `etcdctl cluster-health` will reflect the healthy view in the raft level, while connectivity checks reflects the healthy view in transport level.

kerk1v · 2018-02-06T10:02:20Z

Hello,

It seems behaviour is still the same in etcd v3.2.15, so I have no way a cluster operator can manually confirm the health of an etcd v3 cluster. Any ideas here to help on this, or any alternative etcdctl command to check cluster health in etcd3 than 'etcdctl member list'?

Edited:

It seems

ETCDCTL_API=3 etcdctl --cert=/etc/etcd_k8s/etcd.pem --key /etc/etcd_k8s/etcd-key.pem --i
nsecure-skip-tls-verify=true --endpoints=[https://master-1:2379,https://master-2:2370,https://master-3:2379] endpoint health

will check each of the nodes and report

https://master-3:2379 is healthy: successfully committed proposal: took = 3.665702ms
https://master-1:2379 is healthy: successfully committed proposal: took = 3.202865ms
https://master-2:2370 is unhealthy: failed to connect: dial tcp 192.168.33.102:2370: getsockopt: no route to host
Error:  unhealthy cluster

in case one of the etcd cluster members is down, however, this requires at least some knowledge of the etcd cluster by the operator.

dxlr8r · 2018-10-25T17:36:35Z

I have the same issue with:

etcdctl version: 3.2.22
API version: 3.2

I get "unhealthy cluster" as well when using "etcdctl member list" even though 2/3 is online.

I did however notice that when going from 2/3 to 1/3 there where no leader:

[foo1@bar ~]# etcdctl3 endpoint status
Failed to get the status of endpoint https://foo1:2379 (context deadline exceeded)
Failed to get the status of endpoint https://foo2:2379 (context deadline exceeded)
https://foo3:2379, 6074b97ec42826bg, 3.2.22, 16 MB, false, 760, 20792785

Note the false in the last line.

knraju483 · 2020-05-11T15:25:06Z

Use below command for V3.3.XX etcdctl
etcdctl --endpoints=https://192.168.56.113:2379,https://192.168.56.118:2379,https://192.168.56.119:2379 --key-file="/etc/kubernetes/pki/etcd/client-key.pem" --cert-file="/etc/kubernetes/pki/etcd/client.pem" --ca-file="/etc/kubernetes/pki/etcd/ca.pem" member list -w table

Use below command for V3.4.7 etcdctl

bu3ny · 2020-07-13T16:19:52Z

I'm using latest release of etcd at the time of writing this comment(etcd-v3.4.9) and the following command works for me:

[root@master01 ~]#  etcdctl --endpoints=https://192.168.122.101:2379,https://192.168.122.102:2379,https://192.168.122.103:2379   --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem member list -w table
+------------------+---------+----------+------------------------------+------------------------------+------------+
|        ID        | STATUS  |   NAME   |          PEER ADDRS          |         CLIENT ADDRS         | IS LEARNER |
+------------------+---------+----------+------------------------------+------------------------------+------------+
| 148f9f6172465414 | started | master02 | https://192.168.122.102:2380 | https://192.168.122.102:2379 |      false |
| 79ad015295a746a9 | started | master01 | https://192.168.122.101:2380 | https://192.168.122.101:2379 |      false |
| f857eddf41ed1741 | started | master03 | https://192.168.122.103:2380 | https://192.168.122.103:2379 |      false |
+------------------+---------+----------+------------------------------+------------------------------+------------+

barakmich modified the milestone: v2.1.0-alpha.1 Apr 24, 2015

barakmich modified the milestones: v2.2.0, v2.1.0-alpha.1 May 15, 2015

xiang90 added the etcdctl label Jun 6, 2015

xiang90 assigned yichengq Jul 9, 2015

yichengq mentioned this issue Jul 24, 2015

etcdctl: refactor the way to check cluster health #3178

Merged

xiang90 mentioned this issue Jul 27, 2015

etcdctl: cluster-health reports has delay #2340

Closed

yichengq closed this as completed in #3178 Jul 30, 2015

kerk1v unassigned yichengq Feb 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcdctl cluster-health and member list commands do not work correctly #2711

etcdctl cluster-health and member list commands do not work correctly #2711

kelseyhightower commented Apr 19, 2015

yichengq commented May 13, 2015

mariusgrigaitis commented May 15, 2015

durzo commented Jun 24, 2015

xiang90 commented Jun 24, 2015

yichengq commented Jun 24, 2015

kerk1v commented Feb 6, 2018 •

edited

Loading

dxlr8r commented Oct 25, 2018 •

edited

Loading

knraju483 commented May 11, 2020 •

edited

Loading

bu3ny commented Jul 13, 2020

etcdctl cluster-health and member list commands do not work correctly #2711

etcdctl cluster-health and member list commands do not work correctly #2711

Comments

kelseyhightower commented Apr 19, 2015

yichengq commented May 13, 2015

mariusgrigaitis commented May 15, 2015

durzo commented Jun 24, 2015

xiang90 commented Jun 24, 2015

yichengq commented Jun 24, 2015

kerk1v commented Feb 6, 2018 • edited Loading

dxlr8r commented Oct 25, 2018 • edited Loading

knraju483 commented May 11, 2020 • edited Loading

Use below command for V3.4.7 etcdctl

bu3ny commented Jul 13, 2020

kerk1v commented Feb 6, 2018 •

edited

Loading

dxlr8r commented Oct 25, 2018 •

edited

Loading

knraju483 commented May 11, 2020 •

edited

Loading