-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdctl cluster-health and member list commands do not work correctly #2711
Comments
This is basically the same one as what we met in 2.0.3: #2340 |
Can confirm this. |
Running v2.1.0-alpha.1, shutting down a node still has cluster-health returning that all members are healthy.
|
/cc @yichengq |
@durzo Did it changes to unhealthy finally? How long did you see this false healthy info? So far, we know that the implementation has some delay(around minutes) on healthy status for hard-kill machine, and we plan to improve it in 2.2. Internal details are that etcd 2.0 sends MsgApp async on HTTP stream, which cannot reflect whether the receive side works. |
This method uses raft status exposed at /debug/varz to determine the health of the cluster. It uses whether commit index increases to determine the cluster health, and uses whether match index increases to determine the member health. This could fix the bug etcd-io#2711 that fails to detect follower is unhealthy because it doesn't rely on whether message in long-polling connection is sent. This health check is stricter than the old one, and reflects the situation that whether followers are healthy in the view of the leader. One example is that if the follower is receiving the snapshot, it will turns out to be unhealthy because it doesn't move forward. `etcdctl cluster-health` will reflect the healthy view in the raft level, while connectivity checks reflects the healthy view in transport level.
This method uses raft status exposed at /debug/varz to determine the health of the cluster. It uses whether commit index increases to determine the cluster health, and uses whether match index increases to determine the member health. This could fix the bug etcd-io#2711 that fails to detect follower is unhealthy because it doesn't rely on whether message in long-polling connection is sent. This health check is stricter than the old one, and reflects the situation that whether followers are healthy in the view of the leader. One example is that if the follower is receiving the snapshot, it will turns out to be unhealthy because it doesn't move forward. `etcdctl cluster-health` will reflect the healthy view in the raft level, while connectivity checks reflects the healthy view in transport level.
This method uses raft status exposed at /debug/varz to determine the health of the cluster. It uses whether commit index increases to determine the cluster health, and uses whether match index increases to determine the member health. This could fix the bug etcd-io#2711 that fails to detect follower is unhealthy because it doesn't rely on whether message in long-polling connection is sent. This health check is stricter than the old one, and reflects the situation that whether followers are healthy in the view of the leader. One example is that if the follower is receiving the snapshot, it will turns out to be unhealthy because it doesn't move forward. `etcdctl cluster-health` will reflect the healthy view in the raft level, while connectivity checks reflects the healthy view in transport level.
This method uses raft status exposed at /debug/varz to determine the health of the cluster. It uses whether commit index increases to determine the cluster health, and uses whether match index increases to determine the member health. This could fix the bug etcd-io#2711 that fails to detect follower is unhealthy because it doesn't rely on whether message in long-polling connection is sent. This health check is stricter than the old one, and reflects the situation that whether followers are healthy in the view of the leader. One example is that if the follower is receiving the snapshot, it will turns out to be unhealthy because it doesn't move forward. `etcdctl cluster-health` will reflect the healthy view in the raft level, while connectivity checks reflects the healthy view in transport level.
This method uses raft status exposed at /debug/varz to determine the health of the cluster. It uses whether commit index increases to determine the cluster health, and uses whether match index increases to determine the member health. This could fix the bug etcd-io#2711 that fails to detect follower is unhealthy because it doesn't rely on whether message in long-polling connection is sent. This health check is stricter than the old one, and reflects the situation that whether followers are healthy in the view of the leader. One example is that if the follower is receiving the snapshot, it will turns out to be unhealthy because it doesn't move forward. `etcdctl cluster-health` will reflect the healthy view in the raft level, while connectivity checks reflects the healthy view in transport level.
This method uses raft status exposed at /debug/varz to determine the health of the cluster. It uses whether commit index increases to determine the cluster health, and uses whether match index increases to determine the member health. This could fix the bug etcd-io#2711 that fails to detect follower is unhealthy because it doesn't rely on whether message in long-polling connection is sent. This health check is stricter than the old one, and reflects the situation that whether followers are healthy in the view of the leader. One example is that if the follower is receiving the snapshot, it will turns out to be unhealthy because it doesn't move forward. `etcdctl cluster-health` will reflect the healthy view in the raft level, while connectivity checks reflects the healthy view in transport level.
This method uses raft status exposed at /debug/varz to determine the health of the cluster. It uses whether commit index increases to determine the cluster health, and uses whether match index increases to determine the member health. This could fix the bug etcd-io#2711 that fails to detect follower is unhealthy because it doesn't rely on whether message in long-polling connection is sent. This health check is stricter than the old one, and reflects the situation that whether followers are healthy in the view of the leader. One example is that if the follower is receiving the snapshot, it will turns out to be unhealthy because it doesn't move forward. `etcdctl cluster-health` will reflect the healthy view in the raft level, while connectivity checks reflects the healthy view in transport level.
Hello, It seems behaviour is still the same in etcd v3.2.15, so I have no way a cluster operator can manually confirm the health of an etcd v3 cluster. Any ideas here to help on this, or any alternative etcdctl command to check cluster health in etcd3 than 'etcdctl member list'? Edited: It seems
will check each of the nodes and report
in case one of the etcd cluster members is down, however, this requires at least some knowledge of the etcd cluster by the operator. |
I have the same issue with:
I get "unhealthy cluster" as well when using "etcdctl member list" even though 2/3 is online. I did however notice that when going from 2/3 to 1/3 there where no leader:
Note the false in the last line. |
Use below command for V3.3.XX etcdctl Use below command for V3.4.7 etcdctletcdctl --endpoints=https://192.168.56.113:2379,https://192.168.56.118:2379,https://192.168.56.119:2379 --key="/etc/kubernetes/pki/etcd/client-key.pem" --cert="/etc/kubernetes/pki/etcd/client.pem" --cacert="/etc/kubernetes/pki/etcd/ca.pem" member list -w table |
I'm using latest release of etcd at the time of writing this comment(etcd-v3.4.9) and the following command works for me:
|
etcd server version
etcd client version
Start a 3 node etcd cluster
Poweroff one of the etcd members
The member list commands fails
The cluster is reported healthy, but no nodes are marked unhealthy even though member
7931e79c0d8b47c5
is powered off.The text was updated successfully, but these errors were encountered: