-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different topology view among cluster nodes #133
Comments
The third option was originally conceived. Seems like a bug, need to investigate. Are you using in-memory journal or a file based journal? This issue can occur for in-memory journal (because it drops after process restarts). |
File-based journal. Maybe the issue is that logs are not replayed until at least one other node connected? |
That's possible. All commited log entries (less than __raftCommitIndex) should be replayed without waiting for other nodes. |
Actually, it is not only waiting for other nodes, but the log is applied after the leader election, because it wants to figure out __raftCommitIndex on the majority of nodes: https://github.com/bakwc/PySyncObj/blob/master/pysyncobj/syncobj.py#L570-L579 But, when logs are applied, membership commands are skipped anyway: https://github.com/bakwc/PySyncObj/blob/master/pysyncobj/syncobj.py#L709-L710 Any idea how to fix it? |
The code you quote is actually a part of raft core:
This does not contradict the fact that we can apply all commited records after restart. The case above tells about non-commited records, when record is commited it can never be rolled back.
That was done to prevent new nodes (which already started with a new configuration) to change cluster. It can be fixed by ignoring configuration change commands only when cluster already have a new node added / deleted. |
It would be nice to have it fixed. |
I'll fix it on weekend |
Applying records after restart: #139 |
Sweet, thank you @bakwc! |
I suppose it's not enough, still need to apply configuration changes. |
I know, at least I could check if logs are applied and the latest state is visible. |
Hi @bakwc, |
Sorry, had little time, I'll try to fix it in the nearest future. |
Ping. @bakwc if you don't have time to work on it right now maybe you can at least release a new version with what already was fixed since the last release? |
Ping. |
Could you try version with the latest fix? |
It seems to help, thank you @bakwc! |
Uploaded new release, 0.3.8 |
The
fullDumpFile
contains information about the cluster topology, and more importantly, this information is taking precedence over the list ofotherNodes
passed to the SyncObj class. It sort of makes sense, and for example Etcd is doing the same.But, the full dump is generated only occasionally, besides that is it also possible to configure it to do it at a different time (
logCompactionSplit
).In case if for some reason all (or some) nodes were restarted, we might get to a very strange situation that nodes have a different opinion about cluster topology.
For example.
otherNodes
), it thinks that there are only two nodes in the cluster.The syncobj_admin -status looking like:
*node3
There are a few different options how to solve it:
otherNodes
passed to the SyncObject. IMO, the worst possible option, although, if it is configurable, probably it should be fine.The text was updated successfully, but these errors were encountered: