Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different topology view among cluster nodes #133

Closed
CyberDem0n opened this issue Oct 27, 2020 · 18 comments
Closed

Different topology view among cluster nodes #133

CyberDem0n opened this issue Oct 27, 2020 · 18 comments
Labels

Comments

@CyberDem0n
Copy link
Contributor

The fullDumpFile contains information about the cluster topology, and more importantly, this information is taking precedence over the list of otherNodes passed to the SyncObj class. It sort of makes sense, and for example Etcd is doing the same.

But, the full dump is generated only occasionally, besides that is it also possible to configure it to do it at a different time (logCompactionSplit).

In case if for some reason all (or some) nodes were restarted, we might get to a very strange situation that nodes have a different opinion about cluster topology.
For example.

  1. Originally I have had a cluster with node1 and node2. It was running for a while and log compaction has happened.
  2. After that, node3 was added.
  3. Node2 was restarted, and after the restart (and despite node3 passed in the otherNodes), it thinks that there are only two nodes in the cluster.

The syncobj_admin -status looking like:

  • node1
$ syncobj_admin -conn 127.0.0.1:2222 -status 
commit_idx: 525
enabled_code_version: 0
last_applied: 525
leader: 127.0.0.1:2222
leader_commit_idx: 525
log_len: 50
match_idx_count: 2
match_idx_server_127.0.0.1:2223: 525
match_idx_server_127.0.0.1:2224: 525
next_node_idx_count: 2
next_node_idx_server_127.0.0.1:2223: 526
next_node_idx_server_127.0.0.1:2224: 526
partner_node_status_server_127.0.0.1:2223: 2
partner_node_status_server_127.0.0.1:2224: 2
partner_nodes_count: 2
raft_term: 1
readonly_nodes_count: 0
revision: deprecated
self: 127.0.0.1:2222
self_code_version: 0
state: 2
uptime: 1012
version: 0.3.7
  • node2
$ syncobj_admin -conn 127.0.0.1:2223 -status 
commit_idx: 540
enabled_code_version: 0
last_applied: 540
leader: 127.0.0.1:2222
leader_commit_idx: 540
log_len: 65
match_idx_count: 0
next_node_idx_count: 0
partner_node_status_server_127.0.0.1:2222: 2
partner_nodes_count: 1
raft_term: 1
readonly_nodes_count: 0
revision: deprecated
self: 127.0.0.1:2223
self_code_version: 0
state: 0
uptime: 842
version: 0.3.7

*node3

$ syncobj_admin -conn 127.0.0.1:2224 -status 
commit_idx: 555
enabled_code_version: 0
last_applied: 555
leader: 127.0.0.1:2222
leader_commit_idx: 555
log_len: 132
match_idx_count: 0
next_node_idx_count: 0
partner_node_status_server_127.0.0.1:2222: 2
partner_node_status_server_127.0.0.1:2223: 0
partner_nodes_count: 2
raft_term: 1
readonly_nodes_count: 0
revision: deprecated
self: 127.0.0.1:2224
self_code_version: 0
state: 0
uptime: 974
version: 0.3.7

There are a few different options how to solve it:

  1. Always prefer otherNodes passed to the SyncObject. IMO, the worst possible option, although, if it is configurable, probably it should be fine.
  2. Force log compaction on every topology change. Mostly will work, but there is still a chance that compaction didn't finish before the restart.
  3. Restore the latest known topology by replaying the journal. For some reason it doesn't happen now. Could it be a bug?
@bakwc
Copy link
Owner

bakwc commented Oct 27, 2020

The third option was originally conceived. Seems like a bug, need to investigate. Are you using in-memory journal or a file based journal? This issue can occur for in-memory journal (because it drops after process restarts).

@CyberDem0n
Copy link
Contributor Author

File-based journal.

Maybe the issue is that logs are not replayed until at least one other node connected?

@bakwc
Copy link
Owner

bakwc commented Oct 27, 2020

That's possible. All commited log entries (less than __raftCommitIndex) should be replayed without waiting for other nodes.

@CyberDem0n
Copy link
Contributor Author

Actually, it is not only waiting for other nodes, but the log is applied after the leader election, because it wants to figure out __raftCommitIndex on the majority of nodes: https://github.com/bakwc/PySyncObj/blob/master/pysyncobj/syncobj.py#L570-L579

But, when logs are applied, membership commands are skipped anyway: https://github.com/bakwc/PySyncObj/blob/master/pysyncobj/syncobj.py#L709-L710

Any idea how to fix it?

@bakwc
Copy link
Owner

bakwc commented Oct 27, 2020

Actually, it is not only waiting for other nodes, but the log is applied after the leader election, because it wants to figure out __raftCommitIndex on the majority of nodes: https://github.com/bakwc/PySyncObj/blob/master/pysyncobj/syncobj.py#L570-L579

The code you quote is actually a part of raft core:

  • It work only when node becomes a leader
  • It wait while cluster majority confirms that they received a record, and only in this case log considered "commited".

This does not contradict the fact that we can apply all commited records after restart. The case above tells about non-commited records, when record is commited it can never be rolled back.

But, when logs are applied, membership commands are skipped anyway: https://github.com/bakwc/PySyncObj/blob/master/pysyncobj/syncobj.py#L709-L710

That was done to prevent new nodes (which already started with a new configuration) to change cluster. It can be fixed by ignoring configuration change commands only when cluster already have a new node added / deleted.

@CyberDem0n
Copy link
Contributor Author

It would be nice to have it fixed.
Unfortunately I am not confident enough to get so deep into the SyncObj.

@bakwc
Copy link
Owner

bakwc commented Oct 28, 2020

I'll fix it on weekend

@bakwc
Copy link
Owner

bakwc commented Nov 2, 2020

Applying records after restart: #139

@bakwc bakwc added the bug label Nov 2, 2020
@CyberDem0n
Copy link
Contributor Author

Sweet, thank you @bakwc!
I am gonna try it tomorrow.

@bakwc
Copy link
Owner

bakwc commented Nov 2, 2020

I suppose it's not enough, still need to apply configuration changes.

@CyberDem0n
Copy link
Contributor Author

I know, at least I could check if logs are applied and the latest state is visible.

@CyberDem0n
Copy link
Contributor Author

Hi @bakwc,
Any further progress on it?
Maybe you can release a new version with what was already fixed?

@bakwc
Copy link
Owner

bakwc commented Jan 15, 2021

Sorry, had little time, I'll try to fix it in the nearest future.

@CyberDem0n
Copy link
Contributor Author

Ping.

@bakwc if you don't have time to work on it right now maybe you can at least release a new version with what already was fixed since the last release?

@CyberDem0n
Copy link
Contributor Author

Ping.

@bakwc

bakwc added a commit that referenced this issue Apr 25, 2021

Verified

This commit was signed with the committer’s verified signature. The key has expired.
Witiko Vít Starý Novotný
@bakwc
Copy link
Owner

bakwc commented Apr 25, 2021

Could you try version with the latest fix?
Merged to master, I will make a release if everything fine.

bakwc added a commit that referenced this issue Apr 25, 2021
@CyberDem0n
Copy link
Contributor Author

It seems to help, thank you @bakwc!

@bakwc
Copy link
Owner

bakwc commented Apr 25, 2021

Uploaded new release, 0.3.8

@bakwc bakwc closed this as completed Apr 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants