Different topology view among cluster nodes #133

CyberDem0n · 2020-10-27T08:06:25Z

The fullDumpFile contains information about the cluster topology, and more importantly, this information is taking precedence over the list of otherNodes passed to the SyncObj class. It sort of makes sense, and for example Etcd is doing the same.

But, the full dump is generated only occasionally, besides that is it also possible to configure it to do it at a different time (logCompactionSplit).

In case if for some reason all (or some) nodes were restarted, we might get to a very strange situation that nodes have a different opinion about cluster topology.
For example.

Originally I have had a cluster with node1 and node2. It was running for a while and log compaction has happened.
After that, node3 was added.
Node2 was restarted, and after the restart (and despite node3 passed in the otherNodes), it thinks that there are only two nodes in the cluster.

The syncobj_admin -status looking like:

node1

$ syncobj_admin -conn 127.0.0.1:2222 -status 
commit_idx: 525
enabled_code_version: 0
last_applied: 525
leader: 127.0.0.1:2222
leader_commit_idx: 525
log_len: 50
match_idx_count: 2
match_idx_server_127.0.0.1:2223: 525
match_idx_server_127.0.0.1:2224: 525
next_node_idx_count: 2
next_node_idx_server_127.0.0.1:2223: 526
next_node_idx_server_127.0.0.1:2224: 526
partner_node_status_server_127.0.0.1:2223: 2
partner_node_status_server_127.0.0.1:2224: 2
partner_nodes_count: 2
raft_term: 1
readonly_nodes_count: 0
revision: deprecated
self: 127.0.0.1:2222
self_code_version: 0
state: 2
uptime: 1012
version: 0.3.7

node2

$ syncobj_admin -conn 127.0.0.1:2223 -status 
commit_idx: 540
enabled_code_version: 0
last_applied: 540
leader: 127.0.0.1:2222
leader_commit_idx: 540
log_len: 65
match_idx_count: 0
next_node_idx_count: 0
partner_node_status_server_127.0.0.1:2222: 2
partner_nodes_count: 1
raft_term: 1
readonly_nodes_count: 0
revision: deprecated
self: 127.0.0.1:2223
self_code_version: 0
state: 0
uptime: 842
version: 0.3.7

*node3

$ syncobj_admin -conn 127.0.0.1:2224 -status 
commit_idx: 555
enabled_code_version: 0
last_applied: 555
leader: 127.0.0.1:2222
leader_commit_idx: 555
log_len: 132
match_idx_count: 0
next_node_idx_count: 0
partner_node_status_server_127.0.0.1:2222: 2
partner_node_status_server_127.0.0.1:2223: 0
partner_nodes_count: 2
raft_term: 1
readonly_nodes_count: 0
revision: deprecated
self: 127.0.0.1:2224
self_code_version: 0
state: 0
uptime: 974
version: 0.3.7

There are a few different options how to solve it:

Always prefer otherNodes passed to the SyncObject. IMO, the worst possible option, although, if it is configurable, probably it should be fine.
Force log compaction on every topology change. Mostly will work, but there is still a chance that compaction didn't finish before the restart.
Restore the latest known topology by replaying the journal. For some reason it doesn't happen now. Could it be a bug?

The text was updated successfully, but these errors were encountered:

bakwc · 2020-10-27T09:10:20Z

The third option was originally conceived. Seems like a bug, need to investigate. Are you using in-memory journal or a file based journal? This issue can occur for in-memory journal (because it drops after process restarts).

CyberDem0n · 2020-10-27T09:23:05Z

File-based journal.

Maybe the issue is that logs are not replayed until at least one other node connected?

bakwc · 2020-10-27T09:35:28Z

That's possible. All commited log entries (less than __raftCommitIndex) should be replayed without waiting for other nodes.

CyberDem0n · 2020-10-27T13:19:38Z

Actually, it is not only waiting for other nodes, but the log is applied after the leader election, because it wants to figure out __raftCommitIndex on the majority of nodes: https://github.com/bakwc/PySyncObj/blob/master/pysyncobj/syncobj.py#L570-L579

But, when logs are applied, membership commands are skipped anyway: https://github.com/bakwc/PySyncObj/blob/master/pysyncobj/syncobj.py#L709-L710

Any idea how to fix it?

bakwc · 2020-10-27T13:52:46Z

Actually, it is not only waiting for other nodes, but the log is applied after the leader election, because it wants to figure out __raftCommitIndex on the majority of nodes: https://github.com/bakwc/PySyncObj/blob/master/pysyncobj/syncobj.py#L570-L579

The code you quote is actually a part of raft core:

It work only when node becomes a leader
It wait while cluster majority confirms that they received a record, and only in this case log considered "commited".

This does not contradict the fact that we can apply all commited records after restart. The case above tells about non-commited records, when record is commited it can never be rolled back.

But, when logs are applied, membership commands are skipped anyway: https://github.com/bakwc/PySyncObj/blob/master/pysyncobj/syncobj.py#L709-L710

That was done to prevent new nodes (which already started with a new configuration) to change cluster. It can be fixed by ignoring configuration change commands only when cluster already have a new node added / deleted.

CyberDem0n · 2020-10-28T09:51:21Z

It would be nice to have it fixed.
Unfortunately I am not confident enough to get so deep into the SyncObj.

bakwc · 2020-10-28T10:35:16Z

I'll fix it on weekend

bakwc · 2020-11-02T15:44:23Z

Applying records after restart: #139

CyberDem0n · 2020-11-02T15:54:56Z

Sweet, thank you @bakwc!
I am gonna try it tomorrow.

bakwc · 2020-11-02T16:12:22Z

I suppose it's not enough, still need to apply configuration changes.

CyberDem0n · 2020-11-02T16:25:25Z

I know, at least I could check if logs are applied and the latest state is visible.

CyberDem0n · 2021-01-04T13:02:41Z

Hi @bakwc,
Any further progress on it?
Maybe you can release a new version with what was already fixed?

bakwc · 2021-01-15T23:20:50Z

Sorry, had little time, I'll try to fix it in the nearest future.

CyberDem0n · 2021-02-11T07:57:54Z

Ping.

@bakwc if you don't have time to work on it right now maybe you can at least release a new version with what already was fixed since the last release?

CyberDem0n · 2021-04-22T06:16:53Z

Ping.

@bakwc

bakwc · 2021-04-25T12:59:01Z

Could you try version with the latest fix?
Merged to master, I will make a release if everything fine.

CyberDem0n · 2021-04-25T14:33:33Z

It seems to help, thank you @bakwc!

bakwc · 2021-04-25T16:28:46Z

Uploaded new release, 0.3.8

bakwc added the bug label Nov 2, 2020

CyberDem0n mentioned this issue Nov 14, 2020

Multi-instance raft patroni/patroni#1759

Open

bakwc added a commit that referenced this issue Apr 25, 2021

Unit test for #133

51b4c39

bakwc closed this as completed Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different topology view among cluster nodes #133

Different topology view among cluster nodes #133

CyberDem0n commented Oct 27, 2020

bakwc commented Oct 27, 2020

CyberDem0n commented Oct 27, 2020

bakwc commented Oct 27, 2020

CyberDem0n commented Oct 27, 2020

bakwc commented Oct 27, 2020 •

edited

Loading

CyberDem0n commented Oct 28, 2020

bakwc commented Oct 28, 2020

bakwc commented Nov 2, 2020

CyberDem0n commented Nov 2, 2020

bakwc commented Nov 2, 2020

CyberDem0n commented Nov 2, 2020

CyberDem0n commented Jan 4, 2021

bakwc commented Jan 15, 2021

CyberDem0n commented Feb 11, 2021

CyberDem0n commented Apr 22, 2021

bakwc commented Apr 25, 2021 •

edited

Loading

CyberDem0n commented Apr 25, 2021

bakwc commented Apr 25, 2021

Different topology view among cluster nodes #133

Different topology view among cluster nodes #133

Comments

CyberDem0n commented Oct 27, 2020

bakwc commented Oct 27, 2020

CyberDem0n commented Oct 27, 2020

bakwc commented Oct 27, 2020

CyberDem0n commented Oct 27, 2020

bakwc commented Oct 27, 2020 • edited Loading

CyberDem0n commented Oct 28, 2020

bakwc commented Oct 28, 2020

bakwc commented Nov 2, 2020

CyberDem0n commented Nov 2, 2020

bakwc commented Nov 2, 2020

CyberDem0n commented Nov 2, 2020

CyberDem0n commented Jan 4, 2021

bakwc commented Jan 15, 2021

CyberDem0n commented Feb 11, 2021

CyberDem0n commented Apr 22, 2021

bakwc commented Apr 25, 2021 • edited Loading

CyberDem0n commented Apr 25, 2021

bakwc commented Apr 25, 2021

bakwc commented Oct 27, 2020 •

edited

Loading

bakwc commented Apr 25, 2021 •

edited

Loading