Skip to content

test: flaky unit/swim.test test #5399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
avtikhon opened this issue Oct 9, 2020 · 2 comments
Closed

test: flaky unit/swim.test test #5399

avtikhon opened this issue Oct 9, 2020 · 2 comments
Assignees
Labels
flaky test qa Issues related to tests or testing subsystem

Comments

@avtikhon
Copy link
Contributor

avtikhon commented Oct 9, 2020

Tarantool version:
Tarantool 2.6.0-142-g6a47a75e97
Target: Linux-x86_64-RelWithDebInfo
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror

OS version:
Linux (Debian *)

Bug description:
https://gitlab.com/tarantool/tarantool/-/jobs/771421662#L4457
https://gitlab.com/tarantool/tarantool/-/jobs/771888996#L4058
https://gitlab.com/tarantool/tarantool/-/jobs/822238802#L3007

artifacts.zip

Checksum of the results file is not usable due to changing data in results.

[081] --- unit/swim.result	Fri Oct  2 12:54:07 2020
[081] +++ unit/swim.reject	Sun Oct  4 02:51:30 2020
[081] @@ -115,7 +115,12 @@
[081]      ok 2 - but it is never deleted due to the cfg option
[081]  ok 11 - subtests
[081]  	*** swim_test_undead: done ***
[081] -	*** swim_test_packet_loss ***
[081] +	*** swim_tes    #   Failed test 'S1 is still alive everywhere'
[081] +    #   in /builds/M4RrgQZ3/0/tarantool/tarantool/test/unit/swim.c at line 753
[081] +    # Looks like you failed 1 test of 2 run.
[081] +#   Failed test 'subtests'
[081] +#   in /builds/M4RrgQZ3/0/tarantool/tarantool/test/unit/unit.c at line 54
[081] +t_packet_loss ***
[081]      1..5
[081]      ok 1 - drop rate = 5.00, but the failure is disseminated
[081]      ok 2 - drop rate = 10.00, but the failure is disseminated
[081] @@ -171,9 +176,9 @@
[081]  	*** swim_test_payload_basic: done ***
[081]  	*** swim_test_indirect_ping ***
[081]      1..2
[081] -    ok 1 - S1 is still alive everywhere
[081] +    not ok 1 - S1 is still alive everywhere
[081]      ok 2 - as well as S2 - they communicated via S3
[081] -ok 17 - subtests
[081] +not ok 17 - subtests
[081]  	*** swim_test_indirect_ping: done ***
[081]  	*** swim_test_encryption ***
[081]      1..3
[081] @@ -221,7 +226,8 @@
[081]  	*** swim_test_generation: done ***
[081]  	*** swim_test_dissemination_speed ***
[081]      1..2
[081] -    ok 1 - dissemination work in log time even at the very start of a cluster
[081] +    # Looks like you failed 1 test of 23 run.
[081] +ok 1 - dissemination work in log time even at the very start of a cluster
[081]      ok 2 - dissemination can withstand an event storm
[081]  ok 22 - subtests
[081]  	*** swim_test_dissemination_speed: done ***

Steps to reproduce:

Optional (but very desirable):

  • coredump
  • backtrace
  • netstat
@avtikhon avtikhon added qa Issues related to tests or testing subsystem flaky test labels Oct 9, 2020
@avtikhon avtikhon self-assigned this Oct 9, 2020
avtikhon added a commit that referenced this issue Oct 9, 2020
Added for tests with issues:

  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/huge_field_map_long.test.lua		gh-5375
  replication/anon.test.lua			gh-5381
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 9, 2020
Added for tests with issues:

  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  replication/anon.test.lua			gh-5381
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 9, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  replication/anon.test.lua			gh-5381
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 9, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 10, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 10, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 10, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 11, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 11, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 11, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 11, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 11, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398

t
avtikhon added a commit that referenced this issue Oct 11, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 11, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 12, 2020
Added for tests with issues:

  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 12, 2020
Added for tests with issues:

  box/access.test.lua				gh-5411
  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398

t
avtikhon added a commit that referenced this issue Oct 12, 2020
Added for tests with issues:

  app/socket.test.lua				gh-4978
  box/access.test.lua				gh-5411
  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5287-boot-anon.test.lua	gh-5412
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 12, 2020
Added for tests with issues:

  app/socket.test.lua				gh-4978
  box/access.test.lua				gh-5411
  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5287-boot-anon.test.lua	gh-5412
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 12, 2020
Added for tests with issues:

  app/socket.test.lua				gh-4978
  box/access.test.lua				gh-5411
  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5287-boot-anon.test.lua	gh-5412
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 13, 2020
Added for tests with issues:

  app/socket.test.lua				gh-4978
  box/access.test.lua				gh-5411
  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5287-boot-anon.test.lua	gh-5412
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 13, 2020
Added for tests with issues:

  app/socket.test.lua				gh-4978
  box/access.test.lua				gh-5411
  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5287-boot-anon.test.lua	gh-5412
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
avtikhon added a commit that referenced this issue Oct 13, 2020
Added for tests with issues:

  app/socket.test.lua				gh-4978
  box/access.test.lua				gh-5411
  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5287-boot-anon.test.lua	gh-5412
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
kyukhin pushed a commit that referenced this issue Oct 13, 2020
Added for tests with issues:

  app/socket.test.lua				gh-4978
  box/access.test.lua				gh-5411
  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5287-boot-anon.test.lua	gh-5412
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
kyukhin pushed a commit that referenced this issue Oct 13, 2020
Added for tests with issues:

  app/socket.test.lua				gh-4978
  box/access.test.lua				gh-5411
  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5287-boot-anon.test.lua	gh-5412
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
kyukhin pushed a commit that referenced this issue Oct 13, 2020
Added for tests with issues:

  app/socket.test.lua				gh-4978
  box/access.test.lua				gh-5411
  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5287-boot-anon.test.lua	gh-5412
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
kyukhin pushed a commit that referenced this issue Oct 13, 2020
Added for tests with issues:

  app/socket.test.lua				gh-4978
  box/access.test.lua				gh-5411
  box/access_misc.test.lua			gh-5401
  box/gh-5135-invalid-upsert.test.lua		gh-5376
  box/hash_64bit_replace.test.lua test		gh-5410
  box/hash_replace.test.lua			gh-5400
  box/huge_field_map_long.test.lua		gh-5375
  box/net.box_huge_data_gh-983.test.lua		gh-5402
  replication/anon.test.lua			gh-5381
  replication/autoboostrap.test.lua		gh-4933
  replication/box_set_replication_stress.test.lua gh-4992
  replication/election_basic.test.lua		gh-5368
  replication/election_qsync.test.lua test	gh-5395
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua gh-5380
  replication/gh-3711-misc-no-restart-on-same-configuration.test.lua gh-5407
  replication/gh-5287-boot-anon.test.lua	gh-5412
  replication/gh-5298-qsync-recovery-snap.test.lua.test.lua gh-5379
  replication/show_error_on_disconnect.test.lua	gh-5371
  replication/status.test.lua			gh-5409
  swim/swim.test.lua				gh-5403
  unit/swim.test				gh-5399
  vinyl/gc.test.lua				gh-5383
  vinyl/gh-4864-stmt-alloc-fail-compact.test.lua test gh-5408
  vinyl/gh-4957-too-many-upserts.test.lua	gh-5378
  vinyl/gh.test.lua				gh-5141
  vinyl/quota.test.lua				gh-5377
  vinyl/snapshot.test.lua			gh-4984
  vinyl/stat.test.lua				gh-4951
  vinyl/upsert.test.lua				gh-5398
@Gerold103
Copy link
Collaborator

Can't reproduce still. But the output looks corrupted. For example, this line *** swim_tes # Failed test 'S1 is still alive everywhere' is impossible. # Failed is always printed on a new line. Also swim_tes is truncated. Something is wrong with test-run here.

@Gerold103
Copy link
Collaborator

Random seed for 100% reproduce = 1605651752. Put it to swim_run_test().

@Gerold103 Gerold103 self-assigned this Nov 17, 2020
Gerold103 added a commit that referenced this issue Nov 18, 2020
swim_test_indirect_ping() failed with random seed 1605651752.

The test created a cluster with 3 swim nodes, and broke network
connection between node-1 and node-2. Then it run the cluster for
10 seconds, and ensured, that both node-1 and node-2 are
eventually alive despite they are suspected sometimes.

    node1 <-> node3 <-> node2

'Alive' means that a node is considered alive on all the other
nodes.

The test spun for 10 seconds giving the nodes a chance to become
suspected. Then it checked that node-1 is either still alive, or
it is suspected, but will be restored in at most 3 seconds. The
same was checked for node-2. They were supposed to interact via
node-3.

3 seconds was used assuming that the worst what could happen is
that it is suspected from the beginning of this three-second
interval on node-3, because it was suspected by node-2 and
disseminated to node-3.

Then node-3 might need 1 second to finish its current
dissemination round by sending a ping to node-2, 1 second to start
new round randomly again from node-2, and only then send a ping to
node-1. So 3 seconds total.

But it could also happen, that in the beginning of the
three-second interval node-1 is already suspected on node-2. On
the next step node-2 shares the suspicion with node-3. And then
the scenario above happens. So the test case needed at least 4
seconds.

And actually it could happen infinitely, because while the test
waits for 3 seconds of gossip refutation about node-1 on node-3,
node-2 can suspect it again. And so on.

Also the test would pass even without indirect pings. Because
node-3 has access to node-1 and node-2. So even if, say, node-1
suspects node-2, then it will tell node-3 about it. Node-3 will
ping node-2, get ack, and will refute the gossip. The refutation
will be then sent to node-1 back. It means indirect pings don't
matter here.

The patch makes a new test, which won't pass without indirect
pings. It uses the existing error injection ERRINJ_SWIM_FD_ONLY,
which allows to turn off all the SWIM components except failure
detection. So only pings and acks are being sent.

Then without proper indirect pings node-1 and node-2 would suspect
each other and declare dead eventually. The new test checks it
does not happen.

Closes #5399
Gerold103 added a commit that referenced this issue Nov 20, 2020
swim_test_indirect_ping() failed with random seed 1605651752.

The test created a cluster with 3 swim nodes, and broke network
connection between node-1 and node-2. Then it run the cluster for
10 seconds, and ensured, that both node-1 and node-2 are
eventually alive despite they are suspected sometimes.

    node1 <-> node3 <-> node2

'Alive' means that a node is considered alive on all the other
nodes.

The test spun for 10 seconds giving the nodes a chance to become
suspected. Then it checked that node-1 is either still alive, or
it is suspected, but will be restored in at most 3 seconds. The
same was checked for node-2. They were supposed to interact via
node-3.

3 seconds was used assuming that the worst what could happen is
that it is suspected from the beginning of this three-second
interval on node-3, because it was suspected by node-2 and
disseminated to node-3.

Then node-3 might need 1 second to finish its current
dissemination round by sending a ping to node-2, 1 second to start
new round randomly again from node-2, and only then send a ping to
node-1. So 3 seconds total.

But it could also happen, that in the beginning of the
three-second interval node-1 is already suspected on node-2. On
the next step node-2 shares the suspicion with node-3. And then
the scenario above happens. So the test case needed at least 4
seconds.

And actually it could happen infinitely, because while the test
waits for 3 seconds of gossip refutation about node-1 on node-3,
node-2 can suspect it again. And so on.

Also the test would pass even without indirect pings. Because
node-3 has access to node-1 and node-2. So even if, say, node-1
suspects node-2, then it will tell node-3 about it. Node-3 will
ping node-2, get ack, and will refute the gossip. The refutation
will be then sent to node-1 back. It means indirect pings don't
matter here.

The patch makes a new test, which won't pass without indirect
pings. It uses the existing error injection ERRINJ_SWIM_FD_ONLY,
which allows to turn off all the SWIM components except failure
detection. So only pings and acks are being sent.

Then without proper indirect pings node-1 and node-2 would suspect
each other and declare dead eventually. The new test checks it
does not happen.

Closes #5399

(cherry picked from commit e23b14d)
Gerold103 added a commit that referenced this issue Nov 20, 2020
swim_test_indirect_ping() failed with random seed 1605651752.

The test created a cluster with 3 swim nodes, and broke network
connection between node-1 and node-2. Then it run the cluster for
10 seconds, and ensured, that both node-1 and node-2 are
eventually alive despite they are suspected sometimes.

    node1 <-> node3 <-> node2

'Alive' means that a node is considered alive on all the other
nodes.

The test spun for 10 seconds giving the nodes a chance to become
suspected. Then it checked that node-1 is either still alive, or
it is suspected, but will be restored in at most 3 seconds. The
same was checked for node-2. They were supposed to interact via
node-3.

3 seconds was used assuming that the worst what could happen is
that it is suspected from the beginning of this three-second
interval on node-3, because it was suspected by node-2 and
disseminated to node-3.

Then node-3 might need 1 second to finish its current
dissemination round by sending a ping to node-2, 1 second to start
new round randomly again from node-2, and only then send a ping to
node-1. So 3 seconds total.

But it could also happen, that in the beginning of the
three-second interval node-1 is already suspected on node-2. On
the next step node-2 shares the suspicion with node-3. And then
the scenario above happens. So the test case needed at least 4
seconds.

And actually it could happen infinitely, because while the test
waits for 3 seconds of gossip refutation about node-1 on node-3,
node-2 can suspect it again. And so on.

Also the test would pass even without indirect pings. Because
node-3 has access to node-1 and node-2. So even if, say, node-1
suspects node-2, then it will tell node-3 about it. Node-3 will
ping node-2, get ack, and will refute the gossip. The refutation
will be then sent to node-1 back. It means indirect pings don't
matter here.

The patch makes a new test, which won't pass without indirect
pings. It uses the existing error injection ERRINJ_SWIM_FD_ONLY,
which allows to turn off all the SWIM components except failure
detection. So only pings and acks are being sent.

Then without proper indirect pings node-1 and node-2 would suspect
each other and declare dead eventually. The new test checks it
does not happen.

Closes #5399

(cherry picked from commit e23b14d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test qa Issues related to tests or testing subsystem
Projects
None yet
Development

No branches or pull requests

2 participants