Skip to content

test: flaky vinyl/gh-3395-read-prepared-uncommitted.test.lua test #202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
avtikhon opened this issue Jul 22, 2020 · 1 comment · Fixed by tarantool/tarantool#11150
Closed

Comments

@avtikhon
Copy link
Contributor

avtikhon commented Jul 22, 2020

Tarantool version:
Tarantool 2.6.0-7-g5a856023e8
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -fprofile-arcs -ftest-coverage -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -fprofile-arcs -ftest-coverage -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror

OS version:
Linux (Debian 9)

Bug description:
1.
https://gitlab.com/tarantool/tarantool/-/jobs/647911034#L4541
https://gitlab.com/tarantool/tarantool/-/jobs/645672722#L4467
https://gitlab.com/tarantool/tarantool/-/jobs/759156959#L4849

results file checksum: 82156b1f64522ca82685c56e4803a3f7

[050] --- vinyl/gh-3395-read-prepared-uncommitted.result	Fri Jul 17 18:02:18 2020
[050] +++ vinyl/gh-3395-read-prepared-uncommitted.reject	Tue Jul 21 03:35:29 2020
[050] @@ -215,7 +215,8 @@
[050]  --
[050]  gen(param, state)
[050]   | ---
[050] - | - error: The read view is aborted
[050] + | - <iterator state>
[050] + | - [4, 20]
[050]   | ...
[050]  
[050]  fiber.sleep(0.1)
[050] @@ -224,6 +225,7 @@
[050]  sk:select{20}
[050]   | ---
[050]   | - - [2, 20]
[050] + |   - [3, 20]
[050]   |   - [4, 20]
[050]   | ...
[050]  
[050] 
[050] Last 15 lines of Tarantool Log file [Instance "vinyl"][/builds/WHBm2_Sk/0/tarantool/tarantool/test/var/050_vinyl/vinyl.log]:
[050] 2020-07-21 03:35:27.446 [1349] main/108/vinyl.scheduler I> 515/1: dump completed
[050] 2020-07-21 03:35:27.446 [1349] main/108/vinyl.scheduler I> 515/0: dump started
[050] 2020-07-21 03:35:27.451 [1349] vinyl.dump.0/105/task I> writing `/builds/WHBm2_Sk/0/tarantool/tarantool/test/var/050_vinyl/vinyl/515/0/00000000000000000064.run'
[050] 2020-07-21 03:35:27.704 [1349] vinyl.dump.0/105/task I> writing `/builds/WHBm2_Sk/0/tarantool/tarantool/test/var/050_vinyl/vinyl/515/0/00000000000000000064.index'
[050] 2020-07-21 03:35:28.145 [1349] main/108/vinyl.scheduler I> 515/0: dump completed
[050] 2020-07-21 03:35:28.145 [1349] main/108/vinyl.scheduler I> dumped 196847 bytes in 1.4 s, rate 0.1 MB/s
[050] 2020-07-21 03:35:28.145 [1349] main/124/console/unix/: I> vinyl checkpoint completed
[050] 2020-07-21 03:35:28.962 [1349] main I> removed /builds/WHBm2_Sk/0/tarantool/tarantool/test/var/050_vinyl/vinyl/00000000000000000000.snap
[050] 2020-07-21 03:35:28.966 [1349] wal I> removed /builds/WHBm2_Sk/0/tarantool/tarantool/test/var/050_vinyl/vinyl/00000000000000000582.xlog
[050] 2020-07-21 03:35:28.993 [1349] wal/101/main xlog.c:1216 E> ER_INJECTION: Error injection 'xlog write injection'
[050] 2020-07-21 03:35:28.993 [1349] main/133/lua txn.c:876 E> ER_WAL_IO: Failed to write to disk
[050] 2020-07-21 03:35:28.993 [1349] main/133/lua txn.c:876 E> ER_WAL_IO: Failed to write to disk
[050] 2020-07-21 03:35:28.993 [1349] main/109/vinyl.vylog_flusher xlog.c:1216 E> ER_INJECTION: Error injection 'xlog write injection'
[050] 2020-07-21 03:35:28.993 [1349] main/109/vinyl.vylog_flusher vy_log.c:913 E> failed to flush vylog
[050] 2020-07-21 03:35:28.995 [1349] wal I> removed /builds/WHBm2_Sk/0/tarantool/tarantool/test/var/050_vinyl/vinyl/00000000000000000000.xlog
  1. https://gitlab.com/tarantool/tarantool/-/jobs/758902931#L4570

artifacts.zip

results file checksum: 6ab639ce38b94231c6f0be9a8380d2ff

[087] --- vinyl/gh-3395-read-prepared-uncommitted.result	Thu Sep 24 23:25:31 2020
[087] +++ vinyl/gh-3395-read-prepared-uncommitted.reject	Sat Sep 26 11:47:18 2020
[087] @@ -117,8 +117,7 @@
[087]  --
[087]  read_prepared_with_delay(true)
[087]   | ---
[087] - | - - [1, 2]
[087] - |   - [2, 2]
[087] + | - - [2, 2]
[087]   |   - [3, 2]
[087]   | ...
[087]  
[087] @@ -204,7 +203,7 @@
[087]  --
[087]  read_prepared_with_delay(true)
[087]   | ---
[087] - | - [3, 20]
[087] + | - [4, 20]
[087]   | ...
[087]  -- Give WAL thread time to catch up.
[087]  --
[087] 
[087] Last 15 lines of Tarantool Log file [Instance "vinyl"][/builds/4BUsapPU/0/tarantool/tarantool/test/var/087_vinyl/vinyl.log]:
[087] 2020-09-26 11:47:17.765 [29939] main/105/gc I> removed /builds/4BUsapPU/0/tarantool/tarantool/test/var/087_vinyl/vinyl/525/0/00000000000000000128.index
[087] 2020-09-26 11:47:17.765 [29939] main/105/gc I> removed /builds/4BUsapPU/0/tarantool/tarantool/test/var/087_vinyl/vinyl/525/0/00000000000000000128.run
[087] 2020-09-26 11:47:17.776 [29939] main/105/gc I> removed /builds/4BUsapPU/0/tarantool/tarantool/test/var/087_vinyl/vinyl/525/0/00000000000000000124.index
[087] 2020-09-26 11:47:17.776 [29939] main/105/gc I> removed /builds/4BUsapPU/0/tarantool/tarantool/test/var/087_vinyl/vinyl/525/0/00000000000000000124.run
[087] 2020-09-26 11:47:17.777 [29939] main/105/gc I> removed /builds/4BUsapPU/0/tarantool/tarantool/test/var/087_vinyl/vinyl/525/0/00000000000000000126.index
[087] 2020-09-26 11:47:17.778 [29939] main/105/gc I> removed /builds/4BUsapPU/0/tarantool/tarantool/test/var/087_vinyl/vinyl/525/0/00000000000000000126.run
[087] 2020-09-26 11:47:17.778 [29939] wal I> removed /builds/4BUsapPU/0/tarantool/tarantool/test/var/087_vinyl/vinyl/00000000000000000522.xlog
[087] 2020-09-26 11:47:17.783 [29939] wal/101/main xlog.c:1232 E> ER_INJECTION: Error injection 'xlog write injection'
[087] 2020-09-26 11:47:17.783 [29939] main/109/vinyl.vylog_flusher xlog.c:1232 E> ER_INJECTION: Error injection 'xlog write injection'
[087] 2020-09-26 11:47:17.783 [29939] main/109/vinyl.vylog_flusher vy_log.c:913 E> failed to flush vylog
[087] 2020-09-26 11:47:17.783 [29939] main/249/lua txn.c:688 E> ER_WAL_IO: Failed to write to disk
[087] 2020-09-26 11:47:17.783 [29939] main/249/lua txn.c:688 E> ER_WAL_IO: Failed to write to disk
[087] 2020-09-26 11:47:17.960 [29939] wal/101/main xlog.c:1232 E> ER_INJECTION: Error injection 'xlog write injection'
[087] 2020-09-26 11:47:17.986 [29939] main/251/lua txn.c:688 E> ER_WAL_IO: Failed to write to disk
[087] 2020-09-26 11:47:17.986 [29939] main/251/lua txn.c:688 E> ER_WAL_IO: Failed to write to disk

Steps to reproduce:

Optional (but very desirable):

  • coredump
  • backtrace
  • netstat
@avtikhon avtikhon self-assigned this Jul 22, 2020
avtikhon referenced this issue in tarantool/tarantool Sep 28, 2020
Added for tests with issues:
  app/fiber.test.lua				gh-5341
  app-tap/http_client.test.lua			gh-5346
  box/lua.test.lua				gh-5351
  replication/autobootstrap.test.lua		gh-4533
  replication/autobootstrap_guest.test.lua	gh-4533
  replication/ddl.test.lua			gh-5337
  replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
  replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343
  replication/long_row_timeout.test.lua		gh-4351
  replication/on_replace.test.lua		gh-5344, gh-5349
  replication/qsync_advanced.test.lua		gh-5340
  replication/replicaset_ro_mostly.test.lua	gh-5342
  replication/wal_rw_stress.test.lua		gh-5347
  sql-tap/selectG.test.lua			gh-5350
  vinyl/ddl.test.lua				gh-5338
  vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197
  vinyl/iterator.test.lua			gh-5336
  xlog/panic_on_wal_error.test.lua		gh-5348
avtikhon referenced this issue in tarantool/tarantool Sep 28, 2020
Added for tests with issues:
  app/fiber.test.lua				gh-5341
  app-tap/debug.test.lua			gh-5346
  app-tap/http_client.test.lua			gh-5346
  app-tap/inspector.test.lua			gh-5346
  box/hash_collation.test.lua		gh-5247
  box/lua.test.lua				gh-5351
  box/net.box_on_schema_reload-gh-1904.test.lua gh-5354
  box/protocol.test.lua			gh-5247
  box/update.test.lua			gh-5247
  replication/autobootstrap.test.lua		gh-4533
  replication/autobootstrap_guest.test.lua	gh-4533
  replication/ddl.test.lua			gh-5337
  replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357
  replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343
  replication/long_row_timeout.test.lua		gh-4351
  replication/on_replace.test.lua		gh-5344, gh-5349
  replication/qsync_advanced.test.lua		gh-5340
  replication/qsync_basic.test.lua		gh-5355
  replication/replicaset_ro_mostly.test.lua	gh-5342
  replication/wal_rw_stress.test.lua		gh-5347
  sql-tap/selectG.test.lua			gh-5350
  vinyl/ddl.test.lua				gh-5338
  vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197
  vinyl/iterator.test.lua			gh-5336
  vinyl/write_iterator_rand.test.lua	gh-5356
  xlog/panic_on_wal_error.test.lua		gh-5348
avtikhon referenced this issue in tarantool/tarantool Sep 28, 2020
Added for tests with issues:
  app/fiber.test.lua				gh-5341
  app-tap/debug.test.lua			gh-5346
  app-tap/http_client.test.lua			gh-5346
  app-tap/inspector.test.lua			gh-5346
  box/gh-2763-session-credentials-update.test.lua gh-5363
  box/hash_collation.test.lua			gh-5247
  box/lua.test.lua				gh-5351
  box/net.box_connect_triggers_gh-2858.test.lua	gh-5247
  box/net.box_incompatible_index-gh-1729.test.lua gh-5360
  box/net.box_on_schema_reload-gh-1904.test.lua gh-5354
  box/protocol.test.lua				gh-5247
  box/update.test.lua				gh-5247
  box-tap/net.box.test.lua			gh-5346
  replication/autobootstrap.test.lua		gh-4533
  replication/autobootstrap_guest.test.lua	gh-4533
  replication/ddl.test.lua			gh-5337
  replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357
  replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343
  replication/long_row_timeout.test.lua		gh-4351
  replication/on_replace.test.lua		gh-5344, gh-5349
  replication/prune.test.lua			gh-5361
  replication/qsync_advanced.test.lua		gh-5340
  replication/qsync_basic.test.lua		gh-5355
  replication/replicaset_ro_mostly.test.lua	gh-5342
  replication/wal_rw_stress.test.lua		gh-5347
  replication-py/multi.test.py			gh-5362
  sql/prepared.test.lua test			gh-5359
  sql-tap/selectG.test.lua			gh-5350
  vinyl/ddl.test.lua				gh-5338
  vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197
  vinyl/iterator.test.lua			gh-5336
  vinyl/write_iterator_rand.test.lua	gh-5356
  xlog/panic_on_wal_error.test.lua		gh-5348
kyukhin referenced this issue in tarantool/tarantool Sep 28, 2020
Added for tests with issues:
  app/fiber.test.lua				gh-5341
  app-tap/debug.test.lua			gh-5346
  app-tap/http_client.test.lua			gh-5346
  app-tap/inspector.test.lua			gh-5346
  box/gh-2763-session-credentials-update.test.lua gh-5363
  box/hash_collation.test.lua			gh-5247
  box/lua.test.lua				gh-5351
  box/net.box_connect_triggers_gh-2858.test.lua	gh-5247
  box/net.box_incompatible_index-gh-1729.test.lua gh-5360
  box/net.box_on_schema_reload-gh-1904.test.lua gh-5354
  box/protocol.test.lua				gh-5247
  box/update.test.lua				gh-5247
  box-tap/net.box.test.lua			gh-5346
  replication/autobootstrap.test.lua		gh-4533
  replication/autobootstrap_guest.test.lua	gh-4533
  replication/ddl.test.lua			gh-5337
  replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357
  replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343
  replication/long_row_timeout.test.lua		gh-4351
  replication/on_replace.test.lua		gh-5344, gh-5349
  replication/prune.test.lua			gh-5361
  replication/qsync_advanced.test.lua		gh-5340
  replication/qsync_basic.test.lua		gh-5355
  replication/replicaset_ro_mostly.test.lua	gh-5342
  replication/wal_rw_stress.test.lua		gh-5347
  replication-py/multi.test.py			gh-5362
  sql/prepared.test.lua test			gh-5359
  sql-tap/selectG.test.lua			gh-5350
  vinyl/ddl.test.lua				gh-5338
  vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197
  vinyl/iterator.test.lua			gh-5336
  vinyl/write_iterator_rand.test.lua	gh-5356
  xlog/panic_on_wal_error.test.lua		gh-5348
kyukhin referenced this issue in tarantool/tarantool Sep 28, 2020
Added for tests with issues:
  app/fiber.test.lua				gh-5341
  app-tap/debug.test.lua			gh-5346
  app-tap/http_client.test.lua			gh-5346
  app-tap/inspector.test.lua			gh-5346
  box/gh-2763-session-credentials-update.test.lua gh-5363
  box/hash_collation.test.lua			gh-5247
  box/lua.test.lua				gh-5351
  box/net.box_connect_triggers_gh-2858.test.lua	gh-5247
  box/net.box_incompatible_index-gh-1729.test.lua gh-5360
  box/net.box_on_schema_reload-gh-1904.test.lua gh-5354
  box/protocol.test.lua				gh-5247
  box/update.test.lua				gh-5247
  box-tap/net.box.test.lua			gh-5346
  replication/autobootstrap.test.lua		gh-4533
  replication/autobootstrap_guest.test.lua	gh-4533
  replication/ddl.test.lua			gh-5337
  replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357
  replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343
  replication/long_row_timeout.test.lua		gh-4351
  replication/on_replace.test.lua		gh-5344, gh-5349
  replication/prune.test.lua			gh-5361
  replication/qsync_advanced.test.lua		gh-5340
  replication/qsync_basic.test.lua		gh-5355
  replication/replicaset_ro_mostly.test.lua	gh-5342
  replication/wal_rw_stress.test.lua		gh-5347
  replication-py/multi.test.py			gh-5362
  sql/prepared.test.lua test			gh-5359
  sql-tap/selectG.test.lua			gh-5350
  vinyl/ddl.test.lua				gh-5338
  vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197
  vinyl/iterator.test.lua			gh-5336
  vinyl/write_iterator_rand.test.lua	gh-5356
  xlog/panic_on_wal_error.test.lua		gh-5348

(cherry picked from commit 75ba744)
kyukhin referenced this issue in tarantool/tarantool Sep 28, 2020
Added for tests with issues:
  app/fiber.test.lua				gh-5341
  app-tap/debug.test.lua			gh-5346
  app-tap/http_client.test.lua			gh-5346
  app-tap/inspector.test.lua			gh-5346
  box/gh-2763-session-credentials-update.test.lua gh-5363
  box/hash_collation.test.lua			gh-5247
  box/lua.test.lua				gh-5351
  box/net.box_connect_triggers_gh-2858.test.lua	gh-5247
  box/net.box_incompatible_index-gh-1729.test.lua gh-5360
  box/net.box_on_schema_reload-gh-1904.test.lua gh-5354
  box/protocol.test.lua				gh-5247
  box/update.test.lua				gh-5247
  box-tap/net.box.test.lua			gh-5346
  replication/autobootstrap.test.lua		gh-4533
  replication/autobootstrap_guest.test.lua	gh-4533
  replication/ddl.test.lua			gh-5337
  replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357
  replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343
  replication/long_row_timeout.test.lua		gh-4351
  replication/on_replace.test.lua		gh-5344, gh-5349
  replication/prune.test.lua			gh-5361
  replication/qsync_advanced.test.lua		gh-5340
  replication/qsync_basic.test.lua		gh-5355
  replication/replicaset_ro_mostly.test.lua	gh-5342
  replication/wal_rw_stress.test.lua		gh-5347
  replication-py/multi.test.py			gh-5362
  sql/prepared.test.lua test			gh-5359
  sql-tap/selectG.test.lua			gh-5350
  vinyl/ddl.test.lua				gh-5338
  vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197
  vinyl/iterator.test.lua			gh-5336
  vinyl/write_iterator_rand.test.lua	gh-5356
  xlog/panic_on_wal_error.test.lua		gh-5348

(cherry picked from commit 75ba744)
kyukhin referenced this issue in tarantool/tarantool Sep 28, 2020
Added for tests with issues:
  app/fiber.test.lua				gh-5341
  app-tap/debug.test.lua			gh-5346
  app-tap/http_client.test.lua			gh-5346
  app-tap/inspector.test.lua			gh-5346
  box/gh-2763-session-credentials-update.test.lua gh-5363
  box/hash_collation.test.lua			gh-5247
  box/lua.test.lua				gh-5351
  box/net.box_connect_triggers_gh-2858.test.lua	gh-5247
  box/net.box_incompatible_index-gh-1729.test.lua gh-5360
  box/net.box_on_schema_reload-gh-1904.test.lua gh-5354
  box/protocol.test.lua				gh-5247
  box/update.test.lua				gh-5247
  box-tap/net.box.test.lua			gh-5346
  replication/autobootstrap.test.lua		gh-4533
  replication/autobootstrap_guest.test.lua	gh-4533
  replication/ddl.test.lua			gh-5337
  replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357
  replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343
  replication/long_row_timeout.test.lua		gh-4351
  replication/on_replace.test.lua		gh-5344, gh-5349
  replication/prune.test.lua			gh-5361
  replication/qsync_advanced.test.lua		gh-5340
  replication/qsync_basic.test.lua		gh-5355
  replication/replicaset_ro_mostly.test.lua	gh-5342
  replication/wal_rw_stress.test.lua		gh-5347
  replication-py/multi.test.py			gh-5362
  sql/prepared.test.lua test			gh-5359
  sql-tap/selectG.test.lua			gh-5350
  vinyl/ddl.test.lua				gh-5338
  vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197
  vinyl/iterator.test.lua			gh-5336
  vinyl/write_iterator_rand.test.lua	gh-5356
  xlog/panic_on_wal_error.test.lua		gh-5348

(cherry picked from commit 75ba744)
@avtikhon
Copy link
Contributor Author

avtikhon commented Jun 9, 2021

Not reproduced on MCS with SSD neither in Github Actions.

avtikhon referenced this issue in tarantool/tarantool Jun 16, 2021
Found that the root cause of the issues happened with vinyl tests were
backside effects of the not correct test 'vinyl/gh.test.lua' which
leaved Tarantool worker process in inconsistent state. After it any
other next test on the same Tarantool worker process could fail on
running testings with snapshots calls, like tarantool/tarantool-qa#126:

  error: Snapshot is already in progress

Either restarting Tarantool worker process could fail on stopping it,
like tarantool/test-run#261 and #5141:

  E> failed to process vylog record: delete_slice{slice_id=115, }
  E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 115 deleted but not registered

Decided to remove all vinyl tests from 'fragile' list except test
'gh.test.lua', which should be improved before, to be able to run it
with the other tests. And 'gh-5141-invalid-vylog-file.test.lua' test
which checks this issue and can be removed after the fix will be done.

Part of tarantool/tarantool-qa#97
Closes #4168
Closes #4309
Closes #4346
Closes #4572
Closes #4979
Closes #4984
Closes #4985
Closes #4993
Closes #5141
Closes #5197
Closes #5336
Closes #5338
Closes #5356
Closes #5377
Closes #5378
Closes #5383
Closes #5408
Closes #5539
Closes #5584
Closes #5586
avtikhon referenced this issue in tarantool/tarantool Jun 22, 2021
  #4309: tx_gap_lock.test.lua
  #4168: throttle.test.lua
    Perf test not reproduced on fast hosts.
  #4993: errinj_ddl.test.lua
  #5338: ddl.test.lua
  #5197: gh-3395-read-prepared-uncommitted.test.lua
  #4985: replica_rejoin.test.lua
  #5539: errinj_tx.test.lua

Closes #4309
Closes #4168
Closes #4993
Closes #5338
Closes #5197
Closes #4985
Closes #5539
avtikhon referenced this issue in tarantool/tarantool Jun 24, 2021
  #4309: tx_gap_lock.test.lua
  #4168: throttle.test.lua
    Perf test not reproduced on fast hosts.
  #4993: errinj_ddl.test.lua
  #5338: ddl.test.lua
  #5197: gh-3395-read-prepared-uncommitted.test.lua
  #4985: replica_rejoin.test.lua
  #5539: errinj_tx.test.lua

Closes #4309
Closes #4168
Closes #4993
Closes #5338
Closes #5197
Closes #4985
Closes #5539
@ylobankov ylobankov transferred this issue from tarantool/tarantool Apr 15, 2022
locker added a commit to locker/tarantool that referenced this issue Feb 14, 2025
….lua

The test fails on aarch64 CI runners with errors like this:

```
[113] vinyl/tarantoolgh-3395-read-prepared-uncommitted.test.l>               [ fail ]
[113]
[113] Test failed! Result content mismatch:
[113] --- vinyl/tarantoolgh-3395-read-prepared-uncommitted.result	Wed Feb 12 10:13:31 2025
[113] +++ /tmp/t/rejects/vinyl/tarantoolgh-3395-read-prepared-uncommitted.reject	Wed Feb 12 10:24:09 2025
[113] @@ -119,17 +119,18 @@
[113]  --
[113]  read_prepared_with_delay(false)
[113]   | ---
[113] - | - - [2, 2]
[113] - |   - [3, 2]
[113] - | ...
[113] --- 2. Tuple is not rolled back so it is visible to all transactions.
[113] ---
[113] -read_prepared_with_delay(true)
[113] - | ---
[113]   | - - [1, 2]
[113]   |   - [2, 2]
[113]   |   - [3, 2]
[113]   | ...
[113] +-- 2. Tuple is not rolled back so it is visible to all transactions.
[113] +--
[113] +read_prepared_with_delay(true)
[113] + | ---
[113] + | - - [1, 2]
[113] + |   - [2, 2]
[113] + |   - [3, 2]
[113] + | ...
[113]
[113]  -- Give WAL thread time to catch up.
[113]  --
[113] @@ -139,7 +140,8 @@
[113]
[113]  sk:select{2}
[113]   | ---
[113] - | - - [2, 2]
[113] + | - - [1, 2]
[113] + |   - [2, 2]
[113]   |   - [3, 2]
[113]   | ...
```

Here's why it happens. Note the sleep between clearing ERRINJ_WAL_DELAY
and ERRINJ_VY_READ_PAGE_DELAY injections in read_prepared_with_delay.
Usually, it should be enough for the WAL rollback to complete but it
looks like on aarch64 it isn't. As a result, the reader returns
a prepared tuple before it's rolled back. Instead of sleeping, let's
join the writer fiber to make sure the rollback is complete before
the reader resumes.

The patch isn't as simple as that because of the way the flag
is_tx_faster_than_wal is handled in read_prepared_with_delay.
It emulates the situation when the reader actually manages to read
a prepared tuple because of a slow WAL with another error injection
ERRINJ_RELAY_FASTER_THAN_TX, which prevents the writer fiber from
returning to the TX thread after receiving a WAL error, so we can't
just join it. Actually, using this error injection seems pointless.
Instead we could clear ERRINJ_WAL_DELAY after finishing the read,
and this is what this patch does.

A couple more notes:
 - Drop the channels used for starting a read fiber and getting the read
   result. We can get the result with fiber.join() while stalling the
   fiber at startup seems pointless - a yield should be enough.
 - Set the defer_deletes flag on the test space to make sure the writer
   doesn't yield on disk read, otherwise the test may hang. It didn't
   hang already, because the disk read was luckily reflected by the
   bloom filter. BTW deferred DELETEs were enabled for all spaces when
   the test was introduced.

Closes tarantool/tarantool-qa#202

NO_DOC=testing
NO_CHANGELOG=testing
@locker locker assigned locker and unassigned avtikhon Feb 14, 2025
locker added a commit to tarantool/tarantool that referenced this issue Feb 17, 2025
The test fails on aarch64 CI runners with errors like this:

```
[113] vinyl/gh-3395-read-prepared-uncommitted.test.l>               [ fail ]
[113]
[113] Test failed! Result content mismatch:
[113] --- vinyl/gh-3395-read-prepared-uncommitted.result	Wed Feb 12 10:13:31 2025
[113] +++ /tmp/t/rejects/vinyl/gh-3395-read-prepared-uncommitted.reject	Wed Feb 12 10:24:09 2025
[113] @@ -119,17 +119,18 @@
[113]  --
[113]  read_prepared_with_delay(false)
[113]   | ---
[113] - | - - [2, 2]
[113] - |   - [3, 2]
[113] - | ...
[113] --- 2. Tuple is not rolled back so it is visible to all transactions.
[113] ---
[113] -read_prepared_with_delay(true)
[113] - | ---
[113]   | - - [1, 2]
[113]   |   - [2, 2]
[113]   |   - [3, 2]
[113]   | ...
[113] +-- 2. Tuple is not rolled back so it is visible to all transactions.
[113] +--
[113] +read_prepared_with_delay(true)
[113] + | ---
[113] + | - - [1, 2]
[113] + |   - [2, 2]
[113] + |   - [3, 2]
[113] + | ...
[113]
[113]  -- Give WAL thread time to catch up.
[113]  --
[113] @@ -139,7 +140,8 @@
[113]
[113]  sk:select{2}
[113]   | ---
[113] - | - - [2, 2]
[113] + | - - [1, 2]
[113] + |   - [2, 2]
[113]   |   - [3, 2]
[113]   | ...
```

Here's why it happens. Note the sleep between clearing ERRINJ_WAL_DELAY
and ERRINJ_VY_READ_PAGE_DELAY injections in read_prepared_with_delay.
Usually, it should be enough for the WAL rollback to complete but it
looks like on aarch64 it isn't. As a result, the reader returns
a prepared tuple before it's rolled back. Instead of sleeping, let's
join the writer fiber to make sure the rollback is complete before
the reader resumes.

The patch isn't as simple as that because of the way the flag
is_tx_faster_than_wal is handled in read_prepared_with_delay.
It emulates the situation when the reader actually manages to read
a prepared tuple because of a slow WAL with another error injection
ERRINJ_RELAY_FASTER_THAN_TX, which prevents the writer fiber from
returning to the TX thread after receiving a WAL error, so we can't
just join it. Actually, using this error injection seems pointless.
Instead we could clear ERRINJ_WAL_DELAY after finishing the read,
and this is what this patch does.

A couple more notes:
 - Drop the channels used for starting a read fiber and getting the read
   result. We can get the result with fiber.join() while stalling the
   fiber at startup seems pointless - a yield should be enough.
 - Set the defer_deletes flag on the test space to make sure the writer
   doesn't yield on disk read, otherwise the test may hang. It didn't
   hang already, because the disk read was luckily reflected by the
   bloom filter. BTW deferred DELETEs were enabled for all spaces when
   the test was introduced.

Closes tarantool/tarantool-qa#202

NO_DOC=testing
NO_CHANGELOG=testing

(cherry picked from commit b801814)
locker added a commit to tarantool/tarantool that referenced this issue Feb 17, 2025
The test fails on aarch64 CI runners with errors like this:

```
[113] vinyl/gh-3395-read-prepared-uncommitted.test.l>               [ fail ]
[113]
[113] Test failed! Result content mismatch:
[113] --- vinyl/gh-3395-read-prepared-uncommitted.result	Wed Feb 12 10:13:31 2025
[113] +++ /tmp/t/rejects/vinyl/gh-3395-read-prepared-uncommitted.reject	Wed Feb 12 10:24:09 2025
[113] @@ -119,17 +119,18 @@
[113]  --
[113]  read_prepared_with_delay(false)
[113]   | ---
[113] - | - - [2, 2]
[113] - |   - [3, 2]
[113] - | ...
[113] --- 2. Tuple is not rolled back so it is visible to all transactions.
[113] ---
[113] -read_prepared_with_delay(true)
[113] - | ---
[113]   | - - [1, 2]
[113]   |   - [2, 2]
[113]   |   - [3, 2]
[113]   | ...
[113] +-- 2. Tuple is not rolled back so it is visible to all transactions.
[113] +--
[113] +read_prepared_with_delay(true)
[113] + | ---
[113] + | - - [1, 2]
[113] + |   - [2, 2]
[113] + |   - [3, 2]
[113] + | ...
[113]
[113]  -- Give WAL thread time to catch up.
[113]  --
[113] @@ -139,7 +140,8 @@
[113]
[113]  sk:select{2}
[113]   | ---
[113] - | - - [2, 2]
[113] + | - - [1, 2]
[113] + |   - [2, 2]
[113]   |   - [3, 2]
[113]   | ...
```

Here's why it happens. Note the sleep between clearing ERRINJ_WAL_DELAY
and ERRINJ_VY_READ_PAGE_DELAY injections in read_prepared_with_delay.
Usually, it should be enough for the WAL rollback to complete but it
looks like on aarch64 it isn't. As a result, the reader returns
a prepared tuple before it's rolled back. Instead of sleeping, let's
join the writer fiber to make sure the rollback is complete before
the reader resumes.

The patch isn't as simple as that because of the way the flag
is_tx_faster_than_wal is handled in read_prepared_with_delay.
It emulates the situation when the reader actually manages to read
a prepared tuple because of a slow WAL with another error injection
ERRINJ_RELAY_FASTER_THAN_TX, which prevents the writer fiber from
returning to the TX thread after receiving a WAL error, so we can't
just join it. Actually, using this error injection seems pointless.
Instead we could clear ERRINJ_WAL_DELAY after finishing the read,
and this is what this patch does.

A couple more notes:
 - Drop the channels used for starting a read fiber and getting the read
   result. We can get the result with fiber.join() while stalling the
   fiber at startup seems pointless - a yield should be enough.
 - Set the defer_deletes flag on the test space to make sure the writer
   doesn't yield on disk read, otherwise the test may hang. It didn't
   hang already, because the disk read was luckily reflected by the
   bloom filter. BTW deferred DELETEs were enabled for all spaces when
   the test was introduced.

Closes tarantool/tarantool-qa#202

NO_DOC=testing
NO_CHANGELOG=testing

(cherry picked from commit b801814)
locker added a commit to tarantool/tarantool that referenced this issue Feb 17, 2025
The test fails on aarch64 CI runners with errors like this:

```
[113] vinyl/gh-3395-read-prepared-uncommitted.test.l>               [ fail ]
[113]
[113] Test failed! Result content mismatch:
[113] --- vinyl/gh-3395-read-prepared-uncommitted.result	Wed Feb 12 10:13:31 2025
[113] +++ /tmp/t/rejects/vinyl/gh-3395-read-prepared-uncommitted.reject	Wed Feb 12 10:24:09 2025
[113] @@ -119,17 +119,18 @@
[113]  --
[113]  read_prepared_with_delay(false)
[113]   | ---
[113] - | - - [2, 2]
[113] - |   - [3, 2]
[113] - | ...
[113] --- 2. Tuple is not rolled back so it is visible to all transactions.
[113] ---
[113] -read_prepared_with_delay(true)
[113] - | ---
[113]   | - - [1, 2]
[113]   |   - [2, 2]
[113]   |   - [3, 2]
[113]   | ...
[113] +-- 2. Tuple is not rolled back so it is visible to all transactions.
[113] +--
[113] +read_prepared_with_delay(true)
[113] + | ---
[113] + | - - [1, 2]
[113] + |   - [2, 2]
[113] + |   - [3, 2]
[113] + | ...
[113]
[113]  -- Give WAL thread time to catch up.
[113]  --
[113] @@ -139,7 +140,8 @@
[113]
[113]  sk:select{2}
[113]   | ---
[113] - | - - [2, 2]
[113] + | - - [1, 2]
[113] + |   - [2, 2]
[113]   |   - [3, 2]
[113]   | ...
```

Here's why it happens. Note the sleep between clearing ERRINJ_WAL_DELAY
and ERRINJ_VY_READ_PAGE_DELAY injections in read_prepared_with_delay.
Usually, it should be enough for the WAL rollback to complete but it
looks like on aarch64 it isn't. As a result, the reader returns
a prepared tuple before it's rolled back. Instead of sleeping, let's
join the writer fiber to make sure the rollback is complete before
the reader resumes.

The patch isn't as simple as that because of the way the flag
is_tx_faster_than_wal is handled in read_prepared_with_delay.
It emulates the situation when the reader actually manages to read
a prepared tuple because of a slow WAL with another error injection
ERRINJ_RELAY_FASTER_THAN_TX, which prevents the writer fiber from
returning to the TX thread after receiving a WAL error, so we can't
just join it. Actually, using this error injection seems pointless.
Instead we could clear ERRINJ_WAL_DELAY after finishing the read,
and this is what this patch does.

A couple more notes:
 - Drop the channels used for starting a read fiber and getting the read
   result. We can get the result with fiber.join() while stalling the
   fiber at startup seems pointless - a yield should be enough.
 - Set the defer_deletes flag on the test space to make sure the writer
   doesn't yield on disk read, otherwise the test may hang. It didn't
   hang already, because the disk read was luckily reflected by the
   bloom filter. BTW deferred DELETEs were enabled for all spaces when
   the test was introduced.

Closes tarantool/tarantool-qa#202

NO_DOC=testing
NO_CHANGELOG=testing

(cherry picked from commit b801814)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants