-
Notifications
You must be signed in to change notification settings - Fork 0
test: flaky vinyl/gh-3395-read-prepared-uncommitted.test.lua test #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
avtikhon
referenced
this issue
in tarantool/tarantool
Sep 28, 2020
Added for tests with issues: app/fiber.test.lua gh-5341 app-tap/http_client.test.lua gh-5346 box/lua.test.lua gh-5351 replication/autobootstrap.test.lua gh-4533 replication/autobootstrap_guest.test.lua gh-4533 replication/ddl.test.lua gh-5337 replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940 replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343 replication/long_row_timeout.test.lua gh-4351 replication/on_replace.test.lua gh-5344, gh-5349 replication/qsync_advanced.test.lua gh-5340 replication/replicaset_ro_mostly.test.lua gh-5342 replication/wal_rw_stress.test.lua gh-5347 sql-tap/selectG.test.lua gh-5350 vinyl/ddl.test.lua gh-5338 vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197 vinyl/iterator.test.lua gh-5336 xlog/panic_on_wal_error.test.lua gh-5348
avtikhon
referenced
this issue
in tarantool/tarantool
Sep 28, 2020
Added for tests with issues: app/fiber.test.lua gh-5341 app-tap/debug.test.lua gh-5346 app-tap/http_client.test.lua gh-5346 app-tap/inspector.test.lua gh-5346 box/hash_collation.test.lua gh-5247 box/lua.test.lua gh-5351 box/net.box_on_schema_reload-gh-1904.test.lua gh-5354 box/protocol.test.lua gh-5247 box/update.test.lua gh-5247 replication/autobootstrap.test.lua gh-4533 replication/autobootstrap_guest.test.lua gh-4533 replication/ddl.test.lua gh-5337 replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940 replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357 replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343 replication/long_row_timeout.test.lua gh-4351 replication/on_replace.test.lua gh-5344, gh-5349 replication/qsync_advanced.test.lua gh-5340 replication/qsync_basic.test.lua gh-5355 replication/replicaset_ro_mostly.test.lua gh-5342 replication/wal_rw_stress.test.lua gh-5347 sql-tap/selectG.test.lua gh-5350 vinyl/ddl.test.lua gh-5338 vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197 vinyl/iterator.test.lua gh-5336 vinyl/write_iterator_rand.test.lua gh-5356 xlog/panic_on_wal_error.test.lua gh-5348
avtikhon
referenced
this issue
in tarantool/tarantool
Sep 28, 2020
Added for tests with issues: app/fiber.test.lua gh-5341 app-tap/debug.test.lua gh-5346 app-tap/http_client.test.lua gh-5346 app-tap/inspector.test.lua gh-5346 box/gh-2763-session-credentials-update.test.lua gh-5363 box/hash_collation.test.lua gh-5247 box/lua.test.lua gh-5351 box/net.box_connect_triggers_gh-2858.test.lua gh-5247 box/net.box_incompatible_index-gh-1729.test.lua gh-5360 box/net.box_on_schema_reload-gh-1904.test.lua gh-5354 box/protocol.test.lua gh-5247 box/update.test.lua gh-5247 box-tap/net.box.test.lua gh-5346 replication/autobootstrap.test.lua gh-4533 replication/autobootstrap_guest.test.lua gh-4533 replication/ddl.test.lua gh-5337 replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940 replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357 replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343 replication/long_row_timeout.test.lua gh-4351 replication/on_replace.test.lua gh-5344, gh-5349 replication/prune.test.lua gh-5361 replication/qsync_advanced.test.lua gh-5340 replication/qsync_basic.test.lua gh-5355 replication/replicaset_ro_mostly.test.lua gh-5342 replication/wal_rw_stress.test.lua gh-5347 replication-py/multi.test.py gh-5362 sql/prepared.test.lua test gh-5359 sql-tap/selectG.test.lua gh-5350 vinyl/ddl.test.lua gh-5338 vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197 vinyl/iterator.test.lua gh-5336 vinyl/write_iterator_rand.test.lua gh-5356 xlog/panic_on_wal_error.test.lua gh-5348
kyukhin
referenced
this issue
in tarantool/tarantool
Sep 28, 2020
Added for tests with issues: app/fiber.test.lua gh-5341 app-tap/debug.test.lua gh-5346 app-tap/http_client.test.lua gh-5346 app-tap/inspector.test.lua gh-5346 box/gh-2763-session-credentials-update.test.lua gh-5363 box/hash_collation.test.lua gh-5247 box/lua.test.lua gh-5351 box/net.box_connect_triggers_gh-2858.test.lua gh-5247 box/net.box_incompatible_index-gh-1729.test.lua gh-5360 box/net.box_on_schema_reload-gh-1904.test.lua gh-5354 box/protocol.test.lua gh-5247 box/update.test.lua gh-5247 box-tap/net.box.test.lua gh-5346 replication/autobootstrap.test.lua gh-4533 replication/autobootstrap_guest.test.lua gh-4533 replication/ddl.test.lua gh-5337 replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940 replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357 replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343 replication/long_row_timeout.test.lua gh-4351 replication/on_replace.test.lua gh-5344, gh-5349 replication/prune.test.lua gh-5361 replication/qsync_advanced.test.lua gh-5340 replication/qsync_basic.test.lua gh-5355 replication/replicaset_ro_mostly.test.lua gh-5342 replication/wal_rw_stress.test.lua gh-5347 replication-py/multi.test.py gh-5362 sql/prepared.test.lua test gh-5359 sql-tap/selectG.test.lua gh-5350 vinyl/ddl.test.lua gh-5338 vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197 vinyl/iterator.test.lua gh-5336 vinyl/write_iterator_rand.test.lua gh-5356 xlog/panic_on_wal_error.test.lua gh-5348
kyukhin
referenced
this issue
in tarantool/tarantool
Sep 28, 2020
Added for tests with issues: app/fiber.test.lua gh-5341 app-tap/debug.test.lua gh-5346 app-tap/http_client.test.lua gh-5346 app-tap/inspector.test.lua gh-5346 box/gh-2763-session-credentials-update.test.lua gh-5363 box/hash_collation.test.lua gh-5247 box/lua.test.lua gh-5351 box/net.box_connect_triggers_gh-2858.test.lua gh-5247 box/net.box_incompatible_index-gh-1729.test.lua gh-5360 box/net.box_on_schema_reload-gh-1904.test.lua gh-5354 box/protocol.test.lua gh-5247 box/update.test.lua gh-5247 box-tap/net.box.test.lua gh-5346 replication/autobootstrap.test.lua gh-4533 replication/autobootstrap_guest.test.lua gh-4533 replication/ddl.test.lua gh-5337 replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940 replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357 replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343 replication/long_row_timeout.test.lua gh-4351 replication/on_replace.test.lua gh-5344, gh-5349 replication/prune.test.lua gh-5361 replication/qsync_advanced.test.lua gh-5340 replication/qsync_basic.test.lua gh-5355 replication/replicaset_ro_mostly.test.lua gh-5342 replication/wal_rw_stress.test.lua gh-5347 replication-py/multi.test.py gh-5362 sql/prepared.test.lua test gh-5359 sql-tap/selectG.test.lua gh-5350 vinyl/ddl.test.lua gh-5338 vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197 vinyl/iterator.test.lua gh-5336 vinyl/write_iterator_rand.test.lua gh-5356 xlog/panic_on_wal_error.test.lua gh-5348 (cherry picked from commit 75ba744)
kyukhin
referenced
this issue
in tarantool/tarantool
Sep 28, 2020
Added for tests with issues: app/fiber.test.lua gh-5341 app-tap/debug.test.lua gh-5346 app-tap/http_client.test.lua gh-5346 app-tap/inspector.test.lua gh-5346 box/gh-2763-session-credentials-update.test.lua gh-5363 box/hash_collation.test.lua gh-5247 box/lua.test.lua gh-5351 box/net.box_connect_triggers_gh-2858.test.lua gh-5247 box/net.box_incompatible_index-gh-1729.test.lua gh-5360 box/net.box_on_schema_reload-gh-1904.test.lua gh-5354 box/protocol.test.lua gh-5247 box/update.test.lua gh-5247 box-tap/net.box.test.lua gh-5346 replication/autobootstrap.test.lua gh-4533 replication/autobootstrap_guest.test.lua gh-4533 replication/ddl.test.lua gh-5337 replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940 replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357 replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343 replication/long_row_timeout.test.lua gh-4351 replication/on_replace.test.lua gh-5344, gh-5349 replication/prune.test.lua gh-5361 replication/qsync_advanced.test.lua gh-5340 replication/qsync_basic.test.lua gh-5355 replication/replicaset_ro_mostly.test.lua gh-5342 replication/wal_rw_stress.test.lua gh-5347 replication-py/multi.test.py gh-5362 sql/prepared.test.lua test gh-5359 sql-tap/selectG.test.lua gh-5350 vinyl/ddl.test.lua gh-5338 vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197 vinyl/iterator.test.lua gh-5336 vinyl/write_iterator_rand.test.lua gh-5356 xlog/panic_on_wal_error.test.lua gh-5348 (cherry picked from commit 75ba744)
kyukhin
referenced
this issue
in tarantool/tarantool
Sep 28, 2020
Added for tests with issues: app/fiber.test.lua gh-5341 app-tap/debug.test.lua gh-5346 app-tap/http_client.test.lua gh-5346 app-tap/inspector.test.lua gh-5346 box/gh-2763-session-credentials-update.test.lua gh-5363 box/hash_collation.test.lua gh-5247 box/lua.test.lua gh-5351 box/net.box_connect_triggers_gh-2858.test.lua gh-5247 box/net.box_incompatible_index-gh-1729.test.lua gh-5360 box/net.box_on_schema_reload-gh-1904.test.lua gh-5354 box/protocol.test.lua gh-5247 box/update.test.lua gh-5247 box-tap/net.box.test.lua gh-5346 replication/autobootstrap.test.lua gh-4533 replication/autobootstrap_guest.test.lua gh-4533 replication/ddl.test.lua gh-5337 replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940 replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357 replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343 replication/long_row_timeout.test.lua gh-4351 replication/on_replace.test.lua gh-5344, gh-5349 replication/prune.test.lua gh-5361 replication/qsync_advanced.test.lua gh-5340 replication/qsync_basic.test.lua gh-5355 replication/replicaset_ro_mostly.test.lua gh-5342 replication/wal_rw_stress.test.lua gh-5347 replication-py/multi.test.py gh-5362 sql/prepared.test.lua test gh-5359 sql-tap/selectG.test.lua gh-5350 vinyl/ddl.test.lua gh-5338 vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197 vinyl/iterator.test.lua gh-5336 vinyl/write_iterator_rand.test.lua gh-5356 xlog/panic_on_wal_error.test.lua gh-5348 (cherry picked from commit 75ba744)
Not reproduced on MCS with SSD neither in Github Actions. |
avtikhon
referenced
this issue
in tarantool/tarantool
Jun 16, 2021
Found that the root cause of the issues happened with vinyl tests were backside effects of the not correct test 'vinyl/gh.test.lua' which leaved Tarantool worker process in inconsistent state. After it any other next test on the same Tarantool worker process could fail on running testings with snapshots calls, like tarantool/tarantool-qa#126: error: Snapshot is already in progress Either restarting Tarantool worker process could fail on stopping it, like tarantool/test-run#261 and #5141: E> failed to process vylog record: delete_slice{slice_id=115, } E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 115 deleted but not registered Decided to remove all vinyl tests from 'fragile' list except test 'gh.test.lua', which should be improved before, to be able to run it with the other tests. And 'gh-5141-invalid-vylog-file.test.lua' test which checks this issue and can be removed after the fix will be done. Part of tarantool/tarantool-qa#97 Closes #4168 Closes #4309 Closes #4346 Closes #4572 Closes #4979 Closes #4984 Closes #4985 Closes #4993 Closes #5141 Closes #5197 Closes #5336 Closes #5338 Closes #5356 Closes #5377 Closes #5378 Closes #5383 Closes #5408 Closes #5539 Closes #5584 Closes #5586
avtikhon
referenced
this issue
in tarantool/tarantool
Jun 22, 2021
#4309: tx_gap_lock.test.lua #4168: throttle.test.lua Perf test not reproduced on fast hosts. #4993: errinj_ddl.test.lua #5338: ddl.test.lua #5197: gh-3395-read-prepared-uncommitted.test.lua #4985: replica_rejoin.test.lua #5539: errinj_tx.test.lua Closes #4309 Closes #4168 Closes #4993 Closes #5338 Closes #5197 Closes #4985 Closes #5539
avtikhon
referenced
this issue
in tarantool/tarantool
Jun 24, 2021
#4309: tx_gap_lock.test.lua #4168: throttle.test.lua Perf test not reproduced on fast hosts. #4993: errinj_ddl.test.lua #5338: ddl.test.lua #5197: gh-3395-read-prepared-uncommitted.test.lua #4985: replica_rejoin.test.lua #5539: errinj_tx.test.lua Closes #4309 Closes #4168 Closes #4993 Closes #5338 Closes #5197 Closes #4985 Closes #5539
locker
added a commit
to locker/tarantool
that referenced
this issue
Feb 14, 2025
….lua The test fails on aarch64 CI runners with errors like this: ``` [113] vinyl/tarantoolgh-3395-read-prepared-uncommitted.test.l> [ fail ] [113] [113] Test failed! Result content mismatch: [113] --- vinyl/tarantoolgh-3395-read-prepared-uncommitted.result Wed Feb 12 10:13:31 2025 [113] +++ /tmp/t/rejects/vinyl/tarantoolgh-3395-read-prepared-uncommitted.reject Wed Feb 12 10:24:09 2025 [113] @@ -119,17 +119,18 @@ [113] -- [113] read_prepared_with_delay(false) [113] | --- [113] - | - - [2, 2] [113] - | - [3, 2] [113] - | ... [113] --- 2. Tuple is not rolled back so it is visible to all transactions. [113] --- [113] -read_prepared_with_delay(true) [113] - | --- [113] | - - [1, 2] [113] | - [2, 2] [113] | - [3, 2] [113] | ... [113] +-- 2. Tuple is not rolled back so it is visible to all transactions. [113] +-- [113] +read_prepared_with_delay(true) [113] + | --- [113] + | - - [1, 2] [113] + | - [2, 2] [113] + | - [3, 2] [113] + | ... [113] [113] -- Give WAL thread time to catch up. [113] -- [113] @@ -139,7 +140,8 @@ [113] [113] sk:select{2} [113] | --- [113] - | - - [2, 2] [113] + | - - [1, 2] [113] + | - [2, 2] [113] | - [3, 2] [113] | ... ``` Here's why it happens. Note the sleep between clearing ERRINJ_WAL_DELAY and ERRINJ_VY_READ_PAGE_DELAY injections in read_prepared_with_delay. Usually, it should be enough for the WAL rollback to complete but it looks like on aarch64 it isn't. As a result, the reader returns a prepared tuple before it's rolled back. Instead of sleeping, let's join the writer fiber to make sure the rollback is complete before the reader resumes. The patch isn't as simple as that because of the way the flag is_tx_faster_than_wal is handled in read_prepared_with_delay. It emulates the situation when the reader actually manages to read a prepared tuple because of a slow WAL with another error injection ERRINJ_RELAY_FASTER_THAN_TX, which prevents the writer fiber from returning to the TX thread after receiving a WAL error, so we can't just join it. Actually, using this error injection seems pointless. Instead we could clear ERRINJ_WAL_DELAY after finishing the read, and this is what this patch does. A couple more notes: - Drop the channels used for starting a read fiber and getting the read result. We can get the result with fiber.join() while stalling the fiber at startup seems pointless - a yield should be enough. - Set the defer_deletes flag on the test space to make sure the writer doesn't yield on disk read, otherwise the test may hang. It didn't hang already, because the disk read was luckily reflected by the bloom filter. BTW deferred DELETEs were enabled for all spaces when the test was introduced. Closes tarantool/tarantool-qa#202 NO_DOC=testing NO_CHANGELOG=testing
locker
added a commit
to tarantool/tarantool
that referenced
this issue
Feb 17, 2025
The test fails on aarch64 CI runners with errors like this: ``` [113] vinyl/gh-3395-read-prepared-uncommitted.test.l> [ fail ] [113] [113] Test failed! Result content mismatch: [113] --- vinyl/gh-3395-read-prepared-uncommitted.result Wed Feb 12 10:13:31 2025 [113] +++ /tmp/t/rejects/vinyl/gh-3395-read-prepared-uncommitted.reject Wed Feb 12 10:24:09 2025 [113] @@ -119,17 +119,18 @@ [113] -- [113] read_prepared_with_delay(false) [113] | --- [113] - | - - [2, 2] [113] - | - [3, 2] [113] - | ... [113] --- 2. Tuple is not rolled back so it is visible to all transactions. [113] --- [113] -read_prepared_with_delay(true) [113] - | --- [113] | - - [1, 2] [113] | - [2, 2] [113] | - [3, 2] [113] | ... [113] +-- 2. Tuple is not rolled back so it is visible to all transactions. [113] +-- [113] +read_prepared_with_delay(true) [113] + | --- [113] + | - - [1, 2] [113] + | - [2, 2] [113] + | - [3, 2] [113] + | ... [113] [113] -- Give WAL thread time to catch up. [113] -- [113] @@ -139,7 +140,8 @@ [113] [113] sk:select{2} [113] | --- [113] - | - - [2, 2] [113] + | - - [1, 2] [113] + | - [2, 2] [113] | - [3, 2] [113] | ... ``` Here's why it happens. Note the sleep between clearing ERRINJ_WAL_DELAY and ERRINJ_VY_READ_PAGE_DELAY injections in read_prepared_with_delay. Usually, it should be enough for the WAL rollback to complete but it looks like on aarch64 it isn't. As a result, the reader returns a prepared tuple before it's rolled back. Instead of sleeping, let's join the writer fiber to make sure the rollback is complete before the reader resumes. The patch isn't as simple as that because of the way the flag is_tx_faster_than_wal is handled in read_prepared_with_delay. It emulates the situation when the reader actually manages to read a prepared tuple because of a slow WAL with another error injection ERRINJ_RELAY_FASTER_THAN_TX, which prevents the writer fiber from returning to the TX thread after receiving a WAL error, so we can't just join it. Actually, using this error injection seems pointless. Instead we could clear ERRINJ_WAL_DELAY after finishing the read, and this is what this patch does. A couple more notes: - Drop the channels used for starting a read fiber and getting the read result. We can get the result with fiber.join() while stalling the fiber at startup seems pointless - a yield should be enough. - Set the defer_deletes flag on the test space to make sure the writer doesn't yield on disk read, otherwise the test may hang. It didn't hang already, because the disk read was luckily reflected by the bloom filter. BTW deferred DELETEs were enabled for all spaces when the test was introduced. Closes tarantool/tarantool-qa#202 NO_DOC=testing NO_CHANGELOG=testing (cherry picked from commit b801814)
locker
added a commit
to tarantool/tarantool
that referenced
this issue
Feb 17, 2025
The test fails on aarch64 CI runners with errors like this: ``` [113] vinyl/gh-3395-read-prepared-uncommitted.test.l> [ fail ] [113] [113] Test failed! Result content mismatch: [113] --- vinyl/gh-3395-read-prepared-uncommitted.result Wed Feb 12 10:13:31 2025 [113] +++ /tmp/t/rejects/vinyl/gh-3395-read-prepared-uncommitted.reject Wed Feb 12 10:24:09 2025 [113] @@ -119,17 +119,18 @@ [113] -- [113] read_prepared_with_delay(false) [113] | --- [113] - | - - [2, 2] [113] - | - [3, 2] [113] - | ... [113] --- 2. Tuple is not rolled back so it is visible to all transactions. [113] --- [113] -read_prepared_with_delay(true) [113] - | --- [113] | - - [1, 2] [113] | - [2, 2] [113] | - [3, 2] [113] | ... [113] +-- 2. Tuple is not rolled back so it is visible to all transactions. [113] +-- [113] +read_prepared_with_delay(true) [113] + | --- [113] + | - - [1, 2] [113] + | - [2, 2] [113] + | - [3, 2] [113] + | ... [113] [113] -- Give WAL thread time to catch up. [113] -- [113] @@ -139,7 +140,8 @@ [113] [113] sk:select{2} [113] | --- [113] - | - - [2, 2] [113] + | - - [1, 2] [113] + | - [2, 2] [113] | - [3, 2] [113] | ... ``` Here's why it happens. Note the sleep between clearing ERRINJ_WAL_DELAY and ERRINJ_VY_READ_PAGE_DELAY injections in read_prepared_with_delay. Usually, it should be enough for the WAL rollback to complete but it looks like on aarch64 it isn't. As a result, the reader returns a prepared tuple before it's rolled back. Instead of sleeping, let's join the writer fiber to make sure the rollback is complete before the reader resumes. The patch isn't as simple as that because of the way the flag is_tx_faster_than_wal is handled in read_prepared_with_delay. It emulates the situation when the reader actually manages to read a prepared tuple because of a slow WAL with another error injection ERRINJ_RELAY_FASTER_THAN_TX, which prevents the writer fiber from returning to the TX thread after receiving a WAL error, so we can't just join it. Actually, using this error injection seems pointless. Instead we could clear ERRINJ_WAL_DELAY after finishing the read, and this is what this patch does. A couple more notes: - Drop the channels used for starting a read fiber and getting the read result. We can get the result with fiber.join() while stalling the fiber at startup seems pointless - a yield should be enough. - Set the defer_deletes flag on the test space to make sure the writer doesn't yield on disk read, otherwise the test may hang. It didn't hang already, because the disk read was luckily reflected by the bloom filter. BTW deferred DELETEs were enabled for all spaces when the test was introduced. Closes tarantool/tarantool-qa#202 NO_DOC=testing NO_CHANGELOG=testing (cherry picked from commit b801814)
locker
added a commit
to tarantool/tarantool
that referenced
this issue
Feb 17, 2025
The test fails on aarch64 CI runners with errors like this: ``` [113] vinyl/gh-3395-read-prepared-uncommitted.test.l> [ fail ] [113] [113] Test failed! Result content mismatch: [113] --- vinyl/gh-3395-read-prepared-uncommitted.result Wed Feb 12 10:13:31 2025 [113] +++ /tmp/t/rejects/vinyl/gh-3395-read-prepared-uncommitted.reject Wed Feb 12 10:24:09 2025 [113] @@ -119,17 +119,18 @@ [113] -- [113] read_prepared_with_delay(false) [113] | --- [113] - | - - [2, 2] [113] - | - [3, 2] [113] - | ... [113] --- 2. Tuple is not rolled back so it is visible to all transactions. [113] --- [113] -read_prepared_with_delay(true) [113] - | --- [113] | - - [1, 2] [113] | - [2, 2] [113] | - [3, 2] [113] | ... [113] +-- 2. Tuple is not rolled back so it is visible to all transactions. [113] +-- [113] +read_prepared_with_delay(true) [113] + | --- [113] + | - - [1, 2] [113] + | - [2, 2] [113] + | - [3, 2] [113] + | ... [113] [113] -- Give WAL thread time to catch up. [113] -- [113] @@ -139,7 +140,8 @@ [113] [113] sk:select{2} [113] | --- [113] - | - - [2, 2] [113] + | - - [1, 2] [113] + | - [2, 2] [113] | - [3, 2] [113] | ... ``` Here's why it happens. Note the sleep between clearing ERRINJ_WAL_DELAY and ERRINJ_VY_READ_PAGE_DELAY injections in read_prepared_with_delay. Usually, it should be enough for the WAL rollback to complete but it looks like on aarch64 it isn't. As a result, the reader returns a prepared tuple before it's rolled back. Instead of sleeping, let's join the writer fiber to make sure the rollback is complete before the reader resumes. The patch isn't as simple as that because of the way the flag is_tx_faster_than_wal is handled in read_prepared_with_delay. It emulates the situation when the reader actually manages to read a prepared tuple because of a slow WAL with another error injection ERRINJ_RELAY_FASTER_THAN_TX, which prevents the writer fiber from returning to the TX thread after receiving a WAL error, so we can't just join it. Actually, using this error injection seems pointless. Instead we could clear ERRINJ_WAL_DELAY after finishing the read, and this is what this patch does. A couple more notes: - Drop the channels used for starting a read fiber and getting the read result. We can get the result with fiber.join() while stalling the fiber at startup seems pointless - a yield should be enough. - Set the defer_deletes flag on the test space to make sure the writer doesn't yield on disk read, otherwise the test may hang. It didn't hang already, because the disk read was luckily reflected by the bloom filter. BTW deferred DELETEs were enabled for all spaces when the test was introduced. Closes tarantool/tarantool-qa#202 NO_DOC=testing NO_CHANGELOG=testing (cherry picked from commit b801814)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Tarantool version:
Tarantool 2.6.0-7-g5a856023e8
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -fprofile-arcs -ftest-coverage -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -fprofile-arcs -ftest-coverage -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror
OS version:
Linux (Debian 9)
Bug description:
1.
https://gitlab.com/tarantool/tarantool/-/jobs/647911034#L4541
https://gitlab.com/tarantool/tarantool/-/jobs/645672722#L4467
https://gitlab.com/tarantool/tarantool/-/jobs/759156959#L4849
results file checksum: 82156b1f64522ca82685c56e4803a3f7
artifacts.zip
results file checksum: 6ab639ce38b94231c6f0be9a8380d2ff
Steps to reproduce:
Optional (but very desirable):
The text was updated successfully, but these errors were encountered: