Skip to content

test: vinyl/errinj flaky fails on high loaded host on box.snapshot() #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
avtikhon opened this issue Jul 15, 2019 · 2 comments · Fixed by tarantool/tarantool#6136
Closed

Comments

@avtikhon
Copy link
Contributor

Tarantool version:
master
OS version:
Ubuntu 18.04

Bug description:
Failed:

[016] --- vinyl/errinj.result	Mon Jul 15 07:18:57 2019
[016] +++ vinyl/errinj.reject	Mon Jul 15 17:00:21 2019
[016] @@ -86,7 +86,7 @@
[016]  ...
[016]  box.snapshot();
[016]  ---
[016] -- ok
[016] +- error: Error injection 'vinyl dump'
[016]  ...
[016]  num_rows = num_rows + range();
[016]  ---
[009] --- vinyl/errinj.result	Mon Jul 15 07:18:57 2019
[009] +++ vinyl/errinj.reject	Mon Jul 15 17:26:04 2019
[009] @@ -352,7 +352,7 @@
[009]  ...
[009]  box.snapshot();
[009]  ---
[009] -- ok
[009] +- error: Error injection 'xlog write injection'
[009]  ...
[009]  #s:select({1})
[009]  ---

All the fails are common and have the following structure:

-- enabling the error injection
errinj.set("ERRINJ_VY_RUN_WRITE", true);
num_rows = num_rows + range();
-- fails due to error injection
box.snapshot();

-- disabling the error injection
errinj.set("ERRINJ_VY_RUN_WRITE", false);
fiber.sleep(0.06);
num_rows = num_rows + range();
-- FAILS HERE: still fails at the error injection, but in real should pass
box.snapshot();

Steps to reproduce:

l=0 ; while time ./test-run.py -j15 `for r in {1..15} ; do echo vinyl ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done
l=0 ; while time ./test-run.py -j10 `for r in {1..150} ; do echo vinyl/errinj.test.lua ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done

Optional (but very desirable):

  • coredump
  • backtrace
  • netstat
@avtikhon
Copy link
Contributor Author

At the 2.4.0-16-gcdf502c66 version the issue was not reproduced, but got the following error:

--- vinyl/errinj.result	Fri Jan 17 13:19:04 2020
+++ var/005_vinyl/errinj.result	Wed Jan 22 12:40:36 2020
@@ -963,7 +963,7 @@
 ...
 box.snapshot()
 ---
-- ok
+- error: 'Invalid VYLOG file: Slice 1079 deleted but not registered'
 ...
 -- Create another run file. This will trigger compaction
 -- as run_count_per_level is set to 1. Due to the error

avtikhon referenced this issue in tarantool/tarantool Dec 6, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 6, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 6, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 7, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 7, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 7, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 7, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 7, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 7, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 8, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 8, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 8, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 8, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 10, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 10, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
avtikhon referenced this issue in tarantool/tarantool Dec 10, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
kyukhin referenced this issue in tarantool/tarantool Dec 11, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
kyukhin referenced this issue in tarantool/tarantool Dec 11, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346

(cherry picked from commit 718cba1)
kyukhin referenced this issue in tarantool/tarantool Dec 11, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346
kyukhin referenced this issue in tarantool/tarantool Dec 11, 2020
Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346

(cherry picked from commit 718cba1)
avtikhon referenced this issue in tarantool/tarantool Dec 13, 2020
In previous commit to vinyl/errinj.results file in commit:

  7ca3512 ("test: add test filter for vinyl/errinj.test.lua")

Was made a mistake - missed single line on cherry-pick. During
1.10 release branch testing the issue was found:

  https://gitlab.com/tarantool/tarantool/-/jobs/907233724#L3994

This patch corrects the mistake.

Follows up #4346
kyukhin referenced this issue in tarantool/tarantool Dec 16, 2020
In previous commit to vinyl/errinj.results file in commit:

  7ca3512 ("test: add test filter for vinyl/errinj.test.lua")

Was made a mistake - missed single line on cherry-pick. During
1.10 release branch testing the issue was found:

  https://gitlab.com/tarantool/tarantool/-/jobs/907233724#L3994

This patch corrects the mistake.

Follows up #4346
@Totktonada Totktonada transferred this issue from tarantool/tarantool Jan 15, 2021
@avtikhon
Copy link
Contributor Author

avtikhon commented Jun 9, 2021

The shown issues above happened because of 3 other issues:

#126
tarantool/test-run#261
tarantool/tarantool#5436

avtikhon added a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Found that the root cause of the issues happened with vinyl tests were
backside effects of the not correct test 'vinyl/gh.test.lua' which
leaved Tarantool worker process in inconsistent state. After it any
other next test on the same Tarantool worker process could fail on
running testings with snapshots calls, like tarantool/tarantool-qa#126:

  error: Snapshot is already in progress

Either restarting Tarantool worker process could fail on stopping it,
like tarantool/test-run#261 and #5141:

  E> failed to process vylog record: delete_slice{slice_id=115, }
  E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 115 deleted but not registered

Decided to remove all vinyl tests from 'fragile' list except test
'gh.test.lua', which should be improved before, to be able to run it
with the other tests. And 'gh-5141-invalid-vylog-file.test.lua' test
which checks this issue and can be removed after the fix will be done.

The following issues were moved to tarantool/tarantool-qa repository:
  #4346 -> tarantool/tarantool-qa#11
  #5408 -> tarantool/tarantool-qa#73
  #5584 -> tarantool/tarantool-qa#21
  #5586 -> tarantool/tarantool-qa#19

Part of tarantool/tarantool-qa#97
Closes tarantool/tarantool-qa#11
Closes #4572
Closes #4979
Closes #4984
Closes #5336
Closes #5356
Closes #5377
Closes #5378
Closes #5383
Closes tarantool/tarantool-qa#73
Closes tarantool/tarantool-qa#21
Closes tarantool/tarantool-qa#19
avtikhon added a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Found that the root cause of the issues happened with vinyl tests were
backside effects of the not correct test 'vinyl/gh.test.lua' which
leaved Tarantool worker process in inconsistent state. After it any
other next test on the same Tarantool worker process could fail on
running testings with snapshots calls, like tarantool/tarantool-qa#126:

  error: Snapshot is already in progress

Either restarting Tarantool worker process could fail on stopping it,
like tarantool/test-run#261 and #5141:

  E> failed to process vylog record: delete_slice{slice_id=115, }
  E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 115 deleted but not registered

Decided to remove all vinyl tests from 'fragile' list except test
'gh.test.lua', which should be improved before, to be able to run it
with the other tests. And 'gh-5141-invalid-vylog-file.test.lua' test
which checks this issue and can be removed after the fix will be done.

The following issues were moved to tarantool/tarantool-qa repository:
  #4346 -> tarantool/tarantool-qa#11
  #5408 -> tarantool/tarantool-qa#73
  #5584 -> tarantool/tarantool-qa#21
  #5586 -> tarantool/tarantool-qa#19

Part of tarantool/tarantool-qa#97
Closes tarantool/tarantool-qa#11
Closes #4572
Closes #4979
Closes #4984
Closes #5336
Closes #5356
Closes #5377
Closes #5378
Closes #5383
Closes tarantool/tarantool-qa#73
Closes tarantool/tarantool-qa#21
Closes tarantool/tarantool-qa#19
kyukhin pushed a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Found that the root cause of the issues happened with vinyl tests were
backside effects of the not correct test 'vinyl/gh.test.lua' which
leaved Tarantool worker process in inconsistent state. After it any
other next test on the same Tarantool worker process could fail on
running testings with snapshots calls, like tarantool/tarantool-qa#126:

  error: Snapshot is already in progress

Either restarting Tarantool worker process could fail on stopping it,
like tarantool/test-run#261 and #5141:

  E> failed to process vylog record: delete_slice{slice_id=115, }
  E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 115 deleted but not registered

Decided to remove all vinyl tests from 'fragile' list except test
'gh.test.lua', which should be improved before, to be able to run it
with the other tests. And 'gh-5141-invalid-vylog-file.test.lua' test
which checks this issue and can be removed after the fix will be done.

The following issues were moved to tarantool/tarantool-qa repository:
  #4346 -> tarantool/tarantool-qa#11
  #5408 -> tarantool/tarantool-qa#73
  #5584 -> tarantool/tarantool-qa#21
  #5586 -> tarantool/tarantool-qa#19

Part of tarantool/tarantool-qa#97
Closes tarantool/tarantool-qa#11
Closes #4572
Closes #4979
Closes #4984
Closes #5336
Closes #5356
Closes #5377
Closes #5378
Closes #5383
Closes tarantool/tarantool-qa#73
Closes tarantool/tarantool-qa#21
Closes tarantool/tarantool-qa#19
kyukhin pushed a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Found that the root cause of the issues happened with vinyl tests were
backside effects of the not correct test 'vinyl/gh.test.lua' which
leaved Tarantool worker process in inconsistent state. After it any
other next test on the same Tarantool worker process could fail on
running testings with snapshots calls, like tarantool/tarantool-qa#126:

  error: Snapshot is already in progress

Either restarting Tarantool worker process could fail on stopping it,
like tarantool/test-run#261 and #5141:

  E> failed to process vylog record: delete_slice{slice_id=115, }
  E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 115 deleted but not registered

Decided to remove all vinyl tests from 'fragile' list except test
'gh.test.lua', which should be improved before, to be able to run it
with the other tests. And 'gh-5141-invalid-vylog-file.test.lua' test
which checks this issue and can be removed after the fix will be done.

The following issues were moved to tarantool/tarantool-qa repository:
  #4346 -> tarantool/tarantool-qa#11
  #5408 -> tarantool/tarantool-qa#73
  #5584 -> tarantool/tarantool-qa#21
  #5586 -> tarantool/tarantool-qa#19

Part of tarantool/tarantool-qa#97
Closes tarantool/tarantool-qa#11
Closes #4572
Closes #4979
Closes #4984
Closes #5336
Closes #5356
Closes #5377
Closes #5378
Closes #5383
Closes tarantool/tarantool-qa#73
Closes tarantool/tarantool-qa#21
Closes tarantool/tarantool-qa#19

(cherry picked from commit f0f53a3)
kyukhin pushed a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Found that the root cause of the issues happened with vinyl tests were
backside effects of the not correct test 'vinyl/gh.test.lua' which
leaved Tarantool worker process in inconsistent state. After it any
other next test on the same Tarantool worker process could fail on
running testings with snapshots calls, like tarantool/tarantool-qa#126:

  error: Snapshot is already in progress

Either restarting Tarantool worker process could fail on stopping it,
like tarantool/test-run#261 and #5141:

  E> failed to process vylog record: delete_slice{slice_id=115, }
  E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 115 deleted but not registered

Decided to remove all vinyl tests from 'fragile' list except test
'gh.test.lua', which should be improved before, to be able to run it
with the other tests. And 'gh-5141-invalid-vylog-file.test.lua' test
which checks this issue and can be removed after the fix will be done.

The following issues were moved to tarantool/tarantool-qa repository:
  #4346 -> tarantool/tarantool-qa#11
  #5408 -> tarantool/tarantool-qa#73
  #5584 -> tarantool/tarantool-qa#21
  #5586 -> tarantool/tarantool-qa#19

Part of tarantool/tarantool-qa#97
Closes tarantool/tarantool-qa#11
Closes #4572
Closes #4979
Closes #4984
Closes #5336
Closes #5356
Closes #5377
Closes #5378
Closes #5383
Closes tarantool/tarantool-qa#73
Closes tarantool/tarantool-qa#21
Closes tarantool/tarantool-qa#19

(cherry picked from commit f0f53a3)
kyukhin pushed a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Found that the root cause of the issues happened with vinyl tests were
backside effects of the not correct test 'vinyl/gh.test.lua' which
leaved Tarantool worker process in inconsistent state. After it any
other next test on the same Tarantool worker process could fail on
running testings with snapshots calls, like tarantool/tarantool-qa#126:

  error: Snapshot is already in progress

Either restarting Tarantool worker process could fail on stopping it,
like tarantool/test-run#261 and #5141:

  E> failed to process vylog record: delete_slice{slice_id=115, }
  E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 115 deleted but not registered

Decided to remove all vinyl tests from 'fragile' list except test
'gh.test.lua', which should be improved before, to be able to run it
with the other tests. And 'gh-5141-invalid-vylog-file.test.lua' test
which checks this issue and can be removed after the fix will be done.

The following issues were moved to tarantool/tarantool-qa repository:
  #4346 -> tarantool/tarantool-qa#11
  #5408 -> tarantool/tarantool-qa#73
  #5584 -> tarantool/tarantool-qa#21
  #5586 -> tarantool/tarantool-qa#19

Part of tarantool/tarantool-qa#97
Closes tarantool/tarantool-qa#11
Closes #4572
Closes #4979
Closes #4984
Closes #5336
Closes #5356
Closes #5377
Closes #5378
Closes #5383
Closes tarantool/tarantool-qa#73
Closes tarantool/tarantool-qa#21
Closes tarantool/tarantool-qa#19

(cherry picked from commit f0f53a3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants