Skip to content

Crash in greenlet mostly happens on vinyl/update_optimize & box/gh-5422-broken_snapshot tests #260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
avtikhon opened this issue Dec 25, 2020 · 4 comments
Assignees
Milestone

Comments

@avtikhon
Copy link
Contributor

avtikhon commented Dec 25, 2020

Found crash in greenlet mostly happens on vinyl/update_optimize.test.lua test:
https://gitlab.com/tarantool/tarantool/-/jobs/923586259#L5011

[011] vinyl/update_optimize.test.lua                                  
[011] [Instance "vinyl"] Tarantool server failed to start
[011] 
[011] Last 15 lines of Tarantool Log file [Instance "vinyl"][/builds/WHBm2_Sk/0/tarantool/tarantool/test/var/011_vinyl/vinyl.log]:
[011] 2020-12-20 20:51:51.193 [12990] main C> got signal 15 - Terminated
[011] Starting instance vinyl...
[011] Run console at unix/:/builds/WHBm2_Sk/0/tarantool/tarantool/test/var/011_vinyl/vinyl.control
[011] tcp_server: remove dead UNIX socket: /builds/WHBm2_Sk/0/tarantool/tarantool/test/var/011_vinyl/vinyl.control
[011] started
[011] 2020-12-20 20:51:53.258 [23340] main/103/vinyl C> Tarantool 2.7.0-128-g6c094f9023
[011] 2020-12-20 20:51:53.258 [23340] main/103/vinyl C> log level 5
[011] 2020-12-20 20:51:53.259 [23340] main/103/vinyl I> mapping 536870912 bytes for memtx tuple arena...
[011] 2020-12-20 20:51:53.260 [23340] main/103/vinyl I> mapping 536870912 bytes for vinyl tuple arena...
[011] 2020-12-20 20:51:53.293 [23340] main/103/vinyl I> instance uuid 9abf23fd-0ab3-45de-ba4d-dcec55472e11
[011] 2020-12-20 20:51:53.294 [23340] main/103/vinyl I> instance vclock {0: 23, 1: 26395}
[011] 2020-12-20 20:51:53.294 [23340] iproto/101/main I> binary: bound to unix/:(socket)
[011] 2020-12-20 20:51:53.295 [23340] main/103/vinyl vy_log.c:1062 E> ER_MISSING_SNAPSHOT: Can't find snapshot
[011] 2020-12-20 20:51:53.295 [23340] main/103/vinyl F> can't initialize storage: Can't find snapshot
[011] 2020-12-20 20:51:53.295 [23340] main/103/vinyl F> can't initialize storage: Can't find snapshot
[011] 
[011] Worker "011_vinyl" received the following error; stopping...
[011] Traceback (most recent call last):
[011]   File "/builds/WHBm2_Sk/0/tarantool/tarantool/test-run/lib/worker.py", line 316, in run_task
[011]     task, self.server, self.inspector)
[011]   File "/builds/WHBm2_Sk/0/tarantool/tarantool/test-run/lib/test_suite.py", line 261, in run_test
[011]     short_status, result_checksum = test.run(server)
[011]   File "/builds/WHBm2_Sk/0/tarantool/tarantool/test-run/lib/test.py", line 182, in run
[011]     self.execute(server)
[011]   File "/builds/WHBm2_Sk/0/tarantool/tarantool/test-run/lib/tarantool_server.py", line 391, in execute
[011]     ts.stop_nondefault(signal=signal.SIGKILL)
[011]   File "/builds/WHBm2_Sk/0/tarantool/tarantool/test-run/lib/preprocessor.py", line 442, in stop_nondefault
[011]     v.stop(silent=True, signal=signal)
[011]   File "/builds/WHBm2_Sk/0/tarantool/tarantool/test-run/lib/tarantool_server.py", line 1059, in stop
[011]     save_join(self.crash_detector)
[011]   File "/builds/WHBm2_Sk/0/tarantool/tarantool/test-run/lib/tarantool_server.py", line 52, in save_join
[011]     green_obj.get(timeout=timeout)
[011]   File "src/gevent/greenlet.py", line 654, in gevent._greenlet.Greenlet.get
[011]   File "src/gevent/greenlet.py", line 642, in gevent._greenlet.Greenlet.get
[011]   File "src/gevent/_greenlet_primitives.py", line 59, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
[011]   File "src/gevent/_greenlet_primitives.py", line 63, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
[011]   File "src/gevent/__greenlet_primitives.pxd", line 35, in gevent.__greenlet_primitives._greenlet_switch
[011] TarantoolStartError
[011] Exception: 
@kyukhin kyukhin added this to the Q1-21 milestone Feb 12, 2021
@avtikhon avtikhon changed the title Crash in greenlet mostly happens on vinyl/update_optimize.test.lua test Crash in greenlet mostly happens on vinyl/update_optimize & box/gh-5422-broken_snapshot tests Feb 24, 2021
@Totktonada
Copy link
Member

The similar fail occurs on box/gh-5422-broken_snapshot.test.lua on the release_asan_clang11 job with 'can't initialize storage: tx checksum mismatch' message in the tarantool log.

https://github.com/tarantool/tarantool/runs/2021407153

Relevant log chunk
2021-03-03T12:12:06.1442097Z [011] box/gh-5422-broken_snapshot.test.lua                            
2021-03-03T12:12:06.1443061Z [011] [Instance "gh-5422-broken_snapshot"] Tarantool server failed to start
2021-03-03T12:12:06.1443632Z [011] 
2021-03-03T12:12:06.1444718Z [011] Last 15 lines of Tarantool Log file [Instance "gh-5422-broken_snapshot"][/__w/tarantool/tarantool/test/var/011_box/gh-5422-broken_snapshot.log]:
2021-03-03T12:12:06.1448052Z [011] tcp_server: remove dead UNIX socket: /__w/tarantool/tarantool/test/var/011_box/gh-5422-broken_snapshot.control
2021-03-03T12:12:06.1448915Z [011] started
2021-03-03T12:12:06.1449885Z [011] 2021-03-03 12:12:04.919 [17053] main/103/gh-5422-broken_snapshot C> Tarantool 2.8.0-101-g55daa055d
2021-03-03T12:12:06.1451355Z [011] 2021-03-03 12:12:04.919 [17053] main/103/gh-5422-broken_snapshot C> log level 5
2021-03-03T12:12:06.1453125Z [011] 2021-03-03 12:12:04.920 [17053] main/103/gh-5422-broken_snapshot I> mapping 268435456 bytes for memtx tuple arena...
2021-03-03T12:12:06.1454567Z [011] 2021-03-03 12:12:04.920 [17053] main/103/gh-5422-broken_snapshot I> Actual slab_alloc_factor calculated on the basis of desired slab_alloc_factor = 1.044274
2021-03-03T12:12:06.1455922Z [011] 2021-03-03 12:12:04.920 [17053] main/103/gh-5422-broken_snapshot I> mapping 134217728 bytes for vinyl tuple arena...
2021-03-03T12:12:06.1457220Z [011] 2021-03-03 12:12:04.942 [17053] main/103/gh-5422-broken_snapshot I> instance uuid 58bc2160-8fc2-4b8b-94d7-02490c3a4f87
2021-03-03T12:12:06.1458645Z [011] 2021-03-03 12:12:04.943 [17053] main/103/gh-5422-broken_snapshot I> instance vclock {1: 10004}
2021-03-03T12:12:06.1459641Z [011] 2021-03-03 12:12:04.943 [17053] iproto/101/main I> binary: bound to unix/:(socket)
2021-03-03T12:12:06.1460706Z [011] 2021-03-03 12:12:04.943 [17053] main/103/gh-5422-broken_snapshot I> recovery start
2021-03-03T12:12:06.1461996Z [011] 2021-03-03 12:12:04.943 [17053] main/103/gh-5422-broken_snapshot I> recovering from `/__w/tarantool/tarantool/test/var/011_box/gh-5422-broken_snapshot/00000000000000010004.snap'
2021-03-03T12:12:06.1463436Z [011] 2021-03-03 12:12:04.943 [17053] main/103/gh-5422-broken_snapshot xlog.c:1757 E> XlogError: tx checksum mismatch
2021-03-03T12:12:06.1464323Z [011] 2021-03-03 12:12:04.943 [17053] main/103/gh-5422-broken_snapshot F> can't initialize storage: tx checksum mismatch
2021-03-03T12:12:06.1465220Z [011] 2021-03-03 12:12:04.943 [17053] main/103/gh-5422-broken_snapshot F> can't initialize storage: tx checksum mismatch
2021-03-03T12:12:06.1465691Z [011] 
2021-03-03T12:12:06.1466108Z [011] Worker "011_box" received the following error; stopping...
2021-03-03T12:12:06.1466609Z [011] Traceback (most recent call last):
2021-03-03T12:12:06.1467270Z [011]   File "/__w/tarantool/tarantool/test-run/lib/worker.py", line 316, in run_task
2021-03-03T12:12:06.1467849Z [011]     task, self.server, self.inspector)
2021-03-03T12:12:06.1468558Z [011]   File "/__w/tarantool/tarantool/test-run/lib/test_suite.py", line 261, in run_test
2021-03-03T12:12:06.1469158Z [011]     short_status, result_checksum = test.run(server)
2021-03-03T12:12:06.1469849Z [011]   File "/__w/tarantool/tarantool/test-run/lib/test.py", line 182, in run
2021-03-03T12:12:06.1470338Z [011]     self.execute(server)
2021-03-03T12:12:06.1471047Z [011]   File "/__w/tarantool/tarantool/test-run/lib/tarantool_server.py", line 391, in execute
2021-03-03T12:12:06.1471667Z [011]     ts.stop_nondefault(signal=signal.SIGKILL)
2021-03-03T12:12:07.3393169Z [011]   File "/__w/tarantool/tarantool/test-run/lib/preprocessor.py", line 442, in stop_nondefault
2021-03-03T12:12:07.3394110Z [011]     v.stop(silent=True, signal=signal)
2021-03-03T12:12:07.3395210Z [011]   File "/__w/tarantool/tarantool/test-run/lib/tarantool_server.py", line 1059, in stop
2021-03-03T12:12:07.3395953Z [011]     save_join(self.crash_detector)
2021-03-03T12:12:07.3396660Z [011]   File "/__w/tarantool/tarantool/test-run/lib/tarantool_server.py", line 52, in save_join
2021-03-03T12:12:07.3397239Z [011]     green_obj.get(timeout=timeout)
2021-03-03T12:12:07.3397841Z [011]   File "src/gevent/greenlet.py", line 654, in gevent._greenlet.Greenlet.get
2021-03-03T12:12:07.3398596Z [011]   File "src/gevent/greenlet.py", line 642, in gevent._greenlet.Greenlet.get
2021-03-03T12:12:07.3399735Z [011]   File "src/gevent/_greenlet_primitives.py", line 59, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
2021-03-03T12:12:07.3401263Z [011]   File "src/gevent/_greenlet_primitives.py", line 63, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
2021-03-03T12:12:07.3402448Z [011]   File "src/gevent/__greenlet_primitives.pxd", line 35, in gevent.__greenlet_primitives._greenlet_switch
2021-03-03T12:12:07.3403103Z [011] TarantoolStartError

Maybe it is the same as tarantool/tarantool#261 (the traceback looks quite similar).

@Totktonada
Copy link
Member

The first run was on release_asan_clang11 too. And it occurs once again on release_asan_clang11.

@Totktonada
Copy link
Member

tarantool/tarantool#5862 is merged now, so I guess we'll need to revert it to reproduce the test-run side problem.

@kyukhin kyukhin modified the milestones: Q1-21, Q2-21 Apr 12, 2021
@avtikhon
Copy link
Contributor Author

avtikhon commented Jun 18, 2021

avtikhon added a commit that referenced this issue Jun 18, 2021
Found that the root cause of the issues happened with vinyl tests were
backside effects of the not correct test 'vinyl/gh.test.lua' which
leaved Tarantool worker process in inconsistent state. After it any
other next test on the same Tarantool worker process could fail on
running testings with snapshots calls, like tarantool/tarantool-qa#126:

  error: Snapshot is already in progress

Either restarting Tarantool worker process could fail on stopping it,
like #261 and tarantool/tarantool#5141:

  E> failed to process vylog record: delete_slice{slice_id=115, }
  E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 115 deleted but not registered

Decided to avoid of such situations for now and in the future all tests
must be run after Tarantool worker default server process was restarted.
This patch moves this server restart call from check if the test failed
to the common part of the tests run loop.

Part of tarantool/tarantool-qa#97
Fixes #260
Fixes #261
Needed for tarantool/tarantool#5089
Closes tarantool/tarantool-qa#126
avtikhon added a commit that referenced this issue Jun 18, 2021
Found that the root cause of the issues happened with vinyl tests were
backside effects of the not correct test 'vinyl/gh.test.lua' which
leaved Tarantool worker process in inconsistent state. After it any
other next test on the same Tarantool worker process could fail on
running testings with snapshots calls, like tarantool/tarantool-qa#126:

  error: Snapshot is already in progress

Either restarting Tarantool worker process could fail on stopping it,
like #261 and tarantool/tarantool#5141:

  E> failed to process vylog record: delete_slice{slice_id=115, }
  E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 115 deleted but not registered

Decided to avoid of such situations for now and in the future all tests
must be run after Tarantool worker default server process was restarted.
This patch moves this server restart call from check if the test failed
to the common part of the tests run loop.

Fixes #260
Fixes #261
Needed for tarantool/tarantool#5089
Closes tarantool/tarantool-qa#126
@avtikhon avtikhon self-assigned this Jun 18, 2021
Totktonada pushed a commit that referenced this issue Jun 21, 2021
Found that the root cause of the issues happened with vinyl tests were
backside effects of the not correct test 'vinyl/gh.test.lua' which
leaved Tarantool worker process in inconsistent state. After it any
other next test on the same Tarantool worker process could fail on
running testings with snapshots calls, like tarantool/tarantool-qa#126:

  error: Snapshot is already in progress

Either restarting Tarantool worker process could fail on stopping it,
like #261 and tarantool/tarantool#5141:

  E> failed to process vylog record: delete_slice{slice_id=115, }
  E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 115 deleted but not registered

Decided to avoid of such situations for now and in the future all tests
must be run after Tarantool worker default server process was restarted.
This patch moves this server restart call from check if the test failed
to the common part of the tests run loop.

Fixes #260
Fixes #261
Needed for tarantool/tarantool#5089
Closes tarantool/tarantool-qa#126
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants