Skip to content
This repository was archived by the owner on Jun 23, 2022. It is now read-only.

FAIL_POINT_INJECT_F may lose efficacy in replica_split.cpp #347

Closed
Smityz opened this issue Dec 3, 2019 · 0 comments
Closed

FAIL_POINT_INJECT_F may lose efficacy in replica_split.cpp #347

Smityz opened this issue Dec 3, 2019 · 0 comments
Labels
type/bug This issue reports a bug.

Comments

@Smityz
Copy link
Contributor

Smityz commented Dec 3, 2019

There's a probability that a core dump happens in the unit test of replica_split here .
Unit test info:

W2019-12-03 17:21:58.205 (1575364918205004608 305a) unknown.io-thrd.12378: rpc_engine.cpp:516:start(): [replica] network server started at port 54321, channel = RPC_CHANNEL_TCP, ...
W2019-12-03 17:21:58.205 (1575364918205155193 305a) unknown.io-thrd.12378: rpc_engine.cpp:516:start(): [replica] network server started at port 54321, channel = RPC_CHANNEL_UDP, ...
W2019-12-03 17:21:58.205 (1575364918205239893 3070) replica.io-thrd.12400: task_worker.cpp:120:set_priority(): You may need priviledge to set thread priority. errno = 1
W2019-12-03 17:21:58.205 (1575364918205591377 3073) replica.io-thrd.12403: task_worker.cpp:120:set_priority(): You may need priviledge to set thread priority. errno = 1
W2019-12-03 17:21:58.205 (1575364918205615875 3074) replica.io-thrd.12404: task_worker.cpp:120:set_priority(): You may need priviledge to set thread priority. errno = 1
W2019-12-03 17:21:58.205 (1575364918205630808 3075) replica.io-thrd.12405: task_worker.cpp:120:set_priority(): You may need priviledge to set thread priority. errno = 1
W2019-12-03 17:21:58.206 (1575364918206121988 3079) replica.io-thrd.12409: task_worker.cpp:120:set_priority(): You may need priviledge to set thread priority. errno = 1
Note: Google Test filter = replica_split_test.add_child_succeed
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from replica_split_test
[ RUN      ] replica_split_test.add_child_succeed
got signal id: 11
[       OK ] replica_split_test.add_child_succeed (1 ms)
[----------] 1 test from replica_split_test (1 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1 ms total)
./testrep.sh: line 13: 12378 Segmentation fault      (core dumped) GTEST_FILTER=replica_split_test.add_child_succeed ./dsn.replica.test config-test.ini

Core dump info:

(gdb) bt
#0  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (__str=..., this=0x7fdebeb9f490) at /usr/include/c++/7/bits/basic_string.h:440
#1  std::_Head_base<1ul, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, false>::_Head_base (__h=..., this=0x7fdebeb9f490) at /usr/include/c++/7/tuple:126
#2  std::_Tuple_impl<1ul, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::_Tuple_impl (__head=..., this=0x7fdebeb9f490) at /usr/include/c++/7/tuple:361
#3  std::_Tuple_impl<0ul, std::_Placeholder<1>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::_Tuple_impl (__tail#0=..., __head=..., this=0x7fdebeb9f490)
    at /usr/include/c++/7/tuple:211
#4  std::tuple<std::_Placeholder<1>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::tuple<void, true> (__a2=..., __a1=..., this=0x7fdebeb9f490)
    at /usr/include/c++/7/tuple:948
#5  std::_Bind<void (dsn::replication::replica::*(std::_Placeholder<1>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::_Bind<std::_Placeholder<1> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(void (dsn::replication::replica::*&&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::_Placeholder<1> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
    (__f=<optimized out>, this=0x7fdebeb9f480) at /usr/include/c++/7/functional:534
#6  std::bind<void (dsn::replication::replica::*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::_Placeholder<1> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&> (__f=<optimized out>) at /usr/include/c++/7/functional:879
#7  dsn::replication::replica::child_init_replica (this=0x55f56dc78400, parent_gpid=..., primary_address=..., init_ballot=<optimized out>)
    at /home/smilencer/code/pegasus/rdsn/src/dist/replication/lib/replica_split.cpp:103
#8  0x00007fded2d43801 in std::__invoke_impl<void, void (dsn::replication::replica::*&)(dsn::gpid, dsn::rpc_address, long), dsn::ref_ptr<dsn::replication::replica>&, dsn::gpid&, dsn::rpc_address&, long&>
    (__t=..., __f=
    @0x55f56df44300: (void (dsn::replication::replica::*)(dsn::replication::replica * const, dsn::gpid, dsn::rpc_address, long)) 0x7fded2d25ab0 <dsn::replication::replica::child_init_replica(dsn::gpid, dsn::rpc_address, long)>) at /usr/include/c++/7/bits/invoke.h:73
#9  std::__invoke<void (dsn::replication::replica::*&)(dsn::gpid, dsn::rpc_address, long), dsn::ref_ptr<dsn::replication::replica>&, dsn::gpid&, dsn::rpc_address&, long&> (__fn=
    @0x55f56df44300: (void (dsn::replication::replica::*)(dsn::replication::replica * const, dsn::gpid, dsn::rpc_address, long)) 0x7fded2d25ab0 <dsn::replication::replica::child_init_replica(dsn::gpid, dsn::rpc_address, long)>) at /usr/include/c++/7/bits/invoke.h:95
#10 std::_Bind<void (dsn::replication::replica::*(dsn::ref_ptr<dsn::replication::replica>, dsn::gpid, dsn::rpc_address, long))(dsn::gpid, dsn::rpc_address, long)>::__call<void, , 0ul, 1ul, 2ul, 3ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul, 2ul, 3ul>) (__args=..., this=0x55f56df44300) at /usr/include/c++/7/functional:467
#11 std::_Bind<void (dsn::replication::replica::*(dsn::ref_ptr<dsn::replication::replica>, dsn::gpid, dsn::rpc_address, long))(dsn::gpid, dsn::rpc_address, long)>::operator()<, void>() (
    this=0x55f56df44300) at /usr/include/c++/7/functional:551
#12 std::_Function_handler<void (), std::_Bind<void (dsn::replication::replica::*(dsn::ref_ptr<dsn::replication::replica>, dsn::gpid, dsn::rpc_address, long))(dsn::gpid, dsn::rpc_address, long)> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316
#13 0x00007fded36abab9 in dsn::task::exec_internal (this=0x55f56e046014) at /home/smilencer/code/pegasus/rdsn/src/core/core/task.cpp:180
#14 0x00007fded36c428a in dsn::task_worker::loop (this=0x55f56dc74600) at /home/smilencer/code/pegasus/rdsn/src/core/core/task_worker.cpp:211
#15 0x00007fded36c44a9 in dsn::task_worker::run_internal (this=0x55f56dc74600) at /home/smilencer/code/pegasus/rdsn/src/core/core/task_worker.cpp:191
#16 0x00007fded1d7466f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007fded20476db in start_thread (arg=0x7fdebeba3700) at pthread_create.c:463
#18 0x00007fded17cf88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

We find this bug is caused by an unreliable FAIL_POINT_INJECT_F, which is not corrctly loaded. And then the object for the unit test leaks into formal environment.
In line 103, _app may be moved when replica is closing, so there happened a segement fault.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type/bug This issue reports a bug.
Projects
None yet
Development

No branches or pull requests

2 participants