Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16585 tests: Improve NLT checking of ioil metrics. #15179

Merged
merged 2 commits into from
Oct 3, 2024

Conversation

ashleypittman
Copy link
Contributor

Use the ioil statistics to verify read/write counts rather
than just checking for function name.

Signed-off-by: Ashley Pittman ashley.m.pittman@intel.com

Copy link

Ticket title is 'NLT test failures under Ubuntu 22.04'
Status is 'Open'
Labels: 'google-cloud-daos'
https://daosio.atlassian.net/browse/DAOS-16585

@daosbuild1
Copy link
Collaborator

Use the ioil statistics to verify read/write counts rather
than just checking for function name.

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@ashleypittman ashleypittman marked this pull request as ready for review September 26, 2024 13:48
@ashleypittman ashleypittman requested review from a team as code owners September 26, 2024 13:48
@techbasset
Copy link
Contributor

I just ran this on Ubuntu 22.04 and got errors:

Command "cat /tmp/dnt_dfuse_f2xv5xl4/dfuse_mount.f2kgk8n4/36defe5f-0874-4ab1-b79a-3f47a991ebae/test_file" has zero stat count for fstat
Command "cat /tmp/dnt_dfuse_f2xv5xl4/dfuse_mount.f2kgk8n4/36defe5f-0874-4ab1-b79a-3f47a991ebae/test_file" has zero stat count for fstat
Test failure object for NLT
Closed JSON file nlt-errors.json with 4 errors
Traceback (most recent call last):
  File "/home/ncmurphy_google_com/git/daos/utils/node_local_test.py", line 6561, in <module>
    main()
  File "/home/ncmurphy_google_com/git/daos/utils/node_local_test.py", line 6543, in main
    fatal_errors = run(wf, args)
  File "/home/ncmurphy_google_com/git/daos/utils/node_local_test.py", line 6365, in run
    fatal_errors.add_result(run_dfuse(server, conf))
  File "/home/ncmurphy_google_com/git/daos/utils/node_local_test.py", line 4951, in run_dfuse
    create_and_read_via_il(dfuse, cdir)
  File "/home/ncmurphy_google_com/git/daos/utils/node_local_test.py", line 4871, in create_and_read_via_il
    dfuse.il_cmd(['cat', fname], check_write=False)
  File "/home/ncmurphy_google_com/git/daos/utils/node_local_test.py", line 1539, in il_cmd
    log_test(self.conf, log_name, check_read=check_read, check_write=check_write,
  File "/home/ncmurphy_google_com/git/daos/utils/node_local_test.py", line 4762, in log_timer_wrapper
    rc = func(*args, **kwargs)
  File "/home/ncmurphy_google_com/git/daos/utils/node_local_test.py", line 4852, in log_test
    raise NLTestIlZeroCall('fstat')
__main__.NLTestIlZeroCall: Command "cat /tmp/dnt_dfuse_f2xv5xl4/dfuse_mount.f2kgk8n4/36defe5f-0874-4ab1-b79a-3f47a991ebae/test_file" has zero stat count for fstat
Closed JSON file nlt-server-leaks.json with 1 errors

@techbasset
Copy link
Contributor

Weird errors under Rocky 8:

Running PyDAOS container checker
DEBUG 2024/09/27 19:28:05.817897 mgmt_drpc.go:155: handling PoolFindByLabel: label:"NLT"
DEBUG 2024/09/27 19:28:05.818045 mgmt_drpc.go:168: GetPoolSvcResp: uuid:"a02ffd36-f5d6-49ef-9719-765289ac1177" svcreps:0
TRACE 2024/09/27 19:28:05.930925 auth_sys.go:287: pid: 248605 (python3) uid: 1419268536 (ncmurphy_google_com) gid: 1419268536 (ncmurphy_google_com): successfully signed credential
DEBUG 2024/09/27 19:28:05.946277 procmon.go:225: pid:248605 (python3), connect a02ffd36/056f1368
DEBUG 2024/09/27 19:28:06.916393 procmon.go:237: pid:248605 (python3), disconnect a02ffd36/056f1368
Running log_test on /tmp/dnt_pydaos_n6_lfmbg.log 15.8MiB
[ncmurphy-dev:253006:0:253006] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x40)
==== backtrace (tid: 253006) ====
 0  /opt/daos/prereq/release/ucx/lib64/libucs.so.0(ucs_handle_error+0x294) [0x7f8b62d6ee84]
 1  /opt/daos/prereq/release/ucx/lib64/libucs.so.0(+0x2d03c) [0x7f8b62d6f03c]
 2  /opt/daos/prereq/release/ucx/lib64/libucs.so.0(+0x2d2e8) [0x7f8b62d6f2e8]
 3  /lib64/libpthread.so.0(+0x12d20) [0x7f8b8bcd8d20]
 4  /opt/daos/lib64/python3.6/site-packages/pydaos/../../../../lib64/libgurt.so.4(d_hash_table_traverse+0x16) [0x7f8b672d9b96]
 5  /opt/daos/lib64/python3.6/site-packages/pydaos/../../../../lib64/libgurt.so.4(d_hhash_traverse+0x34) [0x7f8b672dc2c4]
 6  /opt/daos/lib64/python3.6/site-packages/pydaos/../../../../lib64/libdaos.so.2(daos_reinit+0x22) [0x7f8b84591862]
 7  /opt/daos/lib64/python3.6/site-packages/pydaos/pydaos_shim.so(+0x2709) [0x7f8b8494e709]
 8  /lib64/libc.so.6(+0x96430) [0x7f8b8b1fc430]
 9  /lib64/libc.so.6(__libc_fork+0x105) [0x7f8b8b26e1f5]
10  /usr/lib64/python3.6/lib-dynload/_posixsubprocess.cpython-36m-x86_64-linux-gnu.so(+0x2926) [0x7f8b88020926]
11  /lib64/libpython3.6m.so.1.0(+0x19c057) [0x7f8b8c082057]
12  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
13  /lib64/libpython3.6m.so.1.0(+0xfa426) [0x7f8b8bfe0426]
14  /lib64/libpython3.6m.so.1.0(+0x17a0f0) [0x7f8b8c0600f0]
15  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
16  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
17  /lib64/libpython3.6m.so.1.0(+0xfa426) [0x7f8b8bfe0426]
18  /lib64/libpython3.6m.so.1.0(_PyFunction_FastCallDict+0x31a) [0x7f8b8bfe11ba]
19  /lib64/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x70e) [0x7f8b8bfe1d9e]
20  /lib64/libpython3.6m.so.1.0(+0x10e040) [0x7f8b8bff4040]
21  /lib64/libpython3.6m.so.1.0(+0x1888e1) [0x7f8b8c06e8e1]
22  /lib64/libpython3.6m.so.1.0(_PyObject_FastCallKeywords+0x482) [0x7f8b8c081902]
23  /lib64/libpython3.6m.so.1.0(+0x19c436) [0x7f8b8c082436]
24  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
25  /lib64/libpython3.6m.so.1.0(+0x179f08) [0x7f8b8c05ff08]
26  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
27  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
28  /lib64/libpython3.6m.so.1.0(+0xf9ab4) [0x7f8b8bfdfab4]
29  /lib64/libpython3.6m.so.1.0(+0x19b0cf) [0x7f8b8c0810cf]
30  /lib64/libpython3.6m.so.1.0(PyObject_Call+0x4b) [0x7f8b8bfe8b6b]
31  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x236b) [0x7f8b8c084e1b]
32  /lib64/libpython3.6m.so.1.0(+0xfa426) [0x7f8b8bfe0426]
33  /lib64/libpython3.6m.so.1.0(+0x17a0f0) [0x7f8b8c0600f0]
34  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
35  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
36  /lib64/libpython3.6m.so.1.0(+0x179f08) [0x7f8b8c05ff08]
37  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
38  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
39  /lib64/libpython3.6m.so.1.0(+0x179f08) [0x7f8b8c05ff08]
40  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
41  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
42  /lib64/libpython3.6m.so.1.0(+0x179f08) [0x7f8b8c05ff08]
43  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
44  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
45  /lib64/libpython3.6m.so.1.0(+0xf9ab4) [0x7f8b8bfdfab4]
46  /lib64/libpython3.6m.so.1.0(PyEval_EvalCode+0x23) [0x7f8b8bfe0e53]
47  /lib64/libpython3.6m.so.1.0(+0x208c62) [0x7f8b8c0eec62]
48  /lib64/libpython3.6m.so.1.0(PyRun_FileExFlags+0x97) [0x7f8b8bfc01e9]
49  /lib64/libpython3.6m.so.1.0(PyRun_SimpleFileExFlags+0x389) [0x7f8b8bfc53d3]
50  /lib64/libpython3.6m.so.1.0(+0xdfc4d) [0x7f8b8bfc5c4d]
51  python3(main+0x116) [0x557385351b96]
52  /lib64/libc.so.6(__libc_start_main+0xe5) [0x7f8b8b1a07e5]
53  python3(_start+0x2e) [0x557385351d1e]
=================================
Opcode State Transition Tally
    OPCODE    ALLOCATED    SUBMITTED    SENT    COMPLETED    DEALLOCATED
----------  -----------  -----------  ------  -----------  -------------
 0x1030004            2            4       4            0              0
 0x2060001            1            1       1            0              0
 0x3080002            1            1       1            0              0
 0x3080004            1            1       1            0              0
 0x3080005            1            1       1            0              0
 0x308000d            1            2       2            0              0
 0x3080011            1            4       4            0              0
 0x40a0000            4          104     104            0              0
 0x40a0001           74          207     207            0              0
 0x40a0002            5          129     129            0              0
 0x40a0003          937         1024    1024            0              0
0xff040001            4            4       4            0              0
ERROR: Opcode 0x1030004: Alloc'd Total = 2, Dealloc'd Total = 0
ERROR: Opcode 0x2060001: Alloc'd Total = 1, Dealloc'd Total = 0
ERROR: Opcode 0x3080002: Alloc'd Total = 1, Dealloc'd Total = 0
ERROR: Opcode 0x3080004: Alloc'd Total = 1, Dealloc'd Total = 0
ERROR: Opcode 0x3080005: Alloc'd Total = 1, Dealloc'd Total = 0
ERROR: Opcode 0x308000d: Alloc'd Total = 1, Dealloc'd Total = 0
ERROR: Opcode 0x3080011: Alloc'd Total = 1, Dealloc'd Total = 0
ERROR: Opcode 0x40a0000: Alloc'd Total = 4, Dealloc'd Total = 0
ERROR: Opcode 0x40a0001: Alloc'd Total = 74, Dealloc'd Total = 0
ERROR: Opcode 0x40a0002: Alloc'd Total = 5, Dealloc'd Total = 0
ERROR: Opcode 0x40a0003: Alloc'd Total = 937, Dealloc'd Total = 0
ERROR: Opcode 0xff040001: Alloc'd Total = 4, Dealloc'd Total = 0
Memsize: Total:0 HWM:6,296,989 17914 allocations, 17914 frees 0 possible leaks
Pid 248605, 98625 lines total, 20744 trace (21.03%)
Parsed 98625 lines of logs
Most common logging locations
Logging used 6642 times at src/common/tse.c:397 (6.7%)
Logging used 6642 times at src/common/tse.c:518 (6.7%)
Logging used 2934 times at src/object/cli_obj.c:58 (3.0%)
Logging used 1918 times at src/common/tse.c:951 (1.9%)
Logging used 1918 times at src/common/tse.c:253 (1.9%)
Logging used 1726 times at src/object/obj_rpc.c:1251 (1.8%)
Logging used 1482 times at src/cart/crt_rpc.c:515 (1.5%)
Logging used 1482 times at src/cart/crt_rpc.c:534 (1.5%)
Logging used 1482 times at src/cart/crt_rpc.c:557 (1.5%)
Logging used 1482 times at src/cart/crt_rpc.c:1482 (1.5%)
Most common facilities
object: 24347 (24.7%)
client: 24025 (24.4%)
rpc: 23934 (24.3%)
hg: 13476 (13.7%)
grp: 4300 (4.4%)
common: 3981 (4.0%)
mem: 1840 (1.9%)
misc: 1015 (1.0%)
kv: 864 (0.9%)
crt: 309 (0.3%)
Most common levels
DBUG: 98579 (100.0%)
INFO: 46 (0.0%)
[ncmurphy-dev:253015:0:253015] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x40)
==== backtrace (tid: 253015) ====
 0  /opt/daos/prereq/release/ucx/lib64/libucs.so.0(ucs_handle_error+0x294) [0x7f8b62d6ee84]
 1  /opt/daos/prereq/release/ucx/lib64/libucs.so.0(+0x2d03c) [0x7f8b62d6f03c]
 2  /opt/daos/prereq/release/ucx/lib64/libucs.so.0(+0x2d2e8) [0x7f8b62d6f2e8]
 3  /lib64/libpthread.so.0(+0x12d20) [0x7f8b8bcd8d20]
 4  /opt/daos/lib64/python3.6/site-packages/pydaos/../../../../lib64/libgurt.so.4(d_hash_table_traverse+0x16) [0x7f8b672d9b96]
 5  /opt/daos/lib64/python3.6/site-packages/pydaos/../../../../lib64/libgurt.so.4(d_hhash_traverse+0x34) [0x7f8b672dc2c4]
 6  /opt/daos/lib64/python3.6/site-packages/pydaos/../../../../lib64/libdaos.so.2(daos_reinit+0x22) [0x7f8b84591862]
 7  /opt/daos/lib64/python3.6/site-packages/pydaos/pydaos_shim.so(+0x2709) [0x7f8b8494e709]
 8  /lib64/libc.so.6(+0x96430) [0x7f8b8b1fc430]
 9  /lib64/libc.so.6(__libc_fork+0x105) [0x7f8b8b26e1f5]
10  /usr/lib64/python3.6/lib-dynload/_posixsubprocess.cpython-36m-x86_64-linux-gnu.so(+0x2926) [0x7f8b88020926]
11  /lib64/libpython3.6m.so.1.0(+0x19c057) [0x7f8b8c082057]
12  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
13  /lib64/libpython3.6m.so.1.0(+0xfa426) [0x7f8b8bfe0426]
14  /lib64/libpython3.6m.so.1.0(+0x17a0f0) [0x7f8b8c0600f0]
15  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
16  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
17  /lib64/libpython3.6m.so.1.0(+0xfa426) [0x7f8b8bfe0426]
18  /lib64/libpython3.6m.so.1.0(_PyFunction_FastCallDict+0x554) [0x7f8b8bfe13f4]
19  /lib64/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x70e) [0x7f8b8bfe1d9e]
20  /lib64/libpython3.6m.so.1.0(+0x10e040) [0x7f8b8bff4040]
21  /lib64/libpython3.6m.so.1.0(+0x1888e1) [0x7f8b8c06e8e1]
22  /lib64/libpython3.6m.so.1.0(+0x13adb6) [0x7f8b8c020db6]
23  /lib64/libpython3.6m.so.1.0(PyObject_Call+0x4b) [0x7f8b8bfe8b6b]
24  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x236b) [0x7f8b8c084e1b]
25  /lib64/libpython3.6m.so.1.0(+0xfa426) [0x7f8b8bfe0426]
26  /lib64/libpython3.6m.so.1.0(+0x17a0f0) [0x7f8b8c0600f0]
27  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
28  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x10ca) [0x7f8b8c083b7a]
29  /lib64/libpython3.6m.so.1.0(+0xf9ab4) [0x7f8b8bfdfab4]
30  /lib64/libpython3.6m.so.1.0(+0x17a0f0) [0x7f8b8c0600f0]
31  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
32  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x10ca) [0x7f8b8c083b7a]
33  /lib64/libpython3.6m.so.1.0(+0xfa426) [0x7f8b8bfe0426]
34  /lib64/libpython3.6m.so.1.0(+0x17a0f0) [0x7f8b8c0600f0]
35  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
36  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
37  /lib64/libpython3.6m.so.1.0(+0xfa426) [0x7f8b8bfe0426]
38  /lib64/libpython3.6m.so.1.0(+0x17a0f0) [0x7f8b8c0600f0]
39  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
40  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x10ca) [0x7f8b8c083b7a]
41  /lib64/libpython3.6m.so.1.0(+0x179f08) [0x7f8b8c05ff08]
42  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
43  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
44  /lib64/libpython3.6m.so.1.0(+0x179f08) [0x7f8b8c05ff08]
45  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
46  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
47  /lib64/libpython3.6m.so.1.0(+0x179f08) [0x7f8b8c05ff08]
48  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
49  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
50  /lib64/libpython3.6m.so.1.0(+0xf9ab4) [0x7f8b8bfdfab4]
51  /lib64/libpython3.6m.so.1.0(PyEval_EvalCode+0x23) [0x7f8b8bfe0e53]
52  /lib64/libpython3.6m.so.1.0(+0x208c62) [0x7f8b8c0eec62]
53  /lib64/libpython3.6m.so.1.0(PyRun_FileExFlags+0x97) [0x7f8b8bfc01e9]
54  /lib64/libpython3.6m.so.1.0(PyRun_SimpleFileExFlags+0x389) [0x7f8b8bfc53d3]
55  /lib64/libpython3.6m.so.1.0(+0xdfc4d) [0x7f8b8bfc5c4d]
56  python3(main+0x116) [0x557385351b96]
57  /lib64/libc.so.6(__libc_start_main+0xe5) [0x7f8b8b1a07e5]
58  python3(_start+0x2e) [0x557385351d1e]
=================================
Signal received.  Caught interrupt; shutting down
flushing all open local pool handles on shutdown
DEBUG 2024/09/27 19:28:09.182414 start.go:187: shutdown complete in 189.921µs
fi   DBUG src/gurt/fault_inject.c:645 d_fault_inject_fini() Finalized.
DEBUG 2024/09/27 19:28:09.182687 drpc_server.go:66: Quitting listener
rc from agent is 0
running ['/opt/daos/bin/dmg', '--insecure', 'system', 'stop']
[ncmurphy-dev:253024:0:253024] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x40)
==== backtrace (tid: 253024) ====
 0  /opt/daos/prereq/release/ucx/lib64/libucs.so.0(ucs_handle_error+0x294) [0x7f8b62d6ee84]
 1  /opt/daos/prereq/release/ucx/lib64/libucs.so.0(+0x2d03c) [0x7f8b62d6f03c]
 2  /opt/daos/prereq/release/ucx/lib64/libucs.so.0(+0x2d2e8) [0x7f8b62d6f2e8]
 3  /lib64/libpthread.so.0(+0x12d20) [0x7f8b8bcd8d20]
 4  /opt/daos/lib64/python3.6/site-packages/pydaos/../../../../lib64/libgurt.so.4(d_hash_table_traverse+0x16) [0x7f8b672d9b96]
 5  /opt/daos/lib64/python3.6/site-packages/pydaos/../../../../lib64/libgurt.so.4(d_hhash_traverse+0x34) [0x7f8b672dc2c4]
 6  /opt/daos/lib64/python3.6/site-packages/pydaos/../../../../lib64/libdaos.so.2(daos_reinit+0x22) [0x7f8b84591862]
 7  /opt/daos/lib64/python3.6/site-packages/pydaos/pydaos_shim.so(+0x2709) [0x7f8b8494e709]
 8  /lib64/libc.so.6(+0x96430) [0x7f8b8b1fc430]
 9  /lib64/libc.so.6(__libc_fork+0x105) [0x7f8b8b26e1f5]
10  /usr/lib64/python3.6/lib-dynload/_posixsubprocess.cpython-36m-x86_64-linux-gnu.so(+0x2926) [0x7f8b88020926]
11  /lib64/libpython3.6m.so.1.0(+0x19c057) [0x7f8b8c082057]
12  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
13  /lib64/libpython3.6m.so.1.0(+0xfa426) [0x7f8b8bfe0426]
14  /lib64/libpython3.6m.so.1.0(+0x17a0f0) [0x7f8b8c0600f0]
15  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
16  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
17  /lib64/libpython3.6m.so.1.0(+0xfa426) [0x7f8b8bfe0426]
18  /lib64/libpython3.6m.so.1.0(_PyFunction_FastCallDict+0x554) [0x7f8b8bfe13f4]
19  /lib64/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x70e) [0x7f8b8bfe1d9e]
20  /lib64/libpython3.6m.so.1.0(+0x10e040) [0x7f8b8bff4040]
21  /lib64/libpython3.6m.so.1.0(+0x1888e1) [0x7f8b8c06e8e1]
22  /lib64/libpython3.6m.so.1.0(+0x13adb6) [0x7f8b8c020db6]
23  /lib64/libpython3.6m.so.1.0(PyObject_Call+0x4b) [0x7f8b8bfe8b6b]
24  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x236b) [0x7f8b8c084e1b]
25  /lib64/libpython3.6m.so.1.0(+0xfa426) [0x7f8b8bfe0426]
26  /lib64/libpython3.6m.so.1.0(+0x17a0f0) [0x7f8b8c0600f0]
27  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
28  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x10ca) [0x7f8b8c083b7a]
29  /lib64/libpython3.6m.so.1.0(+0x179f08) [0x7f8b8c05ff08]
30  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
31  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
32  /lib64/libpython3.6m.so.1.0(+0x179f08) [0x7f8b8c05ff08]
33  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
34  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
35  /lib64/libpython3.6m.so.1.0(_PyFunction_FastCallDict+0x122) [0x7f8b8bfe0fc2]
36  /lib64/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x70e) [0x7f8b8bfe1d9e]
37  /lib64/libpython3.6m.so.1.0(+0x10e040) [0x7f8b8bff4040]
38  /lib64/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x6ec) [0x7f8b8bfe1d7c]
39  /lib64/libpython3.6m.so.1.0(PyObject_CallFunctionObjArgs+0xe8) [0x7f8b8c003178]
40  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x2913) [0x7f8b8c0853c3]
41  /lib64/libpython3.6m.so.1.0(+0x179f08) [0x7f8b8c05ff08]
42  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
43  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
44  /lib64/libpython3.6m.so.1.0(+0x179f08) [0x7f8b8c05ff08]
45  /lib64/libpython3.6m.so.1.0(+0x19c2f7) [0x7f8b8c0822f7]
46  /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x498) [0x7f8b8c082f48]
47  /lib64/libpython3.6m.so.1.0(+0xf9ab4) [0x7f8b8bfdfab4]
48  /lib64/libpython3.6m.so.1.0(PyEval_EvalCode+0x23) [0x7f8b8bfe0e53]
49  /lib64/libpython3.6m.so.1.0(+0x208c62) [0x7f8b8c0eec62]
50  /lib64/libpython3.6m.so.1.0(PyRun_FileExFlags+0x97) [0x7f8b8bfc01e9]
51  /lib64/libpython3.6m.so.1.0(PyRun_SimpleFileExFlags+0x389) [0x7f8b8bfc53d3]
52  /lib64/libpython3.6m.so.1.0(+0xdfc4d) [0x7f8b8bfc5c4d]
53  python3(main+0x116) [0x557385351b96]
54  /lib64/libc.so.6(__libc_start_main+0xe5) [0x7f8b8b1a07e5]
55  python3(_start+0x2e) [0x557385351d1e]
=================================
CompletedProcess(args=['/opt/daos/bin/dmg', '--insecure', 'system', 'stop'], returncode=-11, stdout=b'', stderr=b'')
CompletedProcess(args=['/opt/daos/bin/dmg', '--insecure', 'system', 'stop'], returncode=-11, stdout=b'', stderr=b'')
CompletedProcess(args=['/opt/daos/bin/dmg', '--insecure', 'system', 'stop'], returncode=-11, stdout=b'', stderr=b'')
AssertionError(CompletedProcess(args=['/opt/daos/bin/dmg', '--insecure', 'system', 'stop'], returncode=-11, stdout=b'', stderr=b''),)
Closed JSON file nlt-errors.json with 2 errors
Traceback (most recent call last):
  File "utils/node_local_test.py", line 6368, in run
    test_pydaos_kv_obj_class(server, conf)
  File "utils/node_local_test.py", line 5302, in test_pydaos_kv_obj_class
    cont = create_cont(conf, pool, ctype="PYTHON", label='pydaos_cont')
  File "utils/node_local_test.py", line 1811, in create_cont
    rc = _create_cont()
  File "utils/node_local_test.py", line 1807, in _create_cont
    cwd=cwd)
  File "utils/node_local_test.py", line 1746, in run_daos_cmd
    rc.json = json.loads(rc.stdout.decode('utf-8'))
  File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "utils/node_local_test.py", line 6561, in <module>
    main()
  File "utils/node_local_test.py", line 6543, in main
    fatal_errors = run(wf, args)
  File "utils/node_local_test.py", line 6377, in run
    fatal_errors.add_result(server.set_fi())
  File "utils/node_local_test.py", line 603, in __exit__
    rc = self._stop(self.wf)
  File "utils/node_local_test.py", line 931, in _stop
    assert rc.returncode == 0, rc
AssertionError: CompletedProcess(args=['/opt/daos/bin/dmg', '--insecure', 'system', 'stop'], returncode=-11, stdout=b'', stderr=b'')
Exception ignored in: <bound method DaosServer.__del__ of <__main__.DaosServer object at 0x7f8b8a947cc0>>
Traceback (most recent call last):
  File "utils/node_local_test.py", line 621, in __del__
  File "utils/node_local_test.py", line 889, in _stop
NameError: name 'open' is not defined
Closed JSON file nlt-server-leaks.json with 1 errors

@ashleypittman
Copy link
Contributor Author

That Ubuntu stack trace is exactly what I'd expect, the code is still failing on Ubuntu but at least it's not clear why that is. The second stack includes daos_reinit which I know is an area of change in master to work across fork so I'm not worried about that - but it is an indication that we shoud test pydaos on that distro again.

@techbasset
Copy link
Contributor

techbasset commented Oct 3, 2024

That Ubuntu stack trace is exactly what I'd expect, the code is still failing on Ubuntu but at least it's not clear why that is. The second stack includes daos_reinit which I know is an area of change in master to work across fork so I'm not worried about that - but it is an indication that we shoud test pydaos on that distro again.

I'm confused what you want to do about the Rocky errors (which I still see, including if I cherry-pick commits to the google/2.6 branch)? They're going to crop up if we try to deploy the script to our local presubmits...can we either fix the errors or disable the relevant check and file a bug to re-enable? Or explain more if there's something I'm missing?

FYI: I'm running on Rocky 8.10.

@ashleypittman
Copy link
Contributor Author

I'm confused what you want to do about the Rocky errors (which I still see, including if I cherry-pick commits to the google/2.6 branch)? They're going to crop up if we try to deploy the script to our local presubmits...can we either fix the errors or disable the relevant check and file a bug to re-enable? Or explain more if there's something I'm missing?

daos_reinit is a new function that was added by @johannlombardi last week as part of #15125 From the stack trace it's calling something which is segfaulting, but this isn't something that we see on our Rocky testing although we may be using an older release. Does the tip of master on 8.10 also fail for you in the same way or are you saying this PR introduces it because code-wise this PR only changes one debug line. I think this probably warrants a ticket on it's own, and possibly a second one to see if our CI infrastructure needs updating.

@techbasset
Copy link
Contributor

I'm confused what you want to do about the Rocky errors (which I still see, including if I cherry-pick commits to the google/2.6 branch)? They're going to crop up if we try to deploy the script to our local presubmits...can we either fix the errors or disable the relevant check and file a bug to re-enable? Or explain more if there's something I'm missing?

daos_reinit is a new function that was added by @johannlombardi last week as part of #15125 From the stack trace it's calling something which is segfaulting, but this isn't something that we see on our Rocky testing although we may be using an older release. Does the tip of master on 8.10 also fail for you in the same way or are you saying this PR introduces it because code-wise this PR only changes one debug line. I think this probably warrants a ticket on it's own, and possibly a second one to see if our CI infrastructure needs updating.

Ah ha...I just discovered I was running with python 3.6 and it seems happier if I upgrade to 3.9. If I was running a too old, unsupported python version, my bad!

Copy link
Contributor

@techbasset techbasset left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works on Ubuntu 22.04 and Rocky 8 for me. I was getting errors on the latter before upgrading from python 3.6 to python 3.9.

@jolivier23 jolivier23 merged commit f49f041 into master Oct 3, 2024
55 checks passed
@jolivier23 jolivier23 deleted the amd/nlt-il-stats branch October 3, 2024 21:22
@ashleypittman
Copy link
Contributor Author

Ah ha...I just discovered I was running with python 3.6 and it seems happier if I upgrade to 3.9. If I was running a too old, unsupported python version, my bad!

Please do keep looking at this as we still expect python 3.6 to work, at the very least file a ticket or speak to Johann about it.

@techbasset
Copy link
Contributor

Ah ha...I just discovered I was running with python 3.6 and it seems happier if I upgrade to 3.9. If I was running a too old, unsupported python version, my bad!

Please do keep looking at this as we still expect python 3.6 to work, at the very least file a ticket or speak to Johann about it.

https://daosio.atlassian.net/browse/DAOS-16658

techbasset pushed a commit that referenced this pull request Oct 15, 2024
Use the ioil statistics to verify read/write counts rather
than just checking for function name.

Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
@mjmac mjmac mentioned this pull request Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants