Skip to content

[BUG] Weird Swank Crash, Possibly Unrelated #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adlai opened this issue Nov 29, 2024 · 4 comments
Open

[BUG] Weird Swank Crash, Possibly Unrelated #8

adlai opened this issue Nov 29, 2024 · 4 comments

Comments

@adlai
Copy link
Owner

adlai commented Nov 29, 2024

While recovering from indeterminate state of an image containing a long-running ScalpL cohort, one of Swank's threads stopped providing context information; the subsequent interruption from Emacs was not properly handled, leading to a disconnection from SLIME and ultimately contributing to loss of the image.

To Reproduce

Due to the indeterminate state reached by multiple killings and restartings of thread pool tasks, it is not obvious how to recreate the environment where an Emacs interruption caused this crash.

Expected behavior

Emacs interruptions should launch SLDB within a new buffer without losing connection to SLIME or killing any threads.

Error Log (taken from *standard-output* after SLIME disconnection)

;; swank:close-connection:
;;  Interrupt thread failed:
;;   thread #<THREAD "ChanL Old Worker" ABORTED {103FFC0153}> has exited.
;; Event history start:
dispatch-event: (:DEBUG-ACTIVATE 39 1 NIL)
encode-message
decode-message
dispatch-event:
(:EMACS-REX (SWANK:INVOKE-NTH-RESTART-FOR-EMACS 1 0) "SCALPL.QD" 39 1492)
send-event: #<SB-THREAD:THREAD "repl-thread" RUNNING {10016B0233}>
(:EMACS-REX (SWANK:INVOKE-NTH-RESTART-FOR-EMACS 1 0) "SCALPL.QD" 1492)
dispatch-event:
(:RETURN #<SB-THREAD:THREAD "repl-thread" RUNNING {10016B0233}>
 (:ABORT "NIL") 1492)
dispatch-event:
(:RETURN #<SB-THREAD:THREAD "worker" RUNNING {100DF5FE33}>
 (:OK (:TITLE "#<CACHING-GATE {10217E34B3}>"
 :ID 96 :CONTENT (("Class: "
                  (:VALUE "#<STANDARD-CLASS SCALPL.KRAKEN:CACHING-GATE>" 97) "
" "--------------------" "
" " Group slots by inheritance " (:ACTION "[ ]" 60) "
" " Sort slots alphabetically  " (:ACTION "[X]" 61) "
" "
" "All Slots:" "
" (:ACTION "[ ]" 62) "  " (:VALUE "ABBREV" 98) "          "
 " = " (:VALUE "NIL" 99) "
" (:ACTION "[ ]" 63) "  " (:VALUE "CACHE" 100) "           "
 " = " (:VALUE "@70=((\"TradesHistory\" (\"ofs\" . \"0\")
                      (\"start\" . \"TONQYM-6ESLP-H25J5W\")) ..)" 101) "
" (:ACTION "[ ]" 64) "  " (:VALUE "CONTROL" 102) "         "
 " = " (:VALUE "#<CHANNEL {1001587393}>" 103) "
" (:ACTION "[ ]" 65) "  " (:VALUE "DELEGATES" 104) "       "
 " = " (:VALUE "NIL" 105) "
" (:ACTION "[ ]" 66) "  " (:VALUE "EXCHANGE" 106) "        "
 " = " (:VALUE "#<KRAKEN>" 107) "
" (:ACTION "[ ]" 67) "  " (:VALUE "INPUT" 108) "           "
 " = " (:VALUE "#<CHANNEL {1021CC5F43}>" 109) "
" (:ACTION "[ ]" 68) "  " (:VALUE "NAME" 110) "            "
 " = " (:VALUE "\"RBh\"" 111) "
" (:ACTION "[ ]" 69) "  " (:VALUE "PUBKEY" 112) "          "
 " = " (:VALUE "\"REDACTEDLOLNICETRY\"" 113) "
" (:ACTION "[ ]" 70) "  " (:VALUE "RECENT-RESPONSES" 114) ""
 " = " (:VALUE "@2=#<HASH-TABLE :TEST EQUAL :COUNT 3 {10024F3403}>" 115) "
" (:ACTION "[ ]" 71) "  " (:VALUE "SECRET" 116) "          "
 " = " (:VALUE "REDACTEDLOLNICETRY" 117) "
" (:ACTION "[ ]" 72) "  " (:VALUE "TASKS" 118) "           "
 " = " (:VALUE "(#<TASK 23:4:8 RBh [ALIVE] {103F2246E3}>
                 #<TASK 17:42:13 RBh [PENDING] {100B19CB53}>)" 119) "
" "
" (:ACTION "[set value]" 73) "  " (:ACTION "[make unbound]" 74) "
") 96 0 500))) 1488)
encode-message
dispatch-event: (:DEBUG-RETURN 39 1 NIL)
encode-message
decode-message
dispatch-event: (:EMACS-REX (SWANK-REPL:LISTENER-EVAL
                             "(scalpl.navel::report-net-activity nil)
") "SCALPL.QD" :REPL-THREAD 1493)
send-event: #<SB-THREAD:THREAD "repl-thread" RUNNING {10016B0233}>
(:EMACS-REX (SWANK-REPL:LISTENER-EVAL "(scalpl.navel::report-net-activity nil)
") "SCALPL.QD" 1493)
dispatch-event: (:WRITE-STRING "
" NIL 39)
encode-message
decode-message
dispatch-event: (:WRITE-DONE 39)
send-event: #<SB-THREAD:THREAD "repl-thread" RUNNING {10016B0233}> (:WRITE-DONE)
wait-for-event: (:WRITE-DONE) NIL
dispatch-event: (:PRESENTATION-START 245 :REPL-RESULT)
encode-message
dispatch-event: (:PING 39 79)
encode-message
wait-for-event: (:EMACS-PONG 79) NIL
decode-message
dispatch-event: (:EMACS-PONG 39 79)
send-event: #<SB-THREAD:THREAD "repl-thread" RUNNING {10016B0233}>
(:EMACS-PONG 79)
dispatch-event:
(:WRITE-STRING "\"Net effect of 236 trades since Thu@22:05: $267.33
\"" :REPL-RESULT)
encode-message
dispatch-event: (:PRESENTATION-END 245 :REPL-RESULT)
dispatch-event:
(:RETURN #<SB-THREAD:THREAD "worker" FINISHED values: T {10167E4683}>
 (:OK (:COMPILATION-RESULT NIL T 0.068 NIL NIL)) 1491)
encode-message
dispatch-event: (:WRITE-STRING "
" :REPL-RESULT)
encode-message
dispatch-event: (:RETURN #<SB-THREAD:THREAD "repl-thread" RUNNING {10016B0233}>
                 (:OK NIL) 1493)
encode-message
dispatch-event: (:EMACS-INTERRUPT T)
wait-for-event: (COMMON-LISP:OR (:EMACS-REX . SWANK::_) (:SLDB-RETURN 2)) NIL
interrupt-worker-thread: T #<THREAD "ChanL Old Worker" ABORTED {103FFC0153}>
close-connection:
 Interrupt thread failed:
  thread #<THREAD "ChanL Old Worker" ABORTED {103FFC0153}> has exited. ...
;; Event history end.
;; Backtrace:
0: (SB-THREAD:INTERRUPT-THREAD
    #<SB-THREAD:THREAD "ChanL Old Worker" ABORTED {103FFC0153}>
    #<FUNCTION (LAMBDA NIL :IN SWANK::QUEUE-THREAD-INTERRUPT) {100E470BDB}>)
1: (SWANK::DISPATCH-LOOP #<SWANK::MULTITHREADED-CONNECTION {102668E753}>)
2: (SWANK::CONTROL-THREAD #<SWANK::MULTITHREADED-CONNECTION {102668E753}>)
3: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
4: ((FLET "WITHOUT-INTERRUPTS-BODY-11" :IN SB-THREAD::RUN))
5: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
6: ((FLET "WITHOUT-INTERRUPTS-BODY-4" :IN SB-THREAD::RUN))
7: (SB-THREAD::RUN)
8: ("foreign function: call_into_lisp_")
9: ("foreign function: funcall1")
;; Connection to Emacs lost. [
;;  condition: Interrupt thread failed:
;;   thread #<THREAD "ChanL Old Worker" ABORTED {103FFC0153}> has exited.
;;  type: SB-THREAD:INTERRUPT-THREAD-ERROR
;;  style: :SPAWN]

Relevant System Information:

  • Hardware: x86_64 SMP
  • OS: Debian
  • Compiler: SBCL
  • Version: 2.2.9.debian
  • Thread Pool: old-thread-pool

edit - emphasized relevant aspect of Hardware information

@adlai
Copy link
Owner Author

adlai commented Dec 11, 2024

wait-for-event: (COMMON-LISP:OR (:EMACS-REX . SWANK::_) (:SLDB-RETURN 2)) NIL
interrupt-worker-thread: T #<THREAD "ChanL Old Worker" ABORTED {103FFC0153}>
close-connection:
Interrupt thread failed:
thread #<THREAD "ChanL Old Worker" ABORTED {103FFC0153}> has exited. ...
;; Event history end.
;; Backtrace:
0: (SB-THREAD:INTERRUPT-THREAD
#<SB-THREAD:THREAD "ChanL Old Worker" ABORTED {103FFC0153}>
#<FUNCTION (LAMBDA NIL :IN SWANK::QUEUE-THREAD-INTERRUPT) {100E470BDB}>)

I don't think it's a ChanL bug anymore; probably reproducible in vanilla SLIME connected to multithreaded Swank without anything beyond the minimal testcase.

@fstamour
Copy link

I read very quickly, but my first guess is that:

  • while slime is showing the debugger's UI,
  • the thread being debugged gets aborted
  • the user chose a restart
  • slime evals something that uses thread-interupt on the thread that was already aborted
    • swank surely keeps a handle on the thread that is currently "stuck" in the debugger

I probably had a very similar issue when I wrote this code: https://github.com/fstamour/breeze/blob/268cd0eb872e54f2c7091f36fd10106e81719396/src/command.lisp#L230

@fstamour
Copy link

Any idea which version of:

  • swank
  • slime
  • sbcl

@adlai
Copy link
Owner Author

adlai commented Dec 19, 2024

Any idea which version of:

  • swank
  • slime

both loaded by quicklisp-slime-helper; *swank-wire-protocol-version* = "2.28"

  • sbcl

2.2.9.debian (apt show sbcl reports 2:2.2.9-1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants