-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Fix a bug with cancelling "attach -w" after you have run a process previously #65822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…eviously. The problem is that the when the "attach" command is initiated, the ExecutionContext for the command has a process - it's the exited one from the previour run. But the `attach wait` creates a new process for the attach, and then errors out instead of interrupting when it finds that its process and the one in the command's ExecutionContext don't match. This change checks that if we're returning a target from GetExecutionContext, we fill the context with it's current process, not some historical one.
ExecutionContext exe_ctx; | ||
if (!m_overriden_exe_contexts.empty()) { | ||
// During the course of a command, the target may have replaced the process | ||
// coming in with another. If fix that here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If fix that here" -> is there a word missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, not quite. "If" -> "I".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea seems good to me. I have similar concerns to Adrian but overall I think I'm on board with this idea. I'm kind of confused why we don't destroy the old Process object after it's exited though, wouldn't properly cleaning up after ourselves solve a lot of this issue? Or is there something I don't understand about the lifecycle of Process?
: m_debugger.GetSelectedExecutionContext(); | ||
ExecutionContext CommandInterpreter::GetExecutionContext() { | ||
ExecutionContext exe_ctx; | ||
if (!m_overriden_exe_contexts.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion/Bikeshedding: Flip the condition and you can reduce one level of indentation in this method. That's just my preference though, doesn't make a difference either way.
This is in reply to Alex's comment about why we keep the process around, but the quote this reply didn't seem to work... After a process exits you might still want to ask it what its exit status was. Plus there might be other system accounting we've gathered for that process. So until you replace it with another process, you very well might want to ask questions of it. So we can't just destroy it as soon as it exits. |
That definitely explains why we persist the previous process even after its finished, but this doesn't sound like there's a technical limitation to actually removing the Process from the Target after it exits. To be more specific, you could either make it so the Process object in That might be an alternative to this solution but I'm not sure if that would be any better or more useful than what you're doing here. I'm mostly thinking aloud here (and trying to improve my knowledge of LLDB). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A different way to approach this fix is to just fix ExecutionContext so that this can never be out of date by removing the "m_process_sp" member variable. Is is not great that we can an execution context that has an older process, and would we ever want an ExecutionContext to be able to have a mismatched process? If we remove m_process_sp, we can always get the process from the m_target_sp and then we can never get it wrong that way and we might avoid some of the changes in this patch? Lemme know what you think as It would be good to make sure our ExecutionContext variables can't get the wrong info.
It seems a bit weird to have an Execution context with a thread and a stack frame but not the process they belong to. I don't know that it would really help either. Even if you removed the process so that you no longer had to check for that being invalid, if a thread or frame had been set in the execution context, you'd still have to check for them being from the wrong process. So I don't think we would made things significantly simpler by leaving out the process.
I think it's easier to deal with it when we hand them out.
Jim
… On Sep 9, 2023, at 11:57 AM, Greg Clayton ***@***.***> wrote:
@clayborg commented on this pull request.
A different way to approach this fix is to just fix ExecutionContext so that this can never be out of date by removing the "m_process_sp" member variable. Is is not great that we can an execution context that has an older process, and would we ever want an ExecutionContext to be able to have a mismatched process? If we remove m_process_sp, we can always get the process from the m_target_sp and then we can never get it wrong that way and we might avoid some of the changes in this patch? Lemme know what you think as It would be good to make sure our ExecutionContext variables can't get the wrong info.
—
Reply to this email directly, view it on GitHub <#65822 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW5VHULVKS5IW4BFKQ3XZS3ZTANCNFSM6AAAAAA4RAWCQY>.
You are receiving this because you authored the thread.
|
I'm not sure this would make things easier. The model where you just ask info of the dead process till you make another one is very straightforward. Also, the window where this is likely to cause problems is very small, and all internal to launching and attaching, so making the handling of the old process more complex to solve this problem seems like overkill. |
Or we can have ExecutionContext have a single element so that it is always correct:
Since target, proces, thread and frames all inherit from ExecutionContextScope, all we need to do is store one item. The ExecutionContextScope API has:
So then everything would always be correct. |
So if we use the ExecutionContextScope and init a ExectionContext with a target, it can calculate the process and target. If you init it with a thread, it can calculate the thread, process and target. And if you init it with a frame, it can get the frame, thread, process and target. |
The only time this will be incorrect in this way is if you make an execution context filling in the current target & process and then hold onto it over the process in the target changing. Internally, we only hold onto execution context's while executing a command, and we return that to ourselves in the API I fixed. So we really don't have to fix this anywhere else to solve actual problems. This seems like a much wider ranging change than is required to fix this bug. I'm all for rethinking the ExecutionContextScope/ExecutionContext/ExecutionContextRef nexus, but I don't think that's a trivial rethink, and I don't have the time to undertake that right now. |
My only point in mentioning this is many of the changed files in your patch goes away if we fix ExecutionContext. Change look fine to me if we don't want to change ExecutionContext, but i will let others do the final accept. |
It looks like this is breaking buildbots: The test can't find |
Good eye!
The cleanup actually does run but it was told to reset the directory to the "testdir" which is:
build/lldb-test-build.noindex/commands/process/attach/TestProcessAttach.test_attach_to_process_from_different_dir_by_id
but the tests are supposed to start in the source directory. I fixed that in:
74338bf
teach me to just run the test I added!
Jim
… On Sep 19, 2023, at 3:05 PM, Daniel Thornburgh ***@***.***> wrote:
It looks like this is breaking buildbots:
https://lab.llvm.org/buildbot/#/builders/68/builds/60267
The test can't find main.cpp, and it looks like the immediately preceding test issues an os.chdir. It has a hook to reset it, but maybe there's some problem with it?
—
Reply to this email directly, view it on GitHub <#65822 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW7ITHEIFWRJLWQS7VLX3IJLVANCNFSM6AAAAAA4RAWCQY>.
You are receiving this because you modified the open/close state.
|
…ocess previously (#65822)" This reverts commit 7265f79. The new test case is flaky on Linux AArch64 (https://lab.llvm.org/buildbot/#/builders/96) and more flaky on Windows on Arm (https://lab.llvm.org/buildbot/#/builders/219/builds/5735).
This is failing on AArch64 Linux and Windows, it finds some async error in the output but the bots don't say what. I'll find what it is. |
break | ||
|
||
self.dbg.DispatchInputInterrupt() | ||
self.dbg.DispatchInputInterrupt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this done twice?
if "Cancelled async attach" in line: | ||
found_result = True | ||
break | ||
self.assertTrue(found_result, "Found async error in results") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails if results
is empty.
********************************
[]
********************************
FAIL: LLDB (/home/david.spickett/build-llvm-aarch64/bin/clang-aarch64) :: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
<bound method SBProcess.Kill of SBProcess: pid = 0, state = exited, threads = 0, executable = ProcessAttach>: success
Restore dir to: /home/david.spickett/build-llvm-aarch64
======================================================================
FAIL: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/david.spickett/llvm-project/lldb/test/API/commands/process/attach/TestProcessAttach.py", line 195, in test_run_then_attach_wait_interrupt
self.assertTrue(found_result, "Found async error in results")
AssertionError: False is not True : Found async error in results
Config=aarch64-/home/david.spickett/build-llvm-aarch64/bin/clang
----------------------------------------------------------------------
When the test works the results contain:
********************************
['error: attach failed: Cancelled async attach.\n', '\n', '... Interrupted.\n']
********************************
Running it in a loop it took ~40 runs to get a failing one.
I wonder if that is because the attach happens to finish a lot faster sometimes, so there's no time to cancel it? If that's down to the OS and machine load, I'm not sure how we'd make this predictable.
The ugly option is to say if the results are empty, pass the test and assume the other 39 runs will check for the bug.
The way interruption works in lldb is that the first time you dispatch an interrupt, it just raises the "voluntary interrupt" flag. Turns out the process attach part of lldb isn't written in a way that makes checking for the interrupt flag possible, so that doesn't work. The second time you issue the interrupt, since we've already raised the Debugger interrupt flag, we send the Process level interrupt to any running processes. That's the one that actually interrupts the attach attempt.
This is a variant of the scheme gdb used (at least back in the day), where the first ^C tries to interrupt the current command, and the second ^C tries to interrupt the process connection. In gdb, on the second ^C you would get a prompt "I see you pressed ^C twice, did you really want to interrupt the process". I didn't emulate that part as in my usage I pretty much never answered "no" to this question...
Jim
… On Sep 20, 2023, at 2:16 AM, David Spickett ***@***.***> wrote:
@DavidSpickett commented on this pull request.
In lldb/test/API/commands/process/attach/TestProcessAttach.py <#65822 (comment)>:
> + # No need to track the output
+ self.stdout_path = self.getBuildArtifact("stdout.txt")
+ self.out_filehandle = open(self.stdout_path, "w")
+ self.dbg.SetOutputFileHandle(self.out_filehandle, False)
+ self.dbg.SetErrorFileHandle(self.out_filehandle, False)
+
+ n_errors, quit_req, crashed = self.dbg.RunCommandInterpreter(
+ True, True, options, 0, False, False)
+
+ while 1:
+ time.sleep(1)
+ if target.process.state == lldb.eStateAttaching:
+ break
+
+ self.dbg.DispatchInputInterrupt()
+ self.dbg.DispatchInputInterrupt()
Why is this done twice?
—
Reply to this email directly, view it on GitHub <#65822 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW5Y7TPO7JCFGAH5KH3X3KX7DANCNFSM6AAAAAA4RAWCQY>.
You are receiving this because you modified the open/close state.
|
The way the test works, we run one real process and let it exit. Then we do "attach -w -n noone_would_use_this_name" because we don't want the second attach attempt to be able to succeed. lldb should just stay stuck till the interrupt succeeds, there shouldn't be anything other way to get out of this. It sounds like something else is kicking us out of the attach wait loop on these systems?
Jim
… On Sep 20, 2023, at 2:56 AM, David Spickett ***@***.***> wrote:
@DavidSpickett commented on this pull request.
In lldb/test/API/commands/process/attach/TestProcessAttach.py <#65822 (comment)>:
> + time.sleep(1)
+ if target.process.state == lldb.eStateAttaching:
+ break
+
+ self.dbg.DispatchInputInterrupt()
+ self.dbg.DispatchInputInterrupt()
+
+ self.out_filehandle.flush()
+ reader = open(self.stdout_path, "r")
+ results = reader.readlines()
+ found_result = False
+ for line in results:
+ if "Cancelled async attach" in line:
+ found_result = True
+ break
+ self.assertTrue(found_result, "Found async error in results")
This fails if results is empty.
********************************
[]
********************************
FAIL: LLDB (/home/david.spickett/build-llvm-aarch64/bin/clang-aarch64) :: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
<bound method SBProcess.Kill of SBProcess: pid = 0, state = exited, threads = 0, executable = ProcessAttach>: success
Restore dir to: /home/david.spickett/build-llvm-aarch64
======================================================================
FAIL: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/david.spickett/llvm-project/lldb/test/API/commands/process/attach/TestProcessAttach.py", line 195, in test_run_then_attach_wait_interrupt
self.assertTrue(found_result, "Found async error in results")
AssertionError: False is not True : Found async error in results
Config=aarch64-/home/david.spickett/build-llvm-aarch64/bin/clang
----------------------------------------------------------------------
When the test works the results contain:
********************************
['error: attach failed: Cancelled async attach.\n', '\n', '... Interrupted.\n']
********************************
Running it in a loop it took ~40 runs to get a failing one.
I wonder if that is because the attach happens to finish a lot faster sometimes, so there's no time to cancel it? If that's down to the OS and machine load, I'm not sure how we'd make this predictable.
The ugly option is to say if the results are empty, pass the test and assume the other 39 runs will check for the bug.
—
Reply to this email directly, view it on GitHub <#65822 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW225PO46OIV2JRC2I3X3K4WPANCNFSM6AAAAAA4RAWCQY>.
You are receiving this because you modified the open/close state.
|
You are right, there is a racy bit here - I was also getting ~one fails per 50 runs.
The raciness is due to the time it takes between when we send the interrupt and when we finish up the command and produce the results. So we can end up reading from the command output before the command has written its result.
I can make this more deterministic by waiting for the process state to shift to eStateExited. But even with that it was still a little flakey because there's still some time between when the interrupt is noticed and causes us to switch the process state and when lldb makes the command return and prints that to stdout.
Unfortunately this is a test of command-line behavior, you don't get the same problem with the SB API's directly. So I have to do it as a "run a command-line command" test, and that makes it harder to get direct signals about when to look at the result.
If we sent "command completed" events I could wait for that, then the test would be deterministic. That seems like a feature we should design, not jam in to fix a test problem..
However, if I just insert a time.sleep(1) between seeing the state flip to eStateExited and looking at the results, I can run this #200 and see no failures. That's just the time to back out of the command execution stack and write the result to stdout, so that shouldn't be as variable. If this still ends up failing intermittently, then I'll have to cook up some kind of "command completed" event that the test can wait on.
Jim
… On Sep 20, 2023, at 11:05 AM, Jim Ingham ***@***.***> wrote:
The way the test works, we run one real process and let it exit. Then we do "attach -w -n noone_would_use_this_name" because we don't want the second attach attempt to be able to succeed. lldb should just stay stuck till the interrupt succeeds, there shouldn't be anything other way to get out of this. It sounds like something else is kicking us out of the attach wait loop on these systems?
Jim
> On Sep 20, 2023, at 2:56 AM, David Spickett ***@***.***> wrote:
>
>
> @DavidSpickett commented on this pull request.
>
> In lldb/test/API/commands/process/attach/TestProcessAttach.py <#65822 (comment)>:
>
> > + time.sleep(1)
> + if target.process.state == lldb.eStateAttaching:
> + break
> +
> + self.dbg.DispatchInputInterrupt()
> + self.dbg.DispatchInputInterrupt()
> +
> + self.out_filehandle.flush()
> + reader = open(self.stdout_path, "r")
> + results = reader.readlines()
> + found_result = False
> + for line in results:
> + if "Cancelled async attach" in line:
> + found_result = True
> + break
> + self.assertTrue(found_result, "Found async error in results")
> This fails if results is empty.
>
> ********************************
> []
> ********************************
> FAIL: LLDB (/home/david.spickett/build-llvm-aarch64/bin/clang-aarch64) :: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
> <bound method SBProcess.Kill of SBProcess: pid = 0, state = exited, threads = 0, executable = ProcessAttach>: success
>
> Restore dir to: /home/david.spickett/build-llvm-aarch64
> ======================================================================
> FAIL: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/home/david.spickett/llvm-project/lldb/test/API/commands/process/attach/TestProcessAttach.py", line 195, in test_run_then_attach_wait_interrupt
> self.assertTrue(found_result, "Found async error in results")
> AssertionError: False is not True : Found async error in results
> Config=aarch64-/home/david.spickett/build-llvm-aarch64/bin/clang
> ----------------------------------------------------------------------
> When the test works the results contain:
>
> ********************************
> ['error: attach failed: Cancelled async attach.\n', '\n', '... Interrupted.\n']
> ********************************
> Running it in a loop it took ~40 runs to get a failing one.
>
> I wonder if that is because the attach happens to finish a lot faster sometimes, so there's no time to cancel it? If that's down to the OS and machine load, I'm not sure how we'd make this predictable.
>
> The ugly option is to say if the results are empty, pass the test and assume the other 39 runs will check for the bug.
>
> —
> Reply to this email directly, view it on GitHub <#65822 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW225PO46OIV2JRC2I3X3K4WPANCNFSM6AAAAAA4RAWCQY>.
> You are receiving this because you modified the open/close state.
>
|
Interesting... I split the end stage of the test up into:
(a) wait for up to 20 seconds the state to switch to eStateExited
(b) wait another second, then check the command output file
On macOS, as I said, that seemed to make the test quite robust, but when I checked the test in I got a bot failure at stage (a). That really sounds like the underlying feature of interrupting a process in this state doesn't work reliably on those systems. The patch this is testing just changed a bug in generic code that prevented us from TRYING to do this interrupt, but didn't change any of the underlying process interrupt machinery. So it looks like the test is uncovering underlying flakiness.
I changed the test to be macOS only for now. We should go back and make the feature work reliably on the other platforms when we get a change.
Jim
… On Sep 20, 2023, at 1:14 PM, Jim Ingham ***@***.***> wrote:
You are right, there is a racy bit here - I was also getting ~one fails per 50 runs.
The raciness is due to the time it takes between when we send the interrupt and when we finish up the command and produce the results. So we can end up reading from the command output before the command has written its result.
I can make this more deterministic by waiting for the process state to shift to eStateExited. But even with that it was still a little flakey because there's still some time between when the interrupt is noticed and causes us to switch the process state and when lldb makes the command return and prints that to stdout.
Unfortunately this is a test of command-line behavior, you don't get the same problem with the SB API's directly. So I have to do it as a "run a command-line command" test, and that makes it harder to get direct signals about when to look at the result.
If we sent "command completed" events I could wait for that, then the test would be deterministic. That seems like a feature we should design, not jam in to fix a test problem..
However, if I just insert a time.sleep(1) between seeing the state flip to eStateExited and looking at the results, I can run this #200 and see no failures. That's just the time to back out of the command execution stack and write the result to stdout, so that shouldn't be as variable. If this still ends up failing intermittently, then I'll have to cook up some kind of "command completed" event that the test can wait on.
Jim
> On Sep 20, 2023, at 11:05 AM, Jim Ingham ***@***.***> wrote:
>
> The way the test works, we run one real process and let it exit. Then we do "attach -w -n noone_would_use_this_name" because we don't want the second attach attempt to be able to succeed. lldb should just stay stuck till the interrupt succeeds, there shouldn't be anything other way to get out of this. It sounds like something else is kicking us out of the attach wait loop on these systems?
>
> Jim
>
>
>
>> On Sep 20, 2023, at 2:56 AM, David Spickett ***@***.***> wrote:
>>
>>
>> @DavidSpickett commented on this pull request.
>>
>> In lldb/test/API/commands/process/attach/TestProcessAttach.py <#65822 (comment)>:
>>
>> > + time.sleep(1)
>> + if target.process.state == lldb.eStateAttaching:
>> + break
>> +
>> + self.dbg.DispatchInputInterrupt()
>> + self.dbg.DispatchInputInterrupt()
>> +
>> + self.out_filehandle.flush()
>> + reader = open(self.stdout_path, "r")
>> + results = reader.readlines()
>> + found_result = False
>> + for line in results:
>> + if "Cancelled async attach" in line:
>> + found_result = True
>> + break
>> + self.assertTrue(found_result, "Found async error in results")
>> This fails if results is empty.
>>
>> ********************************
>> []
>> ********************************
>> FAIL: LLDB (/home/david.spickett/build-llvm-aarch64/bin/clang-aarch64) :: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
>> <bound method SBProcess.Kill of SBProcess: pid = 0, state = exited, threads = 0, executable = ProcessAttach>: success
>>
>> Restore dir to: /home/david.spickett/build-llvm-aarch64
>> ======================================================================
>> FAIL: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>> File "/home/david.spickett/llvm-project/lldb/test/API/commands/process/attach/TestProcessAttach.py", line 195, in test_run_then_attach_wait_interrupt
>> self.assertTrue(found_result, "Found async error in results")
>> AssertionError: False is not True : Found async error in results
>> Config=aarch64-/home/david.spickett/build-llvm-aarch64/bin/clang
>> ----------------------------------------------------------------------
>> When the test works the results contain:
>>
>> ********************************
>> ['error: attach failed: Cancelled async attach.\n', '\n', '... Interrupted.\n']
>> ********************************
>> Running it in a loop it took ~40 runs to get a failing one.
>>
>> I wonder if that is because the attach happens to finish a lot faster sometimes, so there's no time to cancel it? If that's down to the OS and machine load, I'm not sure how we'd make this predictable.
>>
>> The ugly option is to say if the results are empty, pass the test and assume the other 39 runs will check for the bug.
>>
>> —
>> Reply to this email directly, view it on GitHub <#65822 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW225PO46OIV2JRC2I3X3K4WPANCNFSM6AAAAAA4RAWCQY>.
>> You are receiving this because you modified the open/close state.
>>
>
|
On Sep 20, 2023, at 1:39 PM, Jim Ingham ***@***.***> wrote:
Interesting... I split the end stage of the test up into:
(a) wait for up to 20 seconds the state to switch to eStateExited
(b) wait another second, then check the command output file
On macOS, as I said, that seemed to make the test quite robust, but when I checked the test in I got a bot failure at stage (a).
That is, a Linux bot failure...
Jim
… That really sounds like the underlying feature of interrupting a process in this state doesn't work reliably on those systems. The patch this is testing just changed a bug in generic code that prevented us from TRYING to do this interrupt, but didn't change any of the underlying process interrupt machinery. So it looks like the test is uncovering underlying flakiness.
I changed the test to be macOS only for now. We should go back and make the feature work reliably on the other platforms when we get a change.
Jim
> On Sep 20, 2023, at 1:14 PM, Jim Ingham ***@***.***> wrote:
>
> You are right, there is a racy bit here - I was also getting ~one fails per 50 runs.
>
> The raciness is due to the time it takes between when we send the interrupt and when we finish up the command and produce the results. So we can end up reading from the command output before the command has written its result.
>
> I can make this more deterministic by waiting for the process state to shift to eStateExited. But even with that it was still a little flakey because there's still some time between when the interrupt is noticed and causes us to switch the process state and when lldb makes the command return and prints that to stdout.
>
> Unfortunately this is a test of command-line behavior, you don't get the same problem with the SB API's directly. So I have to do it as a "run a command-line command" test, and that makes it harder to get direct signals about when to look at the result.
>
> If we sent "command completed" events I could wait for that, then the test would be deterministic. That seems like a feature we should design, not jam in to fix a test problem..
>
> However, if I just insert a time.sleep(1) between seeing the state flip to eStateExited and looking at the results, I can run this #200 and see no failures. That's just the time to back out of the command execution stack and write the result to stdout, so that shouldn't be as variable. If this still ends up failing intermittently, then I'll have to cook up some kind of "command completed" event that the test can wait on.
>
> Jim
>
>
>
>> On Sep 20, 2023, at 11:05 AM, Jim Ingham ***@***.***> wrote:
>>
>> The way the test works, we run one real process and let it exit. Then we do "attach -w -n noone_would_use_this_name" because we don't want the second attach attempt to be able to succeed. lldb should just stay stuck till the interrupt succeeds, there shouldn't be anything other way to get out of this. It sounds like something else is kicking us out of the attach wait loop on these systems?
>>
>> Jim
>>
>>
>>
>>> On Sep 20, 2023, at 2:56 AM, David Spickett ***@***.***> wrote:
>>>
>>>
>>> @DavidSpickett commented on this pull request.
>>>
>>> In lldb/test/API/commands/process/attach/TestProcessAttach.py <#65822 (comment)>:
>>>
>>> > + time.sleep(1)
>>> + if target.process.state == lldb.eStateAttaching:
>>> + break
>>> +
>>> + self.dbg.DispatchInputInterrupt()
>>> + self.dbg.DispatchInputInterrupt()
>>> +
>>> + self.out_filehandle.flush()
>>> + reader = open(self.stdout_path, "r")
>>> + results = reader.readlines()
>>> + found_result = False
>>> + for line in results:
>>> + if "Cancelled async attach" in line:
>>> + found_result = True
>>> + break
>>> + self.assertTrue(found_result, "Found async error in results")
>>> This fails if results is empty.
>>>
>>> ********************************
>>> []
>>> ********************************
>>> FAIL: LLDB (/home/david.spickett/build-llvm-aarch64/bin/clang-aarch64) :: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
>>> <bound method SBProcess.Kill of SBProcess: pid = 0, state = exited, threads = 0, executable = ProcessAttach>: success
>>>
>>> Restore dir to: /home/david.spickett/build-llvm-aarch64
>>> ======================================================================
>>> FAIL: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>> File "/home/david.spickett/llvm-project/lldb/test/API/commands/process/attach/TestProcessAttach.py", line 195, in test_run_then_attach_wait_interrupt
>>> self.assertTrue(found_result, "Found async error in results")
>>> AssertionError: False is not True : Found async error in results
>>> Config=aarch64-/home/david.spickett/build-llvm-aarch64/bin/clang
>>> ----------------------------------------------------------------------
>>> When the test works the results contain:
>>>
>>> ********************************
>>> ['error: attach failed: Cancelled async attach.\n', '\n', '... Interrupted.\n']
>>> ********************************
>>> Running it in a loop it took ~40 runs to get a failing one.
>>>
>>> I wonder if that is because the attach happens to finish a lot faster sometimes, so there's no time to cancel it? If that's down to the OS and machine load, I'm not sure how we'd make this predictable.
>>>
>>> The ugly option is to say if the results are empty, pass the test and assume the other 39 runs will check for the bug.
>>>
>>> —
>>> Reply to this email directly, view it on GitHub <#65822 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW225PO46OIV2JRC2I3X3K4WPANCNFSM6AAAAAA4RAWCQY>.
>>> You are receiving this because you modified the open/close state.
>>>
>>
>
|
Thanks for looking into this. Well for now at least we know that users on Linux will only see the bug 1 in N times :) (the bots are still red but for unrelated reasons) |
The problem is that the when the "attach" command is initiated, the ExecutionContext for the command has a process - it's the exited one from the previour run. But the
attach wait
creates a new process for the attach, and then errors out instead of interrupting when it finds that its process and the one in the command's ExecutionContext don't match.This change checks that if we're returning a target from GetExecutionContext, we fill the context with it's current process, not some historical one.