-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Use the correct way to check whether an actor task is running or not #51158
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3d77e1b
to
9ef035b
Compare
This is to fix #51058 (comment). |
edoakes
approved these changes
Mar 7, 2025
elimelt
pushed a commit
to elimelt/ray
that referenced
this pull request
Mar 9, 2025
…or not (ray-project#51158) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Ray core worker processes change the process title by calling `setproctitle` before and after executing tasks. Then, `verify_tasks_running_or_terminated` uses the following logic to check whether the Ray actor task is running or not: ```python if psutil.pid_exists(pid) and task_name in psutil.Process(pid).name(): ``` However, the logic is not correct in some cases. To understand the issue, we need to understand the behavior of `setproctitle` and `psutil.Process(pid).name()`. * `setproctitle`: * It calls functions like [prctl](https://github.com/dvarrazzo/py-setproctitle/blob/cacf96fafa3da1cd1a5b131b4f8b9997c01518d5/src/spt_status.c#L390) based on the OS and arch under the hood. It updates `/proc/pid/task/tid/comm` of the calling thread. * If the thread is the leader thread (i.e. main thread in Python), `/proc/pid/comm` will also be updated. * `/proc/pid/cmdline` and `/proc/pid/task/tid/cmdline` will always be updated based on my observation. * Note that `comm` has a length limitation (15 chars) * `psutil.Process(pid).name()` * https://github.com/giampaolo/psutil/blob/a17550784b0d3175da01cdb02cee1bc6b61637dc/psutil/__init__.py#L664-L693 * Try to retrieve `/proc/pid/comm`. * If `/proc/pid/comm` is truncated, attempt to retrieve the first element of `/proc/pid/cmdline`; otherwise, return `/proc/pid/comm`. * If `cmdline[0]` begins with the contents of `/proc/pid/comm`, return `cmdline[0]`; otherwise, return the truncated `/proc/pid/comm`. If the actor task is not running on the main thread (in which case `setproctitle` will not update `/proc/pid/comm`), and if users pass an `options.name` to the actor that doesn't start with `/proc/pid/comm`, then `task_name in psutil.Process(pid).name()` will evaluate to false, which is incorrect. ```python def test_default_thread_options_name_long_comm_2(self, ray_start_regular): """ `/proc/PID/comm` is truncated to 15 characters, so the process title is "ray::VeryLongCo". `psutil.Process.name()` doesn't return `cmdline()[0]` because `cmdline()[0]` doesn't start with "ray::VeryLongCo". This is the implementation detail of `psutil`. """ task_name = "hello" @ray.remote(concurrency_groups={"io": 1}) class VeryLongCommActor: def check_long_comm(self): pid = os.getpid() assert _is_actor_task_running(pid, task_name) # The first 15 characters of "ray::VeryLongCommActor" assert psutil.Process(pid).name() == "ray::VeryLongCo" assert psutil.Process(pid).cmdline()[0] == f"ray::{task_name}" return pid a = VeryLongCommActor.remote() pid = ray.get(a.check_long_comm.options(name=task_name).remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, task_name)) ``` ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: kaihsun <kaihsun@anyscale.com>
8 tasks
edoakes
pushed a commit
that referenced
this pull request
Mar 11, 2025
#51158 adds tests for process titles. However, the definition of "title" varies across different operating systems, and the behavior of `psutil.Process().name()` differs as well. ``` [ref]: https://github.com/dvarrazzo/py-setproctitle > The process title is usually visible in files such as /proc/PID/cmdline, /proc/PID/status, /proc/PID/comm, depending on the operating system and kernel version. This information is used by user-space tools such as ps and top. ``` For example, in Linux, the `/proc/pid/comm` can't be more than 15 chars. However, MacOS doesn't have `/proc/pid/comm`. In the following test, the name of the process in Mac is "python3.9", but the name on Linux is "ray::A.check". ```python def test_main_thread_short_comm(self, ray_start_regular): """ Test that the main thread's comm is not truncated. """ @ray.remote class A: def check(self): pid = os.getpid() assert _is_actor_task_running(pid, "check") assert psutil.Process(pid).name() == "ray::A.check" assert psutil.Process(pid).cmdline()[0] == "ray::A.check" return pid a = A.remote() pid = ray.get(a.check.remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, "check")) ``` ## Related issue number Closes #51182 Closes #45966 On my Mac M3 <img width="1728" alt="image" src="https://github.com/user-attachments/assets/c8c0addf-9c18-402d-bafd-135ff8f6ea7a" /> Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
qinyiyan
pushed a commit
to qinyiyan/ray
that referenced
this pull request
Mar 13, 2025
…1229) ray-project#51158 adds tests for process titles. However, the definition of "title" varies across different operating systems, and the behavior of `psutil.Process().name()` differs as well. ``` [ref]: https://github.com/dvarrazzo/py-setproctitle > The process title is usually visible in files such as /proc/PID/cmdline, /proc/PID/status, /proc/PID/comm, depending on the operating system and kernel version. This information is used by user-space tools such as ps and top. ``` For example, in Linux, the `/proc/pid/comm` can't be more than 15 chars. However, MacOS doesn't have `/proc/pid/comm`. In the following test, the name of the process in Mac is "python3.9", but the name on Linux is "ray::A.check". ```python def test_main_thread_short_comm(self, ray_start_regular): """ Test that the main thread's comm is not truncated. """ @ray.remote class A: def check(self): pid = os.getpid() assert _is_actor_task_running(pid, "check") assert psutil.Process(pid).name() == "ray::A.check" assert psutil.Process(pid).cmdline()[0] == "ray::A.check" return pid a = A.remote() pid = ray.get(a.check.remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, "check")) ``` ## Related issue number Closes ray-project#51182 Closes ray-project#45966 On my Mac M3 <img width="1728" alt="image" src="https://github.com/user-attachments/assets/c8c0addf-9c18-402d-bafd-135ff8f6ea7a" /> Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
park12sj
pushed a commit
to park12sj/ray
that referenced
this pull request
Mar 18, 2025
…or not (ray-project#51158) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Ray core worker processes change the process title by calling `setproctitle` before and after executing tasks. Then, `verify_tasks_running_or_terminated` uses the following logic to check whether the Ray actor task is running or not: ```python if psutil.pid_exists(pid) and task_name in psutil.Process(pid).name(): ``` However, the logic is not correct in some cases. To understand the issue, we need to understand the behavior of `setproctitle` and `psutil.Process(pid).name()`. * `setproctitle`: * It calls functions like [prctl](https://github.com/dvarrazzo/py-setproctitle/blob/cacf96fafa3da1cd1a5b131b4f8b9997c01518d5/src/spt_status.c#L390) based on the OS and arch under the hood. It updates `/proc/pid/task/tid/comm` of the calling thread. * If the thread is the leader thread (i.e. main thread in Python), `/proc/pid/comm` will also be updated. * `/proc/pid/cmdline` and `/proc/pid/task/tid/cmdline` will always be updated based on my observation. * Note that `comm` has a length limitation (15 chars) * `psutil.Process(pid).name()` * https://github.com/giampaolo/psutil/blob/a17550784b0d3175da01cdb02cee1bc6b61637dc/psutil/__init__.py#L664-L693 * Try to retrieve `/proc/pid/comm`. * If `/proc/pid/comm` is truncated, attempt to retrieve the first element of `/proc/pid/cmdline`; otherwise, return `/proc/pid/comm`. * If `cmdline[0]` begins with the contents of `/proc/pid/comm`, return `cmdline[0]`; otherwise, return the truncated `/proc/pid/comm`. If the actor task is not running on the main thread (in which case `setproctitle` will not update `/proc/pid/comm`), and if users pass an `options.name` to the actor that doesn't start with `/proc/pid/comm`, then `task_name in psutil.Process(pid).name()` will evaluate to false, which is incorrect. ```python def test_default_thread_options_name_long_comm_2(self, ray_start_regular): """ `/proc/PID/comm` is truncated to 15 characters, so the process title is "ray::VeryLongCo". `psutil.Process.name()` doesn't return `cmdline()[0]` because `cmdline()[0]` doesn't start with "ray::VeryLongCo". This is the implementation detail of `psutil`. """ task_name = "hello" @ray.remote(concurrency_groups={"io": 1}) class VeryLongCommActor: def check_long_comm(self): pid = os.getpid() assert _is_actor_task_running(pid, task_name) # The first 15 characters of "ray::VeryLongCommActor" assert psutil.Process(pid).name() == "ray::VeryLongCo" assert psutil.Process(pid).cmdline()[0] == f"ray::{task_name}" return pid a = VeryLongCommActor.remote() pid = ray.get(a.check_long_comm.options(name=task_name).remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, task_name)) ``` ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: kaihsun <kaihsun@anyscale.com>
park12sj
pushed a commit
to park12sj/ray
that referenced
this pull request
Mar 18, 2025
…1229) ray-project#51158 adds tests for process titles. However, the definition of "title" varies across different operating systems, and the behavior of `psutil.Process().name()` differs as well. ``` [ref]: https://github.com/dvarrazzo/py-setproctitle > The process title is usually visible in files such as /proc/PID/cmdline, /proc/PID/status, /proc/PID/comm, depending on the operating system and kernel version. This information is used by user-space tools such as ps and top. ``` For example, in Linux, the `/proc/pid/comm` can't be more than 15 chars. However, MacOS doesn't have `/proc/pid/comm`. In the following test, the name of the process in Mac is "python3.9", but the name on Linux is "ray::A.check". ```python def test_main_thread_short_comm(self, ray_start_regular): """ Test that the main thread's comm is not truncated. """ @ray.remote class A: def check(self): pid = os.getpid() assert _is_actor_task_running(pid, "check") assert psutil.Process(pid).name() == "ray::A.check" assert psutil.Process(pid).cmdline()[0] == "ray::A.check" return pid a = A.remote() pid = ray.get(a.check.remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, "check")) ``` ## Related issue number Closes ray-project#51182 Closes ray-project#45966 On my Mac M3 <img width="1728" alt="image" src="https://github.com/user-attachments/assets/c8c0addf-9c18-402d-bafd-135ff8f6ea7a" /> Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
8 tasks
jaychia
pushed a commit
to jaychia/ray
that referenced
this pull request
Mar 19, 2025
…or not (ray-project#51158) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Ray core worker processes change the process title by calling `setproctitle` before and after executing tasks. Then, `verify_tasks_running_or_terminated` uses the following logic to check whether the Ray actor task is running or not: ```python if psutil.pid_exists(pid) and task_name in psutil.Process(pid).name(): ``` However, the logic is not correct in some cases. To understand the issue, we need to understand the behavior of `setproctitle` and `psutil.Process(pid).name()`. * `setproctitle`: * It calls functions like [prctl](https://github.com/dvarrazzo/py-setproctitle/blob/cacf96fafa3da1cd1a5b131b4f8b9997c01518d5/src/spt_status.c#L390) based on the OS and arch under the hood. It updates `/proc/pid/task/tid/comm` of the calling thread. * If the thread is the leader thread (i.e. main thread in Python), `/proc/pid/comm` will also be updated. * `/proc/pid/cmdline` and `/proc/pid/task/tid/cmdline` will always be updated based on my observation. * Note that `comm` has a length limitation (15 chars) * `psutil.Process(pid).name()` * https://github.com/giampaolo/psutil/blob/a17550784b0d3175da01cdb02cee1bc6b61637dc/psutil/__init__.py#L664-L693 * Try to retrieve `/proc/pid/comm`. * If `/proc/pid/comm` is truncated, attempt to retrieve the first element of `/proc/pid/cmdline`; otherwise, return `/proc/pid/comm`. * If `cmdline[0]` begins with the contents of `/proc/pid/comm`, return `cmdline[0]`; otherwise, return the truncated `/proc/pid/comm`. If the actor task is not running on the main thread (in which case `setproctitle` will not update `/proc/pid/comm`), and if users pass an `options.name` to the actor that doesn't start with `/proc/pid/comm`, then `task_name in psutil.Process(pid).name()` will evaluate to false, which is incorrect. ```python def test_default_thread_options_name_long_comm_2(self, ray_start_regular): """ `/proc/PID/comm` is truncated to 15 characters, so the process title is "ray::VeryLongCo". `psutil.Process.name()` doesn't return `cmdline()[0]` because `cmdline()[0]` doesn't start with "ray::VeryLongCo". This is the implementation detail of `psutil`. """ task_name = "hello" @ray.remote(concurrency_groups={"io": 1}) class VeryLongCommActor: def check_long_comm(self): pid = os.getpid() assert _is_actor_task_running(pid, task_name) # The first 15 characters of "ray::VeryLongCommActor" assert psutil.Process(pid).name() == "ray::VeryLongCo" assert psutil.Process(pid).cmdline()[0] == f"ray::{task_name}" return pid a = VeryLongCommActor.remote() pid = ray.get(a.check_long_comm.options(name=task_name).remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, task_name)) ``` ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: kaihsun <kaihsun@anyscale.com> Signed-off-by: Jay Chia <17691182+jaychia@users.noreply.github.com>
jaychia
pushed a commit
to jaychia/ray
that referenced
this pull request
Mar 19, 2025
…1229) ray-project#51158 adds tests for process titles. However, the definition of "title" varies across different operating systems, and the behavior of `psutil.Process().name()` differs as well. ``` [ref]: https://github.com/dvarrazzo/py-setproctitle > The process title is usually visible in files such as /proc/PID/cmdline, /proc/PID/status, /proc/PID/comm, depending on the operating system and kernel version. This information is used by user-space tools such as ps and top. ``` For example, in Linux, the `/proc/pid/comm` can't be more than 15 chars. However, MacOS doesn't have `/proc/pid/comm`. In the following test, the name of the process in Mac is "python3.9", but the name on Linux is "ray::A.check". ```python def test_main_thread_short_comm(self, ray_start_regular): """ Test that the main thread's comm is not truncated. """ @ray.remote class A: def check(self): pid = os.getpid() assert _is_actor_task_running(pid, "check") assert psutil.Process(pid).name() == "ray::A.check" assert psutil.Process(pid).cmdline()[0] == "ray::A.check" return pid a = A.remote() pid = ray.get(a.check.remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, "check")) ``` ## Related issue number Closes ray-project#51182 Closes ray-project#45966 On my Mac M3 <img width="1728" alt="image" src="https://github.com/user-attachments/assets/c8c0addf-9c18-402d-bafd-135ff8f6ea7a" /> Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com> Signed-off-by: Jay Chia <17691182+jaychia@users.noreply.github.com>
jaychia
pushed a commit
to jaychia/ray
that referenced
this pull request
Mar 19, 2025
…or not (ray-project#51158) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Ray core worker processes change the process title by calling `setproctitle` before and after executing tasks. Then, `verify_tasks_running_or_terminated` uses the following logic to check whether the Ray actor task is running or not: ```python if psutil.pid_exists(pid) and task_name in psutil.Process(pid).name(): ``` However, the logic is not correct in some cases. To understand the issue, we need to understand the behavior of `setproctitle` and `psutil.Process(pid).name()`. * `setproctitle`: * It calls functions like [prctl](https://github.com/dvarrazzo/py-setproctitle/blob/cacf96fafa3da1cd1a5b131b4f8b9997c01518d5/src/spt_status.c#L390) based on the OS and arch under the hood. It updates `/proc/pid/task/tid/comm` of the calling thread. * If the thread is the leader thread (i.e. main thread in Python), `/proc/pid/comm` will also be updated. * `/proc/pid/cmdline` and `/proc/pid/task/tid/cmdline` will always be updated based on my observation. * Note that `comm` has a length limitation (15 chars) * `psutil.Process(pid).name()` * https://github.com/giampaolo/psutil/blob/a17550784b0d3175da01cdb02cee1bc6b61637dc/psutil/__init__.py#L664-L693 * Try to retrieve `/proc/pid/comm`. * If `/proc/pid/comm` is truncated, attempt to retrieve the first element of `/proc/pid/cmdline`; otherwise, return `/proc/pid/comm`. * If `cmdline[0]` begins with the contents of `/proc/pid/comm`, return `cmdline[0]`; otherwise, return the truncated `/proc/pid/comm`. If the actor task is not running on the main thread (in which case `setproctitle` will not update `/proc/pid/comm`), and if users pass an `options.name` to the actor that doesn't start with `/proc/pid/comm`, then `task_name in psutil.Process(pid).name()` will evaluate to false, which is incorrect. ```python def test_default_thread_options_name_long_comm_2(self, ray_start_regular): """ `/proc/PID/comm` is truncated to 15 characters, so the process title is "ray::VeryLongCo". `psutil.Process.name()` doesn't return `cmdline()[0]` because `cmdline()[0]` doesn't start with "ray::VeryLongCo". This is the implementation detail of `psutil`. """ task_name = "hello" @ray.remote(concurrency_groups={"io": 1}) class VeryLongCommActor: def check_long_comm(self): pid = os.getpid() assert _is_actor_task_running(pid, task_name) # The first 15 characters of "ray::VeryLongCommActor" assert psutil.Process(pid).name() == "ray::VeryLongCo" assert psutil.Process(pid).cmdline()[0] == f"ray::{task_name}" return pid a = VeryLongCommActor.remote() pid = ray.get(a.check_long_comm.options(name=task_name).remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, task_name)) ``` ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: kaihsun <kaihsun@anyscale.com> Signed-off-by: Jay Chia <17691182+jaychia@users.noreply.github.com>
jaychia
pushed a commit
to jaychia/ray
that referenced
this pull request
Mar 19, 2025
…1229) ray-project#51158 adds tests for process titles. However, the definition of "title" varies across different operating systems, and the behavior of `psutil.Process().name()` differs as well. ``` [ref]: https://github.com/dvarrazzo/py-setproctitle > The process title is usually visible in files such as /proc/PID/cmdline, /proc/PID/status, /proc/PID/comm, depending on the operating system and kernel version. This information is used by user-space tools such as ps and top. ``` For example, in Linux, the `/proc/pid/comm` can't be more than 15 chars. However, MacOS doesn't have `/proc/pid/comm`. In the following test, the name of the process in Mac is "python3.9", but the name on Linux is "ray::A.check". ```python def test_main_thread_short_comm(self, ray_start_regular): """ Test that the main thread's comm is not truncated. """ @ray.remote class A: def check(self): pid = os.getpid() assert _is_actor_task_running(pid, "check") assert psutil.Process(pid).name() == "ray::A.check" assert psutil.Process(pid).cmdline()[0] == "ray::A.check" return pid a = A.remote() pid = ray.get(a.check.remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, "check")) ``` ## Related issue number Closes ray-project#51182 Closes ray-project#45966 On my Mac M3 <img width="1728" alt="image" src="https://github.com/user-attachments/assets/c8c0addf-9c18-402d-bafd-135ff8f6ea7a" /> Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com> Signed-off-by: Jay Chia <17691182+jaychia@users.noreply.github.com>
Drice1999
pushed a commit
to Drice1999/ray
that referenced
this pull request
Mar 23, 2025
…or not (ray-project#51158) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Ray core worker processes change the process title by calling `setproctitle` before and after executing tasks. Then, `verify_tasks_running_or_terminated` uses the following logic to check whether the Ray actor task is running or not: ```python if psutil.pid_exists(pid) and task_name in psutil.Process(pid).name(): ``` However, the logic is not correct in some cases. To understand the issue, we need to understand the behavior of `setproctitle` and `psutil.Process(pid).name()`. * `setproctitle`: * It calls functions like [prctl](https://github.com/dvarrazzo/py-setproctitle/blob/cacf96fafa3da1cd1a5b131b4f8b9997c01518d5/src/spt_status.c#L390) based on the OS and arch under the hood. It updates `/proc/pid/task/tid/comm` of the calling thread. * If the thread is the leader thread (i.e. main thread in Python), `/proc/pid/comm` will also be updated. * `/proc/pid/cmdline` and `/proc/pid/task/tid/cmdline` will always be updated based on my observation. * Note that `comm` has a length limitation (15 chars) * `psutil.Process(pid).name()` * https://github.com/giampaolo/psutil/blob/a17550784b0d3175da01cdb02cee1bc6b61637dc/psutil/__init__.py#L664-L693 * Try to retrieve `/proc/pid/comm`. * If `/proc/pid/comm` is truncated, attempt to retrieve the first element of `/proc/pid/cmdline`; otherwise, return `/proc/pid/comm`. * If `cmdline[0]` begins with the contents of `/proc/pid/comm`, return `cmdline[0]`; otherwise, return the truncated `/proc/pid/comm`. If the actor task is not running on the main thread (in which case `setproctitle` will not update `/proc/pid/comm`), and if users pass an `options.name` to the actor that doesn't start with `/proc/pid/comm`, then `task_name in psutil.Process(pid).name()` will evaluate to false, which is incorrect. ```python def test_default_thread_options_name_long_comm_2(self, ray_start_regular): """ `/proc/PID/comm` is truncated to 15 characters, so the process title is "ray::VeryLongCo". `psutil.Process.name()` doesn't return `cmdline()[0]` because `cmdline()[0]` doesn't start with "ray::VeryLongCo". This is the implementation detail of `psutil`. """ task_name = "hello" @ray.remote(concurrency_groups={"io": 1}) class VeryLongCommActor: def check_long_comm(self): pid = os.getpid() assert _is_actor_task_running(pid, task_name) # The first 15 characters of "ray::VeryLongCommActor" assert psutil.Process(pid).name() == "ray::VeryLongCo" assert psutil.Process(pid).cmdline()[0] == f"ray::{task_name}" return pid a = VeryLongCommActor.remote() pid = ray.get(a.check_long_comm.options(name=task_name).remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, task_name)) ``` ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: kaihsun <kaihsun@anyscale.com>
Drice1999
pushed a commit
to Drice1999/ray
that referenced
this pull request
Mar 23, 2025
…1229) ray-project#51158 adds tests for process titles. However, the definition of "title" varies across different operating systems, and the behavior of `psutil.Process().name()` differs as well. ``` [ref]: https://github.com/dvarrazzo/py-setproctitle > The process title is usually visible in files such as /proc/PID/cmdline, /proc/PID/status, /proc/PID/comm, depending on the operating system and kernel version. This information is used by user-space tools such as ps and top. ``` For example, in Linux, the `/proc/pid/comm` can't be more than 15 chars. However, MacOS doesn't have `/proc/pid/comm`. In the following test, the name of the process in Mac is "python3.9", but the name on Linux is "ray::A.check". ```python def test_main_thread_short_comm(self, ray_start_regular): """ Test that the main thread's comm is not truncated. """ @ray.remote class A: def check(self): pid = os.getpid() assert _is_actor_task_running(pid, "check") assert psutil.Process(pid).name() == "ray::A.check" assert psutil.Process(pid).cmdline()[0] == "ray::A.check" return pid a = A.remote() pid = ray.get(a.check.remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, "check")) ``` ## Related issue number Closes ray-project#51182 Closes ray-project#45966 On my Mac M3 <img width="1728" alt="image" src="https://github.com/user-attachments/assets/c8c0addf-9c18-402d-bafd-135ff8f6ea7a" /> Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
dhakshin32
pushed a commit
to dhakshin32/ray
that referenced
this pull request
Mar 27, 2025
…or not (ray-project#51158) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Ray core worker processes change the process title by calling `setproctitle` before and after executing tasks. Then, `verify_tasks_running_or_terminated` uses the following logic to check whether the Ray actor task is running or not: ```python if psutil.pid_exists(pid) and task_name in psutil.Process(pid).name(): ``` However, the logic is not correct in some cases. To understand the issue, we need to understand the behavior of `setproctitle` and `psutil.Process(pid).name()`. * `setproctitle`: * It calls functions like [prctl](https://github.com/dvarrazzo/py-setproctitle/blob/cacf96fafa3da1cd1a5b131b4f8b9997c01518d5/src/spt_status.c#L390) based on the OS and arch under the hood. It updates `/proc/pid/task/tid/comm` of the calling thread. * If the thread is the leader thread (i.e. main thread in Python), `/proc/pid/comm` will also be updated. * `/proc/pid/cmdline` and `/proc/pid/task/tid/cmdline` will always be updated based on my observation. * Note that `comm` has a length limitation (15 chars) * `psutil.Process(pid).name()` * https://github.com/giampaolo/psutil/blob/a17550784b0d3175da01cdb02cee1bc6b61637dc/psutil/__init__.py#L664-L693 * Try to retrieve `/proc/pid/comm`. * If `/proc/pid/comm` is truncated, attempt to retrieve the first element of `/proc/pid/cmdline`; otherwise, return `/proc/pid/comm`. * If `cmdline[0]` begins with the contents of `/proc/pid/comm`, return `cmdline[0]`; otherwise, return the truncated `/proc/pid/comm`. If the actor task is not running on the main thread (in which case `setproctitle` will not update `/proc/pid/comm`), and if users pass an `options.name` to the actor that doesn't start with `/proc/pid/comm`, then `task_name in psutil.Process(pid).name()` will evaluate to false, which is incorrect. ```python def test_default_thread_options_name_long_comm_2(self, ray_start_regular): """ `/proc/PID/comm` is truncated to 15 characters, so the process title is "ray::VeryLongCo". `psutil.Process.name()` doesn't return `cmdline()[0]` because `cmdline()[0]` doesn't start with "ray::VeryLongCo". This is the implementation detail of `psutil`. """ task_name = "hello" @ray.remote(concurrency_groups={"io": 1}) class VeryLongCommActor: def check_long_comm(self): pid = os.getpid() assert _is_actor_task_running(pid, task_name) # The first 15 characters of "ray::VeryLongCommActor" assert psutil.Process(pid).name() == "ray::VeryLongCo" assert psutil.Process(pid).cmdline()[0] == f"ray::{task_name}" return pid a = VeryLongCommActor.remote() pid = ray.get(a.check_long_comm.options(name=task_name).remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, task_name)) ``` ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: kaihsun <kaihsun@anyscale.com> Signed-off-by: Dhakshin Suriakannu <d_suriakannu@apple.com>
dhakshin32
pushed a commit
to dhakshin32/ray
that referenced
this pull request
Mar 27, 2025
…1229) ray-project#51158 adds tests for process titles. However, the definition of "title" varies across different operating systems, and the behavior of `psutil.Process().name()` differs as well. ``` [ref]: https://github.com/dvarrazzo/py-setproctitle > The process title is usually visible in files such as /proc/PID/cmdline, /proc/PID/status, /proc/PID/comm, depending on the operating system and kernel version. This information is used by user-space tools such as ps and top. ``` For example, in Linux, the `/proc/pid/comm` can't be more than 15 chars. However, MacOS doesn't have `/proc/pid/comm`. In the following test, the name of the process in Mac is "python3.9", but the name on Linux is "ray::A.check". ```python def test_main_thread_short_comm(self, ray_start_regular): """ Test that the main thread's comm is not truncated. """ @ray.remote class A: def check(self): pid = os.getpid() assert _is_actor_task_running(pid, "check") assert psutil.Process(pid).name() == "ray::A.check" assert psutil.Process(pid).cmdline()[0] == "ray::A.check" return pid a = A.remote() pid = ray.get(a.check.remote()) wait_for_condition(lambda: not _is_actor_task_running(pid, "check")) ``` ## Related issue number Closes ray-project#51182 Closes ray-project#45966 On my Mac M3 <img width="1728" alt="image" src="https://github.com/user-attachments/assets/c8c0addf-9c18-402d-bafd-135ff8f6ea7a" /> Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com> Signed-off-by: Dhakshin Suriakannu <d_suriakannu@apple.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Ray core worker processes change the process title by calling
setproctitle
before and after executing tasks. Then,verify_tasks_running_or_terminated
uses the following logic to check whether the Ray actor task is running or not:However, the logic is not correct in some cases. To understand the issue, we need to understand the behavior of
setproctitle
andpsutil.Process(pid).name()
.setproctitle
:/proc/pid/task/tid/comm
of the calling thread./proc/pid/comm
will also be updated./proc/pid/cmdline
and/proc/pid/task/tid/cmdline
will always be updated based on my observation.comm
has a length limitation (15 chars)psutil.Process(pid).name()
/proc/pid/comm
./proc/pid/comm
is truncated, attempt to retrieve the first element of/proc/pid/cmdline
; otherwise, return/proc/pid/comm
.cmdline[0]
begins with the contents of/proc/pid/comm
, returncmdline[0]
; otherwise, return the truncated/proc/pid/comm
.If the actor task is not running on the main thread (in which case
setproctitle
will not update/proc/pid/comm
), and if users pass anoptions.name
to the actor that doesn't start with/proc/pid/comm
, thentask_name in psutil.Process(pid).name()
will evaluate to false, which is incorrect.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.