Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About dcgmError_enum 94 - DCGM_FR_EUD_NON_ZERO_EXIT_CODE. Anyone can told me what is reason to this case? #190

Closed
Pig255 opened this issue Sep 10, 2024 · 3 comments

Comments

@Pig255
Copy link

Pig255 commented Sep 10, 2024

Details are as follows:

"warnings" : "GPU 4 EUD process exited with non-zero exit code. Please check the EUD logs in '/var/log/nvidia-dcgm/dcgm_eud.log', '/var/log/nvidia-dcgm/dcgm_eud_stdout.txt', and '/var/log/nvidia-dcgm/dcgm_eud_stderr.txt' Exit code: 139"

all GPUs report such error

@Pig255
Copy link
Author

Pig255 commented Sep 11, 2024

Image

@Pig255
Copy link
Author

Pig255 commented Sep 11, 2024

/usr/share/nvidia/diagnostic/specializediag: line 12: Segmentation fault (core dumped)

@nikkon-dev
Copy link
Collaborator

@Pig255,

The EUD component is currently not publicly available and is only accessible to a limited number of customers. If you have access to it, you should be able to report a bug at the NvOnline portal and provide all the aforementioned log files and details.

Please note that the DCGM team does not maintain the EUD component, so we are unable to provide significant assistance with its crashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants