You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe your scenario
I'm testing Azure Container Storage with ElasticSAN in an AKS cluster (currently in preview). In total, this AKS cluster has 120 PVCs/PVs backed by an ElasticSAN volume across 7 nodes. Everything has been working fine and as expected over the last 4 weeks. However, on 3 or 4 occasions and during a node image upgrade, the pod azurecontainerstorage-azuresan-csi-driver failed to start up on the updated node - more precisely, it was the container driver on this pod. In the Azure portal, I could see OOMKilled events during this time related to this pod.
Feedback
Although I have not been able to fully investigate the issue, I'm not sure if the memory specs for this container are reasonable:
Even during normal operations, the container driver uses up to 100MiB while the request in the specs is only 20MiB. For such a critical pod/container, these numbers might be too low.
I have no data on memory consumption during the startup phase, but the 512MiB limit might have been too low as well, at least in some instances.
Unfortunately, I can't reproduce this issue reliably, since during most node upgrades, everything just works fine
If it happens, the only way to resolve the issue is to drain & delete this node and let the autoscaler create a new one. Restarting the pod does not help.
The last log line visible for the driver container before it was killed were:
Hi @arenk, thanks for reaching out. Sorry to hear you're experiencing issues using your Elastic SAN with Azure Container Storage. Could you share the following information, so I can take a look at the logs on our end to see what could be going on?
Subscription ID
AKS cluster name
Region SAN is deployed in
Elastic SAN name (should be prefixed with "acstor-managed")
Volumes: If you go to your Elastic SAN in the Azure Portal, you can navigate to the Volumes blade. Select all the volumes and click connect. Expand the connect script and scroll down to the bottom. Copy all the lines that begin with "volume_data.append" That will provide the volume info I need to inspect the logs.
Describe your scenario
I'm testing Azure Container Storage with ElasticSAN in an AKS cluster (currently in preview). In total, this AKS cluster has 120 PVCs/PVs backed by an ElasticSAN volume across 7 nodes. Everything has been working fine and as expected over the last 4 weeks. However, on 3 or 4 occasions and during a node image upgrade, the pod azurecontainerstorage-azuresan-csi-driver failed to start up on the updated node - more precisely, it was the container driver on this pod. In the Azure portal, I could see OOMKilled events during this time related to this pod.
Feedback
Although I have not been able to fully investigate the issue, I'm not sure if the memory specs for this container are reasonable:
The last log line visible for the driver container before it was killed were:
I couldn't find any other issues in the logs of the other containers on this pod.
The text was updated successfully, but these errors were encountered: