[Feedback] Azure Container Storage with ElasticSAN - csi driver issues during node upgrade #4863

arenk · 2025-03-14T10:49:08Z

Describe your scenario
I'm testing Azure Container Storage with ElasticSAN in an AKS cluster (currently in preview). In total, this AKS cluster has 120 PVCs/PVs backed by an ElasticSAN volume across 7 nodes. Everything has been working fine and as expected over the last 4 weeks. However, on 3 or 4 occasions and during a node image upgrade, the pod azurecontainerstorage-azuresan-csi-driver failed to start up on the updated node - more precisely, it was the container driver on this pod. In the Azure portal, I could see OOMKilled events during this time related to this pod.

Feedback
Although I have not been able to fully investigate the issue, I'm not sure if the memory specs for this container are reasonable:

Even during normal operations, the container driver uses up to 100MiB while the request in the specs is only 20MiB. For such a critical pod/container, these numbers might be too low.
I have no data on memory consumption during the startup phase, but the 512MiB limit might have been too low as well, at least in some instances.
Unfortunately, I can't reproduce this issue reliably, since during most node upgrades, everything just works fine
If it happens, the only way to resolve the issue is to drain & delete this node and let the autoscaler create a new one. Restarting the pod does not help.

The last log line visible for the driver container before it was killed were:

I0313 07:39:20.594194 158844 main.go:128] "msg"="Configured Azure API client" "logger"="setup" "subscription"="xxxxxxx"
I0313 07:39:20.594224 158844 main.go:153] "msg"="Reconfiguring multipath daemon" "logger"="setup" "numSessions"=32

I couldn't find any other issues in the logs of the other containers on this pod.

The text was updated successfully, but these errors were encountered:

VybavaRamadoss · 2025-03-14T20:56:45Z

@yuemlu

jiarongjoyce · 2025-03-14T22:32:46Z

Hi @arenk, thanks for reaching out. Sorry to hear you're experiencing issues using your Elastic SAN with Azure Container Storage. Could you share the following information, so I can take a look at the logs on our end to see what could be going on?

Subscription ID
AKS cluster name
Region SAN is deployed in
Elastic SAN name (should be prefixed with "acstor-managed")
Volumes: If you go to your Elastic SAN in the Azure Portal, you can navigate to the Volumes blade. Select all the volumes and click connect. Expand the connect script and scroll down to the bottom. Copy all the lines that begin with "volume_data.append" That will provide the volume info I need to inspect the logs.
Estimated time frame these issues occured.

You can either share that information here, or email it to AskContainerStorage@microsoft.com.

Thanks!
Joyce

arenk added the Feedback General feedback label Mar 14, 2025

philwelz added the storage label Mar 14, 2025

microsoft-github-policy-service bot assigned AllenWen-at-Azure Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feedback] Azure Container Storage with ElasticSAN - csi driver issues during node upgrade #4863

[Feedback] Azure Container Storage with ElasticSAN - csi driver issues during node upgrade #4863

arenk commented Mar 14, 2025 •

edited

Loading

VybavaRamadoss commented Mar 14, 2025

jiarongjoyce commented Mar 14, 2025 •

edited

Loading

[Feedback] Azure Container Storage with ElasticSAN - csi driver issues during node upgrade #4863

[Feedback] Azure Container Storage with ElasticSAN - csi driver issues during node upgrade #4863

Comments

arenk commented Mar 14, 2025 • edited Loading

VybavaRamadoss commented Mar 14, 2025

jiarongjoyce commented Mar 14, 2025 • edited Loading

arenk commented Mar 14, 2025 •

edited

Loading

jiarongjoyce commented Mar 14, 2025 •

edited

Loading