Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feedback] Azure Container Storage with ElasticSAN - csi driver issues during node upgrade #4863

Open
arenk opened this issue Mar 14, 2025 · 2 comments
Assignees
Labels
Feedback General feedback storage

Comments

@arenk
Copy link

arenk commented Mar 14, 2025

Describe your scenario
I'm testing Azure Container Storage with ElasticSAN in an AKS cluster (currently in preview). In total, this AKS cluster has 120 PVCs/PVs backed by an ElasticSAN volume across 7 nodes. Everything has been working fine and as expected over the last 4 weeks. However, on 3 or 4 occasions and during a node image upgrade, the pod azurecontainerstorage-azuresan-csi-driver failed to start up on the updated node - more precisely, it was the container driver on this pod. In the Azure portal, I could see OOMKilled events during this time related to this pod.

Feedback
Although I have not been able to fully investigate the issue, I'm not sure if the memory specs for this container are reasonable:

  • Even during normal operations, the container driver uses up to 100MiB while the request in the specs is only 20MiB. For such a critical pod/container, these numbers might be too low.
  • I have no data on memory consumption during the startup phase, but the 512MiB limit might have been too low as well, at least in some instances.
  • Unfortunately, I can't reproduce this issue reliably, since during most node upgrades, everything just works fine
  • If it happens, the only way to resolve the issue is to drain & delete this node and let the autoscaler create a new one. Restarting the pod does not help.

The last log line visible for the driver container before it was killed were:

I0313 07:39:20.594194 158844 main.go:128] "msg"="Configured Azure API client" "logger"="setup" "subscription"="xxxxxxx"
I0313 07:39:20.594224 158844 main.go:153] "msg"="Reconfiguring multipath daemon" "logger"="setup" "numSessions"=32

I couldn't find any other issues in the logs of the other containers on this pod.

@VybavaRamadoss
Copy link
Member

@yuemlu

@jiarongjoyce
Copy link

jiarongjoyce commented Mar 14, 2025

Hi @arenk, thanks for reaching out. Sorry to hear you're experiencing issues using your Elastic SAN with Azure Container Storage. Could you share the following information, so I can take a look at the logs on our end to see what could be going on?

  • Subscription ID
  • AKS cluster name
  • Region SAN is deployed in
  • Elastic SAN name (should be prefixed with "acstor-managed")
  • Volumes: If you go to your Elastic SAN in the Azure Portal, you can navigate to the Volumes blade. Select all the volumes and click connect. Expand the connect script and scroll down to the bottom. Copy all the lines that begin with "volume_data.append" That will provide the volume info I need to inspect the logs.
  • Estimated time frame these issues occured.

You can either share that information here, or email it to AskContainerStorage@microsoft.com.

Thanks!
Joyce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feedback General feedback storage
Projects
None yet
Development

No branches or pull requests

5 participants