Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Private cluster in spoke VNET with custom DNS in hub VNET tries to join the private DNS zone, linked to hub VNET, to the spoke one #4841

Open
xi4n opened this issue Mar 6, 2025 · 3 comments

Comments

@xi4n
Copy link

xi4n commented Mar 6, 2025

Describe the bug
Let's say I have a classical hub and spoke network topology and I would like to create an AKS private cluster in a subnet of the spoke VNET s. I use a custom DNS server in the hub VNET h and the private DNS zones, by following Azure best practices, are only linked to the hub VNET h, including the privatelink.<my_region>.azmk8s.io one, already existing prior to the creation of the cluster. The spoke VNET s is configured to use the custom DNS server in h to resolve DNS requests.

Now, when I create the private cluster by providing it the private DNS zone id of privatelink.<my_region>.azmk8s.io, grant it the role Private DNS Zone Contributor on the private DNS zone and Contributor role on the node pool subnets (not on the whole VNET s), which reside all in the VNET s, it'll throw an error during the creation of the cluster with Terraform, which I don't believe is related to Terraform.

Expected behavior
The private cluster should accept the hub VNET as single source and solution of DNS when it creates the private endpoint in the spoke VNET, because that's what Azure suggests us do.

Current behavior
The private cluster tries to join the private DNS zone I gave it to the spoke VNET, if it's not already joined. In case it doesn't have enough permissions (because I only gave it permissions on the subnet level not on the VNET level in s), it throws an error.

A workaround
I could link the private DNS zone both to my hub and spoke VNET which would solve the problem during creation, but this is not how it should be, because the spoke VNET link will be never used.

Did I use any preview features?
Some aks preview features are enabled in the subscription of the spoke VNET. However, I was not using any preview features related to this bug, in particular I was not using api server VNET integration and AKS was supposed to create a private endpoint for me to access the control plane.

To Reproduce
You can just use the most recent Terraform azurerm_kubernetes_cluster resource, by using 2 user assigned managed identities resp. for the cluster and for kubelet, network is CNI overlay + Cilium. You need to provide

dns_prefix_private_cluster = "something-you-like"
private_cluster_enabled = true
private_dns_zone_id = var.private_dns_zone_id # id of the private DNS zone `privatelink.<my_region>.azmk8s.io`
private_cluster_public_fqdn_enabled = false

Environment (please complete the following information):

  • Terraform v1.9.4 with azurerm 4.20, but again I don't think this is Terraform relevant
  • Kubernetes version 1.31.5
@asifkd012020
Copy link

@xi4n - this is by design behavior, read hub spoke private aks doco - https://learn.microsoft.com/en-us/azure/aks/private-clusters?tabs=default-basic-networking%2Cazure-portal#hub-and-spoke-with-custom-dns

The private DNS zone is linked only to the VNet that the cluster nodes are attached to (3). This means that the private endpoint can only be resolved by hosts in that linked VNet. In scenarios where no custom DNS is configured on the VNet (default), this works without issue as hosts point at 168.63.129.16 for DNS that can resolve records in the private DNS zone because of the link.

@xi4n
Copy link
Author

xi4n commented Mar 13, 2025

Hey dear @asifkd012020

I can fully understand linking the Private DNS Zone to the spoke VNET being the default behaviour where no custom DNS server is configured. I'm not having a problem with this, but rather with the fact that even when I bring my own Private DNS Zone with another DNS solution, which should already work fine for the cluster, AKS will forcibly try to link it to the spoke VNET. Let me cite the same documentation

In scenarios where the VNet containing your cluster has custom DNS settings (4), cluster deployment fails unless the private DNS zone is linked to the VNet that contains the custom DNS resolvers (5). This link can be created manually after the private zone is created during cluster provisioning or via automation upon detection of creation of the zone using event-based deployment mechanisms (for example, Azure Event Grid and Azure Functions). To avoid cluster failure during initial deployment, the cluster can be deployed with the private DNS zone resource ID.

In the scenario of a BYO Private DNS Zone, this paragraph is somehow misleading for a first time reader. He would think: Aha if I bring my own Private DNS Zone (as described in the last sentence), AKS would leave the VNET linking work (onto the hub VNET, as described in the first sentence) to me and everything will pass. That is unfortunately not the case, as explained in the OP, AKS will still try to link the BYO Private DNS Zone to the spoke VNET and fail if it can't, which is 1. not necessary and 2. this will leave us with 2 options

  1. Grant the cluster's UAMI the Contributor role on the whole VNET in which the cluster will live, so that the cluster can do the dummy linking on its own. This is not consistent with the principle of least privilege, because apart from this all what the cluster identity needs is the Contributor role on the subnets, not on the VNET.
  2. Link the Private DNS Zone both to the hub VNET (for the Private DNS Zone to really do its work) AND to the spoke VNET (just to make the cluster creation pass) beforehand. This doesn't sound like a solution, but a workaround.

In short, in the scenario of a BYO Private DNS Zone, I would expect that AKS leaves the VNET linking work to the user as well or at least make it optional, because after all, if a client brings his own Private DNS Zone, he is supposed to know what he is doing and he works probably with a hub and spoke network / multi-cluster setup.

@asifkd012020
Copy link

@xi4n - yes that's expected behaviour. We have multiple aks clusters on multiple subnets in a spoke vnet. First time it does create a vnet link to the zone, even though we have separate vnet link from the custom dns servers hosted zone that's connected to vwan hub, however, I agree MS has not clearly specified it in documentation. it should be Virtual Network link from the spoke VNet to the AKS private DNS zone is created to ensure that the AKS cluster’s nodes can reliably resolve the private FQDN of the control plane, regardless of external DNS configurations like those in your hub VNet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants