Skip to content

static ip address is already allocated, but that container is already deleted #25422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
boerniee opened this issue Feb 28, 2025 · 8 comments · Fixed by containers/common#2341
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. network Networking related issue or feature

Comments

@boerniee
Copy link

boerniee commented Feb 28, 2025

Issue Description

After rebooting a server a rootless container is spawned by systemd service and failed with information: IP address is already allocated.

This was already reported in #24915 and #15708 but it is now happening for me only if lingering is enabled for the rootless user.
If I disable lingering and restart the pc, the container will start again without this error and the static IP address assigned.

Attached you can find two logs. One with enabled linger (container startup is failing) and one with linger disabled (container starts successfully once user is logging in via ssh).
boot-linger-disabled.log
boot-linger-enabled.log

Steps to reproduce the issue

Steps to reproduce the issue

  1. Install podman
  2. enable linger for rootless user
  3. enable attached quadlets
  4. reboot system

Describe the results you received

The container fails to start while starting the container after reboot with the following error:
Error: starting container e1231c28dcddb6f10e4fde3e080ecc2e6db8bc0f89f293179354caef859bd58c: IPAM error: requested ip address 172.21.0.2 is already allocated to container ID 3c9

Describe the results you expected

I would expect the container to start after boot with the predefined static ip because it is not assigned to another container.

podman info output

host:
  arch: amd64
  buildahVersion: 1.39.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon_2.1.12-4_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: unknown'
  cpuUtilization:
    idlePercent: 99.42
    systemPercent: 0.42
    userPercent: 0.17
  cpus: 12
  databaseBackend: sqlite
  distribution:
    codename: trixie
    distribution: debian
    version: unknown
  eventLogger: journald
  freeLocks: 2045
  hostname: raimund
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.1.0-31-amd64
  linkmode: dynamic
  logDriver: journald
  memFree: 40896307200
  memTotal: 41777610752
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns_1.12.2-2_amd64
      path: /usr/lib/podman/aardvark-dns
      version: aardvark-dns 1.12.2
    package: netavark_1.12.1-9_amd64
    path: /usr/lib/podman/netavark
    version: netavark 1.12.1
  ociRuntime:
    name: crun
    package: crun_1.20-1_amd64
    path: /usr/bin/crun
    version: |-
      crun version 1.20
      commit: 9c9a76ac11994701dd666c4f0b869ceffb599a66
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt_0.0~git20250217.a1e48a0-1_amd64
    version: ""
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.2.1-1+b1_amd64
    version: |-
      slirp4netns version 1.2.1
      commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
      libslirp: 4.8.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 1024454656
  swapTotal: 1024454656
  uptime: 0h 1m 54.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/bernhard/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 2
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/bernhard/.local/share/containers/storage
  graphRootAllocated: 123886837760
  graphRootUsed: 2909896704
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/bernhard/.local/share/containers/storage/volumes
version:
  APIVersion: 5.4.0
  Built: 1739713871
  BuiltTime: Sun Feb 16 14:51:11 2025
  GitCommit: ""
  GoVersion: go1.24.0
  Os: linux
  OsArch: linux/amd64
  Version: 5.4.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

I have installed Debian 12.9 with unstable repo enabled but Pin-Priority set to 100 so I can install only podman from the unstable repo. Becuase current podman version in the debian stable version does not support quadlets.

Additional information

Happens only when lingering is enabled

Quadlet files to preproduce:

traefik.network

[Network]
Driver=bridge
IPv6=true
Subnet=172.21.0.0/16
Subnet=fd00:dead:beef::/48

traefik.container

[Container]
Image=docker.io/traefik:latest
Pod=traefik.pod

traefik.pod

[Pod]
PublishPort=1050:80
Network=traefik.network:ip=172.21.0.2
PodmanArgs=--log-level debug

[Install]
WantedBy=multi-user.target default.target

@boerniee boerniee added the kind/bug Categorizes issue or PR as related to a bug. label Feb 28, 2025
@boerniee
Copy link
Author

First tried it with podman version 5.3.2 but have the same issue with 5.4.0. Adapted the first post with the new data for 5.4.0.

@Luap99
Copy link
Member

Luap99 commented Feb 28, 2025

Do you know who the 3c95848ced852a3d5be5159bed348d0fbb8ff18e14d930a953cb72b2021f324a ID belonged to? Do you fine the one in the journal anywhere?
Is this actually the first time the unit starts after boot? The ipam db is on tmpfs so it should be practically impossible for anything to be already allocated unles someone else already run before the pod is started in your log.
Is this pod the only user of the traffic network? If there is another container that has no static ip allocated it will pick the first free it finds int he network which would be 172.21.0.2 so whent he pod is started after then it would fail because of this

@Luap99 Luap99 added the network Networking related issue or feature label Feb 28, 2025
@boerniee
Copy link
Author

boerniee commented Feb 28, 2025

Do you know who the 3c95848ced852a3d5be5159bed348d0fbb8ff18e14d930a953cb72b2021f324a ID belonged to?

Dont know what you mean with that. I cant find that one in the logs 🤔

Is this actually the first time the unit starts after boot?

Yes, I reboot the machine. Once I login I can already see that the status of the pod is failed and in the logs I can see from the timestamp that the pod failed to start before I've logged in.

Is this pod the only user of the traffic network?

Yes during the tests this was the only pod in that network. I have only used the quadlets attached above. I also thought some other container got the IP instead of this one, thats why I tested it only with one container at all.

Also it looks like if I enable linger first and then install podman there is no problem. At least it is running for a few hours and restarts without a problem.

@Luap99
Copy link
Member

Luap99 commented Feb 28, 2025

Do you know who the 3c95848ced852a3d5be5159bed348d0fbb8ff18e14d930a953cb72b2021f324a ID belonged to?

Dont know what you mean with that. I cant find that one in the logs 🤔

Yeah just checking the full journal, podman should log all events to the journal so the ID from the error that is already assigned must exist (or exited at some prior) point, it would be good to know what kind of container this was to rule out that there is another container requesting it.

You can also read the evens with podman events --since 5m --until 0s change the since time to about before you booted the machine.

@boerniee
Copy link
Author

Ahh now I've got it.
In the journalctl log I find the log again:
Feb 28 19:02:50 raimund traefik-pod[1201]: Error: starting container f33cfc49f91e1eace5b7df4ae4b5bdad267eaf2e68bfc3e258f94795efce5be2: IPAM error: requested ip address 172.21.0.2 is already allocated to container ID cd0 28c1693e14c1e2d2f5c4fc55fad9ff8c673320c8f06657f021bfb94813ecb
And from the podman event logs I can see that it tries to spin up that container several times after the reboot and the IP is assigned to the first container during that:

2025-02-28 19:02:20.848801245 +0100 CET pod stop 2065722cb3b563b10a9013734b432e28b7b5ec8d44f10254c1ab2c7309e0e9fb (image=, name=systemd-traefik) 2025-02-28 19:02:21.700513855 +0100 CET container cleanup 424f524431ec155c11d60151c9d06a9529a71fde4fc24e239af3d5e22f5b44d1 (image=localhost/podman-pause:5.3.2-1737979078, name=systemd-traefik-infra, pod_id=2065722cb3b563b10a9013734b432e28b7b5ec8d44f10254c1ab2c7309e0e9fb, io.buildah.version=1.38.1, PODMAN_SYSTEMD_UNIT=traefik-pod.service) 2025-02-28 19:02:21.709004599 +0100 CET pod stop 2065722cb3b563b10a9013734b432e28b7b5ec8d44f10254c1ab2c7309e0e9fb (image=, name=systemd-traefik) 2025-02-28 19:02:21.809290351 +0100 CET pod stop 2065722cb3b563b10a9013734b432e28b7b5ec8d44f10254c1ab2c7309e0e9fb (image=, name=systemd-traefik) 2025-02-28 19:02:22.86438961 +0100 CET container remove 424f524431ec155c11d60151c9d06a9529a71fde4fc24e239af3d5e22f5b44d1 (image=localhost/podman-pause:5.3.2-1737979078, name=systemd-traefik-infra, pod_id=2065722cb3b563b10a9013734b432e28b7b5ec8d44f10254c1ab2c7309e0e9fb, PODMAN_SYSTEMD_UNIT=traefik-pod.service, io.buildah.version=1.38.1) 2025-02-28 19:02:22.868384655 +0100 CET pod remove 2065722cb3b563b10a9013734b432e28b7b5ec8d44f10254c1ab2c7309e0e9fb (image=, name=systemd-traefik) 2025-02-28 19:02:50.041149988 +0100 CET system refresh 2025-02-28 19:02:50.163062391 +0100 CET container create cd028c1693e14c1e2d2f5c4fc55fad9ff8c673320c8f06657f021bfb94813ecb (image=localhost/podman-pause:5.3.2-1737979078, name=systemd-traefik-infra, pod_id=f6c4fbf4bb3f989b7d4a641f7f702ec4ff2afe713ef47028070fc3014f8bcf98, PODMAN_SYSTEMD_UNIT=traefik-pod.service, io.buildah.version=1.38.1) 2025-02-28 19:02:50.166389538 +0100 CET pod create f6c4fbf4bb3f989b7d4a641f7f702ec4ff2afe713ef47028070fc3014f8bcf98 (image=, name=systemd-traefik) 2025-02-28 19:02:50.302480493 +0100 CET pod stop f6c4fbf4bb3f989b7d4a641f7f702ec4ff2afe713ef47028070fc3014f8bcf98 (image=, name=systemd-traefik) 2025-02-28 19:02:50.406200398 +0100 CET container remove cd028c1693e14c1e2d2f5c4fc55fad9ff8c673320c8f06657f021bfb94813ecb (image=localhost/podman-pause:5.3.2-1737979078, name=systemd-traefik-infra, pod_id=f6c4fbf4bb3f989b7d4a641f7f702ec4ff2afe713ef47028070fc3014f8bcf98, PODMAN_SYSTEMD_UNIT=traefik-pod.service, io.buildah.version=1.38.1) 2025-02-28 19:02:50.413477199 +0100 CET pod remove f6c4fbf4bb3f989b7d4a641f7f702ec4ff2afe713ef47028070fc3014f8bcf98 (image=, name=systemd-traefik) 2025-02-28 19:02:50.51995611 +0100 CET container create b285ae815ca6309c6e5caf0c08da22a3bbf591832ca0867c83fa08d14aaaa182 (image=localhost/podman-pause:5.3.2-1737979078, name=systemd-traefik-infra, pod_id=0b05e1bc5810e09b955b515ee06828c81f3de0a8ddd92c92c519ae86ff4ee10a, PODMAN_SYSTEMD_UNIT=traefik-pod.service, io.buildah.version=1.38.1) 2025-02-28 19:02:50.523852293 +0100 CET pod create 0b05e1bc5810e09b955b515ee06828c81f3de0a8ddd92c92c519ae86ff4ee10a (image=, name=systemd-traefik) 2025-02-28 19:02:50.598741832 +0100 CET pod stop 0b05e1bc5810e09b955b515ee06828c81f3de0a8ddd92c92c519ae86ff4ee10a (image=, name=systemd-traefik) 2025-02-28 19:02:50.695611256 +0100 CET container remove b285ae815ca6309c6e5caf0c08da22a3bbf591832ca0867c83fa08d14aaaa182 (image=localhost/podman-pause:5.3.2-1737979078, name=systemd-traefik-infra, pod_id=0b05e1bc5810e09b955b515ee06828c81f3de0a8ddd92c92c519ae86ff4ee10a, PODMAN_SYSTEMD_UNIT=traefik-pod.service, io.buildah.version=1.38.1) 2025-02-28 19:02:50.707000765 +0100 CET pod remove 0b05e1bc5810e09b955b515ee06828c81f3de0a8ddd92c92c519ae86ff4ee10a (image=, name=systemd-traefik) 2025-02-28 19:02:50.804475722 +0100 CET container create f33cfc49f91e1eace5b7df4ae4b5bdad267eaf2e68bfc3e258f94795efce5be2 (image=localhost/podman-pause:5.3.2-1737979078, name=systemd-traefik-infra, pod_id=b2a431ad556b8f026d47e1bbe0fc5d107fd5f488c7b9d0890bdfbac5ae68dd2d, PODMAN_SYSTEMD_UNIT=traefik-pod.service, io.buildah.version=1.38.1) 2025-02-28 19:02:50.807763864 +0100 CET pod create b2a431ad556b8f026d47e1bbe0fc5d107fd5f488c7b9d0890bdfbac5ae68dd2d (image=, name=systemd-traefik) 2025-02-28 19:02:50.874889972 +0100 CET pod stop (image=, name=systemd-traefik) 2025-02-28 19:02:50.973270725 +0100 CET container remove f33cfc49f91e1eace5b7df4ae4b5bdad267eaf2e68bfc3e258f94795efce5be2 (image=localhost/podman-pause:5.3.2-1737979078, name=systemd-traefik-infra, pod_id=b2a431ad556b8f026d47e1bbe0fc5d107fd5f488c7b9d0890bdfbac5ae68dd2d, PODMAN_SYSTEMD_UNIT=traefik-pod.service, io.buildah.version=1.38.1) 2025-02-28 19:02:50.979721789 +0100 CET pod remove b2a431ad556b8f026d47e1bbe0fc5d107fd5f488c7b9d0890bdfbac5ae68dd2d (image=, name=systemd-traefik)

Now I also see the root cause for the failing start
Feb 28 19:02:50 raimund pasta[1109]: Couldn't set IPv4 route(s) in guest: Invalid argument Feb 28 19:02:50 raimund traefik-pod[1094]: Error: starting container cd028c1693e14c1e2d2f5c4fc55fad9ff8c673320c8f06657f021bfb94813ecb: setting up Pasta: pasta failed with exit code 1: Feb 28 19:02:50 raimund traefik-pod[1094]: Couldn't set IPv4 route(s) in guest: Invalid argument

With this info now I will try the workaround mentioned in this issue #22197. Thank you for your very fast reply and assistance to help me find the root cause 🌝

@Luap99
Copy link
Member

Luap99 commented Mar 3, 2025

The other issue is closed and it only describes the symptom of why an early start of the unit fails.

Here the seems to be clearly an issue with allocating an ip address but never freeing it on the error path again. Which means the allocation got leaked forever which is a real bug, the nest restart should not fail because of this.

@boerniee
Copy link
Author

boerniee commented Mar 5, 2025

Thank you for fixing it that fast @Luap99!

Beside of that reassigning of the ip problem there seems to be another problem on my machine because it is calling pasta with a wrong paramter. Most likely this isnt a bug but I dont know what it is.
For now I've got it working every restart by adding this to the files:
ExecStartPre=timeout 30 sh -c 'while [ "$(cat /sys/class/net/enp114s0/operstate)" != "up" ]; do sleep 1; done'
So for me it looks like the default added
Wants=podman-user-wait-network-online.service After=podman-user-wait-network-online.service
is not working correctly for me.
Do you have a tip for that because that way it feels like a dirty workaround 🙈

@Luap99
Copy link
Member

Luap99 commented Mar 5, 2025

Unfortunately the work arounds are all horrible. Even that podman work around is far from perfect, #24796. Most likely in your case it means that the network-online.target is ready for even before your actual network is fully online then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. network Networking related issue or feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants