Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind mounts in RUN statements are not propagating changes to their source correctly #5951

Closed
solacelost opened this issue Jan 28, 2025 · 17 comments
Assignees

Comments

@solacelost
Copy link

I have observed strange behavior lately with bind mounts misbehaving in RUN statements and it's causing major issues with some of my workflows. I created a minimal reproducer to highlight the issue.

$ ls -halF src
total 0
drwxr-xr-x. 2 james james 21 Jan 28 11:31 ./
drwxr-xr-x. 3 james james 38 Jan 28 11:34 ../
-rw-r--r--. 1 james james  0 Jan 28 11:31 outside
$ cat Containerfile
FROM registry.fedoraproject.org/fedora:41

RUN --mount=type=bind,src=./src,rw=true,target=/src \
    ls -halF /src && \
    touch /src/inside && \
    ls -halF /src

RUN --mount=type=bind,src=./src,rw=true,target=/src \
    ls -halF /src
$ podman build --security-opt=label=disable .
STEP 1/3: FROM registry.fedoraproject.org/fedora:41
STEP 2/3: RUN --mount=type=bind,src=./src,rw=true,target=/src     ls -halF /src &&     touch /src/inside &&     ls -halF /src
total 0
drwxr-xr-x. 1 root root  6 Jan 28 16:35 ./
dr-xr-xr-x. 1 root root 39 Jan 28 16:35 ../
-rw-r--r--. 1 root root  0 Jan 28 16:31 outside
total 0
drwxr-xr-x. 1 root root 20 Jan 28 16:35 ./
dr-xr-xr-x. 1 root root 39 Jan 28 16:35 ../
-rw-r--r--. 1 root root  0 Jan 28 16:35 inside
-rw-r--r--. 1 root root  0 Jan 28 16:31 outside
--> 18d41e10acde
STEP 3/3: RUN --mount=type=bind,src=./src,rw=true,target=/src     ls -halF /src
total 0
drwxr-xr-x. 1 root root  6 Jan 28 16:35 ./
dr-xr-xr-x. 1 root root 28 Jan 28 16:35 ../
-rw-r--r--. 1 root root  0 Jan 28 16:31 outside
COMMIT
--> 2e663a8d9c25
2e663a8d9c2524996127de6619b6b196c604274b3ec7bda1c283b558b877677a
$ ls -halF src
total 0
drwxr-xr-x. 2 james james 21 Jan 28 11:31 ./
drwxr-xr-x. 3 james james 38 Jan 28 11:34 ../
-rw-r--r--. 1 james james  0 Jan 28 11:31 outside

Within the layer where the bind mount is occurring, the changes are saved. After that layer, including outside the container at all as well as any follow-on layers that re-mount the bind mount, the changes are not propagated at all.

I tried to track down the cause, but rapidly ran out of my depth of understanding for how buildah handles bind mounts.

podman info output:

host:
  arch: amd64
  buildahVersion: 1.38.1
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-3.fc41.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 98.11
    systemPercent: 0.57
    userPercent: 1.32
  cpus: 32
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    version: "41"
  eventLogger: journald
  freeLocks: 2048
  hostname: ws.jharmison.com
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.12.10-200.fc41.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 54220419072
  memTotal: 67307999232
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.13.1-1.fc41.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.13.1
    package: netavark-1.13.1-1.fc41.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.13.1
  ociRuntime:
    name: crun
    package: crun-1.19.1-1.fc41.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.19.1
      commit: 3e32a70c93f5aa5fea69b50256cca7fd4aa23c80
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20250121.g4f2c8e7-2.fc41.x86_64
    version: |
      pasta 0^20250121.g4f2c8e7-2.fc41.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 1h 51m 32.00s (Approximately 0.04 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /var/home/james/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/james/.local/share/containers/storage
  graphRootAllocated: 1955146625024
  graphRootUsed: 198258196480
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /var/home/james/.local/share/containers/storage/volumes
version:
  APIVersion: 5.3.2
  Built: 1737504000
  BuiltTime: Tue Jan 21 19:00:00 2025
  GitCommit: ""
  GoVersion: go1.23.4
  Os: linux
  OsArch: linux/amd64
  Version: 5.3.2
@flouthoc
Copy link
Collaborator

Hi @solacelost , I think it happens because buildah reused previous layer and this PR which merged recently should fix it #5691

Could you try the code from upstream ?

@solacelost
Copy link
Author

The issue that I'm seeing is that the files actually do not change at all in the bind source, not that there's some invalid cache being used. Is buildah caching the bind-mounted directory, and only propagating those changes back to the source in the event of a change? That seems unexpected, compared to how a bind mount works on a running container instance.

@solacelost
Copy link
Author

solacelost commented Jan 28, 2025

Freshly compiled buildah from main:

$ ../buildah/bin/buildah version
Version:         1.39.0-dev
Go Version:      go1.23.5
Image Spec:      1.1.0
Runtime Spec:    1.2.0
CNI Spec:        1.1.0
libcni Version:  v1.2.3
image Version:   5.34.1-dev
Git Commit:      042414a056bf3a632b0696fb25d48750578d1ed8
Built:           Tue Jan 28 13:20:15 2025
OS/Arch:         linux/amd64
BuildPlatform:   linux/amd64
$ ../buildah/bin/buildah bud --security-opt=label=disable .
STEP 1/3: FROM registry.fedoraproject.org/fedora:41
STEP 2/3: RUN --mount=type=bind,src=./src,target=/src,rw     ls -halF /src &&     touch /src/inside &&     ls -halF /src
total 0
drwxr-xr-x. 1 root root  6 Jan 28 18:22 ./
dr-xr-xr-x. 1 root root 39 Jan 28 18:22 ../
-rw-r--r--. 1 root root  0 Jan 28 16:31 outside
total 0
drwxr-xr-x. 1 root root 20 Jan 28 18:22 ./
dr-xr-xr-x. 1 root root 39 Jan 28 18:22 ../
-rw-r--r--. 1 root root  0 Jan 28 18:22 inside
-rw-r--r--. 1 root root  0 Jan 28 16:31 outside
STEP 3/3: RUN --mount=type=bind,src=./src,target=/src,rw     ls -halF /src
total 0
drwxr-xr-x. 1 root root  6 Jan 28 18:22 ./
dr-xr-xr-x. 1 root root 39 Jan 28 18:22 ../
-rw-r--r--. 1 root root  0 Jan 28 16:31 outside
COMMIT
Getting image source signatures
Copying blob c9636add9084 skipped: already exists
Copying blob 01796b5e4779 done   |
Copying config 723df8bde5 done   |
Writing manifest to image destination
--> 723df8bde5b6
723df8bde5b60448cca7754ee3508c9adcd89ff86a35200407cdbcaa8d4a1886

Edited to provide more version information

@flouthoc
Copy link
Collaborator

I'll take a look at this issue.

@flouthoc flouthoc self-assigned this Jan 28, 2025
@solacelost
Copy link
Author

Some additional context to show the kinds of things this will impact:
https://gitlab.com/fedora/bootc/base-images/-/blob/main/Containerfile?ref_type=heads

@flouthoc
Copy link
Collaborator

@solacelost Okay my bad i misread the original post.

I think this is expected behavior RUN --mount is supposed to be transient in nature and supposed to be only lasted for one particular RUN instruction, changes from one RUN --mount instruction cannot be propagated to another layer directly, you need to use manual copy.

If you want changes on mount to persist then use --volume with :O flag, that way changes will be available to all layers but will not propogate back to the original volume on host.

@solacelost
Copy link
Author

That is not true. A bind mount should not be a transient volume, it should be a bind mount of a directory from the host into the runtime. Setting it to rw should make that bind mount writable, meaning you can write to that directory and have the changes persist on the host. This was the behavior in buildah as recently as ~2 weeks ago, and that's why the Fedora bootc project relies on that behavior in the Containerfile I linked to be able to publish an OCI archive to the building host's disk, before consuming it in a later layer.

@solacelost
Copy link
Author

Here, I've gone and recompiled 1.37.5 to demonstrate.

$ ../buildah/bin/buildah version
Version:         1.37.5
Go Version:      go1.23.5
Image Spec:      1.1.0
Runtime Spec:    1.2.0
CNI Spec:        1.1.0
libcni Version:  v1.2.3
image Version:   5.32.2
Git Commit:      5fd40b989860984a00f6fc1539ff53caceca1325
Built:           Tue Jan 28 18:32:12 2025
OS/Arch:         linux/amd64
BuildPlatform:   linux/amd64
$ ../buildah/bin/buildah bud --security-opt=label=disable .
STEP 1/3: FROM registry.fedoraproject.org/fedora:41
STEP 2/3: RUN --mount=type=bind,src=./src,target=/src,rw     ls -halF /src &&     touch /src/inside &&     ls -halF /src
total 0
drwxr-xr-x. 2 root root 21 Jan 28 16:31 ./
dr-xr-xr-x. 1 root root 39 Jan 28 23:33 ../
-rw-r--r--. 1 root root  0 Jan 28 16:31 outside
total 0
drwxr-xr-x. 2 root root 35 Jan 28 23:33 ./
dr-xr-xr-x. 1 root root 39 Jan 28 23:33 ../
-rw-r--r--. 1 root root  0 Jan 28 23:33 inside
-rw-r--r--. 1 root root  0 Jan 28 16:31 outside
STEP 3/3: RUN --mount=type=bind,src=./src,target=/src,rw     ls -halF /src
total 0
drwxr-xr-x. 2 root root 35 Jan 28 23:33 ./
dr-xr-xr-x. 1 root root 39 Jan 28 23:33 ../
-rw-r--r--. 1 root root  0 Jan 28 23:33 inside
-rw-r--r--. 1 root root  0 Jan 28 16:31 outside
COMMIT
Getting image source signatures
Copying blob c9636add9084 skipped: already exists
Copying blob 63b4685fd8dc done   |
Copying config d829ef4e9c done   |
Writing manifest to image destination
--> d829ef4e9c50
d829ef4e9c50a06e8a8a1896c794e9cb2cea34638a439ef2b8a0829e7b9e500c

This is the intended, functional behavior. It's almost as if in the current release buildah is using an overlayfs on top of my bind mount and preventing the changes from reaching the host filesystem, as they're supposed to.

@solacelost
Copy link
Author

To make sure that we're clear here, I do not expect the container image layer to contain inside. I expect that file to be written to the host filesystem at the bind source, which is ./src relative to the build context directory, and I expect those changes to live beyond the layer. I expect the layer to have no filesystem changes (except for the other changes that I've come to expect, like /run as mentioned in #5950 ), but to only have the metadata of the layer using ls and touch in a RUN statement. In a multi-stage build, particularly one that uses container tooling to build an OCI archive (like rpm-ostree in the linked example), it doesn't matter that the layer has no changes - it does matter that those changes get written to the host filesystem, which is why we did a bind mount to it.

@solacelost
Copy link
Author

I went to go find an exact reference to this behavior, since I and others have been relying on it for a while. I couldn't find any clarity in the existing buildah docs about the expected behavior, though. I did what Dan always recommends, though, and went to look at docker/moby. Interestingly, their docs say this on using the rw option in a bind mount in a RUN statement:

Allow writes on the mount. Written data will be discarded.

So... that would be in line with the buildah behavior as of 1.38.1 or so, but not in line with how it has been behaving previously.

It looks like a multi-stage build will be physically impossible with an intermediate OCI archive for rpm-ostree composes, then?

@flouthoc
Copy link
Collaborator

@solacelost Yes buildah should behave like docker/buildkit for these instructions but if you want this specific behaviour which you were relying on then maybe we can add a new flag as new feature request to RUN --mount=type=bind,src="",target="",<newflag> which should achieve the required behaviour or you can start using --volume with :O which is an already existing feature.

@solacelost
Copy link
Author

The overlay option flag won't persist the changes to the host source, though, will it? It's my understanding that using --volume this way is deliberately intended to provide an overlay upper, leaving the host directory lowerdir unmodified. That's not the intent with the way I was using it previously, which was to deliberately modify the host filesystem as part of the build.

Perhaps a new flag would be appropriate, or we can simply move away from multi-stage Containerfiles for all-in-one rpm-ostree composes and move to a multi-phased approach instead. I've already done so, since this change, with one Containerfile creating the OCI archive and then a pipeline-driven step to extract it from the image before using the archive in FROM in a dedicated, separate Containerfile.

I'm going to see what @cgwalters thinks :)

@cgwalters
Copy link

I've already done so, since this change, with one Containerfile creating the OCI archive and then a pipeline-driven step to extract it from the image before using the archive in FROM in a dedicated, separate Containerfile.

That's what's done in https://gitlab.com/fedora/bootc/base-images/-/blob/main/Containerfile?ref_type=heads and since it sounds like you're doing custom builds, be sure to track https://gitlab.com/fedora/bootc/tracker/-/issues/32

@solacelost
Copy link
Author

solacelost commented Jan 29, 2025

The way it's working in that Containerfile will no longer work, as it doesn't modify the host filesystem in the bind mount to /buildcontext - meaning that the FROM oci-archive:./out.ociarchive line will fail, as that file will not exist on the host, but only in an overlay that is discarded following the layer in which rpm-ostree is called.

If you look at my reproducer Containerfile and the difference in behavior between 1.38.1 and 1.37.5 (and many earlier versions), you can see that it will break the workflow used to compose bootc bases. You can try it on a fully updated Fedora 41 system to see 1.38.1 in action.

@flouthoc
Copy link
Collaborator

The overlay option flag won't persist the changes to the host source, though, will it?

@solacelost --volume /src:/dest-in-container without :O should allow changes to persists on host and with :O it should allow changes to persists on build across layers but not on host.

@cgwalters
Copy link

I believe this is a duplicate of #5952 (it was filed earlier, but there's a lot more in that issue).

@nalind
Copy link
Member

nalind commented Feb 6, 2025

The discarding behavior should only be affecting locations mounted with type-bind, in Containerfiles to bring it in line with the docs, and on the command line with --mount in hopes that keeping it consistent with the Containerfile behavior will be less confusing going forward.
The behavior for volumes mounts specified using the -v flag on the command line hasn't changed, so -v on the command line can still be used to expose the build context directory and other locations to RUN instructions in the general case, though #5975 is an attempt to get the https://gitlab.com/fedora/bootc/base-images/-/blob/main/Containerfile case working again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants