Paper on reproducible Docker images: "Docker Does Not Guarantee Reproducibility"
Akihiro Suda
suda.kyoto at gmail.com
Wed Jan 28 03:10:33 UTC 2026
> - docker - currently unusable, messes up some timestamps when exporting
> layers, not respecting SDE on every level
This is not true.
You can use the rewrite-timestamp option to export the layers with
SOURCE_DATE_EPOCH
https://docs.docker.com/build/exporters/image-registry/
2026年1月28日(水) 8:09 cen <imbacen at gmail.com>:
> I've been dabbling in this topic for the past year, weak on theory but
> have some practical experience.
>
>
> The results of the first study do not surprise me at all, unless you
> construct the Dockerfile to be reproducible, it won't magically be
> reproducible and most downstream developers don't consider it.
>
> Just from personal experience, most of my colleagues haven't even heard
> about the reproducible builds in general, let alone about reproducible
> docker images. :)
>
> From that aspect, trying to reproduce the hub or ghcr seems like a
> futile experiment.
>
> Bit-for-bit reproducibility is achievable if you strictly follow most of
> these steps:
>
> - the first and obvious condition is that your project/binary that you
> are packaging is reproducible (most Dockerfiles probably already fail
> because of this step)
>
> - pin FROM to sha256, use multi stage builds, consider using distroless
> image for final
>
> - use a reproducible distro as builder base (e.g. Debian with snapshot
> repos turned on)
>
> - normalize all COPY commands with --chown and --chmod (from build step
> to final step)
>
> - if you install any additional packages in the final step, clean up
> after package manager (logs, cache) and normalize fs timestamps (touch
> -d "@${SOURCE_DATE_EPOCH}")
>
>
> If you miss any of these steps, your image won't be bit-for-bit
> identical. There is a chance it could be semantically identical though
> (diffoci diff --semantic).
>
>
> When it comes to tooling, ranked from best to worst in actually
> producing bit-for-bit identical images, based on my experience:
>
> - kaniko with SOURCE_DATE_EPOCH and --reproducible flag
>
> - podman with --timestamp "$SOURCE_DATE_EPOCH" --build-arg
> SOURCE_DATE_EPOCH="$SOURCE_DATE_EPOCH"
>
> - docker - currently unusable, messes up some timestamps when exporting
> layers, not respecting SDE on every level
>
>
> Unfortunately kaniko is currently in maintenance mode (abandoned by
> google and picked up by chainguard) but it works and doesn't require
> root so it's perfect for CI pipelines.
>
>
> I am not sure how big of an issue the cosign situation is.
>
> I can build the same image in CI and locally, push them to two different
> registries and obtain the same digest on both. That seems reproducible
> to me even if not strictly locally.
>
>
> Best regards,
>
> Klemen/cen1
>
>
> On 25/01/2026 20:48, kpcyrd wrote:
> > Dear list,
> >
> > I found this in my news feed and wanted to share:
> >
> > - https://arxiv.org/pdf/2601.12811
> > - https://dl.acm.org/doi/10.1145/3736731.3746146
> >
> > For people reading along who are not super familiar with the topic,
> > note there's a distinction between "Docker image" and "Dockerfile":
> >
> > - the Docker image is the compiled artifact
> > - the Dockerfile is a file with build instructions
> >
> > The Docker image is what you get out of `docker build`, but since this
> > is essentially just a tar file you could also use something like
> > apko[0] to generate them. From what I understand this is a fairly
> > straight-forward way to repack your binary, without having to involve
> > yourself with namespaces, kernel capabilities and base images.
> >
> > At that point you only need to worry about reproducible builds for
> > your executable.
> >
> > [0]: https://github.com/chainguard-dev/apko
> >
> > The Dockerfile is what most people use to build their containers, this
> > technology also notably doesn't have a dependency lockfile like you
> > are used to with modern programming language package managers.
> >
> > This is also what the paper mostly (but not exclusively) focuses on.
> >
> > Lastly, there's also another problem[1] that I see very rarely talked
> > about - if you can build your Docker image on two different computers
> > with bit-for-bit identical outputs, this still does *not* mean you can
> > independently authenticate the contents of a container registry.
> >
> > The image is only fully "built" after it has been published to the
> > registry, since the manifest file is being re-written by the registry
> > (in an undefined/unspecified way). This is, in my opinion, the biggest
> > problem in the Docker/container ecosystem, the other ones we can work
> > around by switching from `docker build` to different tools if we have to.
> >
> > [1]: https://github.com/sigstore/cosign/issues/2516 (2022)
> >
> > ---
> >
> > I would love to get some input on this, especially if I got anything
> > wrong or if there has been progress on authenticating the content of
> > e.g. hub.docker.com (or ghcr.io for that matter).
> >
> > The authors of the paper are also most likely subscribed here (hi!).
> >
> > Very interested,
> > kpcyrd
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20260128/203c762c/attachment.htm>
More information about the rb-general
mailing list