Paper on reproducible Docker images: "Docker Does Not Guarantee Reproducibility"

cen imbacen at gmail.com
Tue Jan 27 23:09:34 UTC 2026


I've been dabbling in this topic for the past year, weak on theory but 
have some practical experience.


The results of the first study do not surprise me at all, unless you 
construct the Dockerfile to be reproducible, it won't magically be 
reproducible and most downstream developers don't consider it.

Just from personal experience, most of my colleagues haven't even heard 
about the reproducible builds in general, let alone about reproducible 
docker images. :)

 From that aspect, trying to reproduce the hub or ghcr seems like a 
futile experiment.

Bit-for-bit reproducibility is achievable if you strictly follow most of 
these steps:

- the first and obvious condition is that your project/binary that you 
are packaging is reproducible (most Dockerfiles probably already fail 
because of this step)

- pin FROM to sha256, use multi stage builds, consider using distroless 
image for final

- use a reproducible distro as builder base (e.g. Debian with snapshot 
repos turned on)

- normalize all COPY commands with --chown and --chmod (from build step 
to final step)

- if you install any additional packages in the final step, clean up 
after package manager (logs, cache) and normalize fs timestamps (touch 
-d "@${SOURCE_DATE_EPOCH}")


If you miss any of these steps, your image won't be bit-for-bit 
identical. There is a chance it could be semantically identical though 
(diffoci diff --semantic).


When it comes to tooling, ranked from best to worst in actually 
producing bit-for-bit identical images, based on my experience:

- kaniko with SOURCE_DATE_EPOCH and --reproducible flag

- podman with --timestamp "$SOURCE_DATE_EPOCH" --build-arg 
SOURCE_DATE_EPOCH="$SOURCE_DATE_EPOCH"

- docker - currently unusable, messes up some timestamps when exporting 
layers, not respecting SDE on every level


Unfortunately kaniko is currently in maintenance mode (abandoned by 
google and picked up by chainguard) but it works and doesn't require 
root so it's perfect for CI pipelines.


I am not sure how big of an issue the cosign situation is.

I can build the same image in CI and locally, push them to two different 
registries and obtain the same digest on both. That seems reproducible 
to me even if not strictly locally.


Best regards,

Klemen/cen1


On 25/01/2026 20:48, kpcyrd wrote:
> Dear list,
>
> I found this in my news feed and wanted to share:
>
> - https://arxiv.org/pdf/2601.12811
> - https://dl.acm.org/doi/10.1145/3736731.3746146
>
> For people reading along who are not super familiar with the topic, 
> note there's a distinction between "Docker image" and "Dockerfile":
>
> - the Docker image is the compiled artifact
> - the Dockerfile is a file with build instructions
>
> The Docker image is what you get out of `docker build`, but since this 
> is essentially just a tar file you could also use something like 
> apko[0] to generate them. From what I understand this is a fairly 
> straight-forward way to repack your binary, without having to involve 
> yourself with namespaces, kernel capabilities and base images.
>
> At that point you only need to worry about reproducible builds for 
> your executable.
>
> [0]: https://github.com/chainguard-dev/apko
>
> The Dockerfile is what most people use to build their containers, this 
> technology also notably doesn't have a dependency lockfile like you 
> are used to with modern programming language package managers.
>
> This is also what the paper mostly (but not exclusively) focuses on.
>
> Lastly, there's also another problem[1] that I see very rarely talked 
> about - if you can build your Docker image on two different computers 
> with bit-for-bit identical outputs, this still does *not* mean you can 
> independently authenticate the contents of a container registry.
>
> The image is only fully "built" after it has been published to the 
> registry, since the manifest file is being re-written by the registry 
> (in an undefined/unspecified way). This is, in my opinion, the biggest 
> problem in the Docker/container ecosystem, the other ones we can work 
> around by switching from `docker build` to different tools if we have to.
>
> [1]: https://github.com/sigstore/cosign/issues/2516 (2022)
>
> ---
>
> I would love to get some input on this, especially if I got anything 
> wrong or if there has been progress on authenticating the content of 
> e.g. hub.docker.com (or ghcr.io for that matter).
>
> The authors of the paper are also most likely subscribed here (hi!).
>
> Very interested,
> kpcyrd


More information about the rb-general mailing list