Grep not actually reproducible on Arch but do we have a bigger issue?
kpcyrd
kpcyrd at archlinux.org
Fri May 1 22:35:26 UTC 2026
On 5/1/26 11:56 PM, cen wrote:
> Hi,
>
> today I found an interesting case of time based non-reproducibility.
>
> Arch rebuilderd has grep as reproducible: https://reproducible.archlinux.org/
> api/v0/builds/787931/log (build was in 2025)
>
> Mine is not: https://rebuilderd.xpam.pl:2096/api/v1/builds/416039/log
>
> Diffoscope shows differences in translations. It seems that translations are
> fetched from a remote location during bootstrap
>
> and it just so happens that remote translations have changed since last year and
> thus introduced a diff.
>
> The obvious fix is that translations should not be dynamically fetched but it
> got me thinking more generally.
This is an ongoing issue with GNU software, there's more details on the page
that tried to address some of this:
https://archlinux.org/todo/unstable-gnu-translations/
The core issue is that their git repository does not contain everything you
need, when building from git it downloads the latest translations from some
network server.
This problem started showing up in Arch Linux when we started implementing this
RFC, in response to the Jia Tan XZ autotools incident:
https://rfc.archlinux.page/0046-upstream-package-sources/#transparency
For many GNU projects, the pre-processed tarballs contain both a snapshot of the
git repository (that we can trace through the git tag), and a snapshot from the
translations server, which we can not.
To work around this, we now specify both the git tag, and the pre-processed
tarball as source code inputs, taking the source code from git, and the
(versioned) translation files from tar.
You can see there's quite a difference between git and tar (also which distro is
using what):
# grep-3.12.tar.xz
https://whatsrc.org/artifact/sha256:2649b27c0e90e632eadcd757be06c6e9a4f48d941de51e7c0f83ff76408a07b9
# grep.git#tag=v3.12
https://whatsrc.org/artifact/sha256:9543190d9ca2201ea46fddaeb39031a0acde1f6aa4351a72f33ef3455e6dd41e
# diff between them
https://whatsrc.org/diff-right-trimmed/sha256:9543190d9ca2201ea46fddaeb39031a0acde1f6aa4351a72f33ef3455e6dd41e/sha256:a1df6ede939dccc06abe879ca55dd1f793042e077165938cf63b26648343a06a
The Arch Linux grep PKGBUILD has been fixed in this commit:
https://gitlab.archlinux.org/archlinux/packaging/packages/grep/-/commit/e7a24b27b5b9a0d3c25c8e140e7f5ba6b6d80bd0
But there has been no release since.
> Perhaps rebuilderd needs a feature where GOOD packages are also periodically
> rebuilt in exponential back-off style and compared against current upstream
> build and also our last GOOD build. This would confirm whether a package is
> reproducible if built in a short time window but also help uncover longer time
> window issues that are currently hidden.
>
> This won't solve the issue entirely but at least uncover them eventually.
That would indeed help discover problems like this, +1.
> Such edge cases will also cause weird issues like all rebuilders agreeing that
> package is GOOD but repro-treshold failing.
repro-threshold only acts as a rebuilderd client, if you configure "I want two
parties to confirm they could reproduce the build server output from the alleged
source code", and two if the rebuilders you've configured as trusted could
indeed reproduce the binaries immediately after they've been released,
repro-threshold would accept the package as valid - unless there's a collusion,
the security guarantees still hold.
The package eventually becoming unreproducible is "only" a problem if you later
want to prove this to an auditor (or yourself), but can't.
> Another feature that might be useful would be to add a list of rebuilders repro-
> treshold style to rebuilderd instance and upon package being marked GOOD, daemon
> could fetch hashes via API from other independent instances and if there is a
> mismatch, flag such packages for further inspection. Right now, even if I
> independently managed to match the exact reproducibility % of another rebuilder,
> I have no idea if the package list is the same between instances.
I did this manually in the past using the *-repro-status[0][1] tools. :) Right
now you need to run the tool multiple times, there's no flag (yet) to query
multiple at the same time and compare results.
[0]: https://github.com/archlinux/arch-repro-status
[1]: https://github.com/kpcyrd/debian-repro-status
cheers,
kpcyrd
More information about the rb-general
mailing list