Reproducible tarballs on Github?
richard.purdie at linuxfoundation.org
Sun Oct 24 09:57:50 UTC 2021
On Sat, 2021-10-23 at 21:41 -0400, David A. Wheeler wrote:
> > On Oct 23, 2021, at 3:23 PM, Arthur Gautier <baloo at superbaloo.net> wrote:
> > I would expect Github to use the tar implementation of git-archive (or
> > libgit2). git-archive is specifically designed to be reproducible.
> I don’t know if it does, but that does seem likely.
> > All I'm suggesting is to checksum the inflated version of the archive
> > and not the compressed one.
> Checksumming the inflated version makes sense to me, so that improved/varying
> compression doesn’t matter (since it produces the same result).
> Sounds like maybe GitHub doesn’t need to change anything.
> If someone thinks GitHub *does* need to change something, I’d like to know
> exactly what practical change is desired.
Yocto Project has struggled with this too FWIW. The tarballs generated by github
are (or were?) dynamically generated and could change checksum over time (as
caches were invalidated and they were rebuilt?). For something like YP where we
list the checksums of the input source archives to validate that the inputs were
the same, this was an issue. As such we only support official tarball releases
of github projects and not the dynamically generated tarballs. We added checks
to try and ensure we don't use the "bad" urls.
I've not heard of this being an issue for us for a while but that could be we
now just don't use the problematic dynamic tarballs. We did raise it with github
and were told they reserved the right to change the output and we shouldn't
depend on checksums like that. They do not use git-archive or certainly didn't
at the time.
More information about the rb-general