Two questions about build-path reproducibility in Debian

John Gilmore gnu at toad.com
Tue Mar 5 16:08:21 UTC 2024


>> But today, if you're building an executable for others, it's common to build using a
>> container/chroot or similar that makes it easy to implement "must compile with these paths",
>> while *fixing* this is often a lot of work.

I know that my opinion is not popular, but let me try again before we lay
this decision to rest.

In avoiding fixing directory dependencies, you can move the complexity
around, but in doing so you didn't reduce the complexity.

Our instructions for reproducing any package would have to identify what
container/chroot/namespace/whatever the end-user must set up to be able
to successfully reproduce a package.  Will these be the same for every
package, for every distro, and for every other environment in which we
want to inspire reproducibility?  Do we need to add those constraints to
the Linux Foundation's Filesystem Hierarchy Standard?  Do we need to add
them to the buildinfo files?

Ideally the tools that ordinary people traditionally use to reproduce
one, such as dpkg-buildpackage or rpmbuild, will have been improved to
do the container/chroot setup automatically.  Otherwise, naive users
will have to figure out what a container is or why it is necessary for
them to grok this obscure environmental thing in order to tell if their
binary package was tampered with or not.  Will they always have to build
software as root, because chroot doesn't and can't work for ordinary users?

If we punt this, there will be an ongoing flow of "my package doesn't
build to the same binary, somebody must be 0wning me" emails from people
who do the obvious thing like type "make" and "cmp".  Do we want
successful reproducibility to depend on setting up servers and virtual
machines and web-servers and databases and build farms and CI-queues and
such?  Yes, to reproduce a whole distro, reproducibility has to WORK
there, but does it have to DEPEND on that complex infrastructure?

I'm an old Unix guy and so are millions of end-users and sysadmins.
Containers are a recent Linux thing.  Namespaces ditto.  I still have
never found a use for containers; I tried using Docker for something and
was bemused to discover that it could calculate all kinds of stuff, but
none of the output of the calculation could come back into my ordinary
Linux filesystem (without some kind of obscure per-invocation JCL-like
configuration setup), so I stopped trying to use it.  Another time, I
tried booting an on-disk, installed copy of Ubuntu inside a virtual
machine, so I could keep running an older service that's hard to port
forward, while migrating the rest of my machine to a newer Ubuntu
release.  VM/360 could do that decades ago, but I discovered that that
use-case is not well supported in the Linux vm tools and documentation,
so I gave up on that too.  There are more things in heaven and earth,
Horatius, than spending all of your time doing sysadmin.  These
newfangled tools are just not as well rounded as the stuff that's been
well understood in Unix since the 1970s or 1980s, like "directories".
If only seventeen experts in the world can figure out if a package has
been tampered with, we will have labored mightily but not done much to
improve computer security.

Also recall what pains the full-source bootstrap people are having to go
through after some imho foolish decisions were made about depending on
modern C++ features inside core tools like gcc and gdb.  Reproducible
builds should make the underlying software LESS dependent on the
particular configuration of the build environment; that's kind of the
point.

>>>                  ... it makes reproducibilty from around 80-85% of all
>>> packages to >95%, IOW with this shortcut we can have meaningful reproducibility
>>> *many years* sooner, than without.

If we move the goal posts in order to claim victory, who are we fooling
but ourselves?  I'd rather that we knew and documented that 57% of
packages are absolutely reproducible, 23% require SOURCE_DATE_EPOCH, and
12% still require a standardized source code directory, than to claim
all 95% are "meaningfully reproducible" today.

	John
	


More information about the rb-general mailing list