Introducing: Semantically reproducible builds

Mon May 29 03:15:10 UTC 2023

On Sat, 27 May 2023 15:24:25 +0200, kpcyrd <kpcyrd at archlinux.org> wrote:
> I think semantically reproducible builds is going to be more expensive 
> in the long run.

I think my intended use case is really different from what you're expecting.
In my use case, the "expense" is irrelevant.

I'm primarily trying to deal with the case where the developer has decided
to *not* provide a reproducible build, and I have to estimate the likelihood
of it being maliciously built (presumably as a part of decideing whether or not
the package is safe to install). I'm primarily thinking of
applying this process to mostly-unmanaged repositories
like npm, PyPI, and RubyGems, *not* to managed repositories like
most Linux distributions' repositories.

In this case, it doesn't matter if "semantically reproducible builds"
are more expensive in the long run. I *cannot* make the developer
provide me a reproducible build (I can beg, but that's not the same thing).
I'm trying to make good decisions with the information I have,
not the information I *want* to have.

The threat model is a little different, too. The assumption isn't that
"it is impossible for these differences to cause damage".
The assumption is that "the original source code was benign,
reasonably coded, and did not do damage". The question is,
is this non-reproducible
package likely to have been generated from it, even though it's
not a reproducible build?"

Here's an example that might clarify the threat model.
It's possible that a
program could look for ".gitignore" and run it if present.
The source code repo might not have a .gitignore file,
but the malicious package added .gitignore and filled it with
a malicious application. That would cause malicious code to
be executed, but it would also be *highly* suspicious to
run a ".gitignore" file (that's *not* what they are for), so
it's reasonable to assume that the source code didn't do that.
If an attacker can insert a file that *would* cause malicious code
to execute in a reasonably-coded app, then that *would* be a problem.
"What's reasonable" is hard to truly write down, but a
whitelisted list of specific filenames seems like a reasonable place
to start.

Sure, ideally everything would have a reproducible build.
Since that day isn't here, what can we do to take piecemeal
steps towards that?

--- David A. Wheeler