Introducing: Semantically reproducible builds
kpcyrd
kpcyrd at archlinux.org
Sat May 27 13:24:25 UTC 2023
> It's much easier (and lower cost) for software
> developers to create a semantically reproducible build instead of always
> creating a fully reproducible build.
> Fully reproducible builds are still a gold standard for verifying
> that a build has not been tampered with.
> However, creating fully reproducible builds often require that package
> creators change their build process, sometimes in substantive ways.
> In many cases a semantically reproducible build requires no changes,
> and even if changes are required, there are typically fewer changes
required.
I think semantically reproducible builds is going to be more expensive
in the long run.
diffoscope is only reliable if it reports both files as bit-for-bit
identical (exit code 0). If there are _any_ differences (exit code 1) it
generates a semantic diff to help debug the root cause, but it does not
guarantee a complete diff of every byte (and sometimes there are quite
many bytes missing).
I found that adding "benign" differences can sometimes help to prevent
diffoscope from revealing my malicous changes, because if the semantic
diff is identical it falls back to a binary diff (that would reveal my
backdoor). If I intentionally introduce some benign difference in the
semantic diff it's picking that up as the reason for a mismatch and
moves on (leaving my non-benign changes unreported).
https://twitter.com/kpcyrd/status/1575080558572449792
On top of development cost of a *reliable* semantic diff program you
would also still continously depend on humans for their opinions about
each diff.
> OSSGadget <https://github.com/microsoft/OSSGadget/">
> includes a tool that can determine if a given package is
> semantically reproducible.
> It's still helpful to work to make a package a fully reproducible build.
> A fully reproducible build is a somewhat stronger claim, and
> you don't need a complex tool to determine if the package is fully
> reproducible.
> Even given that, it's easier to first create a package that's
> semantically reproducible, and then work on the issues remaining
> to make it a fully reproducible build.
oss-reproducible only seems to repack source code into different
container formats like zip/tar but doesn't deal with any compilation steps.
As soon you're in a position to manage the compiler infrastrucutre too
(so your binaries are even remotely close to each other) you're usually
in a good enough position to just go for fully reproducible builds.
This is why mostly Linux Distributions are in the reproducible builds space.
---
I think a better investment would be tooling to mimic the environment of
a given github actions worker run, the SBOMs github currently generates
are based on *the source code*, but not *the CI run* that generated my
binaries.
For example, this github actions run generated a binary:
https://github.com/spytrap-org/spytrap-adb/actions/runs/5043916251
Github tells me it was built from this commit:
https://github.com/spytrap-org/spytrap-adb/commit/b8f667bf54f47a8c358f01aad6d027a70a6fb61b
But there is no tooling (I'm aware of) that I can use to setup a build
environment on my own computer that matches the github actions worker of
the specific job that generated this binary.
It would also need to know what version these commands resolved to:
- sudo apt-get install musl-tools
- rustup target add x86_64-unknown-linux-musl
cheers,
kpcyrd
More information about the rb-general
mailing list