Is it possible to make ext4 images reproducible even after filesystem operations ?
Jean-Philippe Ouellet
jpo at vt.edu
Fri Jan 24 04:59:42 UTC 2025
Context for folks I've added to CC: The topic of reproducible ext4
filesystem image generation came up on the reproducible builds mailing
list. I was aware of your prior art, and figured I would bring it to
the attention of those working on reproducibility, both to assist the
OP in their pursuit, and moreover in the hopes of further
reproducibility validation / testing / any fixes to the approach you
all have taken.
On Wed, Jan 22, 2025 at 3:49 PM David A. Wheeler via rb-general
<rb-general at lists.reproducible-builds.org> wrote:
>
> > On Jan 9, 2025, at 12:22 AM, Adithya.Balakumar at toshiba-tsip.com wrote:
> >
> > Hi All,
> >
> > I am working towards reproducible builds for a project that I am involved in. We use a few ext4 partitions in our disk images and I am trying to make the ext4 filesystems reproducible.
>
> I'm a little concerned with this as a goal, at least if your goal is to detect & counter builds that don't do what they claim to do.
>
> Different versions of operating systems will generate different bit images for a given partition, and of course, not everyone uses ext4. If you must, I imagine you should userspace tools (over which you have complete control & can fix version numbers), then run them to make changes in a sequential order.
>
> For most cases I can think of, they should be compared file-by-file and dir-by-dir, ignoring the filesystem. But your use case may be VERY different from what I have in mind.
>
> --- David A. Wheeler
Along exactly these lines,
NixOS uses tooling to construct ext filesystem images entirely in
userspace, which, to quote a comment from their integration thereof
[1]:
> `make-disk-image` has a bit of magic to minimize the amount of work to do in a virtual machine.
>
> It relies on the LKL (Linux Kernel Library) project [2] which provides Linux kernel as userspace library.
Specifically, they use the `cptofs` tool [3] from that project, which
is a convenient single-command "just manipulate the filesystem image
in this file to add the contents from this directory", in a manner
straightforward for use from build scripts, etc.
I'll note that LKL also includes a FUSE filesystem that calls into the
same linux-as-library in userspace filesystem implementation, for more
fine-grained interactive manipulation of filesystem images in a manner
which, one would suppose, ought to be able to be more controllable
than "whatever the kernel you're currently running on will do".
In the NixOS use case, I believe the primary motivation was to be able
to build full filesystem images without needing to expose authority to
the build sandbox to do things like mount filesystems, and without the
authority of needing to expose KVM, and without the overhead of
needing to run unaccelerated [tcg] qemu to achieve similar. Even if
not directly for reproducible builds (though the authors may well have
had that in mind as well), it still aligns the dependencies of
filesystem manipulation to be normal userspace tools that can be
controlled and versioned just like any other tools, in the manner
David rightly recommended above.
I have not validated myself whether the tooling as implemented in
NixOS unconditionally produces reproducible results today, but it's a
much more sound starting point than implicitly relying on the behavior
of the filesystem implementation of whatever kernel you happen to be
running on, which is unlikely to be within the control of your build
system, and which is IMO unwise to make assumptions about with respect
to forward format stability.
None of the above ideas are mine, I'm just pointing you at prior art
by other fine folks I happen to be aware of.
Hope that sets you on a useful path. If you do end up finding
reproducibility issues, particularly with the above approach (which,
as far as I'm aware, would be the most promising approach if you want
general reproducible ext filesystem images), then fixing them in the
aforementioned tooling would also benefit Nix, and be most appreciated
:)
Adding some folks who have worked on the above to CC, for awareness &
if they should wish to add anything:
- Dan Peebles, who was the one who did the initial work to make use of
use LKL in NixOS' image-building infrastructure, in [4] with
discussion at [5].
- Ryan Lahfa, who documented the approach and made the process more
deterministic [reproducible], by doing things like using fixed
(instead of random) UUIDs/GUIDs for parititions, filesystems, etc. in
[6] with discussion at [7].
- Jörg Thalheim, who provided review of the above, worked on LKL's
cptofs tool in a way which also serves reproducible builds (explicit
resulting file ownership within the disk image independent of wherever
they're being supplied from for copying), and was incidentally the
last to touch the cptofs tool in the LKL project.
- lassulus, who maintains disk image generation tooling for NixOS.
Regards,
Jean-Philippe
[1]: https://github.com/NixOS/nixpkgs/blob/936f4e016d49cbc8086961732927a4c297ab7c49/nixos/lib/make-disk-image.nix#L2-L24
[2]: https://github.com/lkl/linux
[3]: https://github.com/lkl/linux/blob/master/tools/lkl/cptofs.c
[4]: https://github.com/NixOS/nixpkgs/commit/f1708a9d7d79e2bf2961fc648625578b23b3460f
[5]: https://github.com/NixOS/nixpkgs/pull/24964
[6]: https://github.com/NixOS/nixpkgs/commit/22adcaa4491dde18442a234252e1d7ed8c098672
[7]: https://github.com/NixOS/nixpkgs/pull/207038
More information about the rb-general
mailing list