verifiable source-only bootstrap from scratch

ahojlm at 0w.se ahojlm at 0w.se
Sat Mar 11 12:15:43 UTC 2023


Hello Michael,

On Fri, Mar 10, 2023 at 11:42:56PM +0100, Michael Schierl wrote:
> If I did not misunderstand you, with your definition of "without any
> binary seeds" (i.e. it is sufficient that there are multiple independent
> ways to build the seeds from available systems/software) other
> bootstrappable projects are also "without any binary seeds".

I am not sure which reproducible bootstrap projects without binary seeds
you mean. I did not see any solution including all parts: verifiable
(by independent repeating, iow reproducible), source-only, from scratch.

The "scratch" term may deserve some extra explanation. In this context
"without use of any specific pre-existing binaries", regardless
whether such binary would exist in the context of the building or the
target system. As a counter-example, e.g. reproducible Debian packages
have quite specific expectations of the build environment.

> For example, the stage0 project can be built with either
> - minimal size binary seeds
> - sed binary and xxd binary and a way to invoke them
> - a POSIX shell that supports printf builtin
> - a POSIX shell and printf binary
> 
> depending on which you trust most.

Here, as you indicate, we need to trust something (generally, in
addition to the unavoidable which is the actual hardware to run the final
"production" software).

> In particular, the bootstrap from stage0-posix (X86 Linux) up to gcc
> 4.9.4 can be performed starting from a source folder that only contains
> 
> - subdirectories (permissions 755)
> - relative symbolic links
> - UTF-8 text files (permissions 644) where each line is either a Form
>   Feed or contains only
>   "\a\t\x{0020}-\x{007E}\x{00A1}-\x{017F}\x{03b1}\x{2010}-\x{2026}"
>   (obviously that regex could be made shorter at the cost of patching
>   more source files)
> 
> and a way to execute xxd, chmod +x and chroot.

Technically seen, this is undoubtedly an impressive work.

Unfortunately I can not see stage0-posix as a complete solution for
trustability, nor view its approach as effort-efficient.

It is generally not possible to avoid applying some extra trust while
doing a "from-a-mini-binary-bootstrap".

Also, in reality, one relies on someone else's report "I did that to
the point XXX and the result was the same as given on the project YYY's
web site".

One can only rely on such statement safely, if there are multiple
independent parties reporting the same.

This is the model that VSOBFS builds upon.

Then we do *not* need to repeat the hard work of
from-the-instruction-codes bootstrapping, done by the pioneers over half
the century ago on the (analyzable) hardware of those times.

A better reliability of our result is achieved if we begin at a higher
level, for example ANSI C and Bourne shell, which is possible when we
expect a diversity-backed consensus.

This level (C and sh) is comprehensible for a wide circle of professionals
who do not have to learn the intermediate languages used in the
from-minimalistic-binary-bootstraps. This is a reason to believe that
the extent of verification can be in fact higher than for a bootstrap
from a mini-seed.

> > bootstrap path without involving software with the GNU licenses, because
> > they are too restrictive for certain uses or tastes.
> 
> Debatable. Preferring small software to large one definitely helps with
> the auditing work. On the other hand, if at the end you want to run gcc
> on Linux anyway..., you won't get around auditing it.

Not necessarily gcc and Linux. Some people prefer to stay with tcc or
run clang or say Smaller C, on among others BSD kernels. Gcc and Linux
are very popular but do not define computing as a whole.

> I uploaded the website (start page) to
> <https://ipfs.io/ipfs/QmT2Mo4pcCGSf3iJ6NnU8nFv7yEUiM8mU62ArWbcdikEVn>
> 
> And the archive

Many thanks Michael!

> be careful as the archive is a "tar-bomb", i.e. it extracts its content
> into the current directory and not a subdirectory.

A fair notice.

I view an extra directory level as a dead weight, directory naming is
in the realm of the recipient.

Best regards,
/ an 



More information about the rb-general mailing list