[rb-general] Regarding "Zero Install" manifests
Anders Björklund
anders at ecsit.se
Sat May 6 13:35:18 CEST 2017
Ximin Luo wrote:
>> Oh, when I asked my question I got the impression that there was
>> no standardized output format (that would contain any checksums
>> etc)
>>
>> Looking at the docs, I saw only generic explanations but no
>> formats: https://reproducible-builds.org/docs/checksums/
>> https://reproducible-builds.org/docs/embedded-signatures/ So that
>> is why I gave an example of such a format that does exist ?
>>
>>
>> Looking at
>> https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles it seemed
>> rather specific to Debian and I didn't see any contents ?
>>
>
> Ah right, I understand now. Right, Debian buildinfo files do not
> contain checksums of the build-dependencies nor of
> packages-installed-at-build-time right now, for unfortunate technical
> reasons. I've been meaning to figure out how to do this properly and
> then file a bug to the dpkg maintainer, thanks for the reminder.
Think that was beyond the suggestion, but good idea nonetheless.
Then you only have to make those lists compatible with all the
other package management formats, and you are good to go... :-)
We spent quite some time with all this in Smart and in PackageKit,
but I don't think there ever was (nor will be) a universal standard.
In the end (2012?), it seemed more like we "agreed to disagree".
> (Of course, they contain checksums of the actual source and binaries
> being built.)
>
> However, just the concept of adding checksums is not such a
> sophisticated nor complex concept, so I am not sure that looking at
> how other people have done it, is worth the time (given other stuff
> that needs to be done). Or did you spot any particular "gotchas" or
> insightful implementation tricks that you could tell us about?
It's not terribly complex, I think the basic problem was with the
deterministic sorting of the input (i.e. same problem as always).
Then there were some problems with "strange" characters in the
paths themselves, and portability issues like symlinks perhaps.
I thought the base-32 was pretty neat, but then again I myself
preferred base-32-hex and everyone else truncated the base-16.
The petty problem being "solved" was that the sha256 checksums
were so long (compared) that they started line-breaking almost...
The end algorithm was not so complex (at least not in Python):
https://github.com/0install/0install/blob/b2.3/zeroinstall/zerostore/manifest.py
The current version is now being implemented in Objective-CAML:
https://github.com/0install/0install/blob/master/ocaml/zeroinstall/manifest.ml
Just run "0store manifest", and it will show the format itself.
>> The idea was for a single format that would describe the binaries.
>> Wouldn't hurt if it was something like how git describes the code
>> ?
>>
>
> It wouldn't hurt no. Eventually it would be good to try to
> standardise and unify different packaging standards. This is a very
> hard task though, and in the meantime it's not clear to me what the
> benefit is, if we only have unified buildinfo files but they are
> still produced and consumed by different distros' incompatible
> packaging systems.
Well, the packaging and the compression was mostly for transport.
After that, the payload and the metadata/headers were the same.
There was so very much time spent arguing cpio vs tar, or xar vs ar,
that we lost track of the end goal. So now it's more about "defacto".
> We've already established that one must verify the *whole package*
> for the most confidence; what benefit do you see for users to verify
> each individual file as well as the *whole package*? If you are
> talking about the kernel making sure system files haven't been
> tampered with, that is a separate issue and can be done at
> installation time after verifying the whole package, without needing
> to carry checksums for specific files.
Well, just like with the git format the whole package is just the
sum of the contents. So the need to checksum each file is just on
the way to get the checksum of the whole. This is so that when you
need to compare two things, you can just compare their checksums ?
>> But nowadays I'm mostly using Docker images anyway, I suppose that
>> is _another_ standard for describing the binaries... (e.g. the
>> OCI)
>>
>> They just recently changed to a content-addressable format,
>> though.
>> https://github.com/moby/moby/wiki/Engine-v1.10.0-content-addressability-migration
>>
>
>>
> I'm not familiar with Docker, could you explain this in some more
> detail?
Well, you have nameless images (identfied by a sha-256 checksum)
and then you have tags that give those more friendly names to use.
So if I pull "debian:jessie", it is currently image "054abe38b1e6".
("sha256:054abe38b1e6f863befa4258cbfaf127b1cc9440d2e2e349b15d22e676b591e7")
If you are familiar with git tag and git objects, it should be similar ?
Slightly overkill for a single package, but gets the job done I suppose.
> Buildinfo files not only describe a binary, it describes *how it was
> built*. This is the important part; it means people can try to
> reproduce the build. I'm not aware that Docker has an equivalent.
> Also in this field (i.e. with related tools that try to recreate
> "images") they often use the term "reproducible" in a slightly
> different (weaker) way - they mean a semi-reproducible build
> environment, but the images may or may not be bitwise-identical if
> you run the build twice. The reproducibility of the build environment
> is seen as the primary positive characteristic.
>
Right, this *is* very important. I think the Dockerfile is the
equivalent, but just like with a regular Makefile it doesn't
give any guaranteees of a reproducible build. Just the possibility.
It does give you all the options to use "fixed" inputs, though.
* https://docs.docker.com/engine/reference/builder/ (Dockerfile)
> By contrast, we use "reproducible builds" to mean a semi-variable
> build environment, that *despite the variations*, generate
> bitwise-identical end results. The primary positive characteristic is
> the bitwise-identical *end result*.
Yeah, and this is important work and especially the bitwise-identical
has definitely not always been the case. Similar, but not identical...
That is why I think it's a nice add-on to all the other capabilities,
and having all of them brings us closer to the end goal that I want.
i.e. simple to fetch (and install), simple to build (and tweak)
Now, I need to go see how you managed to solve the relocation problem.
Unix always had this historical problem of encoding prefix *everywhere*.
We had BinReloc (from autopackage.org), but it never seemed to catch on.
ZeroInstall itself came from a tradition of AppDirs, similar to Mac app.
* http://rox.sourceforge.net/desktop/AppDirs.html
This was also one of the major reasons why pkg2zero didn't really work ?
Available binaries either always used /usr (or so), or even worse /home.
/Anders
More information about the rb-general
mailing list