What should be the proper practice to manage `.dsc` files on Reprepro?

Yaobin Wen yaobin.wen at minevisionsystems.com
Fri May 27 16:26:04 UTC 2022


Hi vagrant,

Thank you for the reply! Because we are still learning about the
ideas/practices of reproducible builds, our current practices may not align
with the best/typical practices done by people who are more experienced.
That should be why what I described earlier may confuse you. At the end of
your reply, you asked what I was trying to accomplish. My primary goal is
*learning*: to learn whether we are managing the DSC files and using
Reprepro in the appropriate way so I can help my company to do it better.
By "appropriate," I mean "what other code builders and Debian repository
maintainers typically do," and your reply showed me that we are doing them
in a way that may cause us the current problems.

I'll try to clarify what I said and answer your questions right below your
comments. Hope they help. Thanks again for your time!

Cheers!
Yaobin

On Thu, May 26, 2022 at 12:23 PM Vagrant Cascadian <
vagrant at reproducible-builds.org> wrote:

> On 2022-05-26, Yaobin Wen wrote:
> > In my company, we use *Ubuntu (18.04)* and are practicing reproducible
> > builds. Our code is built into a lot of .*deb* packages using *debuild*
> (and
> > related tools). We have made a lot of effort to make our builds
> > reproducible by following the Achieve deterministic builds
> > <https://reproducible-builds.org/docs/> and Known issues related to
> > reproducible builds
> > <https://tests.reproducible-builds.org/debian/index_issues.html>. We
> have
> > made a lot of progress and are still working on it.
>
> It is great to hear of your work on reproducible builds!
>
>
> > We set up a company-wise *Reprepro server to serve the Debian packages
> > that we build regularly*. We publish both *.deb* files and the
> *GPG-signed
> > .dsc* files to the Reprepro server. BTW, our build system is designed to
> do
> > an "*change-only build*": if a package is not changed since the last
> build,
> > i.e., its *changelog* is not changed, the package is not built again this
> > time. *Their .deb and .dsc files are still added to Reprepro but because
> > these files remain unchanged, Reprepro can successfully "add" them (but
> in
> > fact they are skipped)*. I figured this point may be important to
> > understand why I have my questions below.
>
> I don't follow what you mean by "the package is not built again this
> time" and "Their .deb and .dsc files are still added to Reprepro". How
> can a package be added if it is not built?
>
> I forgot to mention that *we cache the packages we have already built*.
For example, when we build version 1 (v1) of a package, we first put the
results (including at least the .deb file and the .dsc file) to a "build
cache" and then publish it to our Reprepro instance. Then, when we kick off
a build the next time, and this package remains unchanged (i.e., still v1),
we will directly use the cached result and not build it again. However, if
the package has been updated to a newer version, e.g., v1.1 or v2, our
build system will build the package (and cache the new results).


> What's unclear to me is why you're uploading the same .deb and .dsc
> files to the same reprepro repository multiple times... ?


The reason that we upload the same .deb and .dsc files to the same Reprepro
repository might be *because we didn't know that*, as you pointed out at
the end of your reply, "with a Debian-style repository, it's not expected
that you will ever upload the same version of any given object more than
once."

But just try to clarify with more details: In my original email, when I
said "are still added to Reprepro", I meant that we issue the command
"reprepro -b <base-dir> -T deb/dsc -C <component-name>
includedeb/includedsc <distro-name> <path-of-deb/dsc-file>" on the same
files again. Because Reprepro already has them, Reprepro will skip the
inclusion (with the message "Skipping inclusion of '<package-name>'
'<version>' ... as it has already '<version>'."). *None of the files in
Reprepro are changed. We didn't forcibly upload/overwrite (e.g., using `cp`
or `rsync`) any file in the `pool` or any other directory.*

>
> > Although we have solved many reproducibility issues in the .*deb* files,
> *I
> > found the .dsc files were changed when* I rebuilt the packages (by
> deleting
> > the previously built *.deb* and *.dsc* files) so Reprepro refuses to
> > include them and reports the following error:
> >
> > ERROR: '<path-to-dsc-on-build-machine>' cannot be included as
> >> 'pool/<path-to-dsc-in-reprepro>'.
> >> Already existing files can only be included again, if they are the same,
> >> but:
> >> md5 expected: <md5-1>, got: <md5-2>
> >> sha1 expected: <sha1-1>, got: <sha1-2>
> >> sha256 expected: <sha256-1>, got: <sha256-2>
> >
> >
> > *diffoscope* told me the `.*dsc*` files *only differ in their GPG
> > signatures* - the related source tarball (<filename>.orig.tar.gz) and
> > debian tarball (<filename>.debian.tar.xz) *have not changed between
> > builds.*
>
> That's to be expected...
>
>
> > I understand that, because as this SO answer says
> > <https://security.stackexchange.com/a/78958/80050>, the GPG signature is
> > generated using the creation time as an input. I found the issue
> > cryptographic_signature
> > <
> https://tests.reproducible-builds.org/debian/issues/unstable/cryptographic_signature_issue.html
> >
> > that
> > made me think we should not have signed our .*dsc* files, but the Debian
> > Admin's Handbook
> > <
> https://www.debian.org/doc/manuals/debian-handbook/sect.source-package-structure.en.html
> >
> > shows that the .*dsc* files are supposed to be signed by the maintainers.
> > In addition, in the Known Issues list
> > <https://tests.reproducible-builds.org/debian/index_issues.html>, I
> didn't
> > seem to find any issue that's related with the .*dsc* files.
>
> If you want to know which party claims to have built a given .dsc file,
> you need the signatures on them. If you track that information some
> other way, you *could* use unsigned .dsc files...
>
> Or you could re-use the original .dsc files, if all the contents they
> reference are bit-for-bit identical. If you want to store the new ones
> somewhere else as a "proof of having rebuilt it again" you could do
> that, but obviously not in the exact same repository.
>
>
> > *After reading around, I'm guessing my understanding about
> > reproducible builds may not be totally correct, so I want to ask here:*
> >
> >    1. *Should the .dsc files be reproducible, too?* Because Reprepro can
> >    manage .*dsc* files, I've been thinking that .*dsc* files should be
> >    reproducible, but now it seems not?
>
> If you build them in the same build environment, with the same source
> code, they should be reproducible *minus the signatures*, as you've
> noted...
>
> Generally, from a reproducible builds perspective, the .dsc file is
> considered an input to the build process rather than a result of a build
> process. Though admittedly, .dsc files are themselves artifacts of a
> source-only build, so it is a bit of a grey area.
>

It makes total sense to me that the .dsc files should be used as an input.
I should have realized this earlier. I took a glance of these files before
but have never studied them. I'm also aware of the .buildinfo files and I
think both .buildinfo and .dsc should be used as the input for rebuilding
the code. Am I right? But by browsing the Reproducible Builds website
("Tools" page, specifically), I didn't seem to find the tools that use
these two kinds of files. *Are there such tools somewhere?*


>
> >    2. In my case, since my company maintains both .*deb* files and
> .*dsc* files
> >    in Reprepro, if one day we need to build the code of an earlier
> version, we
> >    would inevitably generate different .*dsc* files because of the GPG
> >    signatures. *Am I supposed to publish the .dsc files to the same
> >    Reprepro server that we maintain our regular build?* Because I've been
> >    thinking .*dsc* files should also be reproducible, I've been thinking
> we
> >    should keep using the same Reprepro server. *But now it looks like we
> >    need to prepare a second Reprepro server to hold the packages of the
> >    earlier version.*
>
> So you're looking to be able to recreate your whole repository from
> scratch (maybe from git repositories or some other VCS?) at some future
> date, reproducibly?
>
*Yes, I'm looking to be able to recreate the whole Reprepro repository from
scratch, using the code in git. (Or, more accurately, I want to learn how
to achieve this.)* As I said earlier, we have a company-wise Reprepro
instance. We regularly build the code for day-to-day development
(incrementally with the cache to help reduce the unnecessary build of code
that hasn't changed) and publish the result artifacts to Reprepro that
developers/testers can access. Recently, we wanted to rebuild an earlier
reversion of the codebase for testing. I checked out a new copy of the code
and kicked off the build process without using the cache. Because I didn't
use any cache, the packages that are not built in our day-to-day
environment got rebuilt this time. *When I published the rebuilt packages
of the earlier reversion to the same Reprepro instance, I found the .dsc
files got rejected because of the differences in the GPG signature. This
was the primary motivation I came and asked the questions.*

But your reply made me realize that we may not be using Reprepro or
practicing reproducible builds correctly. *In general, I'm trying to figure
out the appropriate practices for two major scenarios: day-to-day
development and the occasions when we need to rebuild an earlier code
revision.* Right now, we are trying to use one single Reprepro instance and
publish everything onto it. But it looks like we should at least use two
Reprepro instances: one for daily development and one for the rebuilt
historical builds.


> >    3. *How does everyone else maintain their Reprepro server?* Do they
> keep
> >    publishing the build artifacts to the same server after a build? Or
> do they
> >    delete the previously published artifacts before publishing the new
> build?
> >    Or do they even recreate the Reprepro server every time they make a
> new
> >    build?
>
> Typically with a Debian-style repository, it's not expected that you
> will ever upload the same version of any given object more than
> once. This is partly because apt and related tools are designed with
> that assumption in mind, and will behave poorly if you in fact feed a
> repository packages with the same version but different content and then
> try to use those packages in the real world.


> So, still a bit unsure what you're actually trying to accomplish; if you
> spell that out a little more clearly it might be easier to make good
> suggestions.
>
>
> live well,
>   vagrant
>


-- 
Software Engineer
Mine Vision Systems <https://www.minevisionsystems.com/>
5877 Commerce St.
Suite 118
Pittsburgh PA, USA, 15206
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20220527/ee428396/attachment.htm>


More information about the rb-general mailing list