[Git][reproducible-builds/reproducible-website][master] vienna2025: Add D1 workgroup notes

Thu Oct 30 11:47:58 UTC 2025


Robin Candau pushed to branch master at Reproducible Builds / reproducible-website


Commits:
553702a3 by Robin Candau at 2025-10-30T12:24:01+01:00
vienna2025: Add D1 workgroup notes

- - - - -


7 changed files:

- _events/vienna2025/agenda.md
- + _events/vienna2025/agenda/d1-buildsandboxing.md
- + _events/vienna2025/agenda/d1-distributedverification.md
- + _events/vienna2025/agenda/d1-languageecosystems.md
- + _events/vienna2025/agenda/d1-localreproducibility.md
- + _events/vienna2025/agenda/d1-lowlevelverification.md
- + _events/vienna2025/agenda/d1-rbtooling.md


Changes:

=====================================
_events/vienna2025/agenda.md
=====================================
@@ -40,17 +40,17 @@ Participants are encouraged to sit with those who they have not yet met or engag
 14.00 Collaborative Working Sessions
 
 * Distributed verification
-** NOTES: https://pad.riseup.net/p/rbsummmit2025-d1-distributedverification-keep
+** [NOTES](/events/vienna2025/agenda/d1-distributedverification)
 * Language ecosystems
-** NOTES: https://pad.riseup.net/p/rbsummmit2025-d1-languageecosystems-keep
+** [NOTES](/events/vienna2025/agenda/d1-languageecosystems)
 * Low-level verification
-** NOTES: https://pad.riseup.net/p/rbsummmit2025-d1-lowlevelverification-keep
+** [NOTES](/events/vienna2025/agenda/d1-lowlevelverification)
 * Local reproducibility and user perception
-** NOTES: https://pad.riseup.net/p/rbsummmit2025-d1-localreproducibility-keep
+** [NOTES](/events/vienna2025/agenda/d1-localreproducibility)
 * State of RB tooling
-** NOTES: https://pad.riseup.net/p/rbsummmit2025-d1-rbtooling-keep
+** [NOTES](/events/vienna2025/agenda/d1-rbtooling)
 * Build sandboxing
-** NOTES: https://pad.riseup.net/p/rbsummmit2025-d1-buildsandboxing-keep
+** [NOTES](/events/vienna2025/agenda/d1-buildsandboxing)
 
 15.45 Closing Circle
 


=====================================
_events/vienna2025/agenda/d1-buildsandboxing.md
=====================================
@@ -0,0 +1,47 @@
+Build sandboxing
+
+== Summary ==
+
+1. Almost everybody uses isolation for builds, but the details vary a lot (VMs, containers, VMs + containers, raw namespacing, userspace syscall substition, MACs).
+   Isolation matters for dependency detection, dependency minimization,but conversly dependency sharing, isolation of builds, security obvs., resource control.
+2. Network might or might not be allowed. One option is to allow network access in some phases, but require a fixed hash of the downloaded artifact (Nix/Guix solution).
+3. Bootstrapping is an extreme case of dependency minimization.
+
+== Log ==
+
+Gabor: everybody doing RB has some isolation in place.
+
+Why are we doing is? unknown sources, foil autodetection, prevent spillover between builds, minimize dependencies that are used during build, share dependencies/state between builds and minimize, e.g. glibc version
+
+separate hosts, VMs, namespaces / jails, +selinux optionally
+VMs is not enough, because it provides isolation from host, but is not a minimal environment,  easier and faster to do local builds
+Justin: library os layer to abstract OS differences, maybe ready for next RBS
+        recompile software to substitute system calls with function calls into the library
+        gvisor is similar, the new thing is more portable
+        rr, dettrace
+
+Jochen: Some packages want to load kernel modules during build… (bup)
+nvidia drivers are needed after the build for some things
+
+Gabor: should we substitute syscalls?
+       apparmor? e.g. for a database that is spun up for tests…
+
+Debian packages have nocheck, but not always honoured, and the test dependencies leak into build results and increase the system surface that is accessed during package builds.
+
+Some test stuff could/should be moved out of the build phase.
+Debian: upstream tests in the build phase, then downstream integration tests, also piuparts
+
+Gabor: should resource constraints be introduced into the build? Should this be enforced? This happens to some extent automatically with VMs and containers. Not really a question of BR.
+
+Gabor: Should builds be allowed to access the network? Debian yes, Fdroid no. Proxies can be used. Nix and Guix have two types of derivations: one, hash of the result, fixed output delivaration, e.g. a downloaded tarball or repo checkout, two, a real build with type one as inputs, but no network access. Then the question of network access doesn't matter. Except when the source might go away at some point.
+
+It is more economic to build a system which allows BR for a limited time, e.g. when stuff is available to download from the CDN.
+
+For Nix/Guix the build is just anything that reproduces the output hash, allowing caching. Next step is to abstract hardware, so that builds can be done without access to specific hardware types.
+
+For nix any user belonging to a group can trigger builds.
+
+Jochen: bootstrapping? Use a shell script to bootstrap for a new architecture. Bootstrapping is an extreme case of dependency minimization. In Guix, 400 byte bootstrap binary + 25MB guile driver. Debian is interested in cross-arch boostrapping, because new arch every year or so, but the set of base packages changes every time, so hard to reuse the process. Maybe fewer new architectures after Risc-V.
+
+Jochen: reproducibility of cross and native builds is good for trust.
+


=====================================
_events/vienna2025/agenda/d1-distributedverification.md
=====================================
@@ -0,0 +1,108 @@
+Distributed verification
+Notes
+
+Session 1: 
+
+# First Activity: Write down problem statements
+
+Julien: disributed RB monitoring #scale #infratructure
+
+Julien: trustless binary distribution #scale #infratructure
+        - get rid of trust / the opposite of martin said about changing who you trust over time
+        - Michael: having to pick someone to trust is a hard thing to do
+                   it is hard to do so in a way that is not absolute or muddy
+                   > wanting confidence without having to pick a point of failure
+        - Julien: I don't want to have users pick who to turst because it sounds complicated
+
+Michael: tooling and infra to support multi party reproducibility verification at scale #scale
+        - Michael: we want to fund this
+
+Holger: distributed verification as the second step after making 90+ % things of reproducible #fromPossibleToUsefulOrActionable
+        - explore difference between useful and actionable
+        - make everyone benefit by default (not just a process you seek out)
+        - get away from manual verification
+
+Marc: RB creates resilience across a diverse set of build tools/diversity of build environments  #security #notRelyOnASingleTool
+       - problem of monocultures?
+
+Herve: as a user of a lib, I want ot make sure I can do a simple fix #constraintOnSolution
+       - make it sure it stays possible to
+
+Martin: I want to be able to change who I trust over time
+       - Michael's reframe: want to be able to reason about my roots of trusti (and how they change over time) -> make tampering exponentially expensive for attackers
+       - reducing the number of roots of trust?
+
+Julien: make roots of trust transparent / easy to users
+
+Mark: I want to be able to trust a published artifact because someone I trust reproduced it
+      -  Michael's reframe: I want to be able to choose who I trust
+
+Michael: Reproducible build cost by sharing the load in a pool #cost #practicality
+     - everyone reproducing everything is not the solution
+     - contribute the pool of reproducibility / benefit from the pool of reproducibility
+     - pick whom I trust
+    
+Holger: if the russians, the cinese and the US agree, the build output is probably fine
+    - kabal of enemies: PKI infrastructure
+    - Apple reproducing Microsoft's binaries / debian rebuilding fedora & fedora rebuilding debian
+
+Holger: distros reproducing is silver, users reproducing is gold
+    - enable anyone to verify that source and binary match up
+    - Michael's devils advocate question: people at the RB summit are not users
+    - broad pool of people participating / it should be easy to be part of the pool / as easy as torrenting a file
+ 
+
+Marc: improve security by linking source and artifacts
+    - source by itself does not help if there is no link between artifact and source
+    - self-generating provenance
+
+Nicolas:
+    - want to make sure that multiple trusted builders agree about output
+
+Holger: we need tools for comparing rebuilder results
+
+Justin: we need a way to to revocations of signatures without revoking keys
+        - Martin: I think only using intuitionistic / constructive logic is sufficient
+        - What about saying "oops I produced a lot of wrong results by mistake"
+
+Justin: we need non-repudiation
+
+Justin: how do we figure out how much consensus is enough and surface that to the end-user?
+
+Michael: debugging failures to reproduce? #tooling
+
+kpcyrd: trust not collude
+       - justin: explicit collusion that is surfaced is OK <- delegation
+       - Martin: freerider problem
+
+kpcyrd: consensus is subjective / it depends on who you choose to listen to
+
+## First summary
+
+* make it easy for people downstream to benefit / participate
+* tooling
+* have flexibility and ability to reason about roots of trust
+* trustless binary distribution
+* resilience through diversity / use fractal-shaped roots of trust to build resilience
+* reason across a distributed set of verifiers
+
+## Areas
+
+* Trust
+* User Benefits
+* Secure Resilience
+* Tooling
+
+## Interesting points of tension
+
+* consens vs everyone picks freely who they trust
+* ... what else?
+
+
+Taxonomy:
+* builders
+* verifiers
+* user?
+* maintainer?
+* policies?
+


=====================================
_events/vienna2025/agenda/d1-languageecosystems.md
=====================================
@@ -0,0 +1,62 @@
+Markdowning by: Timo
+
+Language ecosystems
+
+## Takeaways
+- How to find the source?
+- How to find the build instructions?
+- How to make it easy for developers and rebuilders to make the packages reproducible?
+
+## Notes
+
+- Taking lessons learned from one ecosystem and apply to others
+- Java rebuilds / maven gradle and more
+- Python and npm rebuilds
+
+Some issues people have already encountered:
+- Finding the right commit
+    - There might be multiple commits across multiple repositories (multirepo)
+    - There might be multiple projects in a single repo (monorepo)
+- Finding the source material
+    - Intended fields for linking the source is not always used for the actual source
+- Finding the build instructions
+- attached signatures
+- "poor deployment hygiene", i.e. publishing irrelevant files like DS store or vscode directories
+- Split of internal and public repos, where you publish from an internal repo which may have slight differences
+- unreproducible automation of build processes
+- cache files
+- finding the correct build tool in ecosystems that have multiple build tools available
+- tooling that does not solve existing reproducibility problems, even if solutions are known, so the developers have to create their own implementations of these solutions
+- non-declarative bulid specifications (e.g. setup.py)
+- custom build scripts
+
+- Nondeterministic obfuscation?
+- Nondeterministic minification? Are they reproducible even if you know the exact build tool versions? Or are they generally non-deterinistic?
+- Perhaps more general, optimization techniques for size or for performance, transformations like compilation, transpilation, ...
+- On-the-fly patching java versions during the build process, apparently python sometimes does it as well.
+
+- Sometimes build processes create local commits, use this as the commit sha from which the artifact was produced, and then publish it, but never push the respective commit into the public VCS.
+
+- Lack of automation (or "weird" automation) creates intransparent publishing processes.
+- Missing link between source and artifact.
+
+- What trust do we put into a successfully rebuilt project? Do we assume the repository is the correct repo for future versions as well?
+
+- Sometimes there is the same code in different locations, so it's hard to find what is the "real" source.
+
+- Developers often don't care about reproducibility so they don't give rebuilders a lot of hints on how to build their package.
+- Developers have no incentive to make reproducibility happen because they don't see any benefit.
+- Should reproducibility be enforced?
+- Perhaps automation, which may be a good way to facilitate more reproducible tooling, would be an incentive that makes things easier for developers.
+
+-> what would be a good incentive for devs to want to use reproducible builds?
+
+- big players might be able to make the process more reproducible, like people developing the build tools, so that developers don't have to care about it.
+
+- Should perhaps sources have a metadata field pointing to the canonical location of the artifact?
+
+- Sometimes for builds the very specific build version is important, but some other time it's fine to do it in some range of the build tool, and the less specific you can be the better you can scale up.
+- How would you identify which version is acceptable?
+
+- Perhaps the definition needs a bit of work so we can better differentiate between the environment, source and build instructions in order to make clear guidelines how to provide all of these three as a developer.
+


=====================================
_events/vienna2025/agenda/d1-localreproducibility.md
=====================================
@@ -0,0 +1,38 @@
+Local reproducibility and user perception
+
+* How can I check the provenance of the software on my own machine?
+* What does it even mean to show RB status to the user?
+* Arch has a checkbox
+* F-Droid's Reproducibility Status link
+  https://verification.f-droid.org/packages/org.fdroid.fdroid/
+
+
+1. How to give users choices based on RB?
+2. What does RB mean even?
+3. RB is impossible, you can only succeed in trying at the moment
+
+Example for 3: CPU instruction sets can change builds, newer CPUs might make binary diff
+
+* Users are probably most interested in that binary matches source.
+
+* for most users, RB is really only part of the brand, and has no other meaning
+* Arch setup is that builds are flagged, then users can repro themselves locally
+
+* RB could be represented as an action rather than a status: run this build process, click to read the diff, etc.
+
+track source tarballs
+https://whatsrc.org/artifact/sha256:5dc926f8306473c33082fc4fdd3356207e5874f91c00c0d76125f26ce35bbe1b
+
+
+* limited orgs care about RB
+* limited amount of people care about RB
+* when the general FOSS ecosystem is already RB enough, others will adopt because its easy and necessary to maintain a trusted brand
+
+* lots of distros don't care about RB because it does not fit in with their business model
+
+* bringing in "trust" to the conversation kills the possibility to improve RB
+
+
+takeaways
+* comms for RB to users can be improved
+* definition for reproducibility can be clarified by letting distros define how they do it


=====================================
_events/vienna2025/agenda/d1-lowlevelverification.md
=====================================
@@ -0,0 +1,33 @@
+Low-level verification
+Notes
+
+Embedded systems typically require cross-compile. 
+
+Embedded systems have hosts, which usually change after their life-cycle (10-20 yrs). How to ensure the synchronization of host information before and after?
+
+What’s the minimal reproducible sets (e.g.: compiler flags / etc.)?
+
+What’re challenges for compression / size limitations?
+
+What writable FS images do RB need?
+
+How to make RB meaningful for embedded systems? Attestation?
+
+Shim: Microsoft signed a small boot-loader as the root of trust. 
+
+    Reduce shim size
+
+
+How far does UKI satisfy RB?
+
+    Kernel RB: depends on config. (commands / etc. need to be attached to repo)
+
+    Signing key problem
+
+    Load third-party user modules 
+
+
+    Device issue maybe matters maybe not..
+
+
+Trust compiling is hard


=====================================
_events/vienna2025/agenda/d1-rbtooling.md
=====================================
@@ -0,0 +1,64 @@
+State of RB tooling
+
+reprotest
+it builds pacakge twice and runs diffoscope
+TODO:
+teach about more build types
+make into a github/CI action
+ALSO:
+expand it to also do variance testing, such as: toolchains, time and place, flags and settings, single thread vs multi thread build
+
+diffoscope
+big files make it go OOM
+output can be hard to read
+which flags do we pass to diffoscope?
+TODO:
+output of all the flags, into an html output, with check boxes in that output to show/hide pieces mirroring the CLI flags
+
+trydiffoscope
+slower than running diffoscope locally
+TODO:
+print out the command line arguments used to produce the output
+
+MKOSI & systemd-repart
+tiny differences between invocations
+reproducible inputs, non-reproducible output
+
+rebuilderd
+easy to setup
+setting up the backends is difficult, different for each backend
+TODO:
+documentation needs work
+one instance & one database & many distros = RULE THE WORLD
+WANT:
+integration across the tools
+store the build logs outside of the database
+stop the log truncation
+some pieces need optimization: queue, data access, storage
+distribute builds based on ahead-of-time resource allocation
+
+reproduce containers
+diffoci will show the differences between containers
+diffoci can also see and compare the layers of a container
+
+package management
+installing A then B vs installing B then A?
+does the date/time of installation matter? (post-install scripts, etc)
+ALSO:
+reproducible ISOs
+reproducibility of firmware for flashing
+check reproducibility status of packages when installing them
+only allow the installation of packages which are reproducible
+
+data sharing
+who?
+format?
+when?
+inputs?
+
+uncategorized
+check reproducibility as part of the upload process in distros or language repositories
+reproducible status of packages, checked recursively
+is the package still reproducible if cross-built
+strip-nondeterminism with more output, to show the nondeterminism in the package
+



View it on GitLab: https://salsa.debian.org/reproducible-builds/reproducible-website/-/commit/553702a332f204b4226d80ab6b2caa389c34b353

-- 
View it on GitLab: https://salsa.debian.org/reproducible-builds/reproducible-website/-/commit/553702a332f204b4226d80ab6b2caa389c34b353
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-commits/attachments/20251030/9fbb9908/attachment.htm>