[Git][reproducible-builds/reproducible-website][master] vienna2025: Add d2 working groups notes
Robin Candau (@Antiz)
gitlab at salsa.debian.org
Thu Oct 30 13:16:20 UTC 2025
Robin Candau pushed to branch master at Reproducible Builds / reproducible-website
Commits:
2b787039 by Robin Candau at 2025-10-30T14:16:04+01:00
vienna2025: Add d2 working groups notes
- - - - -
11 changed files:
- _events/vienna2025/agenda.md
- + _events/vienna2025/agenda/d2-community.md
- + _events/vienna2025/agenda/d2-convince-others.md
- + _events/vienna2025/agenda/d2-distributedverification.md
- + _events/vienna2025/agenda/d2-distributedverification3.md
- + _events/vienna2025/agenda/d2-kernel-module-hashes.md
- + _events/vienna2025/agenda/d2-measuringrb.md
- + _events/vienna2025/agenda/d2-native-python-repro.md
- + _events/vienna2025/agenda/d2-rebuilderd.md
- + _events/vienna2025/agenda/d2-successfailure.md
- + _events/vienna2025/agenda/d2-tampering.md
Changes:
=====================================
_events/vienna2025/agenda.md
=====================================
@@ -23,7 +23,6 @@ The 2025 Reproducible Builds Summit will be called to order with a fast-paced ki
10.15 Mapping who is in the room
* Mapping who is in the room
-** NOTES: https://pad.riseup.net/p/rbsummmit2025-d1-mappingprojects-keep
11.00 Break
=====================================
_events/vienna2025/agenda/d2-community.md
=====================================
@@ -0,0 +1,149 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - RB Community
+event: vienna2025
+order: 26
+permalink: /events/vienna2025/agenda/d2-community/
+---
+
+RB Community
+============
+
+
+## high level
+
+- 1: let's build community.
+- 2: how to we carry work on year-round, and how do we get topics to keep advancing with less rediscovery?
+
+
+## discussion
+
+- things that can happen _outside of this summits_:
+ - working groups.
+ - repos where work and notes accumulate.
+
+- centers of gravity.
+
+- is there a repro builds github org?
+ - (yes actually -- empty and unused.)
+ - (wait who owns this?)
+ - the website lives on debian's "salsa" infrastructure.
+ - for which we love their volunteer of services...
+ - but the ease of involvement is a little... could be improved.
+ - will there be objections to this?
+ - ... in our small group? no.
+ - worth raising this but...
+ - seems like everyone values getting in the place that's easiest to access to the most people, and right now... github ranks pretty high for this.
+ - alternatives?
+ - tangled.sh ?
+ - gitlab ? (whose?)
+ - codeberg ?
+ - _the content is git, so we are not critically worried about being forced to move in the future._ Putting it where it's easily engagable _now_ is the priority.
+
+- there are a lot things that could move into a repro-builds github (Or Whatever) org....?
+ - the website!
+ - EVERYONE WANTS THIS TO BE MORE ENGAGEABLE.
+ - there are other examples?
+ - rebuilderd?
+ - reprotest?
+ - diffoscope?
+ - strip_nondet?
+ - S_D_E spec?
+ - build path spec?
+ - ... it's less clear that these need to be radically more engagable in the same way that the website does.
+ - things that are RB specs? YES, let's move all of those.
+ - agreement that step 1, in any case, is the website. it's the most burning situation.
+
+- object level thing that's not git: dns.
+
+- does the community needs to grow? is that really a goal?
+ - analogy: the RB community is similar to telling people to wash their hands before preparing food. Does that need a large organization? Perhaps not.
+ - another way to grow is in increasing alignment instead of increasing numbers.
+
+- "creating places for alignment to happen"
+ - enabling "vote with your feet"
+ - (we have a lot of this within the summit! where do we make this happen outside of the summit, the rest of the year, for even more people?)
+
+- structure and governance...
+ - we would like a larger committee that's involved in the small set of things that do need governance.
+ - example of a thing that needs governance: there are some small updates desired to the CoC....
+ - we can invite comments on this (and we do that)
+ - but things could move more decisively if a steering committee was a clearer thing.
+
+- interesting study:
+ - "contributor ladder" is a concept developed by Drupal...
+ - it has many levels.
+ - it talks about how people on each rung should be helping reach down and pull more people up.
+ - it would be useful to ask people what they're finding challenging for frictional as the move up in such a legible structure.
+
+- there are some organizations that talk more explicitly about sustainability from a monetary perspective.
+ - doing work without sustainable resources is a big recipe for burnout.
+ - -> not addressing with this doesn't make the problem go away.
+
+- CAN WE MAKE WORKING GROUPS? WHY NOT? CAN THEY MEET PERIODICALLY? WHY NOT?
+ - how do you make a working group?
+ - how do you list the working groups for discovery?
+ - -> probably this is just a website page
+ - list the...
+ - mailing list
+ - meeting time and link
+ - things we're really worried about:
+ - enough attendance at a regular cadence.
+ - -> this is a challenge even if we have plenty of engaged people, because a most people have a multiple-loyalty situation: and the other binding (e.g. usually a distro team or similar) is usually stronger.
+ - platform exhaustion -- which venue is the minimum?
+ - mailing lists can be ... mailing lists.
+ - irc has had limited engagement when tried historically by this group.
+ - video platforms? (which?)
+ - note that **different platform types have different virtues** -- text-based mediums are great for agreeing, not so great for disagreeing and for idea thinktank. Video has different affordances. (Not saying either is better: just different.)
+ - things we want to *not* get stuck by:
+ - timezones are hard -- you can't pick an infinitely accessible one -- pick something and do it anyway.
+ - if there's enough demand for a time that favors a different timezone -- that's *great* make *another one*.
+
+- alternative to continuous working groups: "Call for Participation"
+ - blast one of these to the mailing list!
+ - one-time request for time!
+ - focused on some topic space!
+ - this is an option for when a full working group would be a potentially tall ask.
+ - this doesn't require heavy construction in advance -- it's just a community familiarity with the idea that this is a way to ask for engagement.
+ - (arch mentions: we have done this! it works great!)
+ - (example: did this for onboarding package testers. Later: saw explosion of new accounts for package testers!)
+
+- working groups that might be good to have.
+ - RB website and core materials engageability working group.
+ - what it says on the tin.
+ - RB outreach working group.
+ - address issues like where we could make more in-person events (see below).
+ - Diffoscope PR review working group.
+ - help would be welcome!
+
+- getting more in person events...?
+ - events *around other events* is a way to approach this while amortizing the costs that people would experience.
+ - more event types (beyond our summit of high-engagement people)!
+ - e.g. hosting IRL AMA, perhaps at Fosdem or other events that are a good nexus for that.
+
+- do we want to create a more explicit decision process for general reuse?
+ - without this, the default is naive consensus -- which is actually bad: it can only proceed in a *lack of no*, which is very difficult in large groups; and it also lacks a timeout where the decision is resolved... it's very problematic.
+ - can we do this for reuse?
+ - maybe.
+ - in practice? working groups will probably make their own decisions about this.
+ - (if this distills to reusable stuff over time -- great)
+
+summary
+-------
+
+As part of developing a more engageable community:
+- the website is moving to github.
+- we want to form a small working group to do this website migration work (after the summit).
+- this working group will be a live prototype of a "working group" structure, where we're going to have a small, focused group to work on this specific problem.
+- and we'll hope to derive some reusable patterns from that.
+
+We are interested in:
+- incubating that working group concept;
+- incubating *other* way to reach out to each other for engagement -- for example a community convention of "Call for Participation" on the mailing list;
+- and we'd love to have more human engagement situations in other events, e.g. an RB AMA at Fosdem (and things like that).
+
+And in the recursive theme of making progress and keeping it rolling:
+This is stuff we will start.
+And treat as (community structure!) prototypes.
+And roll forward with.
+
=====================================
_events/vienna2025/agenda/d2-convince-others.md
=====================================
@@ -0,0 +1,87 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Convince others
+event: vienna2025
+order: 18
+permalink: /events/vienna2025/agenda/d2-convince-others/
+---
+
+Take on the problem of spreading the word and convince people that should be convinced that reproducible builds are important.
+
+
+# Participants (TODO: self-update full names :) ?):
+
+* Laj /Julien
+* Gabor
+* Alice
+* Andrey
+* Sasa
+* Aman
+* Jarek Potiuk
+* Michael Winser
+
+
+# Initial thoughts !
+
+Audience - and what pain we solve to those audiences without talking to reproducibility
+
+The roles that we can see in the ecosyte:
+
+* App builders - who create : individual maintaienr
+* Distros (composers that rebuild stuff) : Debian/ Suse/ F-Droid
+* Package repos: PyPI/ NPM / F-Droid (multiple roles)
+* Customers & end users: Corporates, SMBs, Individual users
+* Business owners: Proj Managers/Product Manager/ Biz types
+* Science computation on the data: Science community
+* Private and government security focused organizations that are investing into security: Alpha-Omega, Sovereign Tech Agency
+* Governments / regulators / standard bodies: EU Commission, US NSA
+* Developers and contributors, maintainers to other projects using them - various "technical" users of the apps - other open source project, OS Rebuild,
+
+* (!) malicious actors - we do not really need to include them
+
+Roles can be mixed. Conclusions: we need to be read for crisis to happen (inevitable) - it will happen and we should be ready.
+
+# What's next ?
+
+* How we can drivate the interest?
+* Did it happen with XZ utils case? Yes - it did (Google people came to reproducible Summit for example) - but it is not sticky.
+* We need more of those - but in a controlled way.
+* We likely need a chaos-monkey kind of approach where we induce some crisises in a more controlled way.
+* Is audit maybe a solution ? Security is always at the bottom of the pile - certificates ?
+* App builders have a lot of other problems to worry about - we have not a good tooling
+
+# Why do we suck at marketing RB ?
+
+* no turnkey tooling builders can use (*)
+* not clear build attestation status and lack of visibility
+* hard to understand outcomes and benefits (*)
+* no clear and established definition - and commmunication to different users
+* tampering as a worry is something that humans genuinly understand and they might have implicit expectations about stuff "not being tampered with" (example https://en.wikipedia.org/wiki/Chicago_Tylenol_murders which resulted in drugs having seals that make it obvious that they were tampered with).
+* Lack of connecting risk to revenues
+* Lack of terminology/language education - reproducibility is a fancy word but it does not have meaning (*)
+* maturity problem - we are very early stage and we do not have yet right marketing, promotion of what we are doing (*)
+
+# Proposed Strategy and tactics
+
+* We need to work on more "commercial" or "business" way to promote what we want to promote - not necessarily in technical and precise terms, but in a way that can be easily conveyed to wider audience
+
+* The "Software Tamper Protection League" might be a better term to use for example (*)
+
+* We need some understandable goals, language and drive business demand with "Sign up here to be tamper-proof with your software!!!"
+
+* We need to prioritise problems and actors - our goal is to get some people to dring the cool aid of our and drive the solution (*)
+
+* The good solution is when things are prevented, but the problem is that this makes it difficult to "promote" it and have people drink the "cool aid" (*)
+
+* Reduce the toil to become reproducible and normalize the concept. Productisation of the solution space is needed for scaled adoption
+
+* Do we have some good examples - from the past for Developers? There are good success stories Tools for automation onboarding from days to minutes. Tooling to automate things scale.
+
+* Do we have some good examples - from the past for Business and Commercial users?
+
+* How can we promote "Tamper proof" approach better ?
+
+* Introduce tooling that injects "Tampering checks" into the engineering pipelines - on all engineering levels and organisational boundaries - those should be "red-team" kind of audits - auditing the whole supply chain and making it "loud" and showing when the "fence" works - when we get to the crisis (which will happen eventually).
+
+We should not let any good next crisis happens to leverage outputs of such work
+
=====================================
_events/vienna2025/agenda/d2-distributedverification.md
=====================================
@@ -0,0 +1,91 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Distributed verification II
+event: vienna2025
+order: 20
+permalink: /events/vienna2025/agenda/d2-distributedverification/
+---
+
+# Distributed verification II
+
+**Goal**
+Go through notes of yesterday and last year, and distill actionable ideas
+
+**Recap of notes from yesterday**
+
+ * What should a verification system look like?
+ * Should such a system need to form a consensus, or be configurable?
+ * If configurable, it needs sane defaults
+
+
+**Problem**
+You have dozens of rebuilders, with different trust levels, they sometimes fail, sometimes behave maliciously, how can you trust them?
+
+Is this an open consensus problem?
+
+Yes,some specific questions are:
+
+
+ * Costs to set up rebuilders are high?
+ * Should it be a bitcoin like system?
+ * Should it be formal like Certificate Authorities (CA)?
+ * Should it be centralized?
+
+What you use, what you are, what you produce? This problem is saved with slsa/in-toto attestations. Caveat: if someone wants to lie, they can lie
+
+How can you replicate, what's guaranteed in an attestation? Is the attestation self-descriptive enough to reproduce it?
+No, the attestation isn't, who need something else.
+
+How can you use it then to build consensus?
+You need to establish some scheme.
+
+Tools to index and search attestations:
+
+ * Guac
+ * Archivista
+
+in-toto attestation not concerned with rebuilding, but with attesting.
+
+Ecosystems have different tools to allow rebuilds
+
+ * debian: buildinfo
+ * nix: declarative/self-descriptive
+
+
+How do we establish trust in rebuilders?
+
+ * web of trust (doesn't work)
+ * authoritative system (I trust you because I know you)
+ * CA model (formal body)
+
+
+Agree to ignore notes from last year. It is hard to recap them now. Will prepare them in a future session.
+
+Looking at notes from yesterday
+
+**Trust**
+
+"there's a global set of builders a subset of which can be trusted by a verifier"
+The statement misses delegation.
+Delegation: establish trust in curators, to curate list of rebuilders to trust?
+
+There should not be a restriction in how many levels deep you can delegate.
+
+**Resilience**
+
+Secure resilience against
+
+
+ * outages
+ * disagreement
+ * corruption
+ * malicious rebuilders
+
+Diversity is a means to achieve resilience
+How many must agree?
+Depends on ecosystem.
+Threshold must be configurable
+
+How to move forward?
+Capture more properties of the desired verification system and structure them.
+
=====================================
_events/vienna2025/agenda/d2-distributedverification3.md
=====================================
@@ -0,0 +1,85 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Distributed verification III
+event: vienna2025
+order: 23
+permalink: /events/vienna2025/agenda/d2-distributed-verification3.md/
+---
+
+# Evaluation criteria
+
+## Must Have
+
+### Builds must be discoverable along with useful metadata
+
+### Must scale
+
+ * Reduce the build cost of reproducible everything by sharing over a pool
+ * Tooling and infra to append multi-party reproducibility verification scale
+ * Distributed reproducible monitoring #scale #infra
+
+### Must be be resilient against
+
+ * outages
+ * disagreements
+ * corruption
+ * malicious rebuilder
+
+### Must be possible to delegate trust to another party or parties in a limited controlled matter
+
+ * I want to trust the published artifact because someone trustworthy reproduced it
+ * Delegation implies recursion
+ * Understanding web of trust
+ * Users downloading software want to verify that the software has been built by multiple trusted people
+ * "5 out of 7" is possibly good enough
+ * Distributing / sharing trust
+ * Software updater should check that the available update has been built by multiple trusted builders
+ * I want to be able to change who I trust over time and reason about roots of trust
+ * I should be able to list who I trust to build my software
+ * Consensus is subjective
+
+### A rebuilder should be able to indicate what they built in a non-repudiatable way
+
+ * Source and dependencies --> binary
+
+### We need to be able to identify a rebuilder securely such as with a public key
+
+ * Trust and trust management requires identity
+
+### UX is essential especially for non-interested users
+
+ * Making RB actionable/useful to downstream consumers
+ * Making RB benefits easy to access for end users
+ * Trustless binary distribution
+
+## Extra nice to have
+
+### Doing a rebuild should be accessible
+
+ * Distros reproducing is silver, users reproducing is gold
+
+### There should not be properties that discourage institutional diversity in rebuilders.
+
+### Developer experience is important and developers shouldn't be expected care for RB.
+
+ * The system should be invisible to them.
+ * Ideally, developers don't need to update their workflows (much)
+ * Developers shouldn't be surprised by RB results and the system behavior, aka. understandability
+ * As developer of a lib I want to make sure I can do a simple fix.
+
+## Nice to have
+
+### Diverse rebuilder support is desirable
+
+ * Comparing rebuilder results
+ * RB creates resilience by having free choice of build system
+ * Diversity is a means to achieve resilience
+
+### Ideally there is a usable means to report irreproducibility possibly implicitly
+### Ideally closed source / secrets in software should be supported
+
+## Optional
+
+### Support for diverse build inputs would be desirable
+### Ideally there is a way to verify a rebuilder did work
+
=====================================
_events/vienna2025/agenda/d2-kernel-module-hashes.md
=====================================
@@ -0,0 +1,28 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Kernel module hashes
+event: vienna2025
+order: 17
+permalink: /events/vienna2025/agenda/d2-kernel-module-hashes/
+---
+
+LWN article: https://lwn.net/Articles/1012946/
+
+v1: https://lore.kernel.org/all/20241225-module-hashes-v1-0-d710ce7a3fd1@weissschuh.net/
+
+v2: https://lwn.net/ml/all/20250120-module-hashes-v2-0-ba1184e27b7f@weissschuh.net/
+
+v3: https://lore.kernel.org/all/20250429-module-hashes-v3-0-00e9258def9e@weissschuh.net/
+→ ima preparation patches discussed for separate merging, but this didn't happen
+
+1. makes the kernel package build generally easier --> reduce complexity
+2. safety: the signing key could leak if an attacker gains read-only access to the build environment --> reduce attack surface
+3. less entropy needed: no need to generate ephemeral key and signatures, so the build might be faster on builders without a good source of entropy
+
+ * If this is the last remaining bit of entropy in the kernel build process, that is a good argument to remove it for builders to be become completely deterministic.
+
+4. v4: Maybe we can optimize the load time checking a bit further: instead of linear search in the array of hashes, order the hashes by prefix during kernel build and then do a trie based search during load time - O(log n) instead of O(n) during load verification.
+5. Might make loading modules faster compared to signature verification because hashing of the module has to be done anyways during verification, and we no longer need the ECDSA/RSA signature verification.
+
+5000 modules × 256/8 bytes/module = 150 kB of memory for the hash storage
+Total storage reqs are not changed, but are moved from the individual modules to the kernel. Unfortunately this means that memory is always loaded.
=====================================
_events/vienna2025/agenda/d2-measuringrb.md
=====================================
@@ -0,0 +1,134 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Measuring RB
+event: vienna2025
+order: 21
+permalink: /events/vienna2025/agenda/d2-measuringrb/
+---
+
+Measuring, Comparing, Organizing of reproducibility issues
+==========================================================
+
+
+## why are we here
+
+- prioritizing projects to improve
+ - sheer package count is fine but do all incremental steps matter equally? Probably not!
+ - download counts? can easily be misleading (one company can download their own package a million times via CI or etc).
+ - package impact by downstream uses?
+ - some ecosystems have stuff like debian popcomp
+ - different kinds of prioritization:
+ - rank by how difficult to fix?
+ - rank by how impactful to fix?
+
+- explaining
+ - understanding what basis we can have for comparison between ecosystems.
+ - it can be a problem for communicating our successes (and our todos and distance ahead) when definitions of reproduciblity don't have a clear rubrik.
+
+- taxonomy & categorization
+ - finding common problems... so we can find common solutions effectively.
+ - would be great to be able to look at some decision tree for what kind of problem you're having and see a flowchat for what a probably solution is.
+
+- comparablity
+ - and for competition / incentivizing.
+ - for example, distros comparing to each other ... would be good so there is a collective incentive to improve!
+ - a public leaderboard would be cool -- right now we can't realistically compare things that clearly.
+
+## discussion
+
+- guidance information vs success criteria
+ - we think it is useful to have tools that try to determine what reproducibility failure reasons are.
+ - -> useful for deciding what to work on next
+ - -> useful for guessing what to do next
+ - break it down by factors
+ - (diffoscope already does some of this!)
+ - if it's by post-build analysis: this is heuristic -- meaning we have to be VERY careful how literally we take this. (e.g., don't!)
+ - (diffoscope is here.)
+ - if it's by controlled build environment variation, and a variation shifts between bit-identical and fail -- that's pretty good knowledge. (but this can be a bit expensive to run: and it definitely requires full automation of the build before trying this.)
+ - (reprotest is here!)
+ - don't confuse guidance info with success!
+ - we DON'T want to get to a situation where an organization says something like: "we're 97% reproducible" but what they really mean is "100% of our packages are not bit-for-bit identical... but it's just timestamps, we swear! it's all low priority!".
+ - (see further notes below about how we feel that "severity" cannot be usefully evaluated!)
+
+- about comparability...
+ - is the buildinfo made before the first build? or is it just a log?
+ - we like it better if it's made first, because it's a clearer path that the intention of the original builder has been stored.
+ - but maybe the log approach is fine... if hash (see following).
+ - is the buildinfo including a hash, or just package names and version names?
+ - -> big impact: without a hash, then the "build environment" has to be considered to encompass all the services involved in resolving those names to real content!
+ - -> are ecosystems actually including this extended "build environment" management in their models?
+ - things a hash in the buildinfo saves from from:
+ - availability!
+ - servers can disappear.
+ - some of this has very nearly happened recently...
+ - storage can also be trimmed...
+ - if you have a hash, you can get content from somewhere else.
+ - security!
+ - a compromise of the name resolution service becomes irrelevant if hashes are distributed with buildinfo.
+ - safety against mere accidents!
+ - and this is very real and practical: time since this last happened? (a couple days?) (see event on mailing list where a package name accidentally mapped to different content in two different builder staging environments...)
+ - also good to let distros document what parts of controlled variation they _don't_ care about.
+ - e.g. a distro can say they don't care about cross-platform artifact convergence.
+ - e.g. some distros have decided they expect builds to converge on reproducible artifacts *even if* the build path is varied... and other ecosystems have decided they don't care.
+ - controlled variation in general can be seen as _additional_ to reproducibility:
+ - if something can be reproduced with *less specific* build environment: we love it!
+ - if something can be reproduced with fairly specific build environment: as long as that's clearly stated as part of the build environment... the defn of reproducibility we have today says yep, that's repro.
+
+- the value of comparability can... vary.
+ - for programming language ecosystems that have only one major build tool:
+ - the value is a little more limited.
+ - cladistic tree stuff can still help them, if they want to improve.
+ - but we can't manifest competition out of thin air :)
+ - for ecosystems with multiple build systems (python; linux distros vs each other; etc):
+ - comparable metrics lets users choose systems based on how much they respect that system's emphasis on reproducibility.
+ - comparable metrics lets distros meaningfully compare their successes.
+
+- (discussion of wish to revisit the "achieve deterministic builds" page on the website)
+ - there's a list of issues there -- great
+ - it's not very categorized -- could be improved
+ - some things are problem descriptions, some things are solutions -- could be improved
+ - possible inputs:
+ - paper by Goswami (study on npm) has some cladistics that might be useful
+ - "SoK: Towards Reproducibility of Software Packages in Scripting Language Ecosystems"
+ - check out Fig 1.
+ - Timo's research
+ - debian has a bunch of bug categories (unclear how much this gets specific to debian's toolchain, but still probably lots of good reference)
+ - ismypackagereproducibleyet (bernhard effort)
+ - more?
+
+- places that comparisons haven't done us so great...
+ - some distros are including tons of packages that... are ancient origins. Before the goal was engaged with! So this gives them "worse" scores -- in a way most would agree is not meaningful.
+
+- package prioritization: sometimes an ecosystem has made a pick *for us*.
+ - e.g. Arch has an overall repro stat... but *also* they have distributed a container image that is *100%* repro, which includes a subset of packages they call a reasonable core set.
+ - some ecosystems have concepts like "build-essential" (by some name or other).
+ - (is this interesting? Maybe.)
+ - (for the purpose of choosing what to prioritize next? limited use.)
+ - (doesn't seem to pop up in language package manager ecosystems (what's "core" is a typically an even less clear question there).)
+
+- rubrik vs tagging...?
+ - categories that we understand of repro failures will refine over time...!
+ - so: it may be better to use tags for understood problems with a package... vs using a rubrik about what's successful... because the latter won't be revisited as much and as usefully.
+
+- would it be useful to have a "CVE-like" system for repro failure reasons?
+ - a coordinated, public, shared resource.
+ - "CWE" -- common weakness enumeration.
+ - several attributes could be useful:
+ - describing *what* packages have known issues.
+ - tagging what kinds of known problems they have -> can hint towards remediations an ecosystem could apply to that package.
+ - possibly even known exact remediations could be shared.
+ - (more?)
+ - universal agreement in the session: do *not* try to invent a score for "severity".
+ - -> this depends on context of usage, so getting prescriptive about it makes no sense.
+ - (and nobody likes what happened in CVSS system for this -- numbers are invented and vibe-based.)
+ - very very hard to say even something like "a timestamp variation could never be exploited by an attacker".
+ - -> several examples in the wild where very small info leaks are used to trigger larger more subtle pieces of malicious logic.
+ - there are examples of code changes that are 1 character in source, and *1 bit* in binary output... that are the difference between an ssh server giving you remote root, or not. So the *size* of any diff is clearly inadmissible as a severity heuristic: 1 bit can be everything.
+
+- some package ecosystems have phases of their distribution pipeline that aren't reproducible... even when the contents ultimately *are*.
+ - we're kind of okay with this -- provided that it's clear, and _there exists_ a clearly measurable point.
+ - e.g. signatures tend to cause non-reproducibility in practical ways -- but if the pipeline *had* a phase where there's a reproducible artifact, that's still okay.
+ - e.g. distributing stuff with a gzip wrapping... and that's not considered in the reproducibility -- probably not a problem, as long as the artifact inside was observed.
+ - critical check: the unreproducible part (e.g. gzip header or etc) *must not be **visible*** by the time the package is installed or used.
+
+
=====================================
_events/vienna2025/agenda/d2-native-python-repro.md
=====================================
@@ -0,0 +1,35 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Native python repro
+event: vienna2025
+order: 22
+permalink: /events/vienna2025/agenda/d2-native-python-repro/
+---
+
+# native: python code with compiled extensions and/or non-python processed assets
+
+Backgrounds: oss-rebuild project, C-code compiled into platform-dependent wheels, pypi registry not having enough traceability, experience in reproducibility of 700 dependencies of AirFlow, is it necessary rebuild wheels for a custom build of Python, sysadmin perspective of not knowing the provenience of installed software, Fedora packaging experience.
+
+Stuff causing reproducibility problems:
+ - bytecode files – Fedora add-det tooling
+ - shared libraries, e.g. libxml, expected to be preinstalled at deployed systems
+ - postgress client driver, libpq
+
+Is python metadata a separate problem to build reproducibility?
+Metadata in pyproject.toml does not contain enough information to describe all the details needed to rebuild pytorch reproducibly.
+conda is rebuilding packages for different architectures.
+
+To reproduce pypi wheels, maybe build on five popular distros and compare outputs?
+
+pypi is moving from gpgp signatures to trusted publishing. This implies building in the public cloud infra, so the environment is known. Can we describe this in metadata if only some common build environments are used.
+
+Trusted publishing is about publishing, not build reproducibility.
+
+pyx – registry of stuff being rebuilt when appropriate. Reproduciblity is easier to achieve within that ecosystem. Google has some similar product.
+
+Should pypi do rebuilds of packages? The resource requirements would be huge. No interest at this point.
+
+pypi sdists might not match the binary wheels. No checking in place or even requirements. We may need source reproduciblity before build reproducibility. Pypi should do a better job of tracking source provenance, structured field.
+
+On pypi top 4000 packages → 98% of downloads.
+
=====================================
_events/vienna2025/agenda/d2-rebuilderd.md
=====================================
@@ -0,0 +1,59 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Rebuilderd
+event: vienna2025
+order: 19
+permalink: /events/vienna2025/agenda/d2-rebuilderd/
+---
+
+rebuilderd
+
+history of rebuilderd - started at 2019 summit
+architecture of rebuilderd - past and present
+ is fedora support in main? not merged yet
+alpine? two problems - they don't document their build env, no archive/snapshots of packages
+ kp considered running their own archive, but was not feasible
+arch mixes archive.org (dependencies on their own servers, redirects to archive.org for the rest)
+fedora's build and rebuild pipelines differ
+no opensuse support for rebuilderd
+how is the source organized? how are new distros added?
+ monolithic repo, multiple binaries
+ rebuildctl connects to rebuilderd, rebuilderd dispatches to rebuilderd-worker
+ {debian,arch}-repro-status to get the state of your installed system
+ arch has a tool to enforce update policies (which rebuilders, how many, etc)
+ written in rust
+ which distros/releases/etc you rebuild are configured in a TOML file
+goals:
+ list problems with current features
+ blockers for pull 184?
+ new tools? policy enforcers, etc
+ system rebuilder vs entire distro rebuilder?
+ scheduling improvements for rebuilderd
+ postgres database
+ we don't keep artifacts right now, but maybe a useful option for BAD?
+delays of builds?
+ 404'ing buildinfo?
+ subcode for FAIL?
+ upload buildinfo before artifact upload? (short answer no, reason is debian politics)
+index of buildinfo files? yes, on debian. has its issues
+-security doesn't get buildinfo files until point releases (we want to highlight the issue)
+pulling data for britney (debian testing/unstable migration and comparision, large payloads)
+individual API keys with permissions/claims?
+BAD package rescheduling? cutoff count/time vs only retry for FAIL?
+option for variance testing / flaky rebuilds?
+resource allocation / "chunky" builds vs "lean" builds, per-package
+test suite failures (default to off?)
+worker tags
+more worker metadata?
+auto-assigning tags to packages? downprioritized
+auto-policies for package size/tarball size/build resource usage? downprioritized
+web frontends? more serverside compute, less frontend?
+ arch frontend - doesn't handle many packages well
+ debian frontend - mostly static generation
+ jarl's experimental frontend - not ready for use, needs updates to v1 API
+openapi - not source of truth, representation of what the code actually does
+https://vulns.xyz/2021/10/rebuilderd-v0.15.0/ has nice notes and explanations - turn into docs?
+
+wrap-up capture
+blocking packages by name and version?
+
=====================================
_events/vienna2025/agenda/d2-successfailure.md
=====================================
@@ -0,0 +1,12 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Success and failures
+event: vienna2025
+order: 24
+permalink: /events/vienna2025/agenda/d2-successfailure.md/
+---
+
+Standardizing and sharing success and failure
+
+Notes
+
=====================================
_events/vienna2025/agenda/d2-tampering.md
=====================================
@@ -0,0 +1,57 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Attacking and tampering with builds
+event: vienna2025
+order: 25
+permalink: /events/vienna2025/agenda/d2-tampering/
+---
+
+Attacking and tampering with builds
+Notes
+What does tampering even look like?
+
+Collection of possible "backstab"? attacks in 2 papers william mentioned (TODO)
+(One from SAP WG)
+
+
+What attacks are?
+* Lying what you building of (source repo url)
+* Manipulation of build hooks, etc
+
+
+Could reproducible builds have prevented xz and similar attacks?
+
+Argument that reproducible builds can only help if your actual input can be trusted.
+In xz's case, the official dist tarball was malicious to begin with.
+
+BUT Debians current model would allow a maintainer to manipulate the inputs beforehand ->
+Debians developers are fully trusted anyway.
+
+Reproducible builds can only protect against tampering with the binary artifacts, not against
+tampering with the input.
+
+But it can also protect against "tampering with the build process". Such as the build server building from a different
+source than it claims.
+
+Not all projects have a consistent definition of what the canonical source of a package is (e.g. vcs repo vs dist tarball)
+
+Debian is about to add a new gate for package migrations between unstable and testing: Checking that packages that were known
+to reproducible in the past are still reproducible.
+
+Pratical problem for using reproducibility checks to raise suspicion of tampering is that there are too many packages that
+are not reliably reproducible in practice.
+That includes kernel, gcc, etc
+
+People in the industry seem to view provenance attestation and reproducible builds as alternatives to each other. Often preferring the former
+as that can als help iforensics.
+
+Short recap of the discussion, followed by a short reminder of the original goal: ensuring that tampering with the build env in various way would
+actually sound an alarm.
+
+One possible approach would be tooling to set up a copy of (Debians) build infrastructure with "fault injection".
+Discussion about how rebuilderd might be able to do this.
+
+Recap:
+* Verification of source -> dist artifacts transformation
+* Tampering with build inputs, such as manipulating locations where build server fetches dependencies from and check whether alarms are run
+* Discussed ideas on what to sound the alarm for exactly. Probably only "build as such is successful, package was reproducible in the past, but isn't anymore"
View it on GitLab: https://salsa.debian.org/reproducible-builds/reproducible-website/-/commit/2b78703953bbdfb60b1d3cb7392560d83594bc72
--
View it on GitLab: https://salsa.debian.org/reproducible-builds/reproducible-website/-/commit/2b78703953bbdfb60b1d3cb7392560d83594bc72
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-commits/attachments/20251030/247ab0d1/attachment.htm>
More information about the rb-commits
mailing list