[Git][reproducible-builds/reproducible-website][master] hamburg2023: day 1
Arnout Engelen (@raboof-guest)
gitlab at salsa.debian.org
Thu Nov 2 16:15:37 UTC 2023
Arnout Engelen pushed to branch master at Reproducible Builds / reproducible-website
Commits:
f28ef41e by Arnout Engelen at 2023-11-02T17:14:53+01:00
hamburg2023: day 1
- - - - -
10 changed files:
- _events/hamburg2023/agenda.md
- + _events/hamburg2023/big-picture.md
- + _events/hamburg2023/definitions.md
- + _events/hamburg2023/infra.md
- + _events/hamburg2023/language-specific.md
- + _events/hamburg2023/lists.md
- + _events/hamburg2023/projects.md
- + _events/hamburg2023/snapshot-service.md
- + _events/hamburg2023/success-stories.md
- + _events/hamburg2023/users.md
Changes:
=====================================
_events/hamburg2023/agenda.md
=====================================
@@ -30,19 +30,13 @@ Day 1 - Tuesday, October 31
* [minimal installation ISO](https://discourse.nixos.org/t/nixos-reproducible-builds-minimal-installation-iso-successfully-independently-rebuilt/34756)
* 11.15 Break
-* 11.00 Mapping the Big Picture
-TODO links
+* 11.00 [Mapping the Big Picture]({{ "/events/hamburg2023/big-picture/" | relative_url }})
* 13.00 Lunch
* 14.00 Collaborative Working Sessions
-TODO needs the pads as .md
- * Towards a snapshot service
- https://pad.riseup.net/p/rbsummmit2023-d1-snapshots-keep
- * Understanding user-facing needs and personas
- https://pad.riseup.net/p/rbsummmit2023-d1-userfacing-keep
- * Language-specific package managers
- https://pad.riseup.net/p/rbsummmit2023-d1-languagepackages-keep
- * Defining our definitions
- https://pad.riseup.net/p/rbsummmit2023-d1-deftinitions-keep
+ * [Towards a snapshot service]({{ "/events/hamburg2023/snapshot-service/" | relative_url }})
+ * [Understanding user-facing needs and personas]({{ "/events/hamburg2023/users/" | relative_url }})
+ * [Language-specific package managers]({{ "/events/hamburg2023/language-specific/" | relative_url }})
+ * [Defining our definitions]({{ "/events/hamburg2023/definitions/" | relative_url }})
* 15.15 Break
* 15.30 Collaborative Working Sessions/Hack Time
* 16.30 Closing Circle
=====================================
_events/hamburg2023/big-picture.md
=====================================
@@ -0,0 +1,28 @@
+---
+layout: event_detail
+title: Mapping the Big Picture
+event: hamburg2023
+order: 23
+permalink: /events/hamburg2023/big-picture/
+---
+
+Building on the mappings we did at the 2022 Reproducible Builds Summit, the group will use this time to take stock of where things stand for Reproducible Builds across a range of context, as of the Summit. We'll identify success stories, exemplars and case studies to be celebrated and amplified, while also mapping challenges, needs and unsolved problems.
+
+Topics, issues and ideas that surface during this session will inform how we structure the rest of the agenda.
+
+[Success stories]({{ "/events/hamburg2023/success-stories/" | relative_url }})
+* Real world success stories: What we know works
+* Real world success stories we need or are searching for
+
+[Projects]({{ "/events/hamburg2023/projects/" | relative_url }})
+* Projects practicing reproducibility
+* What projects/platforms/libraries do we *want*/need to be reproducible?
+
+[Mapping projects infra]({{ "/events/hamburg2023/infra/" | relative_url }})
+* Missing RB infrastructure we need to create
+
+[Mapping lists]({{ "/events/hamburg2023/lists/" | relative_url }})
+* Problems we need to discuss/solve
+* Other topics we need to discuss
+* Other lists we need to make
+* Other projects/people we should get to the next Summit
=====================================
_events/hamburg2023/definitions.md
=====================================
@@ -0,0 +1,60 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Defining our definitions
+event: hamburg2023
+order: 33
+permalink: /events/hamburg2023/definitions/
+---
+
+We feel that definitions of terms are important to coordinate our work across projects and to be able to communicate both our successes and the work still remaining to be done.
+
+The current Reproducible Builds effort has two commonly cited definitions of "reproducible" -- one mentioned in the reproducible-builds.org website, and another (shorter) one which is seen on the group's teeshirts. But perhaps we need more; and perhaps it is time to revisit those and see if they still serve.
+
+Consensus:
+- Definitions are important
+- We only have one (relatively) clear definition -- "reproducible" -- but maybe we need more definitions, or some concept of "levels".
+- The definition we have is evidently *not* clear enough and may have other problems -- evidenced by announcements made by various projects and distributions which recurrently report "X% reproducible", wherein:
+ - percentages do not appear to be meaningfully comparable across distros
+ - the percentages reported by projects vary over time (when the exact definition changes to be more or less strict, or something *not covered by their previous practical definition changes*)
+ - it appears that no systems are actually approaching "100%".
+
+Progress:
+
+Producing new definitions proves difficult.
+
+Brainstorming: potentially revelevant terms and concepts mentioned included:
+- diverse compilation
+- environmental randomization
+- insignificant environment bits
+- "once" "I" reproduced it (example of a weak definition that we often see used in practice!)
+- bit-for-bit reproducibility (included in current definition -- we ratify that we still like this because it is specific and clear)
+- late-discovered un-reproducibility (an unavoidable phenomenon that causes percentages to backslide)
+- circumstantial reproducibility
+- idependent reproducibility
+- should we consider different Levels for Outcome Equality?
+- should we consider different Levels for Input Variation?
+- "only 100% reproducibility is useful" (several people agree with this, while acknolwedging the irony that no project has attained it)
+- deterministic
+- spurious vs tampering vs unreproducibile -- degrees (and reasons) for unreproducibility events
+- "transparently reproducible" (vs "blackbox"?)
+- reliable reproducibility
+- several notes contain drafts of functions...
+ - one contains "f(S)=B" -- meaning: a function consumes source and produces a binary
+ - a later note contains "f(S,SE,I)=As" -- meaning: a function consumes source, source environment, (?unknown?), and produces Artifacts (perhaps multiple).
+- Draft of levels?
+ - Level 0: unreproducible
+ - Level 1: Build at least twice with matching initial conditions, on the same machine, by the same person
+ - Level 2: Level 1 plus at least one build varying "X" things ("X" not specified)
+
+Observations, following brainstorming:
+
+- As the discussion that oriented around function sketches continued, things started with one parameter, and then people tend to want to factorize out more and more parameters.
+ - The distinguishing trait for what got factorized tended to be roughly "which things are difficult to change".
+- Participants wanted to steer the world by changing the definitions -- in two very different directions:
+ - Some participants specifically identified wanting to make the definition more concrete in ways that would encourage readers to pick narrower, more attainable smaller steps towards the goal of reproducibility.
+ - Other participants wished to make the definition as broad and aspirational as possible (for example immediately encouraging "diverse" compilation, instead of merely repeatable setup and verification of deterministic steps from identical setup conditions).
+- In this session, we were unable to immediately identify clear "levels".
+ - The general idea seems to be that higher levels of reproducibility should involve more variation injection...
+ - ... but there are many different potential specific axis for this,
+ - ... and there is no clear ordering in which the different classes of variation could be said to matter more than others (so, ordinal "levels" seem difficult to map to this).
+
=====================================
_events/hamburg2023/infra.md
=====================================
@@ -0,0 +1,79 @@
+---
+layout: event_detail
+title: Mapping the Big Picture - Mapping projects infra
+event: hamburg2023
+order: 26
+permalink: /events/hamburg2023/infra/
+---
+
+* Projects practicing reproducibility
+ * Arch Linux
+ * Distrust Toolchain
+ * Buildroot
+ * Coreboot
+ * GNU Guix
+ * NixOS
+ * Warpforge
+ * ElectroBSD
+ * OpenWRT
+ * Fedora Linux
+ * OpenSUSE
+ * F-Droid
+ * Java jar Archives
+ * Debian
+ * TOR
+ * TAILS
+ * so toolchain
+ * Apache Maven
+ * Qubes OS
+ * Scala+sbt
+* What projects/platforms/libraries do we *want*/need to be reproducible?
+ * Maven (artifacts without sources)
+ * NPM
+ * Rust crates
+ * PyPI
+ * Docker Directory Timestamps
+ * ElectroBSD, FreeBSD Ports/Packages
+ * Binutils
+ * Qt
+ * Python sphinx
+ * ar (static .a libraries)
+ * Gradle
+ * DPkg database
+ * PureOS/Mobian
+ * R-B in Ubuntu
+ * Compilers for embedded systems
+ * Aurix (Infineon)
+ * ARM/KEIL
+ * GCC (its own build and the binaries it builds)
+ * Python 3.9
+ * Filesystems
+ * Clojure
+ * Build Tools
+ * Lein
+ * Flatpack
+ * Docker Images
+* Missing RB infrastructure we need to create
+ * [The N commandments of reproducible builds]({{ "/events/hamburg2023/rb-commandments/" | relative_url }})
+ * Alpine pkg archive
+ * Recordin Diversity
+ * Debian debuginfod.debian.net should also provide sources (stripped paths seems to be part of the problem)
+ * Rebuild infra for custom projects
+ * Standard for build location like we have for SOURCE_DATE_EPOCH for time
+ * Debian Snapshot server
+ * PyPI Repository
+ * Firmware hashes
+ * Cross target reproducibility
+ * Guix QA testing reproducibility
+ * Shared tooling for reporting
+ * Sharing + reporting on build results
+ * Comparing reproducibility stats Guix packages with other projects
+ * Crowd sourced reproducibility status information
+ * Diffoscope but I can install it (fewer deps)
+ * Merkle tree of upstream source releases
+ * Reproducible vs PGO
+ * Build infrastructure for ElectroBSD
+ * Standard BUILDINFO file format
+ * GCC 4.7.4 bootstrapped on more arches for bootstrappable builds
+
+https://salsa.debian.org/reproducible-builds/reproducible-website/-/merge_requests/105
=====================================
_events/hamburg2023/language-specific.md
=====================================
@@ -0,0 +1,34 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Language-specific package managers
+event: hamburg2023
+order: 32
+permalink: /events/hamburg2023/language-specific/
+---
+
+* Packaging, source or binary? Python and Rust (crates) supports both
+* Crates have immutable tags, git usually does not
+* Source provance is important but usually hard to get
+* Score card can help
+* GOSST (Google) scans packages and try to rebuild it's _content_ with
+ some success, results are not yet published
+ * If results were published, could be used to add badges to the
+ packages in the repositories that the package was rebuilt and
+ verified by a third party builder
+ * Could be used for cli integration to only allow install of
+ packages being rebuilt/verified by a third party
+ * Compare here is the binary/source artifact, not all metadata
+ * Makes it easier to adopt as maintainers do not have to change
+ all their CI/CD workflows
+* Discussed hosted vs local builder and trustworthyness, if the buid
+ is being reproduced, both hosted and local builder can be trusted
+ * Having developers managing key materials can still be hard
+* Can we tie Scorecard data into the package registry?
+* Can we have workflows that triggers a rebuild on a release, and gate
+ the publish step with a verified rebuild?
+* First action point:
+ * Have thirdparty builders rebuilt packages similar to what Herve
+ is doing for Maven Central?
+
+
+
=====================================
_events/hamburg2023/lists.md
=====================================
@@ -0,0 +1,70 @@
+---
+layout: event_detail
+title: Mapping the Big Picture - Mapping lists
+event: hamburg2023
+order: 27
+permalink: /events/hamburg2023/lists/
+---
+
+* Problems we need to discuss/solve
+ * NetBSD: LTO test in tests.tgz
+ * NetBSD: "-O bigdir" breaks it completely :(
+ * Reproducible Filesystem images
+ * Users caring
+ * Easy to use/understand tools for end-users
+ * Generated sources from graphic presentation
+ * comparability of graphic sources
+ * snapshots.debian.org for OpenWRT package source code
+ * Geting all sources eg. `go mod download`
+ * cross-compilation (even a different CPU feature is)
+ * Further improving knowledge sharing between distros
+ * Rproduce & challenge
+ * Diffoscope to (opt-in) ignore embedded signatures
+ * missing archive of build dependencies
+ * Github/Gitlab patches (git hash abbreviation) changing in length
+ * "Provenance" (Don't embed!)
+ * Build Artifact Retention
+ * Avoid fragmentation in signing formats PGP JWT WAC VC ZWK SIGSTORE NOTARY
+ * Reproducible profile-guided-optimization
+ * Undocumented build environmets
+* Other topics we need to discuss
+ * t-shirts and other swag
+ * Binary tarnsparency
+ * Too much dependencies...?
+ * Bootstrappability
+ * What work can be Distro-Agnostic?
+ * Potential blog posts for reproducible-builds.org
+ * diffoscope improvements
+ * index of binaries of a distro, keyed by hash (i.e. map binary -> src)
+* Other lists we need to make
+ * "sister" projects of reproducibility
+ * SLSA
+ * Bootstrappable Builds
+ * R-B hackathon organize
+ * List of our RB-related tooling
+ * List of Existing RB Infrastructure
+ * List of Reasons for investing in R-B
+ * Similar to buy-in page
+* [Other projects/people we should get to the next Summit]({{ "/events/hamburg2023/projects/" | relative_url }})
+
+# Other Projects / People to invite to next summit
+
+* Nuget Gallery
+* Language Package managers (all of them!)
+* Language registries (repositories) managers
+* Martin Monperrus Professor @ KTH Royal Institute of Technology
+* GHC devs = Haskell
+* "python people" (pypi pip...)
+
+* Cargo
+* chainguard (wolfi)
+* go toolchain team
+* Alpine & postmarket OS
+* Software heritage
+* QEMU
+* Red Hat
+* Google Android team
+* iOS
+* Yocto
+* UEFI-maybe on Arm/RISC-V
+
=====================================
_events/hamburg2023/projects.md
=====================================
@@ -0,0 +1,30 @@
+---
+layout: event_detail
+title: Mapping the Big Picture - Projects
+event: hamburg2023
+order: 25
+permalink: /events/hamburg2023/projects/
+---
+
+Projects we want/need to be reproducible:
+
+- binutils
+- PureOS/Mobian; RB images for phones
+- filesystems
+- npm
+- Rust crates
+- PyPI
+- Docker directory timestamps
+- ElectroBSD/FreeBSD Ports/Packages
+- Qt
+- Python Sphinx
+- ar embeds mtime, uid, gid
+- Gradle
+- dpkg database
+- RB in Ubuntu
+- Compiler for embed, Aurix (Infineon), ARM/Keil
+- Gcc
+- Python 3.9
+- Cloiure, build-tools dependencies
+- Flatpack
+- Docker images
=====================================
_events/hamburg2023/snapshot-service.md
=====================================
@@ -0,0 +1,65 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Towards a snapshot service
+event: hamburg2023
+order: 30
+permalink: /events/hamburg2023/snapshot-service/
+---
+
+## binary archives:
+
+* Debian snapshot.debian.org - slow and unstable
+* Arch (daily snapshot)
+* Notalpine
+* openSUSE (daily snapshot)
+
+## source archives
+openwrt needs source tarballs with specific hash
+others are mostly interested in latest sources + older binaries
+
+## use-cases:
+* verify latest binaries
+* track down supply-chain dependency problems
+
+
+Arch: sends a month worth to internet archive, keeps index
+
+openSUSE: keeps archive of published x86_64 binaries (some unpublished build deps missing) in IPFS on two machines on a 16TB HDD
+
+Software heritage keeps sources - only git?
+
+pristine-tar could help to track tarballs in git
+
+
+Debian:
+Vagrant did some more work on capturing current deps
+
+
+Need index by SHA-sum
+snapshot.debian.org is fast in delivering SHA-sum
+
+Packages list includes SHA-sum for all packages. buildinfo only lists name+version but not SHA-sum, because dpkg-build does not have hashes.
+
+Frederic had a copy of snapshot.debian.org ; but operational problems
+
+metasnap FIXME
+
+build-time from buildinfo file can tell what snapshot to use.
+
+Need DB of name+version => SHA-sum
+
+Debian build-env may be partially outdated at time of build. Makes it harder to find the right versions.
+
+Is it possible to make snapshot.debian.org faster? Uses FUSE filesystem; uses SHA1 internally while Debian uses MD5+SHA256 so mapping needs effort
+100TB archive; 80 GB per snapshot ; 1M files
+need only a small subset that is used for builds.
+Also needed for reproducing images.
+
+need more new faster servers? With distributed indexed servers.
+
+
+Need URL that gives a specific repo state at a time.
+
+Fedora does not do snapshots, but koji API to fetch past name+version ; not sure how long it is kept.
+
+Qubes has few Debian packages ; one repo with latest versions ; another repo will all old versions ; scales OK there.
=====================================
_events/hamburg2023/success-stories.md
=====================================
@@ -0,0 +1,62 @@
+---
+layout: event_detail
+title: Mapping the Big Picture - Success Stories
+event: hamburg2023
+order: 24
+permalink: /events/hamburg2023/success-stories/
+---
+
+## Real world success stories: What we know works
+- clear cases
+ - `SOURCE_DATE_EPOCH` *widely* honored and standardized
+ - ElectroBSD (distribution tar balls amd64)
+ - near 500 Java project produce RB releases (see Reproducible Central)
+ - Yocto base
+ - NixOS minimal installation ISO reproducible
+ - Tails ISO
+ - Tor Browser is reproducible
+ - Debian docker images
+ - diverse double compilation to encounter Trusting-trust attacs
+ - Debian policy
+ - bitcoin core
+ - find+fix corruption bugs
+- nobody knows (missing docs etc)
+ - zig
+ - go toolchain?
+ - android
+ - [Spoon](https://github.com/INRIA/spoon): An AST parsing and transformation library for Java
+ - openSuSE at 97%
+ - GitHub reproroducible build badge
+ - spytrap-adb
+ - F-Droid 90% new apps included are reproducible
+ - bootstrappable.org
+ - coreboot
+ - developer stories
+
+## Real world success stories we need or are searching for
+
+(RB Success Stories Desired)
+
+## column 1: < 100%
+
+* 96% binary pkg reproducible debian cloud image
+* K && N trust in GNU Guix substitues by default
+* Debug Packages
+* Debian Install Images reproducible
+
+## column 2
+
+* 150 java projects try fail at getting RB release (really hard to read) J
+* f-droid: older apps RB verify but not yet switched
+* only release reproducible packages (stage until verified)
+
+## column 3
+
+* Reproducible compilers built with other compilers. e.g. gcc built via clang then rebuild gcc with that
+* setting up CI RB system for FEdora RPMs
+* Install images rebuilt on different distro
+* Install images rebuilt by different organisations
+* Repo reproducible as opposed to deterministic build
+* Debian snapshot scalable
+
+
=====================================
_events/hamburg2023/users.md
=====================================
@@ -0,0 +1,58 @@
+---
+layout: event_detail
+title: Collaborative Working Sessions - Understanding user-facing needs and personas
+event: hamburg2023
+order: 31
+permalink: /events/hamburg2023/users/
+---
+
+Clusters of stakeholders:
+- 'end users'
+ - 'distro' end-users
+ - 'direct' (non-distro) end-users
+ - 'normies'
+ - administrators
+- organizations that want to use reproducible software
+ - software vendors
+ - (oss) developer communities
+- intermediaries
+ - distro/package managers
+ - verifiers
+ - managers/teamleaders
+
+Goals:
+- even developers are not aware of reproducible builds. Expected much less so to end-users, but already
+- initiatives such as Debian mandating reproducibility
+
+- example: f-droid built an apk with malware from package repository, while the original developer had a cached non-backdoored version.
+- policy: most build pipelines nowadays have security compliance features, reproducibility might become a part of that. that helps developers care.
+- even if source is available it can be hard to rebuild in practice.
+
+- integration in package managers, so you can set a policy to only install reproducible software
+
+- what about software does not found in distro packages
+ - repro-env: makes it easier to rebuild 3rd-party packages
+ - important that software is reproduced by people unaffiliated with the project
+
+- Levels of trustworthiness:
+ - low: source unknown, distributed by 'authority'
+ - medium: open source
+ - high: reproducible open source
+
+- In case of F-Droid: there F-Droid takes the role of the 3rd party reproducing/verifying the software
+ - extra advantage is that in case of F-Droid the APK built by F-Droid is compared
+ to the APK built by the upstream.
+ This is unfeasible for distro's, though, since distro's provide value by building
+ packages in a particular way to provide a consistent experience to their users
+
+- registry where independent 3rd party rebuilders/verifiers can upload their build results
+ - in-toto plugin for arch and debian would be an interesting inspiration
+ - how to organize/fund such rebuilders?
+ - integrate rebuilding functionality into distro/package managers?
+ - reproduce probabilistically?
+ - some large organizations may want to rebuild for their own use anyway
+ - if we make it easy for them, and entice them to share their results,
+ the rest of the community could piggy-back on that?
+ - rebuilderd? results queryable over http api
+
+
View it on GitLab: https://salsa.debian.org/reproducible-builds/reproducible-website/-/commit/f28ef41e20f61f7e48fdaf740c5e5743a4e5ac35
--
View it on GitLab: https://salsa.debian.org/reproducible-builds/reproducible-website/-/commit/f28ef41e20f61f7e48fdaf740c5e5743a4e5ac35
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-commits/attachments/20231102/2b37c79c/attachment.htm>
More information about the rb-commits
mailing list