[Git][reproducible-builds/reproducible-website][master] Document non-reproducibility arising from abbreviated Git hashes. (Closes:...

Chris Lamb (@lamby) gitlab at salsa.debian.org
Fri Jun 18 10:18:37 UTC 2021



Chris Lamb pushed to branch master at Reproducible Builds / reproducible-website


Commits:
a444890b by Chris Lamb at 2021-06-18T11:18:21+01:00
Document non-reproducibility arising from abbreviated Git hashes. (Closes: reproducible-builds/reproducible-website#31)

- - - - -


1 changed file:

- _docs/version_information.md


Changes:

=====================================
_docs/version_information.md
=====================================
@@ -5,18 +5,28 @@ permalink: /docs/version-information/
 ---
 
 Version information embedded in the software needs to be made
-deterministic. Counter-examples are using the current date or an
-incremental build counter.
+deterministic. Counter-examples are using the current date or an incremental
+build counter.
 
-The date and time of the build itself is hardly of value as an old
-source code can always be compiled long after it has been released.
-It's best when version information gives a good indication of what source
-code has been built.
+The date and time of the build itself is hardly of value as an old source
+code can always be compiled long after it has been released.  It's best when
+version information gives a good indication of what source code has been
+built.
 
-The version number can come from a dedicated source file, a *changelog*,
-or from a version control system. If a date is needed, it can be
-extracted from the *changelog* or the version control system. A
-cryptographic checksum can also help to pinpoint the exact source
-content. This makes [Git](https://git-scm.com/) commit ids good
-candidates as part of version information.
+The version number can come from a dedicated source file, a *changelog*, or
+from a version control system. If a date is needed, it can be extracted from
+the *changelog* or the version control system.
 
+## Git checksums
+
+Cryptographic checksums from revision control systems can be used to identify source content. [Git](https://git-scm.com/) commit IDs are thus a good candidate to include as as part of version information.
+
+However, abbreviated Git hash identifies (such as those obtained via `git describe` or `git rev-parse`) can be a source of non-reproducibility. This is because the number of hexadecimal characters in the abbreviated hash is *dependent on the number of objects in the Git repository*.
+
+The number of objects will not only change over time (due to other commits, even those not on the primary development branch), but it will also dramatically change if a 'shallow' clone is made (see `git-clone(1)`) -- these have fewer objects by design. To quote from the `git-config(1)` manual page:
+
+> `core.abbrev`
+>
+> Set the length object names are abbreviated to. If unspecified or set to "`auto`", an appropriate value is computed based on the approximate number of packed objects in your repository, which hopefully is enough for abbreviated object names to stay unique for some time. If set to "`no`", no abbreviation is made and the object names are shown in their full length. The minimum length is 4.
+
+Therefore, it is recommended that a fixed (or "`no`") truncation is specified when obtaining identifiers by using, for example, `git describe --abrev=12`, `git rev-parse --short=12` or the `core.abbrev config` setting.



View it on GitLab: https://salsa.debian.org/reproducible-builds/reproducible-website/-/commit/a444890bc68f4e38a50a30e2d5019f0c3177193e

-- 
View it on GitLab: https://salsa.debian.org/reproducible-builds/reproducible-website/-/commit/a444890bc68f4e38a50a30e2d5019f0c3177193e
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-commits/attachments/20210618/6632b186/attachment.htm>


More information about the rb-commits mailing list