Need Help with duperemove in Debian

Chris Lamb chris at reproducible-builds.org
Thu Sep 4 21:24:04 UTC 2025


Hi Marc,

> > While doing sort, you could use 'LC_ALL="C"', to ensure ASCII-based 
> > sorting (some locales place e.g. u-umlaut after z, other locales between 
> > u and v)

Sorry for the delay in getting to this thread, especially I could have
saved you a bunch of effort in getting to a solution.

The TL;DR is that kpcyrd correctly identified the root cause as an
issue due to the order of the object linking and Roland was pretty
close in his remark about locales.

The real causes, however, are that GNU Make has some interesting and
nonintuitive sorting tripwires and you were caught out with them due
to an issue due to some particular filenames chosen by the upstream
duperemove developers.

You perhaps wouldn't have thought to look for this specifically, but
in the reprotest log that you linked to, the linking order was
reversed between two files only: filerec.o and file_scan.o.

Cutting out a huge amount of noise from the log, cc was being called
like this between the two builds used for the comparison:

  cc […] filerec.o file_scan.o […] -o duperemove
  cc […] file_scan.o filerec.o […] -o duperemove

This was because, in the Estonian et_EE.UTF-8 locale that reprotest
uses, underscores sort after the numbers and letters. In the C locale,
by contrast, underscores sort *before* the letters: "file_scan.o"
is therefore placed before "filerec.o".

This is more of a slight weirdness in the C locale rather then
something in ee_EE. (ee_EE is still a good choice for reprotest, however,
because whilst en_US and friends sort the lower case before the upper
case, et_EE.UTF-8 and the "C" locale sort capital letters before the
lower case. Also, Estonian sorts the letter 'z' between 's' and 't'. 
Also, I'm fun at parties.) 

If that wasn't confusing enough, GNU Make then makes it even harder to
make sorting deterministic in practice.

The first confusing thing about GNU Make's sorting is that its
behaviour has changed over time. (kpcyrd hinted at this in his reply.)
The short version of the history is that the wildcard function sorted
prior to version 3.82 although that behaviour was not documented.
>From 3.82 onwards, however, the wildcard function was explicitly *not*
sorted and the documentation/changelog was updated to reflect that.
However, the original behaviour was probably being relied upon by too
many maintainers so this change was reverted in GNU Make version 4.3,
leading to the status quo: "the results of the wildcard function are
sorted".

But sorted how, precisely? This is the second confusing thing about
Make's sorting. Adding "LC_ALL=C" or "export LC_ALL=C" to the top
of a Makefile will not change the sorting behaviour/collation of the
wildcard function within that file. To me, this is *kinda* expected
because Make's wildcard function is not a subshell command, but it
can trip folks up who just want to make their package reproducible
so I explicitly mention it here. I mention it especially because
calling the Make binary via "LC_ALL=C make" or its many equivalents
*will* change the sorting to what you ask.

(Therefore, one solution for the duperemove Debian package would
therefore be to put an "export LC_ALL=C" in your debian/rules. This
ensures that the child Make process is invoked with the correct
collation setting. However, I would recommend a different fix; see
below.)

The third confusing thing about sorting files in Make is that
attempting to naively pipe the wildcard function to sort isn't
actually valid Make code, and then, rather unhelpfully, you don't get
any message that you've made a mistake. Your first attempt to fix
the package: 

   $(wildcard *.c | sort)

… looks like it should do something useful, but it will still vary
depending on the current collation. But this is because it's essentially
bogus code:

   $(wildcard *.c | you can actually put anything here and receive no error surprise hahahaha)

This is why, if you tried it, Roland's suggestion to use LC_ALL=C
(quoted at the top of this email) would not have worked for you. That
is to say, the following snippet wouldn't have worked for the same
reason as my "you can actually put anything here" example immediately
above:

   $(wildcard *.c | LC_ALL=C sort)  # not really 'valid' Make and doesn't sort using C

To me, this is particularly ironic, as this snippet superficially
appears to explicitly override the current collation but still varies
in practice.

The only way to reliably get simple, byte-comparison, "C" sorting is
via:

   $(sort $(wildcard *.c))


Indeed, I can confirm that the following patch makes the proposed
duperemove package (i.e. cloned from your debian/latest branch)
reproducible:

  -CFILES = $(filter-out tests.c,$(wildcard *.c))
  +CFILES = $(filter-out tests.c,$(sort $(wildcard *.c)))

No other changes are required. (This is better than changing the
debian/rules file as mentioned above, as it can be sent upstream.)

Anyway, hope this helps you and others in the future when they hit
similar underscore-related or GNU Make sorting foo.


Best wishes,

-- 
      o
    ⬋   ⬊      Chris Lamb
   o     o     reproducible-builds.org 💠
    ⬊   ⬋
      o


More information about the rb-general mailing list