[rb-general] SOURCE_PREFIX_MAP and Occam's Razor

John Gilmore gnu at toad.com
Tue Jan 24 03:47:35 CET 2017


> talk is cheap, show me the code

Arrogance is often a useful attitude for a young person, but it's
useful to first determine whether one is in the wrong.  When working
on code that's decades old, reading ChangeLog files is often
educational.

>> At Cygnus in the 1990s we made the GNU tools fully reproducible, even
>> for cross-compilation.  This required dealing with many byte-order
>> issues and floating-point representations and such -- a set of issues
>> that you don't have.  We built the byte-order-independent BFD tools,
>> and the new GNU linker, pretty much from scratch, for just that purpose.

Ximin Luo said:
>Do you have this documented in detail somewhere? There is a difference
>between 100% reproducible and 92% reproducible. We are covering the
>final few %, so forgive me that I am not interested in people who
>claim their stuff is "reproducible" but actually do need things like
>SOURCE_DATE_EPOCH in corner cases that they didn't think about because
>they were too busy telling everyone how they set timestamps to 0
>everywhere.

I thank you for covering the final few %.  It is important work that
needs to be done.

We did not make every program in the universe reproducible.  We made
the GNU tools that we were shipping and supporting -- and all of our
test cases compiled by them -- reproducible.  That includes gcc, gdb,
gas, binutils, gnu make, and a few other things.  Plus several hundred
compiler test cases.  We didn't even have any Linux systems then
(Yggdrasil, the first distro, had its first release in late 1992;
Debian had just started in late 1993); we and all of our customers
were running on proprietary Unix and MSDOS machines.  See:

  "The Daemon, the GNU and the Penguin" by Peter Salus
  http://www.groklaw.net/articlebasic.php?story=20051031235811490

What we did is documented in the code.  I've done some poking around
in early documentation, and it's interesting that I haven't found any
place yet where we specifically and publicly said we made the output
files fully determined by the source files.  We just treated it as a
bug, and fixed it, anytime the object files didn't match bit-for-bit.
It wasn't something to shout about, it was just good engineering.

Originally the GNU tools only worked native; we improved 'em.  I made
the first GCC cross-compiler in about March 1990, but it was a
one-shot release for a single customer.  We made six or eight of those
one-shots for various platforms.  It got hard to support each customer
out of a unique source tree, so we invested the time to make the whole
thing configurable in one tree.  Cygnus was the first to build and
ship and support a suite of GNU cross-compilers, from a single source
tree, in October 1992.  These did have object-file differences in the
timestamps, and in some floating point constants, i.e. they weren't
fully deterministic, but the cross-compiler bear was finally starting
to dance.

Nine months later, we shipped our fourth major multi-platform
supported GNU compiler release (93q2) in June 1993.  Compiling the
same source files on 8 of the 9 different host systems, for the
Motorola 68000 target (among 10 target environments using 5 different
CPU chips), produced exactly bit-for-bit identical object files.  Part
of making this happen was by adding a "-gnodir" switch to gcc that
suppressed the directory from the debug information.  (RMS, who was
maintaining GCC at the time, suggested merely dropping the directory
name in all cases.  We left that issue unresolved pending a later
merge with his version of the code.)

The only difference between the files produced on different platforms
was a minor "release notes" issue about excess precision in floating
point numbers when cross-compiling from the 9th host platform, the IBM
RS/6000 (see below; it got resolved in later releases).  We verified
this with the DejaGnu test framework, which we designed and built as
part of this effort.  Just to get the GNU tools to build cleanly on
all those platforms, and to test them in both native UNIX and MSDOS
systems, and in embedded boards without an operating system, took a
team of 6 or 8 talented people many months.  We had to design and
build the first "configure" script, also.  (My apologies to everyone
for how complicated autoconf has become in the intervening 25 years).
Emails documenting some of this are below.

I have a copy of the GNU binutils-2.18.tar.bz2 from 2007.  In the file
binutils-2.18/bfd/ChangeLog-9193 is this entry:

  Wed Oct 28 13:42:09 1992  John Gilmore  (gnu at cygnus.com)

          * coffcode.h (coff_write_object_contents):  Zero timestamp field.

One of the emails below (by Ian Lance Taylor) suggests searching in
bfd/coffcode.h for the string "timestamp".  There you will see:

  /* We will NOT put a fucking timestamp in the header here. Every time you
     put it back, I will come in and take it out again.  I'm sorry.  This
     field does not belong here.  We fill it with a 0 so it compares the
     same but is not a reasonable time. -- gnu at cygnus.com  */
  internal_f.f_timdat = 0;

The comment was addressed to Steve Chamberlain, a great and prolific
Cygnus programmer who was the original author of the BFD library.
Note that this code only applied to the COFF object file format.  A
similar change was made for ECOFF.  There is similar code in bfd/som.c
for the HP PA-RISC SOM file format, setting timestamps to zero.

That code and ChangeLog entries have survived right up through the
current binutils release, binutils-2.27 from 2016.  In that code you
will also see Chris Demetriou's 2009 changes for
BFD_DETERMINISTIC_OUTPUT in archive (ar(1)) files, which avoid putting
each file's uid, gid, and timestamps in object file archives.  (This
is now the compiled default in ubuntu-16.04, though it wasn't default
in ubuntu-14.04.  Back in 1992 we needed those archive timestamps for
compatability with Sun's and BSD's proprietary UNIX linker.)

The closest thing to a published, broad summary of what we were doing
back in 1992 and 1993 is from _Inside Cygnus Engineering_ issues
1992-09 and 1992-12.  In general see:

http://www.toad.com/gnu/cygnus/

http://www.toad.com/gnu/cygnus/ice/ice-1992-09.txt

  PRODUCTS AND RELEASES
  ---------------------

  1.  Progressive Update

  We are still hard at work on the next progressive release.  In the
  process, we have done significant cleanup and reorganization of the
  libraries (libgcc, libc and libm, libiberty, libg++) to make the
  toolchain easily configurable for many cross-development platforms.  
  Our test framework has been ported and enhanced to increase 
  platform-independence of test cases, particularly for cross development.  
  And then there has been the very large task of building and testing 
  each platform, fixing bugs, and starting the cycle again.

  As expected, logistics is proving to be the greatest challenge.
  Although we have built and verified most of the platforms below at one 
  time or another in engineering, getting all of them into product boxes 
  has proved to be our undoing.  We will not be able to final- and 
  installation-test all of these in a timely fashion.  Thus we have decided 
  to delay the final ship date and decrease the number of platforms that we 
  will package and put on the shelf.

  The original ship date was set for September 30.  We are moving this
  back a month to October 31.  Core and Leveraged customers who needs
  updates at the end of the month will receive a beta tape and documentation.  
  If you need an update, please contact us (at engnews at cygnus.com) as soon 
  as possible.  We anticipate that the only difference betweeb such a taoe
  and the stock tapes will be in the installation procedure, although,
  of course, late-breaking bugs may be discovered.  We apologize for the
  inconvenience this causes.

  Our matrix of supported platforms covers both the needs of our current 
  customers and inputs from our sales and marketing groups.  To reduce
  the number of 'stock' platforms, we've decided to support certain
  platforms on a custom basis only.  Stock  platforms are shown with X's
  in the table below.  Platforms that we will not productize are shown with
  O's.  If there is sufficient customer interest in any of these, we will
  make a prerelease tape available, or make it a stock item for the 
  Q4 progressive release.


	 \ HOST |                   DEC   IBM   SGI
  TARGET  \     | SUN3  SUN4  SOL2  STN   RS6K  IRIS  DOS  HP300  HP700
  --------------+--------------------------------------------------------
  Native        |  X     X     X     X     X     X     
  68k VxWorks   |  X     X           X     X     X           X      X
  68k a.out     |  X     X           X                 X            X
  68k coff      |        X           O                 O
  29k UDI       |        X                             X
  ix86 a.out    |        X           O                 O
  i960 VxWorks  |        O           O     X                        O
  i960 Nindy    |        O           O
  SPARC VxWorks |        X                                               
  SPARC a.out   |        O            

  X     = on the shelf                     
  O     = may be custom built
  68k   = 68000, 68010, 68020, 68030, 68040
  ix86  = 386, 486
  i960  = KA, KB, CA
  SPARC = SPARC, SPARClite

[End of excerpt from Inside Cygnus Engineering; you can read many more
 issues of it at the above URLs.]

We shipped that "P3" progressive release in October 1992.  Then we
released a stability update in December, and further updates every
three months for years thereafter.  The 93Q2 release of June 1993 is
the first release in which we made the results bit-for-bit identical
on all platforms except the RS/6000.  That is documented in the emails
below.  Note the progression of dates.

	John

Date: Sun, 11 Oct 92 18:27:12 PDT
From: david d `zoo' zuhn <zoo at cygnus.com>
Organization: Cygnus Support -- +1 415 322 3816
To: p3
Subject: comparison results on p3 testing (GNBN)

[ After losing an hour of testing, I realized that -gnodir was required for
  this exercise.  Sigh. ]

Good news on this front.

With the exception of m68k-coff targets, all hosts generate identical
object files for the test case of GNU hello v1.1.  The m68k-coff objects
appear to be different, beyond the timestamp (which also appears to be
different, in byte 8, right?).  I don't know enough about coff or object
files in general to try to find the differences.

These objects all live in ~zoo/testing/{host}-{target}-objdir, in case some
enterprising hacker wishes to bang on this problem.

For some bad news, the final executable for m68k-aout targets is different
for each host:

 -rwxrwxr-x 1 zoo 146247 Oct 11 17:58 hppa1.1-hp-hpux-m68k-aout-objdir/hello
 -rwxrwxr-x 1 zoo 146315 Oct 11 18:03 mips-dec-ultrix-m68k-aout-objdir/hello
 -rwxrwxr-x 1 zoo 146348 Oct 11 18:01 rs6000-ibm-aix-m68k-aout-objdir/hello
 -rwxrwxr-x 1 zoo 146484 Oct 11 18:00 sparc-sun-sunos411-m68k-aout-objdir/hello

This is the only toolchain that is built on several hosts that is complete
(vxworks toolchains don't produce executables).   Well, okay, we built
m68k-coff for a couple of hosts, and those don't compare identically
either, but that shouldn't be a surprise.


From: ian at cygnus.com (Ian Lance Taylor)
Date: Mon, 12 Oct 92 19:15:19 EDT
To: zoo at cygnus.com
Cc: ian at cygnus.com, p3 at cygnus.com
Subject: comparison results on p3 testing (GNBN)

   Date: Sun, 11 Oct 92 18:27:12 PDT
   From: david d `zoo' zuhn <zoo at cygnus.com>

   With the exception of m68k-coff targets, all hosts generate identical
   object files for the test case of GNU hello v1.1.  The m68k-coff objects
   appear to be different, beyond the timestamp (which also appears to be
   different, in byte 8, right?).  I don't know enough about coff or object
   files in general to try to find the differences.

Well, after considerable tedium I tracked down the first of five
differences between the files created by mips-dec-ultrix and by
sparc-sun-sunos411.  The coff_swap_aux_out function only sets the
first eight or so bytes of the outgoing auxent for the C_FILE case, so
the remaining bytes are random garbage.  Perhaps a bzero at the start
of coff_swap_aux_out in coffcode.h would be appropriate.  I'm leaving
now, though.

Ian


Date: Tue, 13 Oct 92 11:33:36 PDT
From: david d `zoo' zuhn <zoo at cygnus.com>
Organization: Cygnus Support -- +1 415 322 3816
To: ian at cygnus.com (Ian Lance Taylor)
Cc: p3
Subject: comparison results on p3 testing (GNBN)

I think the intention in our tools is to not have the time stamp differ.
I'm not certain of this though.... anyone else?


To: david d `zoo' zuhn <zoo at cygnus.com>
Cc: ian at cygnus.com (Ian Lance Taylor), p3
Subject: Re: comparison results on p3 testing (GNBN) 
Date: Tue, 13 Oct 92 11:56:10 -0700
From: gnu at cygnus.com

> I think the intention in our tools is to not have the time stamp differ.
> I'm not certain of this though.... anyone else?

I strongly agree that our object files should not contain timestamps.
If you compile the same sources with the same compiler, you should get
the same result -- down to the bit.

	John


From: ian at cygnus.com (Ian Lance Taylor)
Date: Wed, 28 Oct 92 13:40:40 EST
To: gumby at cygnus.com
Subject: -gnodir
Cc: p3 at cygnus.com, ian at cygnus.com

The -gnodir hack I put into p3 gcc just removes the current directory
name.  If the file name is passed to gcc with directories, then those
do appear in the debugging file.  For example, gcc -gnodir
/foo/bar/file.c will put /foo/bar/file.c in the debugging information.
This turns out not to matter for p3, so far as I can tell, because the
only times pathnames are given to gcc they are always of the form
../p3/foo.

Do you think I should improve it to use the basename of the argument,
or do you think that since this is just an undocumented hack anyhow
there's no point?

Ian


To: ian at cygnus.com (Ian Lance Taylor)
Cc: gumby at cygnus.com, p3 at cygnus.com, gnu
Subject: Re: -gnodir 
Date: Wed, 28 Oct 92 13:09:17 -0800
From: gnu at cygnus.com

> Do you think I should improve it to use the basename of the argument,
> or do you think that since this is just an undocumented hack anyhow
> there's no point?

I do not think that this should be changed.  If the user passes in an
absolute pathname, they should get what they get.  Cutting it down to
the basename would be the wrong thing; it would not enable the
debugger to tell the difference between ../bfd/core.c and
../gdb/core.c, for example.

I think that -gnodir should become a supported feature, but we should get
some further experience with it, first.  Did it accomplish its intended
purpose of making it possible to compare all the P3 .o's easily?

	John

Date: Wed, 28 Oct 92 13:20:19 PST
From: david d `zoo' zuhn <zoo at cygnus.com>
Organization: Cygnus Support -- +1 415 322 3858
To: gnu at cygnus.com
Cc: p3
Subject: Re: -gnodir 

   I think that -gnodir should become a supported feature, but we should get
   some further experience with it, first.  Did it accomplish its intended
   purpose of making it possible to compare all the P3 .o's easily?

Yes, it did help here, quite a bit.  


From: ian at cygnus.com (Ian Lance Taylor)
Date: Wed, 28 Oct 92 16:27:47 EST
To: gnu at cygnus.com
Cc: gumby at cygnus.com, p3 at cygnus.com, ian at cygnus.com
Subject: -gnodir 

   Date: Wed, 28 Oct 92 13:09:17 -0800
   From: gnu at cygnus.com

   I think that -gnodir should become a supported feature, but we should get
   some further experience with it, first.  Did it accomplish its intended
   purpose of making it possible to compare all the P3 .o's easily?

Yes, it did.  I did not an exhaustive comparison of everything (too
much other stuff to do, but this should definitely be done for p4).
For the ones that I did compare the only differences were the floating
point differences and the COFF timestamps.

In the collection of mail you sent me, rms suggested never generating
the directory name at all instead of adding the -gnodir option.  I'm
not sure I know how to find out whether that would ever cause any
problems.  dbx on the Sun4 seems to be able to cope.

We could resolve the floating point differences by making gcc use the
newlib atof, as you suggested.  [copyright discussion elided...--gnu]
Not a problem for us, but this would seem to be an obstacle to getting
the modification accepted by the FSF.  I don't know if rms has any
opinions on the issue.

As far as the COFF timestamps go, yes, we are still generating them.
Take a look at the comment in bfd/coffcode.h (search for the string
"timestamp") and argue it with Steve.

Ian


To: wilson, zoo, progressive
Subject: Stage2/3 differences in gdb caused by stage1/2 diff in libgcc2
Date: Tue, 06 Apr 93 23:58:22 -0700
From: gnu at cygnus.com

In comparing the directories built in stage 2 and 3 of a Progressive
build, in /cygint/gnu/cdware/sparc-sun-sunos4.1.1-objdir.[23], I found
that the GDB binary differed.  The difference comes from a module in
libgcc2.a in the previous stage [objdir.1 and objdir.2], which is linked
in to the GDB binary by the previous stage's compiler.

Note that the libgcc2.a in stage 1 is built with the freshly built
stage1 gcc, so it should produce identical executables to the stage2
gcc.

The difference is that the debug symbol for __gnuc_va_list:t21=*2
occurs in stage2, but not in stage1.  This happens in both _eprintf.o
and _new_handler.o in the built gcc/libgcc2 directories.

I don't have time to chase this further (CDware release to cut), but I
thought you-all should know.  My guess is there's still some
misadventures in the Makefile regarding include file paths while
building libgcc2.

	John

PS:  The stage2/3 comparisons in test-build.mk do not catch this,
because they only compare the .o files.  The difference occurs only
in the linked executables (at the stage2/3 level).  *All* files
need to be compared, not just .o files, and all bytes of these files
need comparing; the first 10 bytes can't be ignored.


From: cassidy (Jeffrey Wheat)
Subject: q2 testing
To: progressive, testing
Date: Thu, 24 Jun 1993 21:38:57 -0700 (PDT)

Hi

	I've completed testing for the m68k-aout cross targets. The systems
that I tested on are: solaris, sun3, sun4, decstation, rs6000, hp300 and 700.
I was unable to test on the sco box (no account there), and the sgi (no gcc).

The only problem I encountered, was a big difference in the code that was
generated by the rs6000. All other hosts generated identical code. The test
in question is the gcc.execute/920715-1.c test. The diff's between the srec
output are:

[pluto] [gcc/rs6000-ibm-aix3.2] % diff 920715-1 ../mips-dec-ultrix/920715-1
1112,1115c1112,1115
< S31A0000510EA94A5135F294001AF22E5400FFE8F23C54383F666053
< S31A000051235AA96D8068F2920004600661FF00000448F22E54000B
< S31A00005138FFE0F23C54383F5995369285B1F4F294001AF22E5490
< S31A0000514D00FFE0F23C54383F59953692CC105AF2920004600695
---
> S31A0000510EA94A514CF294001AF22E5400FFE8F23C54383F66603C
> S31A000051235AA96D807FF2920004600661FF00000448F22E5400F4
> S31A00005138FFE0F23C54383F5995369285B222F294001AF22E5461
> S31A0000514D00FFE0F23C54383F59953692CC1088F2920004600667

The differences between the code generated when linked to a binary and then
dumped with objdump are too many to post (90k diff file) so its in the file
/offsite/cassidy/rs6000-others.diff. I don't know if the fact that the
rs6000 is generating different code than all the other hosts tested, so I
wanted to let those who would know, know.

cassidy


From: cassidy at darkstar.cygnus.com (Jeffrey Wheat)
Subject: rs6000 differences (920715-1.s)
To: progressive at darkstar.cygnus.com, testing at darkstar.cygnus.com,
        gumby at darkstar.cygnus.com
Date: Fri, 25 Jun 1993 16:11:26 -0600 (MDT)

[pluto] [gcc/rs6000-ibm-aix] [pts/3] % diff 920715-1.s ../sparc-sun-sunos4.1.1/920715-1.s
23c23
<       fcmpd #0r0.0013165056730000000,fp0
---
>       fcmpd #0r0.001316505673000000062,fp0
26c26
<       fcmpd #0r0.0013165056740000000,fp0
---
>       fcmpd #0r0.001316505674000000058,fp0
33c33
<       fcmpd #0r0.0027314921119999900,fp0
---
>       fcmpd #0r0.0027314921119999998472,fp0
36c36
<       fcmpd #0r0.0027314921129999900,fp0
---
>       fcmpd #0r0.0027314921129999998432,fp0
43c43
<       fcmpd #0r0.0015614540989999900,fp0
---
>       fcmpd #0r0.0015614540989999999183,fp0
46c46
<       fcmpd #0r0.0015614540999999900,fp0
---
>       fcmpd #0r0.0015614540999999999143,fp0
204c204
<       fmuld #0r7.4373772832748258e-06,fp2
---
>       fmuld #0r7.4373772832748257769e-06,fp2
207c207
<       fmuld #0r3.8580246913580248e-06,fp3
---
>       fmuld #0r3.8580246913580247821e-06,fp3
299c299
<       fmuld #0r7.4373772832748258e-06,fp2
---
>       fmuld #0r7.4373772832748257769e-06,fp2
302c302
<       fmuld #0r3.8580246913580248e-06,fp3
---
>       fmuld #0r3.8580246913580247821e-06,fp3
[pluto] [gcc/rs6000-ibm-aix] [pts/3] % 

Note that the code that is produced *does* execute cleanly on the target board.

cassidy


From: gumby at cygnus.com (D. V. Henkel-Wallace)
Date: Mon, 28 Jun 93 12:14:03 EDT
To: gnu at cygnus.com
Cc: jeffrey at cygnus.com, progressive at cygnus.com
Subject: rs6000 differences (920715-1.s) 

   Date: Mon, 28 Jun 93 09:04:11 -0700
   From: gnu at cygnus.com

   >      Code generated on the IBM RS/6000 by the compiler in this release of
   >      the Developer's Kit may differ slightly from that of previous
   >      releases.  Specifically, this is with respect to the "fcmpd" and
   >      "fmuld" commands.  Note that the code produced does execute cleanly.
   > 
   > "Floating-point numbers for certain cross targets may have excess
   > precision when generated by a compiler running on an R/S 6000."

   excess precision?!  surely you jest...

No, there are too many bits in the mantissa (oops, it's now called
something else these days).

   We still need to investigate why this is happening.

Mark's already scheduled to fix it for devo, but it won't happen
before Q2 ships.



More information about the rb-general mailing list