[diffoscope] New feature discussion

Jean-Romain Garnier salsa at jean-romain.com
Fri Jul 10 12:14:03 UTC 2020


Hi,

Thanks for your detailed answer! I should be subscribed to the mailing
list, so no need to keep me in CC.

To try to give a more concrete example of the types of changes I made,
without entering into implementation details, here are a few modifications
I made to how ELF files are handled:
• Ignore some sections (essentially debugging sections, as well as those
prone to changing if a file is recompiled). This could be done using the
--exclude-command option, so it's not entirely necessary to add a new
feature. It does make for a pretty long command line, though.
• Filter out some strings in specific sections of readelf, for example to
ignore offsets in relocs,
• Same goes for objdump, for example to ignore offsets referencing
functions that resolve to a name (among other things).

These last two changes could be handled using the new --diff-mask option,
but it might mask details in other files or sections that I would rather
keep (I can't, for example, just mask every 0xXXX value or I'll lose a lot
of valuable information).
To illustrate how implementing those could impact the codebase, I opened
some merge requests to serve as examples:
• Filtering the output of readelf:
https://salsa.debian.org/reproducible-builds/diffoscope/-/merge_requests/59
• Filtering the output of objdump:
https://salsa.debian.org/reproducible-builds/diffoscope/-/merge_requests/60
• Filtering the output of strings(1) and removing binary diff fallback:
https://salsa.debian.org/reproducible-builds/diffoscope/-/merge_requests/61

As you can see, the overall idea is to remove as much "noise" (with regards
to my use case) as possible. Since I only want to focus on changes related
to features or fixes, I ignore essentially everything that makes a build
non-reproducible.
I realize these changes could be beneficial for a subset of diffoscope
users, and I think that shows in some issues that have been opened. Since
it goes pretty much against the flow of diffoscope's original purpose, I
wasn't very optimistic about finding a clean way of merging these upstream.
It would require either adding an option for each feature (which we don't
really want), or having an overall "detail-level" option to manage several
of those features. I believe this is similar to what is described in issue
#129.
You definitely know better than I do however, so you can check out the
mentioned merge requests if you wish.

Going back to the feature proposal, I must say your insight and experience
are very valuable.
Right now, my implementation consists in basically cloning diffoscope, and
replacing the comparator files with my own tweaked versions, which is what
led me to this proposal.
I understand that it would add another layer of complexity, which might not
be beneficial for the project. This is the main reason why I decided to
send an email about it, rather than trying to implement something that
doesn't fit the project's objectives.

Thanks again for your feedback, and let me know if you have any further
ideas or comments.

Best,
Jean-Romain Garnier


On Thu, Jul 9, 2020 at 9:30 AM Chris Lamb <chris at reproducible-builds.org>
wrote:

> [keeping Jean in CC; let me know if you are on this list]
>
> Hi Jean-Romain,
>
> I think we share the same concerns about the scope of diffoscope.
>
> There are already a lot of moving parts, comparators and code that
> could do with rewriting, yet alone optimising.
>
> However, I will add the general comment that there is a tendency in
> free software development to prematurely add layers of abstraction. I
> have nothing against plugin systems and similar abstractions — as
> ideas, they are also very satisfying to my geek mind as well. But, in
> my experience, they sometimes do not improve the codebase as a whole
> as the wider and longer-term negatives are not fully appreciated or
> even considered. Also, if they are added prematurely they often don't
> really fix the original issue to begin with as they are not understood
> properly.
>
> >  This use case is very different from the need to spot all
> >  differences in the context of reproducible builds, so simply
> >  merging it upstream doesn't seem straightforward.
>
> I think specifics might be better addressed in the MR rather than this
> email but what, in general terms, is preventing you from merging this
> functionality? It may not be Platonically ideal, but merging it in
> some form may be the best step for diffoscope as a whole.
>
> Even if it hacks around existing functionality (eg. --exclude-command
> or whatever… again, I won't engage in the specifics in this thread I'm
> afraid), but if we *directly* experience what problems we see with
> this approach in real life we can make a real judgement about where we
> go from there and possibly even see patterns across different
> features/problems/whatever.
>
> Compare this to imagining these problems and jumping to the conclusion
> that "just adding" abstraction X, plugin system Y or autodetection
> system Z would magically fix it … all requiring delicate, nuanced and
> exhausting responses by the maintainers that would have to maintain
> them.
>
> Hope that makes sense.
>
>
> Best wishes,
>
> --
>       o
>     ⬋   ⬊      Chris Lamb
>    o     o     reproducible-builds.org 💠
>     ⬊   ⬋
>       o
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/diffoscope/attachments/20200710/341ffa1b/attachment.htm>


More information about the diffoscope mailing list