[diffoscope] [Reproducible-builds] Support for --ignore-profile flag in diffoscope

Ximin Luo infinity0 at debian.org
Thu May 12 15:10:30 CEST 2016

Satyam Zode:
> Hi, all !
> I am Satyam Zode, I am GSoC student intern (http://satyamz.github.io
> /blog/2016/05/08/google-summer-of-code-2016-with-debian
> -reproducible-builds-introduction/).
> I am trying to understand the problem "Allow users to ignore arbitrary
> differences"  ( --ignore-profiles flag) in which diffoscope users will be
> able to ignore arbitrary differences. This problem is also described in
> diffoscope wishlist (https://reproducible-builds.org/events/athens2015/
> diffoscope-wishlist/).
> I have started thinking about the solution of this problem and for that, I
> want to know as a diffoscope user, what kind of stuff would you like to
> ignore ? Such as any irrelevant differences which are just making noise and
> useless.
> Currently, I am looking at pkg-diff (provided by OBS) code because it's
> able to ignore some kind of stuff (Lunar has suggested me to have a look at
> pkg-diff to get an idea about this problem) . I kindly request you all to
> express your views and expectations regarding this particular problem :)

Hi Satyam,

This is quite an open-ended problem and there is no single "correct" answer. I don't even know myself what would be best, at this stage.

What would help you (at least, how *I* would try to begin to tackle this problem), is to experience lots of data for yourself - you can browse through https://tests.reproducible-builds.org/ and look at lots of diffoscope output, then based on what you see, you can think through ways to classify differences into categories.

Once you have a list of categories to detect, you could then perhaps think of a mini language to allow users to express combinations of these categories.

One issue is you should be careful how you name and document these categories. For example, many people would like to --ignore-timestamp. Whilst you could have diffoscope hide certain timestamp differences, it is generally unsafe to assume from this that the two inputs "behave the same". For example, code contained in the file could examine itself and behave differently depending on what the timestamp says.


GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: OpenPGP digital signature
URL: <http://lists.reproducible-builds.org/pipermail/diffoscope/attachments/20160512/6210bb0c/attachment.sig>

More information about the diffoscope mailing list