<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<meta content="text/html; charset=UTF-8">
<style type="text/css" style="">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div dir="ltr">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Garamond,Georgia,serif">
<p>Hi Chris,</p>
<p><br>
</p>
<p>> <font size="2"><span style="font-size:10pt">The "source1" and "source2" fields are essentially free-form text<br>
descriptions </span></font></p>
<p><br>
</p>
<p>Good to know! Thanks!</p>
<p><br>
</p>
<p>> <font size="2"><span style="font-size:10pt">Is there a particular problem you are trying to solve here? Your<br>
question suggests there might be.</span></font></p>
<p><br>
</p>
<p>Nice of you to ask. I have 1000s of diffoscope files for Maven central artifacts that I am analysing. I wanted to understand the reasons for differences in each file. I can't do it manually given the manually so I thought I would cluster them in groups.
For example, if source1 is "javap", I can be sure that the diff is in JVM bytecode and I would cluster all diffs under "Difference in JVM bytecode". Then I would manually analyse some of them and know the reason for difference and eventually root cause. However,
source1/source2 being toolname is not true for all diffs so I could not categorize diffs that way. Eventually, I went for RegEx to cluster them. For this matter, the "comments" JSON attribute in diffoscope files also helped :)</p>
<p><br>
</p>
<p>For example, <a href="https://github.com/algomaster99/reproducible-central/issues/16" class="x_OWAAutoLink">
there cases where some files are missing or additional in the rebuild version</a>, I created a RegEx to capture that pattern and classify which Maven releases have this reason for non-reproducibility.
<br>
</p>
<p><br>
</p>
<div id="x_Signature">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:rgb(0,0,0); font-family:Calibri,Helvetica,sans-serif,"EmojiFont","Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols">
<div id="x_m_4935352394101912768Signature">
<div name="x_divtagdefaultwrapper"><font size="2" color="#808080"><span style="font-family:Arial,"Helvetica Neue",helvetica,sans-serif; background-color:rgb(255,255,255)"><span id="x_divtagdefaultwrapper" style="font-size:12pt">
<div style="margin-top:0; margin-bottom:0"><span style="color:rgb(0,0,0); font-family:Garamond,Georgia,serif">Regards,</span></div>
<span style="font-family:Garamond,Georgia,serif"></span><span style="font-family:Garamond,Georgia,serif"></span><span style="color:rgb(0,0,0)"></span><span style="font-family:Garamond,Georgia,serif"></span><span style="font-family:Garamond,Georgia,serif"></span>
<div style="margin-top:0; margin-bottom:0"><span style="color:rgb(0,0,0); font-family:Garamond,Georgia,serif">Aman Sharma</span></div>
</span><br>
</span></font></div>
<div name="x_divtagdefaultwrapper"><font size="2" color="#808080"><span style="font-family:Arial,"Helvetica Neue",helvetica,sans-serif; background-color:rgb(255,255,255)"></span><span class="x_im">PhD Student<br style="font-family:Arial,"Helvetica Neue",helvetica,sans-serif">
<span style="font-family:Arial,"Helvetica Neue",helvetica,sans-serif; background-color:rgb(255,255,255)">KTH Royal Institute of Technology</span><br style="font-family:Arial,"Helvetica Neue",helvetica,sans-serif">
</span><span style="font-family:Arial,"Helvetica Neue",helvetica,sans-serif; background-color:rgb(255,255,255)">School of Electrical Engineering and Computer Science (EECS)</span><br style="font-family:Arial,"Helvetica Neue",helvetica,sans-serif">
<span style="font-family:Arial,"Helvetica Neue",helvetica,sans-serif; background-color:rgb(255,255,255)">Department of Theoretical Computer Science (TCS)</span><br style="font-family:Arial,"Helvetica Neue",helvetica,sans-serif">
<span style="font-family:Arial,"Helvetica Neue",helvetica,sans-serif; background-color:rgb(255,255,255)"><a href="http://www.kth.se" target="_blank" id="LPNoLP"></a><a href="https://www.kth.se/profile/amansha" class="x_OWAAutoLink" id="LPNoLP"></a><a href="https://www.kth.se/profile/amansha" class="x_OWAAutoLink" id="LPNoLP"></a></span></font></div>
</div>
<a href="https://www.kth.se/profile/amansha" class="x_OWAAutoLink" id="LPNoLP"><span style="font-size:10pt"></span></a><a href="https://algomaster99.github.io/" class="x_OWAAutoLink" id="LPNoLP">https://algomaster99.github.io/</a><br>
</div>
</div>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Chris Lamb <chris@reproducible-builds.org><br>
<b>Sent:</b> Wednesday, January 15, 2025 7:00:15 PM<br>
<b>To:</b> diffoscope<br>
<b>Cc:</b> Aman Sharma<br>
<b>Subject:</b> Re: [diffoscope] Schema of diffoscope JSON output</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">Hello Aman,<br>
<br>
> I want to know if there is a schema for JSON output from diffoscope. I <br>
> have understood that it always contains 'source1' and 'source2'. <br>
> However, they can either mean the actual source files that are being <br>
> diff-ed or the name of the tool that is run on the files before being <br>
> diff-ed.<br>
<br>
No, there is not a defined JSON schema á la json-schema.org (or<br>
similar) beyond what you have observed. :)<br>
<br>
The "source1" and "source2" fields are essentially free-form text<br>
descriptions — as you outline, sometimes they are filenames and<br>
sometimes they are descriptions of the tool being used to format some<br>
difference.<br>
<br>
Do remember that because of the way that diffoscope recursively<br>
unpacks archives, you cannot rely on any filenames listed in these<br>
fields being resolvable on the filesystem anyway, so it is unclear<br>
what would be gained if this was somehow more... 'strict'.<br>
<br>
Is there a particular problem you are trying to solve here? Your<br>
question suggests there might be.<br>
<br>
<br>
Best wishes,<br>
<br>
-- <br>
o<br>
⬋ ⬊ Chris Lamb<br>
o o reproducible-builds.org 💠<br>
⬊ ⬋<br>
o<br>
</div>
</span></font>
</body>
</html>