[diffoscope] What file to diff on after root paths are compared?

Aman Sharma amansha at kth.se
Thu Dec 26 20:42:54 UTC 2024


Hi,


I had a question on how does diffoscope decide which files to diff after it has diff-ed the root paths?


>From the description of diffoscope, it is clear that it would first compare_root_paths<https://salsa.debian.org/reproducible-builds/diffoscope/-/blob/master/diffoscope/main.py?ref_type=heads#L718> and then recursively unpack if it is an archive to find diffs between the files. My question is how does it know which files to diff over? For example, consider the diff output below between two jars. It first runs some zip archive related tools and then shows the diff between the files (MANIFEST and properties file) that was actually causing the changes in the archive. Does diffoscope know about these files from zipinfo/zipnote/zipdetails output or does it simply diff over all files in the archive?



```json

{
"diffoscope-json-version": 1,
"source1": "/target/reference/ch.qos.logback.db/logback-access-db-1.2.11.1.jar",
"source2": "/logback-access-db/target/logback-access-db-1.2.11.1.jar",
"unified_diff": null,
"details": [
{
"source1": "zipinfo {}",
"source2": "zipinfo {}",
"unified_diff": "@@ -1,23 +1,23 @@
-Zip file size: 10358 bytes, number of entries: 21
-drwxr-xr-x 2.0 unx 0 b- stor 22-Apr-20 20:27 META-INF/
--rw-r--r-- 2.0 unx 130 bl defN 22-Apr-20 20:27 META-INF/MANIFEST.MF
-drwxr-xr-x 2.0 unx 0 b- stor 22-Apr-20 20:27 ch/
-drwxr-xr-x 2.0 unx 0 b- stor 22-Apr-20 20:27 ch/qos/
-drwxr-xr-x 2.0 unx 0 b- stor 22-Apr-20 20:27 ch/qos/logback/
-drwxr-xr-x 2.0 unx 0 b- stor 22-Apr-20 20:27 ch/qos/logback/access/
-drwxr-xr-x 2.0 unx 0 b- stor 22-Apr-20 20:27 ch/qos/logback/access/db/
-drwxr-xr-x 2.0 unx 0 b- stor 22-Apr-20 20:27 ch/qos/logback/access/db/script/
--rw-r--r-- 2.0 unx 4952 bl defN 22-Apr-20 20:27 ch/qos/logback/access/db/DBAppender.class
--rw-r--r-- 2.0 unx 1378 bl defN 22-Apr-20 20:27 ch/qos/logback/access/db/script/db2.sql
--rw-r--r-- 2.0 unx 1001 bl defN 22-Apr-20 20:27 ch/qos/logback/access/db/script/db2l.sql
--rw-r--r-- 2.0 unx 1011 bl defN 22-Apr-20 20:27 ch/qos/logback/access/db/script/hsqldb.sql
--rw-r--r-- 2.0 unx 1145 bl defN 22-Apr-20 20:27 ch/qos/logback/access/db/script/msSQLServer.sql
--rw-r--r-- 2.0 unx 1291 bl defN 22-Apr-20 20:27 ch/qos/logback/access/db/script/mysql.sql
--rw-r--r-- 2.0 unx 1663 bl defN 22-Apr-20 20:27 ch/qos/logback/access/db/script/oracle.sql
--rw-r--r-- 2.0 unx 1283 bl defN 22-Apr-20 20:27 ch/qos/logback/access/db/script/postgresql.sql
-drwxr-xr-x 2.0 unx 0 b- stor 22-Apr-20 20:27 META-INF/maven/
-drwxr-xr-x 2.0 unx 0 b- stor 22-Apr-20 20:27 META-INF/maven/ch.qos.logback.db/
-drwxr-xr-x 2.0 unx 0 b- stor 22-Apr-20 20:27 META-INF/maven/ch.qos.logback.db/logback-access-db/
--rw-r--r-- 2.0 unx 2897 bl defN 22-Apr-20 20:23 META-INF/maven/ch.qos.logback.db/logback-access-db/pom.xml
--rw-r--r-- 2.0 unx 135 bl defN 22-Apr-20 20:27 META-INF/maven/ch.qos.logback.db/logback-access-db/pom.properties
-21 files, 16886 bytes uncompressed, 7178 bytes compressed: 57.5%
+Zip file size: 10359 bytes, number of entries: 21
+drwxr-xr-x 2.0 unx 0 b- stor 24-Oct-18 03:03 META-INF/
+-rw-r--r-- 2.0 unx 130 bl defN 24-Oct-18 03:03 META-INF/MANIFEST.MF
+drwxr-xr-x 2.0 unx 0 b- stor 24-Oct-18 03:03 ch/
+drwxr-xr-x 2.0 unx 0 b- stor 24-Oct-18 03:03 ch/qos/
+drwxr-xr-x 2.0 unx 0 b- stor 24-Oct-18 03:03 ch/qos/logback/
+drwxr-xr-x 2.0 unx 0 b- stor 24-Oct-18 03:03 ch/qos/logback/access/
+drwxr-xr-x 2.0 unx 0 b- stor 24-Oct-18 03:03 ch/qos/logback/access/db/
+drwxr-xr-x 2.0 unx 0 b- stor 24-Oct-18 03:03 ch/qos/logback/access/db/script/
+-rw-r--r-- 2.0 unx 4952 bl defN 24-Oct-18 03:03 ch/qos/logback/access/db/DBAppender.class
+-rw-r--r-- 2.0 unx 1663 bl defN 24-Oct-18 03:03 ch/qos/logback/access/db/script/oracle.sql
+-rw-r--r-- 2.0 unx 1145 bl defN 24-Oct-18 03:03 ch/qos/logback/access/db/script/msSQLServer.sql
+-rw-r--r-- 2.0 unx 1291 bl defN 24-Oct-18 03:03 ch/qos/logback/access/db/script/mysql.sql
+-rw-r--r-- 2.0 unx 1378 bl defN 24-Oct-18 03:03 ch/qos/logback/access/db/script/db2.sql
+-rw-r--r-- 2.0 unx 1001 bl defN 24-Oct-18 03:03 ch/qos/logback/access/db/script/db2l.sql
+-rw-r--r-- 2.0 unx 1011 bl defN 24-Oct-18 03:03 ch/qos/logback/access/db/script/hsqldb.sql
+-rw-r--r-- 2.0 unx 1283 bl defN 24-Oct-18 03:03 ch/qos/logback/access/db/script/postgresql.sql
+drwxr-xr-x 2.0 unx 0 b- stor 24-Oct-18 03:03 META-INF/maven/
+drwxr-xr-x 2.0 unx 0 b- stor 24-Oct-18 03:03 META-INF/maven/ch.qos.logback.db/
+drwxr-xr-x 2.0 unx 0 b- stor 24-Oct-18 03:03 META-INF/maven/ch.qos.logback.db/logback-access-db/
+-rw-rw-r-- 2.0 unx 2897 bl defN 24-Oct-18 03:03 META-INF/maven/ch.qos.logback.db/logback-access-db/pom.xml
+-rw-r--r-- 2.0 unx 134 bl defN 24-Oct-18 03:03 META-INF/maven/ch.qos.logback.db/logback-access-db/pom.properties
+21 files, 16885 bytes uncompressed, 7179 bytes compressed: 57.5%
"
},
{
"source1": "zipnote \u00abTEMP\u00bb/diffoscope_8jqzrch0_target/tmpa4tdz64e_.zip",
"source2": "zipnote \u00abTEMP\u00bb/diffoscope_8jqzrch0_target/tmpa4tdz64e_.zip",
"unified_diff": "[REMOVED FOR CONCISENESS]"
},
{
"source1": "zipdetails --redact --scan --utc {}",
"source2": "zipdetails --redact --scan --utc {}",
"unified_diff": "[REMOVED FOR CONCISENESS]"
},
{
"source1": "META-INF/MANIFEST.MF",
"source2": "META-INF/MANIFEST.MF",
"unified_diff": "[REMOVED FOR CONCISENESS]"
},
{
"source1": "META-INF/maven/ch.qos.logback.db/logback-access-db/pom.properties",
"source2": "META-INF/maven/ch.qos.logback.db/logback-access-db/pom.properties",
"unified_diff": "[REMOVED FOR CONCISENESS]"
}
]
}



Regards,
Aman Sharma

PhD Student
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science (EECS)
Department of Theoretical Computer Science (TCS)
<http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha>
<https://www.kth.se/profile/amansha>https://algomaster99.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/diffoscope/attachments/20241226/c181862a/attachment.htm>


More information about the diffoscope mailing list