[Git][reproducible-builds/diffoscope][master] 3 commits: Fix missing diff output on large diffs.
    Chris Lamb (@lamby) 
    gitlab at salsa.debian.org
       
    Mon Nov 15 19:02:58 UTC 2021
    
    
  
Chris Lamb pushed to branch master at Reproducible Builds / diffoscope
Commits:
6790469f by Brandon Maier at 2021-11-15T11:01:53-08:00
Fix missing diff output on large diffs.
When there is a large diff chunk, match_lines() will skip running the
difflib.Differ.compare(). However this causes the following issues:
- It does not empty the `self.buf` buffer. This means that all future
  calls to match_lines() for that file will always be too large. So
  effectively no more diffs from the file get output.
- It outputs a debug message, but does not output anything to the
  side-by-side diff, so a user looking at the side-by-side diff may be
  misled into thinking the rest of the file has no differences.
We can fix these issue by falling back to a lazy line-by-line diff. This
produces suboptimal output, but it runs in linear O(n) time while
providing some form of output. We include a comment in the diff so the
user knows the following output is using a lazy diff algorithm.
- - - - -
11cdb97c by Chris Lamb at 2021-11-15T11:02:02-08:00
Apply Black to previous commit.
Gbp-dch: ignore
- - - - -
592c401b by Chris Lamb at 2021-11-15T11:02:38-08:00
Import itertools top-level directly.
Gbp-dch: ignore
- - - - -
1 changed file:
- diffoscope/diff.py
Changes:
=====================================
diffoscope/diff.py
=====================================
@@ -24,6 +24,7 @@ import errno
 import fcntl
 import hashlib
 import logging
+import itertools
 import threading
 import subprocess
 
@@ -551,11 +552,11 @@ class SideBySideDiff:
         if len(l0) + len(l1) > 750:
             # difflib.Differ.compare is at least O(n^2), so don't call it if
             # our inputs are too large.
-            logger.debug(
-                "Not calling difflib.Differ.compare(x, y) with len(x) == %d and len(y) == %d",
-                len(l0),
-                len(l1),
+            yield "C", "Diff chunk too large, falling back to line-by-line diff ({} lines added, {} lines removed)".format(
+                self.add_cpt, self.del_cpt
             )
+            for line0, line1 in itertools.zip_longest(l0, l1, fillvalue=""):
+                yield from self.yield_line(line0, line1)
             return
 
         saved_line = None
View it on GitLab: https://salsa.debian.org/reproducible-builds/diffoscope/-/compare/3ab6acb816fa5e38cc58e6ad69515eef1ae4fe61...592c401bcad2ebffff195e23640031539fdf3a94
-- 
View it on GitLab: https://salsa.debian.org/reproducible-builds/diffoscope/-/compare/3ab6acb816fa5e38cc58e6ad69515eef1ae4fe61...592c401bcad2ebffff195e23640031539fdf3a94
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-commits/attachments/20211115/7c2eb4f6/attachment.htm>
    
    
More information about the rb-commits
mailing list