[Git][reproducible-builds/diffoscope][master] 3 commits: Fix missing diff output on large diffs.
Chris Lamb (@lamby)
gitlab at salsa.debian.org
Mon Nov 15 19:02:58 UTC 2021
Chris Lamb pushed to branch master at Reproducible Builds / diffoscope
Commits:
6790469f by Brandon Maier at 2021-11-15T11:01:53-08:00
Fix missing diff output on large diffs.
When there is a large diff chunk, match_lines() will skip running the
difflib.Differ.compare(). However this causes the following issues:
- It does not empty the `self.buf` buffer. This means that all future
calls to match_lines() for that file will always be too large. So
effectively no more diffs from the file get output.
- It outputs a debug message, but does not output anything to the
side-by-side diff, so a user looking at the side-by-side diff may be
misled into thinking the rest of the file has no differences.
We can fix these issue by falling back to a lazy line-by-line diff. This
produces suboptimal output, but it runs in linear O(n) time while
providing some form of output. We include a comment in the diff so the
user knows the following output is using a lazy diff algorithm.
- - - - -
11cdb97c by Chris Lamb at 2021-11-15T11:02:02-08:00
Apply Black to previous commit.
Gbp-dch: ignore
- - - - -
592c401b by Chris Lamb at 2021-11-15T11:02:38-08:00
Import itertools top-level directly.
Gbp-dch: ignore
- - - - -
1 changed file:
- diffoscope/diff.py
Changes:
=====================================
diffoscope/diff.py
=====================================
@@ -24,6 +24,7 @@ import errno
import fcntl
import hashlib
import logging
+import itertools
import threading
import subprocess
@@ -551,11 +552,11 @@ class SideBySideDiff:
if len(l0) + len(l1) > 750:
# difflib.Differ.compare is at least O(n^2), so don't call it if
# our inputs are too large.
- logger.debug(
- "Not calling difflib.Differ.compare(x, y) with len(x) == %d and len(y) == %d",
- len(l0),
- len(l1),
+ yield "C", "Diff chunk too large, falling back to line-by-line diff ({} lines added, {} lines removed)".format(
+ self.add_cpt, self.del_cpt
)
+ for line0, line1 in itertools.zip_longest(l0, l1, fillvalue=""):
+ yield from self.yield_line(line0, line1)
return
saved_line = None
View it on GitLab: https://salsa.debian.org/reproducible-builds/diffoscope/-/compare/3ab6acb816fa5e38cc58e6ad69515eef1ae4fe61...592c401bcad2ebffff195e23640031539fdf3a94
--
View it on GitLab: https://salsa.debian.org/reproducible-builds/diffoscope/-/compare/3ab6acb816fa5e38cc58e6ad69515eef1ae4fe61...592c401bcad2ebffff195e23640031539fdf3a94
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-commits/attachments/20211115/7c2eb4f6/attachment.htm>
More information about the rb-commits
mailing list