[diffoscope] 01/01: Don't show a difference for identical but long diff inputs

Jérémy Bobbio lunar at moszumanska.debian.org
Wed Jan 20 00:39:48 CET 2016


This is an automated email from the git hooks/post-receive script.

lunar pushed a commit to branch master
in repository diffoscope.

commit 21b04d4065c57e4ff868a56213835a06e36b6e2a
Author: Jérémy Bobbio <lunar at debian.org>
Date:   Wed Jan 20 00:24:47 2016 +0100

    Don't show a difference for identical but long diff inputs
    
    If the input for diff hit the number of maximum lines, we would always output
    a difference, even if both input were actually the same. This usually show
    when comparing ELF files and creates confusing reports.
    
    So lets compute a hash while we feed lines to diff and write it in the final
    line in case we hit the limit. In case there were differences in the hidden
    part, the hash will be different and will appear as such in the report.
    
    This should make diffoscope run longer for large diff inputs, but more correct
    reports worths it.
---
 diffoscope/difference.py | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/diffoscope/difference.py b/diffoscope/difference.py
index d6d49e1..0fedfdc 100644
--- a/diffoscope/difference.py
+++ b/diffoscope/difference.py
@@ -18,6 +18,7 @@
 # along with diffoscope.  If not, see <http://www.gnu.org/licenses/>.
 
 from contextlib import contextmanager
+import hashlib
 from io import StringIO
 import os
 import os.path
@@ -224,17 +225,22 @@ def empty_file_feeder():
 
 def make_feeder_from_raw_reader(in_file, filter=lambda buf: buf):
     def feeder(out_file):
+        max_lines = Config.general.max_diff_input_lines
         line_count = 0
         end_nl = False
+        if max_lines > 0:
+            h = hashlib.sha1()
         for buf in in_file:
             line_count += 1
-            out_file.write(filter(buf))
-            max_lines = Config.general.max_diff_input_lines
-            if max_lines > 0 and line_count >= max_lines:
-                out_file.write('[ Too much input for diff ]{}\n'.format(' ' * out_file.fileno()).encode('utf-8'))
-                end_nl = True
-                break
+            out = filter(buf)
+            if h:
+                h.update(out)
+            if line_count < max_lines:
+                out_file.write(out)
             end_nl = buf[-1] == '\n'
+        if h and line_count >= max_lines:
+            out_file.write('[ Too much input for diff (SHA1: {}) ]\n'.format(h.hexdigest()).encode('utf-8'))
+            end_nl = True
         return end_nl
     return feeder
 

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/diffoscope.git


More information about the diffoscope mailing list