[Git][reproducible-builds/diffoscope][master] 2 commits: Allow user to mask/filter reader output via --diff-mask=REGEX. (MR:...
Chris Lamb
gitlab at salsa.debian.org
Fri Jun 26 15:04:17 UTC 2020
Chris Lamb pushed to branch master at Reproducible Builds / diffoscope
Commits:
354445a9 by Chris Lamb at 2020-06-26T10:59:57+01:00
Allow user to mask/filter reader output via --diff-mask=REGEX. (MR: reproducible-builds/diffoscope!51)
Signed-off-by: Chris Lamb <lamby at debian.org>
- - - - -
958206f2 by Chris Lamb at 2020-06-26T15:57:44+01:00
releasing package diffoscope version 149
- - - - -
8 changed files:
- debian/changelog
- debian/zsh-completion/_diffoscope
- diffoscope/__init__.py
- diffoscope/config.py
- diffoscope/feeders.py
- diffoscope/main.py
- + tests/test_diff_mask.py
- + tests/test_exclude_substrings.py
Changes:
=====================================
debian/changelog
=====================================
@@ -1,8 +1,33 @@
-diffoscope (149) UNRELEASED; urgency=medium
+diffoscope (149) unstable; urgency=medium
- * WIP (generated upon release).
+ [ Chris Lamb ]
+ * Update tests for file 5.39. (Closes: reproducible-builds/diffoscope#179)
+ * Downgrade the tlsh warning message to an "info" level warning.
+ (Closes: #888237, reproducible-builds/diffoscope#29)
+ * Use the CSS "word-break" property over manually adding U+200B zero-width
+ spaces that make copy-pasting cumbersome.
+ (Closes: reproducible-builds/diffoscope!53)
+
+ * Codebase improvements:
+ - Drop some unused imports from the previous commit.
+ - Prevent an unnecessary .format() when rendering difference comments.
+ - Use a semantic "AbstractMissingType" type instead of remembering to check
+ for both "missing" files and missing containers.
+
+ [ Jean-Romain Garnier ]
+ * Allow user to mask/filter reader output via --diff-mask=REGEX.
+ (MR: reproducible-builds/diffoscope!51)
+ * Make --html-dir child pages open in new window to accommodate new web
+ browser content security policies.
+ * Fix the --new-file option when comparing directories by merging
+ DirectoryContainer.compare and Container.compare.
+ (Closes: reproducible-builds/diffoscope#180)
+ * Fix zsh completion for --max-page-diff-block-lines.
+
+ [ Mattia Rizzolo ]
+ * Do not warn about missing tlsh during tests.
- -- Chris Lamb <lamby at debian.org> Fri, 19 Jun 2020 11:39:15 +0100
+ -- Chris Lamb <lamby at debian.org> Fri, 26 Jun 2020 15:57:41 +0100
diffoscope (148) unstable; urgency=medium
=====================================
debian/zsh-completion/_diffoscope
=====================================
@@ -31,6 +31,7 @@ _diffoscope() {
'--exclude=[Exclude files whose names (including any directory part) match %(metavar)s. Use this option to ignore files based on their names.]:' \
'--exclude-command=[Exclude commands that match %(metavar)s. For example "^readelf.*\s--debug-dump=info" can take a long time and differences here are likely secondary differences caused by something represented elsewhere. Use this option to disable commands that use a lot of resources.]:' \
'--exclude-directory-metadata=[Exclude directory metadata. Useful if comparing files whose filesystem-level metadata is not intended to be distributed to other systems. This is true for most distributions package builders, but not true for the output of commands such as `make install`. Metadata of archive members remain un-excluded except if "recursive" choice is set. Use this option to ignore permissions, timestamps, xattrs etc. Default: False if comparing two directories, else True. Note that "file" metadata actually a property of its containing directory, and is not relevant when distributing the file across systems.]:--exclude-directory-metadata :(auto yes no recursive)' \
+ '--diff-mask=[Replace/unify substrings that match regular expression %(metavar)s from output strings before applying diff. For example, to filter out a version number or changed path.]:' \
'--fuzzy-threshold=[Threshold for fuzzy-matching (0 to disable, %(default)s is default, 400 is high fuzziness)]:' \
'--tool-prefix-binutils=[Prefix for binutils program names, e.g. "aarch64-linux-gnu-" for a foreign-arch binary or "g" if you"re on a non-GNU system.]:' \
'--max-diff-input-lines=[Maximum number of lines fed to diff(1) (0 to disable, default: 4194304)]:' \
=====================================
diffoscope/__init__.py
=====================================
@@ -18,4 +18,4 @@
# You should have received a copy of the GNU General Public License
# along with diffoscope. If not, see <https://www.gnu.org/licenses/>.
-VERSION = "148"
+VERSION = "149"
=====================================
diffoscope/config.py
=====================================
@@ -55,6 +55,7 @@ class Config:
self.max_text_report_size = 0
self.difftool = None
+ self.diff_masks = ()
self.new_file = False
self.fuzzy_threshold = 60
self.enforce_constraints = True
=====================================
diffoscope/feeders.py
=====================================
@@ -18,10 +18,12 @@
# You should have received a copy of the GNU General Public License
# along with diffoscope. If not, see <https://www.gnu.org/licenses/>.
+import re
import signal
import hashlib
import logging
import subprocess
+import functools
from .config import Config
from .profiling import profile
@@ -31,6 +33,39 @@ logger = logging.getLogger(__name__)
DIFF_CHUNK = 4096
+ at functools.lru_cache(maxsize=128)
+def compile_string_regex(regex_str):
+ return re.compile(regex_str)
+
+
+ at functools.lru_cache(maxsize=128)
+def compile_bytes_regex(regex_str):
+ return re.compile(regex_str.encode("utf-8"))
+
+
+def filter_reader(buf, additional_filter=None):
+ # Apply the passed filter first, for example Command.filter
+ if additional_filter:
+ buf = additional_filter(buf)
+
+ # No need to work on empty lines
+ if not buf:
+ return buf
+
+ # Use either str or bytes objects depending on buffer type
+ if isinstance(buf, str):
+ compile_func = compile_string_regex
+ replace = "[filtered]"
+ else:
+ compile_func = compile_bytes_regex
+ replace = b"[filtered]"
+
+ for regex in Config().diff_masks:
+ buf = compile_func(regex).sub(replace, buf)
+
+ return buf
+
+
def from_raw_reader(in_file, filter=None):
def feeder(out_file):
max_lines = Config().max_diff_input_lines
@@ -45,7 +80,7 @@ def from_raw_reader(in_file, filter=None):
for buf in in_file:
line_count += 1
- out = buf if filter is None else filter(buf)
+ out = filter_reader(buf, filter)
if h is not None:
h.update(out)
@@ -113,7 +148,9 @@ def from_command(command):
def from_text(content):
def feeder(f):
for offset in range(0, len(content), DIFF_CHUNK):
- f.write(content[offset : offset + DIFF_CHUNK].encode("utf-8"))
+ buf = filter_reader(content[offset : offset + DIFF_CHUNK])
+ f.write(buf.encode("utf-8"))
+
return content and content[-1] == "\n"
return feeder
=====================================
diffoscope/main.py
=====================================
@@ -312,6 +312,16 @@ def create_parser():
"and is not relevant when distributing the file across "
"systems.",
)
+ group3.add_argument(
+ "--diff-mask",
+ metavar="REGEX_PATTERN",
+ dest="diff_masks",
+ action="append",
+ default=[],
+ help="Replace/unify substrings that match regular expression "
+ "%(metavar)s from output strings before applying diff. For example, to "
+ "filter out a version number or changed path.",
+ )
group3.add_argument(
"--fuzzy-threshold",
type=int,
@@ -612,6 +622,7 @@ def configure(parsed_args):
Config().exclude_directory_metadata = (
parsed_args.exclude_directory_metadata
)
+ Config().diff_masks = parsed_args.diff_masks
Config().compute_visual_diffs = PresenterManager().compute_visual_diffs()
=====================================
tests/test_diff_mask.py
=====================================
@@ -0,0 +1,71 @@
+# -*- coding: utf-8 -*-
+#
+# diffoscope: in-depth comparison of files, archives, and directories
+#
+# Copyright © 2020 Chris Lamb <lamby at debian.org>
+#
+# diffoscope is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# diffoscope is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with diffoscope. If not, see <https://www.gnu.org/licenses/>.
+
+import os
+import re
+import pytest
+
+from diffoscope.main import main
+
+
+def run(capsys, *args):
+ with pytest.raises(SystemExit) as exc:
+ main(
+ args
+ + tuple(
+ os.path.join(os.path.dirname(__file__), "data", x)
+ for x in ("test1.tar", "test2.tar")
+ )
+ )
+
+ out, err = capsys.readouterr()
+
+ assert err == ""
+
+ return exc.value.code, out
+
+
+def test_none(capsys):
+ ret, out = run(capsys)
+ # Make sure the output doesn't contain any [filtered]
+ assert re.search(r"\[filtered\]", out) is None
+ assert ret == 1
+
+
+def test_all(capsys):
+ ret, out = run(capsys, "--diff-mask=.*")
+
+ # Make sure the correct sections were filtered
+ assert "file list" not in out
+ assert "dir/link" not in out
+
+ # Make sure the output contains only [filtered]
+ # Lines of content start with "│ ", and then either have a +, a - or a space
+ # depending on the type of change
+ # It should then only contain "[filtered]" until the end of the string
+ assert re.search(r"│\s[\s\+\-](?!(\[filtered\])+)", out) is None
+ assert ret == 1
+
+
+def test_specific(capsys):
+ ret, out = run(capsys, "--diff-mask=^Lorem")
+ # Make sure only the Lorem ipsum at the start of the line was filtered
+ assert "[filtered] ipsum dolor sit amet" in out
+ assert '"Lorem ipsum"' in out
+ assert ret == 1
=====================================
tests/test_exclude_substrings.py
=====================================
@@ -0,0 +1,71 @@
+# -*- coding: utf-8 -*-
+#
+# diffoscope: in-depth comparison of files, archives, and directories
+#
+# Copyright © 2020 Chris Lamb <lamby at debian.org>
+#
+# diffoscope is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# diffoscope is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with diffoscope. If not, see <https://www.gnu.org/licenses/>.
+
+import os
+import re
+import pytest
+
+from diffoscope.main import main
+
+
+def run(capsys, *args):
+ with pytest.raises(SystemExit) as exc:
+ main(
+ args
+ + tuple(
+ os.path.join(os.path.dirname(__file__), "data", x)
+ for x in ("test1.tar", "test2.tar")
+ )
+ )
+
+ out, err = capsys.readouterr()
+
+ assert err == ""
+
+ return exc.value.code, out
+
+
+def test_none(capsys):
+ ret, out = run(capsys)
+ # Make sure the output doesn't contain any [filtered]
+ assert re.search(r"\[filtered\]", out) is None
+ assert ret == 1
+
+
+def test_all(capsys):
+ ret, out = run(capsys, "--diff-mask=.*")
+
+ # Make sure the correct sections were filtered
+ assert "file list" not in out
+ assert "dir/link" not in out
+
+ # Make sure the output contains only [filtered]
+ # Lines of content start with "│ ", and then either have a +, a - or a space
+ # depending on the type of change
+ # It should then only contain "[filtered]" until the end of the string
+ assert re.search(r"│\s[\s\+\-](?!(\[filtered\])+)", out) is None
+ assert ret == 1
+
+
+def test_specific(capsys):
+ ret, out = run(capsys, "--diff-mask=^Lorem")
+ # Make sure only the Lorem ipsum at the start of the line was filtered
+ assert "[filtered] ipsum dolor sit amet" in out
+ assert '"Lorem ipsum"' in out
+ assert ret == 1
View it on GitLab: https://salsa.debian.org/reproducible-builds/diffoscope/-/compare/b2f9aa3e600d6f18edd7581df968d27be488af6a...958206f24fb8898ec824fa61983087bec3a936a1
--
View it on GitLab: https://salsa.debian.org/reproducible-builds/diffoscope/-/compare/b2f9aa3e600d6f18edd7581df968d27be488af6a...958206f24fb8898ec824fa61983087bec3a936a1
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-commits/attachments/20200626/d2e8e22e/attachment.htm>
More information about the rb-commits
mailing list