[Git][reproducible-builds/diffoscope][master] 2 commits: Rewrite the calculation of a file's "fuzzy hash" to make the control flow cleaner.

Chris Lamb (@lamby) gitlab at salsa.debian.org
Fri Jul 16 14:18:26 UTC 2021



Chris Lamb pushed to branch master at Reproducible Builds / diffoscope


Commits:
15590583 by Chris Lamb at 2021-07-16T15:16:04+01:00
Rewrite the calculation of a file's "fuzzy hash" to make the control flow cleaner.

For next commits.

- - - - -
8b673b26 by Chris Lamb at 2021-07-16T15:18:02+01:00
Don't traceback on an broken symlink in a directory. (Closes: reproducible-builds/diffoscope#269)

- - - - -


1 changed file:

- diffoscope/comparators/utils/file.py


Changes:

=====================================
diffoscope/comparators/utils/file.py
=====================================
@@ -337,21 +337,29 @@ class File(metaclass=abc.ABCMeta):
 
         @property
         def fuzzy_hash(self):
-            if not hasattr(self, "_fuzzy_hash"):
+            def calc():
                 # tlsh is not meaningful with files smaller than 512 bytes
-                if os.stat(self.path).st_size >= 512:
-                    h = tlsh.Tlsh()
-                    with open(self.path, "rb") as f:
-                        for buf in iter(lambda: f.read(32768), b""):
-                            h.update(buf)
-                    h.final()
-                    try:
-                        self._fuzzy_hash = h.hexdigest()
-                    except ValueError:
-                        # File must contain a certain amount of randomness.
-                        self._fuzzy_hash = None
-                else:
-                    self._fuzzy_hash = None
+                try:
+                    if os.stat(self.path).st_size < 512:
+                        return None
+                except FileNotFoundError:
+                    # eg. invalid symlink
+                    return None
+
+                h = tlsh.Tlsh()
+                with open(self.path, "rb") as f:
+                    for buf in iter(lambda: f.read(32768), b""):
+                        h.update(buf)
+                h.final()
+
+                try:
+                    self._fuzzy_hash = h.hexdigest()
+                except ValueError:
+                    # File must contain a certain amount of randomness.
+                    return None
+
+            if not hasattr(self, "_fuzzy_hash"):
+                self._fuzzy_hash = calc()
             return self._fuzzy_hash
 
     @abc.abstractmethod



View it on GitLab: https://salsa.debian.org/reproducible-builds/diffoscope/-/compare/9aefdb654df57c4e4c6fdeb37ba135492a64d623...8b673b26d07df9184241a64429695f65d78b205f

-- 
View it on GitLab: https://salsa.debian.org/reproducible-builds/diffoscope/-/compare/9aefdb654df57c4e4c6fdeb37ba135492a64d623...8b673b26d07df9184241a64429695f65d78b205f
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-commits/attachments/20210716/8e2edf13/attachment.htm>


More information about the rb-commits mailing list