[diffoscope] 01/01: When extracting archives, try to keep directory sizes small

Ximin Luo infinity0 at debian.org
Mon Feb 13 14:31:42 CET 2017


This is an automated email from the git hooks/post-receive script.

infinity0 pushed a commit to branch master
in repository diffoscope.

commit 33267c20e7ee884a54b5ebe955c1e5d92200bb17
Author: Ximin Luo <infinity0 at debian.org>
Date:   Mon Feb 13 14:31:04 2017 +0100

    When extracting archives, try to keep directory sizes small
---
 diffoscope/comparators/utils/libarchive.py | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/diffoscope/comparators/utils/libarchive.py b/diffoscope/comparators/utils/libarchive.py
index 4924a31..4206efd 100644
--- a/diffoscope/comparators/utils/libarchive.py
+++ b/diffoscope/comparators/utils/libarchive.py
@@ -219,11 +219,17 @@ class LibarchiveContainer(Archive):
 
                 # Maintain a mapping of archive path to the extracted path,
                 # avoiding the need to sanitise filenames.
-                dst = os.path.join(tmpdir, str(idx))
+                not_first = idx % 4096
+                # keep directory sizes small. could be improved but should be
+                # good enough for "ordinary" large archives.
+                basename = os.path.join(str(idx // 4096), str(not_first))
+                dst = os.path.join(tmpdir, basename)
                 self._members[entry.pathname] = dst
 
                 logger.debug("Extracting %s to %s", entry.pathname, dst)
 
+                if not not_first:
+                    os.makedirs(os.path.dirname(dst), exist_ok=True)
                 with open(dst, 'wb') as f:
                     for block in entry.get_blocks():
                         f.write(block)

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/diffoscope.git


More information about the diffoscope mailing list