[diffoscope] 01/02: Ensure that we always get path names from libarchive as str

Jérémy Bobbio lunar at moszumanska.debian.org
Sun Dec 20 23:15:58 CET 2015


This is an automated email from the git hooks/post-receive script.

lunar pushed a commit to branch master
in repository diffoscope.

commit 140071f6868ec625c4b55cf687e2cc29dd49d122
Author: Jérémy Bobbio <lunar at debian.org>
Date:   Sun Dec 20 22:01:11 2015 +0000

    Ensure that we always get path names from libarchive as str
    
    libarchive-c will return the pathname either as str or bytes if the string
    can't be decoded in the current locale. Sadly, it makes the result very hard
    to predict, especially as we might encounter weird archives.
    
    As a solution, we monkeypatch libarchive binding so we always get a str object,
    escaping undecodable bytes.
    
    Closes: #808541
---
 diffoscope/comparators/libarchive.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/diffoscope/comparators/libarchive.py b/diffoscope/comparators/libarchive.py
index 2a49d95..f67e310 100644
--- a/diffoscope/comparators/libarchive.py
+++ b/diffoscope/comparators/libarchive.py
@@ -37,6 +37,10 @@ if not hasattr(libarchive.ffi, 'entry_rdevminor'):
     libarchive.ffi.ffi('entry_rdevminor', [libarchive.ffi.c_archive_entry_p], ctypes.c_uint)
     libarchive.ArchiveEntry.rdevminor = property(lambda self: libarchive.ffi.entry_rdevminor(self._entry_p))
 
+# Monkeypatch libarchive-c so we always get pathname as (Unicode) str
+# Otherwise, we'll get sometimes str and sometimes bytes and always pain.
+libarchive.ArchiveEntry.pathname = property(lambda self: libarchive.ffi.entry_pathname(self._entry_p).decode('utf-8', errors='surrogateescape'))
+
 
 class LibarchiveMember(ArchiveMember):
     def __init__(self, archive, entry):

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/diffoscope.git


More information about the diffoscope mailing list