UnicodeEncodeError when logging improperly encoded filenames
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Duplicity |
Fix Released
|
Medium
|
Unassigned | ||
duplicity (Ubuntu) |
Fix Committed
|
Low
|
Unassigned |
Bug Description
Attempts to log messages which contain unicode surrogate characters cause exceptions.
(These surrogate characters arise, for example, when handling files whose names are not properly encoded as UTF-8.)
NOTE: I have no idea whether this is an issue when running on python 2. (If it is, the fixes suggested below probably won't work.)
Duplicity version: 0.8.15
Python version: 3.8.5
Target filesystem: Linux
Example log output:
--- Logging error ---
Traceback (most recent call last):
File "/opt/Python-
stream.
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 45: surrogates not allowed
Call stack:
File "/root/
with_
File "/root/
fn()
File "/root/
do_
File "/root/
full_
File "/root/
bytes_written = write_multivol(
File "/root/
at_end = gpg.GPGWriteFil
File "/root/
data = block_iter.
File "/root/
result = self.process(
File "/root/
log_
File "/root/
log.Info(_(u"A %s") %
File "/root/
Log(s, INFO, code, extra)
File "/root/
_logger.
Message: 'A home/dairiki/
Arguments: ()
Steps to reproduce:
- Have a file with funny characters in its name, encoded in latin-1 encoding. E.g. a file whose name is "Fü" encoded to latin-1 (b'F\xfc'). When duplicity handles this file, the improperly encoded character will be replaced with a unicode surrogate character.
- Attempt to create an archive containing this file, with verbosity set to 5. Duplicity will try to log each file processed. When it gets to this file, an exception will be reported (and the file will not make it into the archive.)
Alternative steps to produce:
- If the archive is created with verbosity less than 5, the file will make it into the archive. However, if an attempt is made to list files using 'duplicity list-current-
Workaround
==========
A simple workaround is to set the environment variable PYTHONIOENCODIN
Possible Fix
============
A possible fix, at least for Py3K, is probably for duplicity to explicitly set the encoding error strategy for stdin and stdout.
For python >= 3.7 this is simple:
sys.
sys.
For earlier pythons (>= 3), the best option might be:
sys.stdin = codecs.
(and similarly for stderr)
Note that python 2 doesn't know about errors=
Possible Similar Issue
=======
I didn't actually verify that this fails, but it appears that there might be a similar issue when using the --log-fd command line option. Function duplicity.
handler = logging.
In Python 3 os.fdopen (an alias for open) opens the stream with errors='strict' by default.
handler = logging.
or
handler = logging.
is probably a better choice. (But neither will work in python 2.)
Changed in duplicity: | |
milestone: | 0.8.16 → 0.8.17 |
Changed in duplicity: | |
assignee: | Kenneth Loafman (kenneth-loafman) → nobody |
status: | Confirmed → Fix Committed |
Changed in duplicity (Ubuntu): | |
importance: | Undecided → Low |
status: | New → Fix Committed |
Changed in duplicity: | |
status: | Fix Committed → Fix Released |
I apologize. I just now noticed that the duplicity project is apparently moving to gitlab?
Let me know if you'd like me to re-file this bug over there.