Merge lp:~ericmoritz/bzr/488519-walkdirs-encoding-bug into lp:bzr

Proposed by Eric Moritz on 2010-06-08
Status: Merged
Approved by: Vincent Ladeuil on 2010-06-14
Approved revision: 5294
Merged at revision: 5302
Proposed branch: lp:~ericmoritz/bzr/488519-walkdirs-encoding-bug
Merge into: lp:bzr
Diff against target: 194 lines (+74/-13)
4 files modified
bzrlib/errors.py (+3/-3)
bzrlib/osutils.py (+25/-9)
bzrlib/tests/__init__.py (+11/-0)
bzrlib/tests/test_osutils.py (+35/-1)
To merge this branch: bzr merge lp:~ericmoritz/bzr/488519-walkdirs-encoding-bug
Reviewer Review Type Date Requested Status
Martin Packman (community) 2010-06-08 Approve on 2010-06-10
John A Meinel 2010-06-08 Approve on 2010-06-08
Martin Pool 2010-06-08 Pending
Eric Moritz Pending
Review via email: mp+27006@code.launchpad.net

This proposal supersedes a proposal from 2010-06-08.

Description of the change

A fix for bug #488519

To post a comment you must log in.
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> You have been requested to review the proposed merge of lp:~ericmoritz/bzr/488519-walkdirs-encoding-bug into lp:bzr.
>
> A fix for bug #488519
>
>

This is the 'walkdirs' code and not the '_walkdirs_utf8' code, the
latter being the more performance critical code. And he is correct that
we are already casting everything to Unicode, except for what Python
cannot cast based on fs_encoding. And the only way we can determine that
is by checking the type of the string. As such, I think his change is
reasonable.

I will note that:

a) I prefer "if type(filename) is unicode" versus the isinstance check,
but it shouldn't be a huge difference. (isinstance has to check the
inheritance heirarchy, but I would assume that being exact 99.9% of the
time means it would still be fast.)

b) I expect that if we get a string, then the decode will fail. That is
sort of the crux of the problem (python listdir() failed to decode it,
so it just returns a string, I don't know why we would expect to find
differently.)

c) It would be better to put ByteStreamedNamedFilesystem into
bzrlib/tests/features.py and call it "byte_stream_named_filesystem". But
maybe not since the other ones like UTF8Filesystem are still in this
location.

The rest seems ok as long as the other reviewers are happy.

 review: approve
John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwOn90ACgkQJdeBCYSNAAObeACgqq60DJ3Jy3T3MnZ4ZSERL4gK
X4YAoMH9WydE+dCoYL1c4MEmEQzie42y
=Ctuq
-----END PGP SIGNATURE-----

review: Approve
Martin Packman (gz) wrote :

There's still bzrlib.errors spurious trailing whitespace change, and now there are more in bzrlib.osutils as well...

Looks to me like all the substantive issues have been addressed though, let's try this out.

review: Approve
Eric Moritz (ericmoritz) wrote :

Ok I'll delete the trailing whitespace and resubmit

Martin Packman (gz) wrote :

> Ok I'll delete the trailing whitespace and resubmit

You don't need to resubmit for every minor change, just push the new revision up to you branch and launchpad will update this merge proposal and the diff.

5292. By Eric Moritz on 2010-06-11

merge in lp:bzr.dev

5293. By Eric Moritz on 2010-06-11

Deleted trailing whitespace

5294. By Eric Moritz on 2010-06-11

used "type(filename) == unicode" per John A Meinel's suggestion. This will
be slightly faster than isinstance

Eric Moritz (ericmoritz) wrote :

Ok, I've applied the new suggestion

Vincent Ladeuil (vila) wrote :

The NEWS entry was missing, I added it and submitted to pqm.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/errors.py'
2--- bzrlib/errors.py 2010-05-28 06:04:57 +0000
3+++ bzrlib/errors.py 2010-06-11 01:51:29 +0000
4@@ -1923,12 +1923,12 @@
5 class CantMoveRoot(BzrError):
6
7 _fmt = "Moving the root directory is not supported at this time"
8-
9-
10+
11+
12 class TransformRenameFailed(BzrError):
13
14 _fmt = "Failed to rename %(from_path)s to %(to_path)s: %(why)s"
15-
16+
17 def __init__(self, from_path, to_path, why, errno):
18 self.from_path = from_path
19 self.to_path = to_path
20
21=== modified file 'bzrlib/osutils.py'
22--- bzrlib/osutils.py 2010-05-27 17:59:18 +0000
23+++ bzrlib/osutils.py 2010-06-11 01:51:29 +0000
24@@ -931,7 +931,7 @@
25
26 def parent_directories(filename):
27 """Return the list of parent directories, deepest first.
28-
29+
30 For example, parent_directories("a/b/c") -> ["a/b", "a"].
31 """
32 parents = []
33@@ -961,7 +961,7 @@
34 # NB: This docstring is just an example, not a doctest, because doctest
35 # currently can't cope with the use of lazy imports in this namespace --
36 # mbp 20090729
37-
38+
39 # This currently doesn't report the failure at the time it occurs, because
40 # they tend to happen very early in startup when we can't check config
41 # files etc, and also we want to report all failures but not spam the user
42@@ -1037,8 +1037,8 @@
43
44
45 def delete_any(path):
46- """Delete a file, symlink or directory.
47-
48+ """Delete a file, symlink or directory.
49+
50 Will delete even if readonly.
51 """
52 try:
53@@ -1233,6 +1233,22 @@
54 # but for now, we haven't optimized...
55 return [canonical_relpath(base, p) for p in paths]
56
57+
58+def decode_filename(filename):
59+ """Decode the filename using the filesystem encoding
60+
61+ If it is unicode, it is returned.
62+ Otherwise it is decoded from the the filesystem's encoding. If decoding
63+ fails, a errors.BadFilenameEncoding exception is raised.
64+ """
65+ if type(filename) is unicode:
66+ return filename
67+ try:
68+ return filename.decode(_fs_enc)
69+ except UnicodeDecodeError:
70+ raise errors.BadFilenameEncoding(filename, _fs_enc)
71+
72+
73 def safe_unicode(unicode_or_utf8_string):
74 """Coerce unicode_or_utf8_string into unicode.
75
76@@ -1644,7 +1660,7 @@
77 dirblock = []
78 append = dirblock.append
79 try:
80- names = sorted(_listdir(top))
81+ names = sorted(map(decode_filename, _listdir(top)))
82 except OSError, e:
83 if not _is_error_enotdir(e):
84 raise
85@@ -2020,14 +2036,14 @@
86
87 def send_all(sock, bytes, report_activity=None):
88 """Send all bytes on a socket.
89-
90+
91 Breaks large blocks in smaller chunks to avoid buffering limitations on
92 some platforms, and catches EINTR which may be thrown if the send is
93 interrupted by a signal.
94
95 This is preferred to socket.sendall(), because it avoids portability bugs
96 and provides activity reporting.
97-
98+
99 :param report_activity: Call this as bytes are read, see
100 Transport._report_activity
101 """
102@@ -2121,7 +2137,7 @@
103
104 def until_no_eintr(f, *a, **kw):
105 """Run f(*a, **kw), retrying if an EINTR error occurs.
106-
107+
108 WARNING: you must be certain that it is safe to retry the call repeatedly
109 if EINTR does occur. This is typically only true for low-level operations
110 like os.read. If in any doubt, don't use this.
111@@ -2258,7 +2274,7 @@
112 if sys.platform == 'win32':
113 def open_file(filename, mode='r', bufsize=-1):
114 """This function is used to override the ``open`` builtin.
115-
116+
117 But it uses O_NOINHERIT flag so the file handle is not inherited by
118 child processes. Deleting or renaming a closed file opened with this
119 function is not blocking child processes.
120
121=== modified file 'bzrlib/tests/__init__.py'
122--- bzrlib/tests/__init__.py 2010-05-25 17:27:52 +0000
123+++ bzrlib/tests/__init__.py 2010-06-11 01:51:29 +0000
124@@ -4345,6 +4345,17 @@
125 UnicodeFilename = _UnicodeFilename()
126
127
128+class _ByteStringNamedFilesystem(Feature):
129+ """Is the filesystem based on bytes?"""
130+
131+ def _probe(self):
132+ if os.name == "posix":
133+ return True
134+ return False
135+
136+ByteStringNamedFilesystem = _ByteStringNamedFilesystem()
137+
138+
139 class _UTF8Filesystem(Feature):
140 """Is the filesystem UTF-8?"""
141
142
143=== modified file 'bzrlib/tests/test_osutils.py'
144--- bzrlib/tests/test_osutils.py 2010-05-27 17:59:18 +0000
145+++ bzrlib/tests/test_osutils.py 2010-06-11 01:51:29 +0000
146@@ -1083,6 +1083,40 @@
147 # Ensure the message contains the file name
148 self.assertContainsRe(str(e), "\./test-unreadable")
149
150+
151+ def test_walkdirs_encoding_error(self):
152+ # <https://bugs.launchpad.net/bzr/+bug/488519>
153+ # walkdirs didn't raise a useful message when the filenames
154+ # are not using the filesystem's encoding
155+
156+ # require a bytestring based filesystem
157+ self.requireFeature(tests.ByteStringNamedFilesystem)
158+
159+ tree = [
160+ '.bzr',
161+ '0file',
162+ '1dir/',
163+ '1dir/0file',
164+ '1dir/1dir/',
165+ '1file'
166+ ]
167+
168+ self.build_tree(tree)
169+
170+ # rename the 1file to a latin-1 filename
171+ os.rename("./1file", "\xe8file")
172+
173+ self._save_platform_info()
174+ win32utils.winver = None # Avoid the win32 detection code
175+ osutils._fs_enc = 'UTF-8'
176+
177+ # this should raise on error
178+ def attempt():
179+ for dirdetail, dirblock in osutils.walkdirs('.'):
180+ pass
181+
182+ self.assertRaises(errors.BadFilenameEncoding, attempt)
183+
184 def test__walkdirs_utf8(self):
185 tree = [
186 '.bzr',
187@@ -1921,7 +1955,7 @@
188 def restore_osutils_globals(self):
189 osutils._terminal_size_state = self._orig_terminal_size_state
190 osutils._first_terminal_size = self._orig_first_terminal_size
191-
192+
193 def replace_stdout(self, new):
194 self.overrideAttr(sys, 'stdout', new)
195