Merge lp:~gz/bzr/path_from_environ_832028 into lp:bzr

Proposed by Martin Packman
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 6369
Proposed branch: lp:~gz/bzr/path_from_environ_832028
Merge into: lp:bzr
Diff against target: 372 lines (+87/-99)
4 files modified
bzrlib/osutils.py (+31/-24)
bzrlib/tests/test_lockdir.py (+2/-1)
bzrlib/tests/test_osutils.py (+0/-6)
bzrlib/win32utils.py (+54/-68)
To merge this branch: bzr merge lp:~gz/bzr/path_from_environ_832028
Reviewer Review Type Date Requested Status
Vincent Ladeuil Needs Fixing
Review via email: mp+85388@code.launchpad.net

Commit message

Add path_from_environ and clean ups to osutils and win32utils

Description of the change

The basic idea behind this branch is to add an interface for retrieving a path from the environment, it's just a little bogged down in osutils/win32utils tech debt.

The main drawback apart from that is that paths really are bytestrings on nix, and in a couple of places can get away with that (at least until they wind up in an exception object or similar). Without a bigger fix for lots of apis, this is pretty intractable unfortunately.

Because a few things need working out still, I've not pushed the bzrlib.config changes, this probably wants coordinating with Jelmer's changes:

<https://code.launchpad.net/~jelmer/bzr/commit-uses-config-stacks/+merge/85134>

The background is this merge proposal:

<https://code.launchpad.net/~vila/bzr/822571-bzr-home-unicode/+merge/70870>

Also want to unblock this one by fixing the various underlying api issues:

<https://code.launchpad.net/~jelmer/bzr/cachedir/+merge/71486>

Proposing despite my confusion for some feedback.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

20 + # GZ 2011-12-12:Ideally want to include `key` in the exception message
21 + raise errors.BadFilenameEncoding(val, _fs_enc)

Any reason for not defining an exception derived from BadFilenameEncoding or
just make it a bit more customizable ?

24 +def _posix_getuser_unicode():
25 + """Get username from environment or password database as unicode"""
26 + name = getpass.getuser()
27 + user_encoding = get_user_encoding()
28 + try:
29 + return name.decode(user_encoding)
30 + except UnicodeDecodeError:
31 + raise errors.BzrError("Encoding of username %r is unsupported by %s "
32 + "application locale." % (name, user_encoding))

Beware here that we had at least one case (savannah ?) where the /etc/passwd
contained utf-8 names while running under C locale.

The weird thing is that I kind of remember specific code for that... but
can't find it with a quick search. Hints includes: poolie, locks (?) and
pwd...

95 +# GZ 2011-12-12: These tests escape isolation and look at the GlobalConfig
96 +# of the user through lockdir.get_username_for_lock_info
97 class TestLockHeldInfo(TestCase):

Right, use TestCaseInTempDir then ;)

On the other hand, if you have a trick to better detect this kind of issue
(how did you find it in this case ?), let's generalize the check (if
possible, I know I encountered cases in the past where this wasn't
possible).

108 - def test_no_username_bug_660174(self):

Can you just briefly explain why the bug (hence this test) is not relevant
anymore ?

223 - windir = os.environ.get('windir')
224 + windir = get_environ_unicode('WINDIR')

Env vars are not case sensitive on windows ? Worth a reminding comment ;)
Say, in the docstring of the win32utils module ?

Your proposal deletes a lot of 'Returned value can be unicode or plain
string.' which is a huge win ! Thanks.

What's the status now ? We can safely get unicode strings from the
environment, distinguishing paths (fs_encoding) from the rest
(user_encoding), trapping illegal byte strings (good).

From there we chase all remaining call to os.environ.get ?

About "illegal" byte strings, I see two cases:

- paths inside a bzr controlled tree (I don't know a case where such a path
  is used in an env var, so we don't have to care),

- paths containing a bzr controlled tree: sorry guys, we don't support that
  and we abort as soon as we encounter an env var with such a path.

Is that correct ?

If yes, then pending the minor tweaks mentioned above, that's a significant improvement that I'd like to see landed ASAP !

review: Needs Fixing
Revision history for this message
Martin Packman (gz) wrote :
Download full text (3.8 KiB)

> Any reason for not defining an exception derived from BadFilenameEncoding or
> just make it a bit more customizable ?

Mostly because there are still a few cases to be written that could do with a custom error, and I wanted to see what they all look like before working out how to structure the exception classes. So, the current _posix_getuser_unicode exception could do with being something more specific than BzrError, and we need a few more functions like that as well.

> Beware here that we had at least one case (savannah ?) where the /etc/passwd
> contained utf-8 names while running under C locale.
>
> The weird thing is that I kind of remember specific code for that... but
> can't find it with a quick search. Hints includes: poolie, locks (?) and
> pwd...

Yep, it's just moving that existing code, so there's no behaviour change on that front.

> Right, use TestCaseInTempDir then ;)

That's why I threw in the comment, couldn't remember what change is needed so wanted your advice. :)

> On the other hand, if you have a trick to better detect this kind of issue
> (how did you find it in this case ?), let's generalize the check (if
> possible, I know I encountered cases in the past where this wasn't
> possible).

Manually I'm afraid, was seeing if I could replace the deleted test with one that checked the lock breaking code did the right thing if no username could be determined - and was very surprised to see selftest turn up a value from a config file I'd completely forgotten existed. We should be able to isolate, or at least fail on, access to GlobalConfig with some code in TestCase though.

> Can you just briefly explain why the bug (hence this test) is not relevant
> anymore ?

The username is now obtained via the existing win32utils function, which unlike the python getpass module does the right thing when run as a service. Testing that root bug is unfortunately harder than the superficial test added with the previous workaround, we don't have infrastructure in python to run processes as different users on windows.

> Env vars are not case sensitive on windows ? Worth a reminding comment ;)
> Say, in the docstring of the win32utils module ?

Will do. Would still be nice to have envvar code that was aware of this fact for the override in different case issue.

> What's the status now ? We can safely get unicode strings from the
> environment, distinguishing paths (fs_encoding) from the rest
> (user_encoding), trapping illegal byte strings (good).
>
> From there we chase all remaining call to os.environ.get ?

Right, a bunch are just getting a bool or int or something which works fine as is (bytes on nix, lossy downcoding on windows that still preserves the basic ascii characters). The remaining issues are things like BZR_EMAIL which contain text unrelated to filesystem paths but likely in the locale of the user, and os.path.expanduser("~"), which currently need remembering to decode afterwards.

> About "illegal" byte strings, I see two cases:
>
> - paths inside a bzr controlled tree (I don't know a case where such a path
> is used in an env var, so we don't have to care),
>
> - paths containing a bzr controlled tree: sorry guys, w...

Read more...

Revision history for this message
Vincent Ladeuil (vila) wrote :

> Mostly because there are still a few cases to be written that could do
> with a custom error, and I wanted to see what they all look like before
> working out how to structure the exception classes. So, the current
> _posix_getuser_unicode exception could do with being something more
> specific than BzrError, and we need a few more functions like that as
> well.

Right, nothing forbidding creating a specific exception for your use case
here, make it private if you think you'll need to fix it soon.

>
> > Beware here that we had at least one case (savannah ?) where the /etc/passwd
> > contained utf-8 names while running under C locale.
> >
> > The weird thing is that I kind of remember specific code for that... but
> > can't find it with a quick search. Hints includes: poolie, locks (?) and
> > pwd...
>
> Yep, it's just moving that existing code, so there's no behaviour change on
> that front.
>

Ghaa, if I read the code correctly then the bug is still there for
savannah... I cannot find a bug for it :-/

> > Right, use TestCaseInTempDir then ;)
>
> That's why I threw in the comment, couldn't remember what change is needed so
> wanted your advice. :)

Here you are then :)

The rule is: TestCase can only be used for tests that requires no disk
resources. As soon as you start testing stuff sufficiently high in the
stack, you're bound to encounter some config or log access.

>
> > On the other hand, if you have a trick to better detect this kind of issue
> > (how did you find it in this case ?), let's generalize the check (if
> > possible, I know I encountered cases in the past where this wasn't
> > possible).
>

> Manually I'm afraid, was seeing if I could replace the deleted test with one
> that checked the lock breaking code did the right thing if no username could
> be determined - and was very surprised to see selftest turn up a value from a
> config file I'd completely forgotten existed. We should be able to isolate, or
> at least fail on, access to GlobalConfig with some code in TestCase though.

Ha right, the infamous 'HOME is set to None but python is helpful by still
expanding ~'.

Adding configs to TestCase is a double-edge sword, this wil mask some
failures (this one included). Isolation is better achieved by using
TestCaseInTempDir or even TestCaseWithTransport when you should.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/osutils.py'
2--- bzrlib/osutils.py 2011-12-13 17:10:47 +0000
3+++ bzrlib/osutils.py 2011-12-14 18:30:32 +0000
4@@ -320,6 +320,33 @@
5 return path
6
7
8+def _posix_path_from_environ(key):
9+ """Get unicode path from `key` in environment or None if not present
10+
11+ Note that posix systems use arbitrary byte strings for filesystem objects,
12+ so a path that raises BadFilenameEncoding here may still be accessible.
13+ """
14+ val = os.environ.get(key, None)
15+ if val is None:
16+ return val
17+ try:
18+ return val.decode(_fs_enc)
19+ except UnicodeDecodeError:
20+ # GZ 2011-12-12:Ideally want to include `key` in the exception message
21+ raise errors.BadFilenameEncoding(val, _fs_enc)
22+
23+
24+def _posix_getuser_unicode():
25+ """Get username from environment or password database as unicode"""
26+ name = getpass.getuser()
27+ user_encoding = get_user_encoding()
28+ try:
29+ return name.decode(user_encoding)
30+ except UnicodeDecodeError:
31+ raise errors.BzrError("Encoding of username %r is unsupported by %s "
32+ "application locale." % (name, user_encoding))
33+
34+
35 def _win32_fixdrive(path):
36 """Force drive letters to be consistent.
37
38@@ -414,6 +441,8 @@
39 realpath = _posix_realpath
40 pathjoin = os.path.join
41 normpath = _posix_normpath
42+path_from_environ = _posix_path_from_environ
43+getuser_unicode = _posix_getuser_unicode
44 getcwd = os.getcwdu
45 rename = os.rename
46 dirname = os.path.dirname
47@@ -475,6 +504,8 @@
48 f = win32utils.get_unicode_argv # special function or None
49 if f is not None:
50 get_unicode_argv = f
51+ path_from_environ = win32utils.get_environ_unicode
52+ getuser_unicode = win32utils.get_user_name
53
54 elif sys.platform == 'darwin':
55 getcwd = _mac_getcwd
56@@ -2431,30 +2462,6 @@
57 open_file = open
58
59
60-def getuser_unicode():
61- """Return the username as unicode.
62- """
63- try:
64- user_encoding = get_user_encoding()
65- username = getpass.getuser().decode(user_encoding)
66- except UnicodeDecodeError:
67- raise errors.BzrError("Can't decode username as %s." % \
68- user_encoding)
69- except ImportError, e:
70- if sys.platform != 'win32':
71- raise
72- if str(e) != 'No module named pwd':
73- raise
74- # https://bugs.launchpad.net/bzr/+bug/660174
75- # getpass.getuser() is unable to return username on Windows
76- # if there is no USERNAME environment variable set.
77- # That could be true if bzr is running as a service,
78- # e.g. running `bzr serve` as a service on Windows.
79- # We should not fail with traceback in this case.
80- username = u'UNKNOWN'
81- return username
82-
83-
84 def available_backup_name(base, exists):
85 """Find a non-existing backup file name.
86
87
88=== modified file 'bzrlib/tests/test_lockdir.py'
89--- bzrlib/tests/test_lockdir.py 2011-08-19 22:34:02 +0000
90+++ bzrlib/tests/test_lockdir.py 2011-12-14 18:30:32 +0000
91@@ -43,6 +43,7 @@
92 from bzrlib.tests import (
93 features,
94 TestCase,
95+ TestCaseInTempDir,
96 TestCaseWithTransport,
97 )
98
99@@ -654,7 +655,7 @@
100 self.assertEqual([], self._calls)
101
102
103-class TestLockHeldInfo(TestCase):
104+class TestLockHeldInfo(TestCaseInTempDir):
105 """Can get information about the lock holder, and detect whether they're
106 still alive."""
107
108
109=== modified file 'bzrlib/tests/test_osutils.py'
110--- bzrlib/tests/test_osutils.py 2011-12-14 17:07:11 +0000
111+++ bzrlib/tests/test_osutils.py 2011-12-14 18:30:32 +0000
112@@ -2115,12 +2115,6 @@
113 self.overrideEnv('LOGNAME', u'jrandom\xb6'.encode(ue))
114 self.assertEqual(u'jrandom\xb6', osutils.getuser_unicode())
115
116- def test_no_username_bug_660174(self):
117- self.requireFeature(features.win32_feature)
118- for name in ('LOGNAME', 'USER', 'LNAME', 'USERNAME'):
119- self.overrideEnv(name, None)
120- self.assertEqual(u'UNKNOWN', osutils.getuser_unicode())
121-
122
123 class TestBackupNames(tests.TestCase):
124
125
126=== modified file 'bzrlib/win32utils.py'
127--- bzrlib/win32utils.py 2011-12-12 13:31:08 +0000
128+++ bzrlib/win32utils.py 2011-12-14 18:30:32 +0000
129@@ -20,11 +20,15 @@
130 """
131
132 import glob
133+import operator
134 import os
135 import struct
136 import sys
137
138-from bzrlib import cmdline
139+from bzrlib import (
140+ cmdline,
141+ symbol_versioning,
142+ )
143 from bzrlib.i18n import gettext
144
145 # Windows version
146@@ -63,9 +67,12 @@
147 else:
148 if winver == 'Windows 98':
149 create_buffer = ctypes.create_string_buffer
150+ def extract_buffer(buf):
151+ return buf.value.decode("mbcs")
152 suffix = 'A'
153 else:
154 create_buffer = ctypes.create_unicode_buffer
155+ extract_buffer = operator.attrgetter("value")
156 suffix = 'W'
157 try:
158 import pywintypes
159@@ -248,28 +255,12 @@
160 one that moves with the user as they logon to different machines, and
161 a 'local' one that stays local to the machine. This returns the 'roaming'
162 directory, and thus is suitable for storing user-preferences, etc.
163-
164- Returned value can be unicode or plain string.
165- To convert plain string to unicode use
166- s.decode(osutils.get_user_encoding())
167- (XXX - but see bug 262874, which asserts the correct encoding is 'mbcs')
168 """
169 appdata = _get_sh_special_folder_path(CSIDL_APPDATA)
170 if appdata:
171 return appdata
172- # from env variable
173- appdata = os.environ.get('APPDATA')
174- if appdata:
175- return appdata
176- # if we fall to this point we on win98
177- # at least try C:/WINDOWS/Application Data
178- windir = os.environ.get('windir')
179- if windir:
180- appdata = os.path.join(windir, 'Application Data')
181- if os.path.isdir(appdata):
182- return appdata
183- # did not find anything
184- return None
185+ # Use APPDATA if defined, will return None if not
186+ return get_environ_unicode('APPDATA')
187
188
189 def get_local_appdata_location():
190@@ -281,17 +272,12 @@
191 a 'local' one that stays local to the machine. This returns the 'local'
192 directory, and thus is suitable for caches, temp files and other things
193 which don't need to move with the user.
194-
195- Returned value can be unicode or plain string.
196- To convert plain string to unicode use
197- s.decode(osutils.get_user_encoding())
198- (XXX - but see bug 262874, which asserts the correct encoding is 'mbcs')
199 """
200 local = _get_sh_special_folder_path(CSIDL_LOCAL_APPDATA)
201 if local:
202 return local
203 # Vista supplies LOCALAPPDATA, but XP and earlier do not.
204- local = os.environ.get('LOCALAPPDATA')
205+ local = get_environ_unicode('LOCALAPPDATA')
206 if local:
207 return local
208 return get_appdata_location()
209@@ -302,33 +288,27 @@
210 Assume on win32 it's the <My Documents> folder.
211 If location cannot be obtained return system drive root,
212 i.e. C:\
213-
214- Returned value can be unicode or plain string.
215- To convert plain string to unicode use
216- s.decode(osutils.get_user_encoding())
217 """
218 home = _get_sh_special_folder_path(CSIDL_PERSONAL)
219 if home:
220 return home
221- # try for HOME env variable
222- home = os.path.expanduser('~')
223- if home != '~':
224+ home = get_environ_unicode('HOME')
225+ if home is not None:
226 return home
227+ homepath = get_environ_unicode('HOMEPATH')
228+ if homepath is not None:
229+ return os.path.join(get_environ_unicode('HOMEDIR', ''), home)
230 # at least return windows root directory
231- windir = os.environ.get('windir')
232+ windir = get_environ_unicode('WINDIR')
233 if windir:
234 return os.path.splitdrive(windir)[0] + '/'
235 # otherwise C:\ is good enough for 98% users
236- return 'C:/'
237+ return unicode('C:/')
238
239
240 def get_user_name():
241 """Return user name as login name.
242 If name cannot be obtained return None.
243-
244- Returned value can be unicode or plain string.
245- To convert plain string to unicode use
246- s.decode(osutils.get_user_encoding())
247 """
248 if has_ctypes:
249 try:
250@@ -340,9 +320,9 @@
251 buf = create_buffer(UNLEN+1)
252 n = ctypes.c_int(UNLEN+1)
253 if GetUserName(buf, ctypes.byref(n)):
254- return buf.value
255+ return extract_buffer(buf)
256 # otherwise try env variables
257- return os.environ.get('USERNAME', None)
258+ return get_environ_unicode('USERNAME')
259
260
261 # 1 == ComputerNameDnsHostname, which returns "The DNS host name of the local
262@@ -353,8 +333,7 @@
263 """Return host machine name.
264 If name cannot be obtained return None.
265
266- :return: A unicode string representing the host name. On win98, this may be
267- a plain string as win32 api doesn't support unicode.
268+ :return: A unicode string representing the host name.
269 """
270 if has_win32api:
271 try:
272@@ -377,7 +356,7 @@
273 if (GetComputerNameEx is not None
274 and GetComputerNameEx(_WIN32_ComputerNameDnsHostname,
275 buf, ctypes.byref(n))):
276- return buf.value
277+ return extract_buffer(buf)
278
279 # Try GetComputerName in case GetComputerNameEx wasn't found
280 # It returns the NETBIOS name, which isn't as good, but still ok.
281@@ -387,18 +366,12 @@
282 None)
283 if (GetComputerName is not None
284 and GetComputerName(buf, ctypes.byref(n))):
285- return buf.value
286- # otherwise try env variables, which will be 'mbcs' encoded
287- # on Windows (Python doesn't expose the native win32 unicode environment)
288- # According to this:
289- # http://msdn.microsoft.com/en-us/library/aa246807.aspx
290- # environment variables should always be encoded in 'mbcs'.
291- try:
292- return os.environ['COMPUTERNAME'].decode("mbcs")
293- except KeyError:
294- return None
295-
296-
297+ return extract_buffer(buf)
298+ return get_environ_unicode('COMPUTERNAME')
299+
300+
301+@symbol_versioning.deprecated_method(
302+ symbol_versioning.deprecated_in((2, 5, 0)))
303 def _ensure_unicode(s):
304 if s and type(s) != unicode:
305 from bzrlib import osutils
306@@ -406,17 +379,17 @@
307 return s
308
309
310-def get_appdata_location_unicode():
311- return _ensure_unicode(get_appdata_location())
312-
313-def get_home_location_unicode():
314- return _ensure_unicode(get_home_location())
315-
316-def get_user_name_unicode():
317- return _ensure_unicode(get_user_name())
318-
319-def get_host_name_unicode():
320- return _ensure_unicode(get_host_name())
321+get_appdata_location_unicode = symbol_versioning.deprecated_method(
322+ symbol_versioning.deprecated_in((2, 5, 0)))(get_appdata_location)
323+
324+get_home_location_unicode = symbol_versioning.deprecated_method(
325+ symbol_versioning.deprecated_in((2, 5, 0)))(get_home_location)
326+
327+get_user_name_unicode = symbol_versioning.deprecated_method(
328+ symbol_versioning.deprecated_in((2, 5, 0)))(get_user_name)
329+
330+get_host_name_unicode = symbol_versioning.deprecated_method(
331+ symbol_versioning.deprecated_in((2, 5, 0)))(get_host_name)
332
333
334 def _ensure_with_dir(path):
335@@ -433,7 +406,6 @@
336 return path
337
338
339-
340 def glob_one(possible_glob):
341 """Same as glob.glob().
342
343@@ -585,6 +557,11 @@
344 def get_environ_unicode(key, default=None):
345 """Get `key` from environment as unicode or `default` if unset
346
347+ The environment is natively unicode on modern windows versions but
348+ Python 2 only accesses it through the legacy bytestring api.
349+
350+ Environmental variable names are case insenstive on Windows.
351+
352 A large enough buffer will be allocated to retrieve the value, though
353 it may take two calls to the underlying library function.
354
355@@ -609,7 +586,16 @@
356 return buffer[:length]
357 buffer_size = length
358 else:
359- get_unicode_argv = get_environ_unicode = None
360+ get_unicode_argv = None
361+ def get_environ_unicode(key, default=None):
362+ """Get `key` from environment as unicode or `default` if unset
363+
364+ Fallback version that should basically never be needed.
365+ """
366+ try:
367+ return os.environ[key].decode("mbcs")
368+ except KeyError:
369+ return default
370
371
372 if has_win32api: