Merge lp:~gz/bzr/get_environ_unicode_262874 into lp:bzr

Proposed by Martin Packman
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 6358
Proposed branch: lp:~gz/bzr/get_environ_unicode_262874
Merge into: lp:bzr
Diff against target: 111 lines (+83/-2)
2 files modified
bzrlib/tests/test_win32utils.py (+53/-0)
bzrlib/win32utils.py (+30/-2)
To merge this branch: bzr merge lp:~gz/bzr/get_environ_unicode_262874
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve
Review via email: mp+84311@code.launchpad.net

Commit message

Add get_environ_unicode for accessing variables without mbcs mangling on windows

Description of the change

Adds a mechanism for getting unicode values from the environment directly on windows without the lossy encode/decode used by Python 2 environ access. For most uses, just doing correct encoding and decoding is enough, but this should be an easier interface to get right with less risk of corrupting the environment block by mistake.

An equivalent function for putting unicode values has not been added. While it's nice to cope with all inputs, it's also best not to export values we know may cause problems for other applications. Also, bypassing the os.environ dict cache is fine in the getenv direction, but in the putenv direction leads means future access will get old values.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

Hmm, warming up for the fix-the-oldest-bug challenge are you ?

79 + A large enough buffer will be allocated to retrieve the value, unless
80 + unless `_size` is set which will act as a hard limit.

s/unless unless/unless/

84 + f = getattr(get_environ_unicode, "_func", None)

I have a few concerns here:
- 'f' is too short :)
- using function attributes is unusal and could be explained,
- this is addressing a lot of issues that aren't documented ;)

Fix as best as you can so some unaware hacker won't go crazy searching for a mythical '_func' attribute on docs.python.org tricked by the resemblance with say, 'im_func' ;)

Also, teach your win editor about not mixing line endings in test_win32utils ;-p

review: Needs Fixing
Revision history for this message
Martin Packman (gz) wrote :

> Hmm, warming up for the fix-the-oldest-bug challenge are you ?

The next one is actually nearly a month newer than this bug. :)

> s/unless unless/unless/

Done... and then pulled anyway, looking at it _size isn't really helpful now I know the buffer logic is correct.

> 84 + f = getattr(get_environ_unicode, "_func", None)
>
> I have a few concerns here:
> - 'f' is too short :)

Changed to `cfunc`.

> - using function attributes is unusal and could be explained,
> - this is addressing a lot of issues that aren't documented ;)
>
> Fix as best as you can so some unaware hacker won't go crazy searching for a
> mythical '_func' attribute on docs.python.org tricked by the resemblance with
> say, 'im_func' ;)

And changed to `._c_function`. I agree some of these issues over how to write ctypes code need documenting properly. I think HACKING may be the best place, but hopefully a general cleanup can happen and make the current code somewhat consistent. I'll leave that to a later branch for now.

> Also, teach your win editor about not mixing line endings in test_win32utils
> ;-p

I have no idea what you're talking about! ...ehem...

Revision history for this message
Vincent Ladeuil (vila) wrote :

nudge, nudge, land, land

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/tests/test_win32utils.py'
2--- bzrlib/tests/test_win32utils.py 2011-06-14 01:26:41 +0000
3+++ bzrlib/tests/test_win32utils.py 2011-12-12 12:26:25 +0000
4@@ -328,3 +328,56 @@
5 self.assertCommandLine([u"rm", u"x*"], "-m pdb rm x*", ["rm", u"x*"])
6 self.assertCommandLine([u"add", u"d/f1", u"d/f2"], "-m pdb add d/*",
7 ["add", u"d/*"])
8+
9+
10+class TestGetEnvironUnicode(tests.TestCase):
11+ """Tests for accessing the environment via the windows wide api"""
12+
13+ _test_needs_features = [CtypesFeature, features.win32_feature]
14+
15+ def setUp(self):
16+ super(TestGetEnvironUnicode, self).setUp()
17+ self.overrideEnv("TEST", "1")
18+
19+ def test_get(self):
20+ """In the normal case behaves the same as os.environ access"""
21+ self.assertEqual("1", win32utils.get_environ_unicode("TEST"))
22+
23+ def test_unset(self):
24+ """A variable not present in the environment gives None by default"""
25+ del os.environ["TEST"]
26+ self.assertIs(None, win32utils.get_environ_unicode("TEST"))
27+
28+ def test_unset_default(self):
29+ """A variable not present in the environment gives passed default"""
30+ del os.environ["TEST"]
31+ self.assertIs("a", win32utils.get_environ_unicode("TEST", "a"))
32+
33+ def test_unicode(self):
34+ """A non-ascii variable is returned as unicode"""
35+ unicode_val = u"\xa7" # non-ascii character present in many encodings
36+ try:
37+ bytes_val = unicode_val.encode(osutils.get_user_encoding())
38+ except UnicodeEncodeError:
39+ self.skip("Couldn't encode non-ascii string to place in environ")
40+ os.environ["TEST"] = bytes_val
41+ self.assertEqual(unicode_val, win32utils.get_environ_unicode("TEST"))
42+
43+ def test_long(self):
44+ """A variable bigger than heuristic buffer size is still accessible"""
45+ big_val = "x" * (2<<10)
46+ os.environ["TEST"] = big_val
47+ self.assertEqual(big_val, win32utils.get_environ_unicode("TEST"))
48+
49+ def test_unexpected_error(self):
50+ """An error from the underlying platform function is propogated"""
51+ ERROR_INVALID_PARAMETER = 87
52+ SetLastError = win32utils.ctypes.windll.kernel32.SetLastError
53+ def failer(*args, **kwargs):
54+ SetLastError(ERROR_INVALID_PARAMETER)
55+ return 0
56+ self.overrideAttr(win32utils.get_environ_unicode, "_c_function",
57+ failer)
58+ e = self.assertRaises(WindowsError,
59+ win32utils.get_environ_unicode, "TEST")
60+ self.assertEqual(e.winerror, ERROR_INVALID_PARAMETER)
61
62=== modified file 'bzrlib/win32utils.py'
63--- bzrlib/win32utils.py 2011-12-05 11:06:16 +0000
64+++ bzrlib/win32utils.py 2011-12-12 12:26:25 +0000
65@@ -569,7 +569,7 @@
66 return args
67
68
69-if has_ctypes and winver != 'Windows 98':
70+if has_ctypes and winver == 'Windows NT':
71 def get_unicode_argv():
72 prototype = ctypes.WINFUNCTYPE(ctypes.c_wchar_p)
73 GetCommandLineW = prototype(("GetCommandLineW",
74@@ -580,8 +580,36 @@
75 # Skip the first argument, since we only care about parameters
76 argv = _command_line_to_argv(command_line, sys.argv)[1:]
77 return argv
78+
79+
80+ def get_environ_unicode(key, default=None):
81+ """Get `key` from environment as unicode or `default` if unset
82+
83+ A large enough buffer will be allocated to retrieve the value, though
84+ it may take two calls to the underlying library function.
85+
86+ This needs ctypes because pywin32 does not expose the wide version.
87+ """
88+ cfunc = getattr(get_environ_unicode, "_c_function", None)
89+ if cfunc is None:
90+ from ctypes.wintypes import DWORD, LPCWSTR, LPWSTR
91+ cfunc = ctypes.WINFUNCTYPE(DWORD, LPCWSTR, LPWSTR, DWORD)(
92+ ("GetEnvironmentVariableW", ctypes.windll.kernel32))
93+ get_environ_unicode._c_function = cfunc
94+ buffer_size = 256 # heuristic, 256 characters often enough
95+ while True:
96+ buffer = ctypes.create_unicode_buffer(buffer_size)
97+ length = cfunc(key, buffer, buffer_size)
98+ if not length:
99+ code = ctypes.GetLastError()
100+ if code == 203: # ERROR_ENVVAR_NOT_FOUND
101+ return default
102+ raise ctypes.WinError(code)
103+ if buffer_size > length:
104+ return buffer[:length]
105+ buffer_size = length
106 else:
107- get_unicode_argv = None
108+ get_unicode_argv = get_environ_unicode = None
109
110
111 if has_win32api: