Merge lp:~gz/bzr/require_unicode_committer_614593 into lp:bzr
| Status: | Merged |
|---|---|
| Approved by: | John A Meinel on 2010-10-15 |
| Approved revision: | 5487 |
| Merged at revision: | 5510 |
| Proposed branch: | lp:~gz/bzr/require_unicode_committer_614593 |
| Merge into: | lp:bzr |
| Diff against target: |
46 lines (+13/-1) 3 files modified
bzrlib/repository.py (+2/-0) bzrlib/tests/per_repository/test_commit_builder.py (+10/-0) bzrlib/tests/test_testament.py (+1/-1) |
| To merge this branch: | bzr merge lp:~gz/bzr/require_unicode_committer_614593 |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Vincent Ladeuil | 2010-10-13 | Approve on 2010-10-14 | |
| John A Meinel | 2010-10-14 | Pending | |
|
Review via email:
|
|||
Commit Message
Check committer values are ascii or unicode and fix a test where it was not
Description of the Change
Currently commit can be passed a non-ascii str value for committer and the value makes it all the way through to the repository serialisation code where it potentially outputs bogus data. As generally the input will be decoded to unicode already from the command line or config this isn't generally fatal, but is laying a bit of a trap for test and plugin authors to do the wrong thing without realising. With Python 2.7 it happens that this is now caught by the xml escaping function, and a misspelt test started failing.
As well as fixing that test, I've added a guard in commit that raises UnicodeDecodeError for all non-ascii str values. I think that's better than decoding with user_encoding or similar as we don't actually have a good basis for believing a str passed is one encoding or another, so it's better to leave that up to the caller. We could trap the decode error and raise some other flavour of exception instead, but as this will mostly be for bzrlib coders rather than users that's probably no more informative.
| John A Meinel (jameinel) wrote : | # |
If I were doing this, I would probably have done it deeper. Specifically "committer" is just passed into Repository.
if committer is None:
else:
That is probably the point where I would assert that it is properly Unicode. That may be a bit later than you would like (since the commit is pretty much finished then), but it should help prevent injection via some other path.
You could also potentially check in the Revision constructor.
Either way, we are expecting a unicode string here.
| Martin Packman (gz) wrote : | # |
Thanks John, that's just the kind of feedback I wanted. I'll move the guard further in as you suggest.
Doing that, just realised the test I added doesn't actually fail unless I add a format=
- 5487. By Martin Packman on 2010-10-15
-
Move guard to CommitBuilder.
__init_ _ and test to bt.per_repository
| Martin Packman (gz) wrote : | # |
Implemented those changes, but am now worrying over some new things.
Using per_repository seems wasteful as the guard triggers before the repo actually gets touched, so the 22 variations aren't interestingly different.
Also, CommitBuilder has an existing _validate_
| John A Meinel (jameinel) wrote : | # |
I'm fine with it being per-repository, since each repository can have a custom CommitBuilder. Even if 99% of them are the same, catching the failure on a new repo format is good.
I would be ok if you wanted to add something like "if not isinstance(text, unicode)" to the _validate check. The code exists because revision property values (vs keys) are meant to be unicode content. And the set of unicode strings has some additional restrictions (not having '\r' in them).
(Just because it doesn't check everything it could *today* doesn't mean we can't add more checks :)
| Martin Packman (gz) wrote : | # |
Thanks John, I'll land this then and file a followup bug on tightening up the existing _validate methods.
| Martin Packman (gz) wrote : | # |
sent to pqm by email

This sounds fine to me, but I'd like John to confirm that we are indeed expecting Unicode there.