Merge lp:~mbp/bzr/220464-stale-locks into lp:bzr
- 220464-stale-locks
- Merge into bzr.dev
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Martin Pool | ||||
Approved revision: | no longer in the source branch. | ||||
Merged at revision: | 5970 | ||||
Proposed branch: | lp:~mbp/bzr/220464-stale-locks | ||||
Merge into: | lp:bzr | ||||
Diff against target: |
1034 lines (+488/-135) 10 files modified
bzrlib/commit.py (+0/-1) bzrlib/config.py (+7/-4) bzrlib/help_topics/en/configuration.txt (+8/-0) bzrlib/lockdir.py (+246/-97) bzrlib/osutils.py (+27/-0) bzrlib/tests/__init__.py (+7/-3) bzrlib/tests/test_lockdir.py (+139/-27) bzrlib/ui/__init__.py (+5/-3) bzrlib/win32utils.py (+38/-0) doc/en/release-notes/bzr-2.4.txt (+11/-0) |
||||
To merge this branch: | bzr merge lp:~mbp/bzr/220464-stale-locks | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
John A Meinel | Approve | ||
Martin Packman (community) | Approve | ||
Andrew Bennetts | Approve | ||
Review via email: mp+62582@code.launchpad.net |
Commit message
optionally detect and steal dead locks from the same machine and user (bug 220464)
Description of the change
bzr automatically detects stale locks originating from other processes on the same machine.
This is a somewhat old branch, but I think it's actually all finished, aside from not handling detection of live/dead processes on Windows. There was a list thread about that but it looks a bit nontrivial, so I'd like to handle that separately.
I think the config variable needs to be renamed to match our current convention.
Martin Pool (mbp) wrote : | # |
Martin Pool (mbp) wrote : | # |
This is a little tricky to hit interactively from the command line, but you can get it pretty well from the Python shell:
>>> b.lock_write()
Unable to obtain lock file://
at grace [process #7658], acquired 42 seconds ago.
Will continue to try until 10:35:52, unless you press Ctrl-C.
See "bzr help break-lock" for more.
^CTraceback (most recent call last):
....
KeyboardInterrupt
now release lock from another window
>>> b.lock_write()
Stole lock file://
BranchWriteLock
One thing that demonstrates is that it should probably print the dead process info in a somewhat more baked form...
Andrew Bennetts (spiv) wrote : | # |
I like this a lot. You seem to have taken a lot of care with the various possible cases, and you've done some nice refactoring and tidying too.
My only real question is if it's appropriate for is_lock_
Apart from that I just have unimportant pedantry:
In test_auto_
646 + self.assertRais
647 + l1.unlock)
That would fit comfortably on just one line :)
Ditto in test_auto_
713 +* Information about held lockdir locks returned from eg `LockDir.peek` is
714 + now represented as a `LockHeldInfo` object, rather than a plain Python
715 + dict.
I think to be pendantically correct ReST you want double-backticks for an inline literal. Single backticks means interpreted text with the default role, which I think is unset in our sphinx conf. Hmm, I suppose it would be reasonable to set it to something that automatically links to our API docs, I think pydoctor provides an “api” role for this…
Andrew Bennetts (spiv) wrote : | # |
I agree btw that the output isn't particularly pretty. Something formatted more like the usual LockContention output would be nicer. But I'd much rather have this with ugly output than not have it at all.
Martin Pool (mbp) wrote : | # |
> I like this a lot. You seem to have taken a lot of care with the various
> possible cases, and you've done some nice refactoring and tidying too.
Thanks!
> My only real question is if it's appropriate for is_lock_
> raise if os.kill raises something other than ESRCH or EPERM? On one hand
> other errnos shouldn't happen, at least on Linux, but even if they do aborting
> the bzr command entirely is probably a bit unfriendly to the user. Emitting a
> warning and then returning False might be better?
Historically we have not tended to do that; I think Rob argued that these warnings are not normally going to be checked by tests and so could either cause false passes, or could have latent problems where the warning is emitted all the time.
However in this case I think users would almost certainly not thank us for having bzr stop there unnecessarily. If we do want to test it, there are ways.
> I think to be pendantically correct ReST you want double-backticks for an
> inline literal. Single backticks means interpreted text with the default
> role, which I think is unset in our sphinx conf. Hmm, I suppose it would be
> reasonable to set it to something that automatically links to our API docs, I
> think pydoctor provides an “api” role for this…
I think we should do that. Using double backticks for shell or computer output, and single for API references seems like the best way to spend our backtick budget. (Unless we want to get all docbooky and specifically mark roles on computer input and output, but that seems too longwinded.)
Martin Packman (gz) wrote : | # |
See <lp:~gz/bzr/220464-stale-locks> for some changes built on this branch. It includes fixes for some trivial things, and an implementation (well, two really, the pywin32 ones needs testing) of process deadness detection for windows.
The current os.kill detection gets broken pretty thoroughly by Python 2.7 which includes a function of that name on windows that doesn't behave anything like the posix one. Currently my branch doesn't have any fallback (`lambda pid: False` would be reasonable) as posix+windows should cover everything.
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 05/27/2011 02:06 AM, Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/220464-stale-locks into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> Bug #220464 in Bazaar: "Bazaar doesn't detect its own stale locks"
> https:/
>
> For more details, see:
> https:/
>
> bzr automatically detects stale locks originating from other processes on the same machine.
>
> This is a somewhat old branch, but I think it's actually all finished, aside from not handling detection of live/dead processes on Windows. There was a list thread about that but it looks a bit nontrivial, so I'd like to handle that separately.
>
> I think the config variable needs to be renamed to match our current convention.
Unfortunately on newer Python's os.kill *is* available on windows, but
doesn't work like you want it to. So we probably need a "sys.platform ==
'windows'" check as well.
+ if getattr(os, 'kill', None) is None:
+ # Probably not available on Windows.
+ # XXX: How should we check for process liveness there?
+ return False
review: needsfixing
Otherwise it looks good to me.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAk3
jMAAnRiKL5tYKqJ
=fBtG
-----END PGP SIGNATURE-----
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 05/27/2011 04:29 AM, Martin [gz] wrote:
> Review: Needs Fixing
> See <lp:~gz/bzr/220464-stale-locks> for some changes built on this branch. It includes fixes for some trivial things, and an implementation (well, two really, the pywin32 ones needs testing) of process deadness detection for windows.
>
> The current os.kill detection gets broken pretty thoroughly by Python 2.7 which includes a function of that name on windows that doesn't behave anything like the posix one. Currently my branch doesn't have any fallback (`lambda pid: False` would be reasonable) as posix+windows should cover everything.
>
>
I thought that was also in python2.6. Regardless, it needs to be done
differently.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAk3
dQcAoLq19easU8F
=kwJ1
-----END PGP SIGNATURE-----
Martin Pool (mbp) wrote : | # |
@jam: gz kindly provided what looks like a working Windows version of a factored-out is_local_pid_dead, so I think we're now ok there. Please let me know if you spot anything else.
Martin Pool (mbp) wrote : | # |
I'd like some rereview on this, at least just so John can look at gz's windows code.
Robert Collins (lifeless) wrote : | # |
A small question - without reading the diff (sorry, but it seemed a
little large).
Will this result in Launchpad smart servers thinking a lock is held by
a dead, when there are two smart server *machines* with one backend
store mounted over (NFS|OCFS) and the active process is on a different
machine?
-Rob
Martin Pool (mbp) wrote : | # |
On 3 June 2011 18:56, Robert Collins <email address hidden> wrote:
> A small question - without reading the diff (sorry, but it seemed a
> little large).
>
> Will this result in Launchpad smart servers thinking a lock is held by
> a dead, when there are two smart server *machines* with one backend
> store mounted over (NFS|OCFS) and the active process is on a different
> machine?
Only if they have the same hostname, which they shouldn't.
More of a worry is if some manages to have a client machine with the
same hostname as the ssh server, and the names are stored
not-fully-
Perhaps we need something stronger than that. We could put a random
per-user key into their home directory and into the lock file, though
that would be a larger change.
Robert Collins (lifeless) wrote : | # |
The impact of a mistake could be pretty bad, so perhaps we should look
at something stronger before doing this?
Martin Packman (gz) wrote : | # |
As windows and posix should cover everything, I think this is fine now.
> I'd like some rereview on this, at least just so John can look at gz's windows
> code.
I'd like that too, was coded a little too late at night.
A few notes it:
* I did a ctypes and pywin32 version out of habit. Now we're 2.6 only, there's not much point in having pywin32 code around (apart from where it's easier to spell).
* As Martin changed the posix version to never raise an exception, should the windows versions change too? Really, I think I'd prefer the caller to catch and log rather than the function, but pywin32 makes that annoying by using a different exception type.
* OpenProcess requests the PROCESS_TERMINATE process access right. It might make more sense to ask for PROCESS_
* Could add code later to get the process executable name too with GetProcessImage
Vincent Ladeuil (vila) wrote : | # |
sent to pqm by email
Martin Pool (mbp) wrote : | # |
On 3 June 2011 19:34, Robert Collins <email address hidden> wrote:
> The impact of a mistake could be pretty bad, so perhaps we should look
> at something stronger before doing this?
I wrote some background about other types of locking improvements we
can do, but we probably all know that. This patch is just about
trying to detect locks belong to processes that are sure to be dead.
As I said above, the main risk is that we'll see a lock actually from
another machine and assume it's actually local. It is a little hard
because there is on unix no obvious guaranteed-unique machine
identifier.
Things we could do to guard against that:
0- don't use this approach at all
1- put in the host default ip (also not guaranteed unique; likely to
often be seen as 127.0.0.1; also arguably a privacy problem to send
this to lp)
2- disable this on the smart server; punt back to the client to decide
if the process is live or not (maybe a good idea anyhow)
3- make up a unique per-user nonce; store it in .bazaar; put it into
lock files (not a perfect solution for home on NFS for instance)
4- don't automatically break it; just ask the user if they want to do
so, or point out that it probably is dead
5- grab some other random data that might both be stable across
multiple processes on a single machine, and across reboots, and also
unique to machines (this does exist on windows, and might be available
in Python)
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 6/6/2011 12:02 PM, Vincent Ladeuil wrote:
> The proposal to merge lp:~mbp/bzr/220464-stale-locks into lp:bzr has been updated.
>
> Status: Approved => Needs review
>
> For more details, see:
> https:/
Vila, you changed this to needs-review after approving it and sending it
to pqm. Did it bounce?
I'll try to take a look at it tomorrow, but my WinAPI isn't all that strong.
I do think gz's comment about access to kill vs access to get info is
relevant. Specifically, if User A does the push, User B cannot kill
their processes. Would that lead to us thinking the process is already
dead, and unlocking things underneath them?
John
=:->
review: needsinfo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAk3
rmcAn0OtBiyVbXA
=PEAW
-----END PGP SIGNATURE-----
John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 6/3/2011 11:34 AM, Robert Collins wrote:
> The impact of a mistake could be pretty bad, so perhaps we should look
> at something stronger before doing this?
>
I think the potential for trouble is moderate here, mostly because of
the lifetimes of locks. If two people race on the *repository* lock, you
could get strange corruption. (Both feel they uploaded their content,
but only one of them ends up with their file mentioned in pack-names.)
That, however is meant to be a very fast lock, and shouldn't be opened
by VFS anymore. (I'm sure it could still happen, but we probably would
be more productive fixing those cases rather than spending too much time
on the locking code.)
If a Branch/WT lock gets out of sync, it should be a lot easier to
recover. The locks there seem a bit more advisory. (I'm pushing to this
branch, so don't try to push to it until I'm done.) Because the actual
mutations should be pretty close to atomic. (overwrite
.bzr/branch/
gz saying that we may not be able to detect other people's processes as
being alive is worrying.
Having the shared host be named "ubuntu" and your local host be named
"ubuntu" could be a problem. Would it be reasonable as a first step to
only support local host transports? (So we auto-unlock
file://
I specifically mention "ubuntu" because I believe all ubuntu installs
default to naming the machine "ubuntu" and someone has to actively
change the machine name. (May not be as true anymore.) I think some of
this stems from using a live-install where the machine needs a name
before actually installing.
What about a compromise, and auto-unlocking local objects only, until we
get something like the unique random id in .bazaar, etc. (Note that you
could still get bitten if someone copies their .bazaar folder...)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAk3
yeMAoMEnK2O/
=Rcq8
-----END PGP SIGNATURE-----
Vincent Ladeuil (vila) wrote : | # |
>>>>> John Arbash Meinel <email address hidden> writes:
<snip/>
> Vila, you changed this to needs-review after approving it and sending it
> to pqm. Did it bounce?
Yes (mail forwarded) and poolie said on IRC he wanted more feedback on
it before landing anyway.
Robert Collins (lifeless) wrote : | # |
On Mon, Jun 6, 2011 at 9:52 PM, Martin Pool <email address hidden> wrote:
> On 3 June 2011 19:34, Robert Collins <email address hidden> wrote:
>> The impact of a mistake could be pretty bad, so perhaps we should look
>> at something stronger before doing this?
>
> I wrote some background about other types of locking improvements we
> can do, but we probably all know that. This patch is just about
> trying to detect locks belong to processes that are sure to be dead.
> As I said above, the main risk is that we'll see a lock actually from
> another machine and assume it's actually local. It is a little hard
> because there is on unix no obvious guaranteed-unique machine
> identifier.
>
> Things we could do to guard against that:
>
> 0- don't use this approach at all
> 1- put in the host default ip (also not guaranteed unique; likely to
> often be seen as 127.0.0.1; also arguably a privacy problem to send
> this to lp)
> 2- disable this on the smart server; punt back to the client to decide
> if the process is live or not (maybe a good idea anyhow)
> 3- make up a unique per-user nonce; store it in .bazaar; put it into
> lock files (not a perfect solution for home on NFS for instance)
> 4- don't automatically break it; just ask the user if they want to do
> so, or point out that it probably is dead
> 5- grab some other random data that might both be stable across
> multiple processes on a single machine, and across reboots, and also
> unique to machines (this does exist on windows, and might be available
> in Python)
Right. So I'm worried about one of these cases incorrectly happening
to Launchpad:
- Launchpad smart server 1 thinking smart server 2 is dead
- Launchpad smart server 1 thinking sftp client 2 is dead
- sftp client 1 thinking Launchpad smart server 2 is dead
- sftp client 1 thinking sftp client 2 is dead
However, These are sorted in what I think is most likely frequency of
use - that is we use the smart server a lot.
Having this go wrong on a repository would be pretty bad. Having it go
wrong on a branch (which has a greater window of opportunity per
John's mail) would probably be less destructive.
For the first case, where Launchpad is running both smart servers, we
can configure a smart server nonce so the servers can detect each
other, if thats how you'd like to prevent misdetected dead processes.
Or we can use a network aware lock inside the data centre (e.g. in
memcache/a db/ ...).
-Rob
Martin Pool (mbp) wrote : | # |
On 7 June 2011 06:03, Robert Collins <email address hidden> wrote:
> Right. So I'm worried about one of these cases incorrectly happening
> to Launchpad:
> - Launchpad smart server 1 thinking smart server 2 is dead
> - Launchpad smart server 1 thinking sftp client 2 is dead
> - sftp client 1 thinking Launchpad smart server 2 is dead
> - sftp client 1 thinking sftp client 2 is dead
>
> However, These are sorted in what I think is most likely frequency of
> use - that is we use the smart server a lot.
>
> Having this go wrong on a repository would be pretty bad. Having it go
> wrong on a branch (which has a greater window of opportunity per
> John's mail) would probably be less destructive.
>
> For the first case, where Launchpad is running both smart servers, we
> can configure a smart server nonce so the servers can detect each
> other, if thats how you'd like to prevent misdetected dead processes.
>
> Or we can use a network aware lock inside the data centre (e.g. in
> memcache/a db/ ...).
When the smart server takes a lock, it puts its own (ie the server
side) hostname and pid into the lock file. (I checked in the source
and by taking a lock on lp.) So the first case should never happen
unless Canonical IS starts giving multiple servers the same name, or
there is some other kind of bizarre split-brain case.
hostname: crowberry
nonce: x4bgo6vp5iixo3o
pid: 16838
start_time: 1307421845
user: <email address hidden>
I don't think using a network aware lock is especially useful; backing
the whole repository storage on something other than a disk might be
but I don't see much reason to move just the locks.
The second two cases can potentially happen if someone configures
their machine to have the same hostname as one of the Launchpad
servers, and they access the repo over sftp (or nosmart ssh). This is
probably unlikely considering how obscure Canonical's host names are
but it's possible.
I'd like to use the fqdn but I can't see an easy reliable way to get
it in Python. (It really wants to say localhost.
that's just making things worse.) We could put this in configuration
but that will take some work to roll out to lp.
The last case should only happen when someone has configured two
machines with the same hostname and they're both accessing the same
branch at the same time, which is kind of asking for trouble.
We need to bear in mind that at the moment people are encouraged to
break locks by hand and they may well make worse decisions than bzr
can.
One easy step would be to check the user name too; I'll try that now.
Andrew Bennetts (spiv) wrote : | # |
Martin Pool wrote:
[...]
> We need to bear in mind that at the moment people are encouraged to
> break locks by hand and they may well make worse decisions than bzr
> can.
Especially in the presence of a smart server, where it's not clear to
users whether the server or the client “owns” the lock. Confusingly,
the client does (via the nonce handed back from the server) but it's the
server's details that appear in the lock info. I've certainly seen
users confused about this on IRC.
I suppose ideally the lock info generated by the server would indicate
“on behalf of bzr smart client from 1.2.3.4:9876” or something like
that. I think LP's server already tweaks BZR_EMAIL to at least use the
LP username rather than codehosting system user.
Martin Pool (mbp) wrote : | # |
It does seem to use the user's email address which I guess comes from
the lp account.
Also, I think lock breaking at the moment is done over vfs and it
would be good to abstract that.
Martin
Martin Packman (gz) wrote : | # |
> I do think gz's comment about access to kill vs access to get info is
> relevant. Specifically, if User A does the push, User B cannot kill
> their processes. Would that lead to us thinking the process is already
> dead, and unlocking things underneath them?
No, it should fail safe. In this circumstance, you get ERROR_ACCESS_DENIED which we take to mean the process may exist, so don't break the lock. That's just slightly worse information than if we can get a process handle.
John A Meinel (jameinel) wrote : | # |
I would like to see the constants in the win32 functions pre-defined as constants, so
PROCESS_TERMINATE = 1
ERROR_ACCESS_DENIED = 5
etc.
But that is a pretty trivial cleanup.
if self.get(
I wonder if we should handle "get()" returning None? what about "get('hostname'
Anyway, overall I think I've balanced enough to be okay with this. I think the checks Martin added is pretty strict, so should be enough.
Martin Pool (mbp) wrote : | # |
Based on the irc discussion, we might start this by just warning in these cases.
Martin Pool (mbp) wrote : | # |
sent to pqm by email
Martin Pool (mbp) wrote : | # |
sent to pqm by email
Martin Pool (mbp) wrote : | # |
sent to pqm by email
Martin Pool (mbp) wrote : | # |
sent to pqm by email
Preview Diff
1 | === modified file 'bzrlib/commit.py' |
2 | --- bzrlib/commit.py 2011-05-17 14:27:30 +0000 |
3 | +++ bzrlib/commit.py 2011-06-13 21:24:39 +0000 |
4 | @@ -52,7 +52,6 @@ |
5 | from bzrlib import ( |
6 | debug, |
7 | errors, |
8 | - revision, |
9 | trace, |
10 | tree, |
11 | ui, |
12 | |
13 | === modified file 'bzrlib/config.py' |
14 | --- bzrlib/config.py 2011-06-01 12:31:04 +0000 |
15 | +++ bzrlib/config.py 2011-06-13 21:24:39 +0000 |
16 | @@ -75,7 +75,6 @@ |
17 | import re |
18 | from cStringIO import StringIO |
19 | |
20 | -import bzrlib |
21 | from bzrlib import ( |
22 | atomicfile, |
23 | bzrdir, |
24 | @@ -372,16 +371,18 @@ |
25 | value = self._expand_options_in_string(value) |
26 | return value |
27 | |
28 | - def get_user_option_as_bool(self, option_name, expand=None): |
29 | - """Get a generic option as a boolean - no special process, no default. |
30 | + def get_user_option_as_bool(self, option_name, expand=None, default=None): |
31 | + """Get a generic option as a boolean. |
32 | |
33 | + :param expand: Allow expanding references to other config values. |
34 | + :param default: Default value if nothing is configured |
35 | :return None if the option doesn't exist or its value can't be |
36 | interpreted as a boolean. Returns True or False otherwise. |
37 | """ |
38 | s = self.get_user_option(option_name, expand=expand) |
39 | if s is None: |
40 | # The option doesn't exist |
41 | - return None |
42 | + return default |
43 | val = ui.bool_from_string(s) |
44 | if val is None: |
45 | # The value can't be interpreted as a boolean |
46 | @@ -2701,6 +2702,8 @@ |
47 | ' the configuration file'), |
48 | ] |
49 | |
50 | + _see_also = ['configuration'] |
51 | + |
52 | @commands.display_command |
53 | def run(self, name=None, all=False, directory=None, scope=None, |
54 | remove=False): |
55 | |
56 | === modified file 'bzrlib/help_topics/en/configuration.txt' |
57 | --- bzrlib/help_topics/en/configuration.txt 2011-05-24 14:40:29 +0000 |
58 | +++ bzrlib/help_topics/en/configuration.txt 2011-06-13 21:24:39 +0000 |
59 | @@ -462,6 +462,14 @@ |
60 | These settings are only needed if the SMTP server requires authentication |
61 | to send mail. |
62 | |
63 | +locks.steal_dead |
64 | +~~~~~~~~~~~~~~~~ |
65 | + |
66 | +If set to true, bzr will automatically break locks held by processes from |
67 | +the same machine and user that are no longer alive. Otherwise, it will |
68 | +print a message and you can break the lock manually, if you are satisfied |
69 | +the object is no longer in use. |
70 | + |
71 | mail_client |
72 | ~~~~~~~~~~~ |
73 | |
74 | |
75 | === modified file 'bzrlib/lockdir.py' |
76 | --- bzrlib/lockdir.py 2011-05-16 13:39:39 +0000 |
77 | +++ bzrlib/lockdir.py 2011-06-13 21:24:39 +0000 |
78 | @@ -92,6 +92,10 @@ |
79 | >>> # do something here |
80 | >>> l.unlock() |
81 | |
82 | +Some classes of stale locks can be predicted by checking: the host name is the |
83 | +same as the local host name; the user name is the same as the local user; the |
84 | +process id no longer exists. The check on user name is not strictly necessary |
85 | +but helps protect against colliding host names. |
86 | """ |
87 | |
88 | |
89 | @@ -103,16 +107,19 @@ |
90 | # the existing locking code and needs a new format of the containing object. |
91 | # -- robertc, mbp 20070628 |
92 | |
93 | +import errno |
94 | import os |
95 | import time |
96 | |
97 | from bzrlib import ( |
98 | + config, |
99 | debug, |
100 | errors, |
101 | lock, |
102 | osutils, |
103 | + ui, |
104 | + urlutils, |
105 | ) |
106 | -import bzrlib.config |
107 | from bzrlib.decorators import only_raises |
108 | from bzrlib.errors import ( |
109 | DirectoryNotEmpty, |
110 | @@ -130,7 +137,6 @@ |
111 | ) |
112 | from bzrlib.trace import mutter, note |
113 | from bzrlib.osutils import format_delta, rand_chars, get_host_name |
114 | -import bzrlib.ui |
115 | |
116 | from bzrlib.lazy_import import lazy_import |
117 | lazy_import(globals(), """ |
118 | @@ -162,7 +168,8 @@ |
119 | |
120 | __INFO_NAME = '/info' |
121 | |
122 | - def __init__(self, transport, path, file_modebits=0644, dir_modebits=0755): |
123 | + def __init__(self, transport, path, file_modebits=0644, dir_modebits=0755, |
124 | + extra_holder_info=None): |
125 | """Create a new LockDir object. |
126 | |
127 | The LockDir is initially unlocked - this just creates the object. |
128 | @@ -171,6 +178,10 @@ |
129 | |
130 | :param path: Path to the lock within the base directory of the |
131 | transport. |
132 | + |
133 | + :param extra_holder_info: If passed, {str:str} dict of extra or |
134 | + updated information to insert into the info file when the lock is |
135 | + taken. |
136 | """ |
137 | self.transport = transport |
138 | self.path = path |
139 | @@ -181,8 +192,9 @@ |
140 | self._held_info_path = self._held_dir + self.__INFO_NAME |
141 | self._file_modebits = file_modebits |
142 | self._dir_modebits = dir_modebits |
143 | - |
144 | self._report_function = note |
145 | + self.extra_holder_info = extra_holder_info |
146 | + self._warned_about_lock_holder = None |
147 | |
148 | def __repr__(self): |
149 | return '%s(%s%s)' % (self.__class__.__name__, |
150 | @@ -203,7 +215,6 @@ |
151 | except (TransportError, PathError), e: |
152 | raise LockFailed(self, e) |
153 | |
154 | - |
155 | def _attempt_lock(self): |
156 | """Make the pending directory and attempt to rename into place. |
157 | |
158 | @@ -216,8 +227,8 @@ |
159 | |
160 | :returns: The nonce of the lock, if it was successfully acquired. |
161 | |
162 | - :raises LockContention: If the lock is held by someone else. The exception |
163 | - contains the info of the current holder of the lock. |
164 | + :raises LockContention: If the lock is held by someone else. The |
165 | + exception contains the info of the current holder of the lock. |
166 | """ |
167 | self._trace("lock_write...") |
168 | start_time = time.time() |
169 | @@ -226,17 +237,24 @@ |
170 | except (errors.TransportError, PathError), e: |
171 | self._trace("... failed to create pending dir, %s", e) |
172 | raise LockFailed(self, e) |
173 | - try: |
174 | - self.transport.rename(tmpname, self._held_dir) |
175 | - except (errors.TransportError, PathError, DirectoryNotEmpty, |
176 | - FileExists, ResourceBusy), e: |
177 | - self._trace("... contention, %s", e) |
178 | - self._remove_pending_dir(tmpname) |
179 | - raise LockContention(self) |
180 | - except Exception, e: |
181 | - self._trace("... lock failed, %s", e) |
182 | - self._remove_pending_dir(tmpname) |
183 | - raise |
184 | + while True: |
185 | + try: |
186 | + self.transport.rename(tmpname, self._held_dir) |
187 | + break |
188 | + except (errors.TransportError, PathError, DirectoryNotEmpty, |
189 | + FileExists, ResourceBusy), e: |
190 | + self._trace("... contention, %s", e) |
191 | + other_holder = self.peek() |
192 | + self._trace("other holder is %r" % other_holder) |
193 | + try: |
194 | + self._handle_lock_contention(other_holder) |
195 | + except: |
196 | + self._remove_pending_dir(tmpname) |
197 | + raise |
198 | + except Exception, e: |
199 | + self._trace("... lock failed, %s", e) |
200 | + self._remove_pending_dir(tmpname) |
201 | + raise |
202 | # We must check we really got the lock, because Launchpad's sftp |
203 | # server at one time had a bug were the rename would successfully |
204 | # move the new directory into the existing directory, which was |
205 | @@ -262,6 +280,33 @@ |
206 | (time.time() - start_time) * 1000) |
207 | return self.nonce |
208 | |
209 | + def _handle_lock_contention(self, other_holder): |
210 | + """A lock we want to take is held by someone else. |
211 | + |
212 | + This function can: tell the user about it; possibly detect that it's |
213 | + safe or appropriate to steal the lock, or just raise an exception. |
214 | + |
215 | + If this function returns (without raising an exception) the lock will |
216 | + be attempted again. |
217 | + |
218 | + :param other_holder: A LockHeldInfo for the current holder; note that |
219 | + it might be None if the lock can be seen to be held but the info |
220 | + can't be read. |
221 | + """ |
222 | + if (other_holder is not None): |
223 | + if other_holder.is_lock_holder_known_dead(): |
224 | + if self.get_config().get_user_option_as_bool( |
225 | + 'locks.steal_dead', |
226 | + default=False): |
227 | + ui.ui_factory.show_user_warning( |
228 | + 'locks_steal_dead', |
229 | + lock_url=urlutils.join(self.transport.base, self.path), |
230 | + other_holder_info=unicode(other_holder)) |
231 | + self.force_break(other_holder) |
232 | + self._trace("stole lock from dead holder") |
233 | + return |
234 | + raise LockContention(self) |
235 | + |
236 | def _remove_pending_dir(self, tmpname): |
237 | """Remove the pending directory |
238 | |
239 | @@ -287,14 +332,14 @@ |
240 | self.create(mode=self._dir_modebits) |
241 | # After creating the lock directory, try again |
242 | self.transport.mkdir(tmpname) |
243 | - self.nonce = rand_chars(20) |
244 | - info_bytes = self._prepare_info() |
245 | + info = LockHeldInfo.for_this_process(self.extra_holder_info) |
246 | + self.nonce = info.get('nonce') |
247 | # We use put_file_non_atomic because we just created a new unique |
248 | # directory so we don't have to worry about files existing there. |
249 | # We'll rename the whole directory into place to get atomic |
250 | # properties |
251 | self.transport.put_bytes_non_atomic(tmpname + self.__INFO_NAME, |
252 | - info_bytes) |
253 | + info.to_bytes()) |
254 | return tmpname |
255 | |
256 | @only_raises(LockNotHeld, LockBroken) |
257 | @@ -344,9 +389,10 @@ |
258 | def break_lock(self): |
259 | """Break a lock not held by this instance of LockDir. |
260 | |
261 | - This is a UI centric function: it uses the bzrlib.ui.ui_factory to |
262 | + This is a UI centric function: it uses the ui.ui_factory to |
263 | prompt for input if a lock is detected and there is any doubt about |
264 | - it possibly being still active. |
265 | + it possibly being still active. force_break is the non-interactive |
266 | + version. |
267 | |
268 | :returns: LockResult for the broken lock. |
269 | """ |
270 | @@ -355,16 +401,16 @@ |
271 | holder_info = self.peek() |
272 | except LockCorrupt, e: |
273 | # The lock info is corrupt. |
274 | - if bzrlib.ui.ui_factory.get_boolean(u"Break (corrupt %r)" % (self,)): |
275 | + if ui.ui_factory.get_boolean(u"Break (corrupt %r)" % (self,)): |
276 | self.force_break_corrupt(e.file_data) |
277 | return |
278 | if holder_info is not None: |
279 | - lock_info = '\n'.join(self._format_lock_info(holder_info)) |
280 | - if bzrlib.ui.ui_factory.confirm_action( |
281 | - "Break %(lock_info)s", 'bzrlib.lockdir.break', |
282 | - dict(lock_info=lock_info)): |
283 | + if ui.ui_factory.confirm_action( |
284 | + u"Break %(lock_info)s", |
285 | + 'bzrlib.lockdir.break', |
286 | + dict(lock_info=unicode(holder_info))): |
287 | result = self.force_break(holder_info) |
288 | - bzrlib.ui.ui_factory.show_message( |
289 | + ui.ui_factory.show_message( |
290 | "Broke lock %s" % result.lock_url) |
291 | |
292 | def force_break(self, dead_holder_info): |
293 | @@ -374,18 +420,19 @@ |
294 | it still thinks it has the lock there will be two concurrent writers. |
295 | In general the user's approval should be sought for lock breaks. |
296 | |
297 | - dead_holder_info must be the result of a previous LockDir.peek() call; |
298 | - this is used to check that it's still held by the same process that |
299 | - the user decided was dead. If this is not the current holder, |
300 | - LockBreakMismatch is raised. |
301 | - |
302 | After the lock is broken it will not be held by any process. |
303 | It is possible that another process may sneak in and take the |
304 | lock before the breaking process acquires it. |
305 | |
306 | + :param dead_holder_info: |
307 | + Must be the result of a previous LockDir.peek() call; this is used |
308 | + to check that it's still held by the same process that the user |
309 | + decided was dead. If this is not the current holder, |
310 | + LockBreakMismatch is raised. |
311 | + |
312 | :returns: LockResult for the broken lock. |
313 | """ |
314 | - if not isinstance(dead_holder_info, dict): |
315 | + if not isinstance(dead_holder_info, LockHeldInfo): |
316 | raise ValueError("dead_holder_info: %r" % dead_holder_info) |
317 | self._check_not_locked() |
318 | current_info = self.peek() |
319 | @@ -413,10 +460,10 @@ |
320 | |
321 | def force_break_corrupt(self, corrupt_info_lines): |
322 | """Release a lock that has been corrupted. |
323 | - |
324 | + |
325 | This is very similar to force_break, it except it doesn't assume that |
326 | self.peek() can work. |
327 | - |
328 | + |
329 | :param corrupt_info_lines: the lines of the corrupted info file, used |
330 | to check that the lock hasn't changed between reading the (corrupt) |
331 | info file and calling force_break_corrupt. |
332 | @@ -470,7 +517,8 @@ |
333 | |
334 | peek() reads the info file of the lock holder, if any. |
335 | """ |
336 | - return self._parse_info(self.transport.get_bytes(path)) |
337 | + return LockHeldInfo.from_info_file_bytes( |
338 | + self.transport.get_bytes(path)) |
339 | |
340 | def peek(self): |
341 | """Check if the lock is held by anyone. |
342 | @@ -489,35 +537,6 @@ |
343 | def _prepare_info(self): |
344 | """Write information about a pending lock to a temporary file. |
345 | """ |
346 | - # XXX: is creating this here inefficient? |
347 | - config = bzrlib.config.GlobalConfig() |
348 | - try: |
349 | - user = config.username() |
350 | - except errors.NoWhoami: |
351 | - user = osutils.getuser_unicode() |
352 | - s = rio.Stanza(hostname=get_host_name(), |
353 | - pid=str(os.getpid()), |
354 | - start_time=str(int(time.time())), |
355 | - nonce=self.nonce, |
356 | - user=user, |
357 | - ) |
358 | - return s.to_string() |
359 | - |
360 | - def _parse_info(self, info_bytes): |
361 | - lines = osutils.split_lines(info_bytes) |
362 | - try: |
363 | - stanza = rio.read_stanza(lines) |
364 | - except ValueError, e: |
365 | - mutter('Corrupt lock info file: %r', lines) |
366 | - raise LockCorrupt("could not parse lock info file: " + str(e), |
367 | - lines) |
368 | - if stanza is None: |
369 | - # see bug 185013; we fairly often end up with the info file being |
370 | - # empty after an interruption; we could log a message here but |
371 | - # there may not be much we can say |
372 | - return {} |
373 | - else: |
374 | - return stanza.as_dict() |
375 | |
376 | def attempt_lock(self): |
377 | """Take the lock; fail if it's already held. |
378 | @@ -598,24 +617,16 @@ |
379 | else: |
380 | start = 'Lock owner changed for' |
381 | last_info = new_info |
382 | - formatted_info = self._format_lock_info(new_info) |
383 | + msg = u'%s lock %s %s.' % (start, lock_url, new_info) |
384 | if deadline_str is None: |
385 | deadline_str = time.strftime('%H:%M:%S', |
386 | - time.localtime(deadline)) |
387 | - user, hostname, pid, time_ago = formatted_info |
388 | - msg = ('%s lock %s ' # lock_url |
389 | - 'held by ' # start |
390 | - '%s\n' # user |
391 | - 'at %s ' # hostname |
392 | - '[process #%s], ' # pid |
393 | - 'acquired %s.') # time ago |
394 | - msg_args = [start, lock_url, user, hostname, pid, time_ago] |
395 | + time.localtime(deadline)) |
396 | if timeout > 0: |
397 | msg += ('\nWill continue to try until %s, unless ' |
398 | - 'you press Ctrl-C.') |
399 | - msg_args.append(deadline_str) |
400 | + 'you press Ctrl-C.' |
401 | + % deadline_str) |
402 | msg += '\nSee "bzr help break-lock" for more.' |
403 | - self._report_function(msg, *msg_args) |
404 | + self._report_function(msg) |
405 | if (max_attempts is not None) and (attempt_count >= max_attempts): |
406 | self._trace("exceeded %d attempts") |
407 | raise LockContention(self) |
408 | @@ -676,23 +687,6 @@ |
409 | raise LockContention(self) |
410 | self._fake_read_lock = True |
411 | |
412 | - def _format_lock_info(self, info): |
413 | - """Turn the contents of peek() into something for the user""" |
414 | - start_time = info.get('start_time') |
415 | - if start_time is None: |
416 | - time_ago = '(unknown)' |
417 | - else: |
418 | - time_ago = format_delta(time.time() - int(info['start_time'])) |
419 | - user = info.get('user', '<unknown>') |
420 | - hostname = info.get('hostname', '<unknown>') |
421 | - pid = info.get('pid', '<unknown>') |
422 | - return [ |
423 | - user, |
424 | - hostname, |
425 | - pid, |
426 | - time_ago, |
427 | - ] |
428 | - |
429 | def validate_token(self, token): |
430 | if token is not None: |
431 | info = self.peek() |
432 | @@ -710,3 +704,158 @@ |
433 | if 'lock' not in debug.debug_flags: |
434 | return |
435 | mutter(str(self) + ": " + (format % args)) |
436 | + |
437 | + def get_config(self): |
438 | + """Get the configuration that governs this lockdir.""" |
439 | + # XXX: This really should also use the locationconfig at least, but |
440 | + # that seems a bit hard to hook up at the moment. -- mbp 20110329 |
441 | + return config.GlobalConfig() |
442 | + |
443 | + |
444 | +class LockHeldInfo(object): |
445 | + """The information recorded about a held lock. |
446 | + |
447 | + This information is recorded into the lock when it's taken, and it can be |
448 | + read back by any process with access to the lockdir. It can be used, for |
449 | + example, to tell the user who holds the lock, or to try to detect whether |
450 | + the lock holder is still alive. |
451 | + |
452 | + Prior to bzr 2.4 a simple dict was used instead of an object. |
453 | + """ |
454 | + |
455 | + def __init__(self, info_dict): |
456 | + self.info_dict = info_dict |
457 | + |
458 | + def __repr__(self): |
459 | + """Return a debugging representation of this object.""" |
460 | + return "%s(%r)" % (self.__class__.__name__, self.info_dict) |
461 | + |
462 | + def __unicode__(self): |
463 | + """Return a user-oriented description of this object.""" |
464 | + d = self.to_readable_dict() |
465 | + return ( |
466 | + u'held by %(user)s on %(hostname)s (process #%(pid)s), ' |
467 | + u'acquired %(time_ago)s' % d) |
468 | + |
469 | + def to_readable_dict(self): |
470 | + """Turn the holder info into a dict of human-readable attributes. |
471 | + |
472 | + For example, the start time is presented relative to the current time, |
473 | + rather than as seconds since the epoch. |
474 | + |
475 | + Returns a list of [user, hostname, pid, time_ago] all as readable |
476 | + strings. |
477 | + """ |
478 | + start_time = self.info_dict.get('start_time') |
479 | + if start_time is None: |
480 | + time_ago = '(unknown)' |
481 | + else: |
482 | + time_ago = format_delta( |
483 | + time.time() - int(self.info_dict['start_time'])) |
484 | + user = self.info_dict.get('user', '<unknown>') |
485 | + hostname = self.info_dict.get('hostname', '<unknown>') |
486 | + pid = self.info_dict.get('pid', '<unknown>') |
487 | + return dict( |
488 | + user=user, |
489 | + hostname=hostname, |
490 | + pid=pid, |
491 | + time_ago=time_ago) |
492 | + |
493 | + def get(self, field_name): |
494 | + """Return the contents of a field from the lock info, or None.""" |
495 | + return self.info_dict.get(field_name) |
496 | + |
497 | + @classmethod |
498 | + def for_this_process(cls, extra_holder_info): |
499 | + """Return a new LockHeldInfo for a lock taken by this process. |
500 | + """ |
501 | + info = dict( |
502 | + hostname=get_host_name(), |
503 | + pid=str(os.getpid()), |
504 | + nonce=rand_chars(20), |
505 | + start_time=str(int(time.time())), |
506 | + user=get_username_for_lock_info(), |
507 | + ) |
508 | + if extra_holder_info is not None: |
509 | + info.update(extra_holder_info) |
510 | + return cls(info) |
511 | + |
512 | + def to_bytes(self): |
513 | + s = rio.Stanza(**self.info_dict) |
514 | + return s.to_string() |
515 | + |
516 | + @classmethod |
517 | + def from_info_file_bytes(cls, info_file_bytes): |
518 | + """Construct from the contents of the held file.""" |
519 | + lines = osutils.split_lines(info_file_bytes) |
520 | + try: |
521 | + stanza = rio.read_stanza(lines) |
522 | + except ValueError, e: |
523 | + mutter('Corrupt lock info file: %r', lines) |
524 | + raise LockCorrupt("could not parse lock info file: " + str(e), |
525 | + lines) |
526 | + if stanza is None: |
527 | + # see bug 185013; we fairly often end up with the info file being |
528 | + # empty after an interruption; we could log a message here but |
529 | + # there may not be much we can say |
530 | + return cls({}) |
531 | + else: |
532 | + return cls(stanza.as_dict()) |
533 | + |
534 | + def __cmp__(self, other): |
535 | + """Value comparison of lock holders.""" |
536 | + return ( |
537 | + cmp(type(self), type(other)) |
538 | + or cmp(self.info_dict, other.info_dict)) |
539 | + |
540 | + def is_locked_by_this_process(self): |
541 | + """True if this process seems to be the current lock holder.""" |
542 | + return ( |
543 | + self.get('hostname') == get_host_name() |
544 | + and self.get('pid') == str(os.getpid()) |
545 | + and self.get('user') == get_username_for_lock_info()) |
546 | + |
547 | + def is_lock_holder_known_dead(self): |
548 | + """True if the lock holder process is known to be dead. |
549 | + |
550 | + False if it's either known to be still alive, or if we just can't tell. |
551 | + |
552 | + We can be fairly sure the lock holder is dead if it declared the same |
553 | + hostname and there is no process with the given pid alive. If people |
554 | + have multiple machines with the same hostname this may cause trouble. |
555 | + |
556 | + This doesn't check whether the lock holder is in fact the same process |
557 | + calling this method. (In that case it will return true.) |
558 | + """ |
559 | + if self.get('hostname') != get_host_name(): |
560 | + return False |
561 | + if self.get('hostname') == 'localhost': |
562 | + # Too ambiguous. |
563 | + return False |
564 | + if self.get('user') != get_username_for_lock_info(): |
565 | + # Could well be another local process by a different user, but |
566 | + # just to be safe we won't conclude about this either. |
567 | + return False |
568 | + pid_str = self.info_dict.get('pid', None) |
569 | + if not pid_str: |
570 | + mutter("no pid recorded in %r" % (self, )) |
571 | + return False |
572 | + try: |
573 | + pid = int(pid_str) |
574 | + except ValueError: |
575 | + mutter("can't parse pid %r from %r" |
576 | + % (pid_str, self)) |
577 | + return False |
578 | + return osutils.is_local_pid_dead(pid) |
579 | + |
580 | + |
581 | +def get_username_for_lock_info(): |
582 | + """Get a username suitable for putting into a lock. |
583 | + |
584 | + It's ok if what's written here is not a proper email address as long |
585 | + as it gives some clue who the user is. |
586 | + """ |
587 | + try: |
588 | + return config.GlobalConfig().username() |
589 | + except errors.NoWhoami: |
590 | + return osutils.getuser_unicode() |
591 | |
592 | === modified file 'bzrlib/osutils.py' |
593 | --- bzrlib/osutils.py 2011-05-26 08:05:45 +0000 |
594 | +++ bzrlib/osutils.py 2011-06-13 21:24:39 +0000 |
595 | @@ -2448,3 +2448,30 @@ |
596 | if os.access(f, os.X_OK): |
597 | return f |
598 | return None |
599 | + |
600 | + |
601 | +def _posix_is_local_pid_dead(pid): |
602 | + """True if pid doesn't correspond to live process on this machine""" |
603 | + try: |
604 | + # Special meaning of unix kill: just check if it's there. |
605 | + os.kill(pid, 0) |
606 | + except OSError, e: |
607 | + if e.errno == errno.ESRCH: |
608 | + # On this machine, and really not found: as sure as we can be |
609 | + # that it's dead. |
610 | + return True |
611 | + elif e.errno == errno.EPERM: |
612 | + # exists, though not ours |
613 | + return False |
614 | + else: |
615 | + mutter("os.kill(%d, 0) failed: %s" % (pid, e)) |
616 | + # Don't really know. |
617 | + return False |
618 | + else: |
619 | + # Exists and our process: not dead. |
620 | + return False |
621 | + |
622 | +if sys.platform == "win32": |
623 | + is_local_pid_dead = win32utils.is_local_pid_dead |
624 | +else: |
625 | + is_local_pid_dead = _posix_is_local_pid_dead |
626 | |
627 | === modified file 'bzrlib/tests/__init__.py' |
628 | --- bzrlib/tests/__init__.py 2011-06-09 15:13:54 +0000 |
629 | +++ bzrlib/tests/__init__.py 2011-06-13 21:24:39 +0000 |
630 | @@ -1070,9 +1070,13 @@ |
631 | # break some locks on purpose and should be taken into account by |
632 | # considering that breaking a lock is just a dirty way of releasing it. |
633 | if len(acquired_locks) != (len(released_locks) + len(broken_locks)): |
634 | - message = ('Different number of acquired and ' |
635 | - 'released or broken locks. (%s, %s + %s)' % |
636 | - (acquired_locks, released_locks, broken_locks)) |
637 | + message = ( |
638 | + 'Different number of acquired and ' |
639 | + 'released or broken locks.\n' |
640 | + 'acquired=%s\n' |
641 | + 'released=%s\n' |
642 | + 'broken=%s\n' % |
643 | + (acquired_locks, released_locks, broken_locks)) |
644 | if not self._lock_check_thorough: |
645 | # Rather than fail, just warn |
646 | print "Broken test %s: %s" % (self, message) |
647 | |
648 | === modified file 'bzrlib/tests/test_lockdir.py' |
649 | --- bzrlib/tests/test_lockdir.py 2011-05-13 12:51:05 +0000 |
650 | +++ bzrlib/tests/test_lockdir.py 2011-06-13 21:24:39 +0000 |
651 | @@ -17,6 +17,7 @@ |
652 | """Tests for LockDir""" |
653 | |
654 | import os |
655 | +import sys |
656 | import time |
657 | |
658 | import bzrlib |
659 | @@ -24,6 +25,7 @@ |
660 | config, |
661 | errors, |
662 | lock, |
663 | + lockdir, |
664 | osutils, |
665 | tests, |
666 | transport, |
667 | @@ -35,9 +37,13 @@ |
668 | LockFailed, |
669 | LockNotHeld, |
670 | ) |
671 | -from bzrlib.lockdir import LockDir |
672 | +from bzrlib.lockdir import ( |
673 | + LockDir, |
674 | + LockHeldInfo, |
675 | + ) |
676 | from bzrlib.tests import ( |
677 | features, |
678 | + TestCase, |
679 | TestCaseWithTransport, |
680 | ) |
681 | from bzrlib.trace import note |
682 | @@ -48,6 +54,7 @@ |
683 | # implementation are tested separately. (The main requirement is just that |
684 | # they don't allow overwriting nonempty directories.) |
685 | |
686 | + |
687 | class TestLockDir(TestCaseWithTransport): |
688 | """Test LockDir operations""" |
689 | |
690 | @@ -141,8 +148,8 @@ |
691 | self.addCleanup(lf1.unlock) |
692 | # lock is held, should get some info on it |
693 | info1 = lf1.peek() |
694 | - self.assertEqual(set(info1.keys()), |
695 | - set(['user', 'nonce', 'hostname', 'pid', 'start_time'])) |
696 | + self.assertEqual(set(info1.info_dict.keys()), |
697 | + set(['user', 'nonce', 'hostname', 'pid', 'start_time'])) |
698 | # should get the same info if we look at it through a different |
699 | # instance |
700 | info2 = LockDir(t, 'test_lock').peek() |
701 | @@ -161,7 +168,7 @@ |
702 | self.addCleanup(lf1.unlock) |
703 | info2 = lf2.peek() |
704 | self.assertTrue(info2) |
705 | - self.assertEqual(info2['nonce'], lf1.nonce) |
706 | + self.assertEqual(info2.get('nonce'), lf1.nonce) |
707 | |
708 | def test_30_lock_wait_fail(self): |
709 | """Wait on a lock, then fail |
710 | @@ -184,24 +191,16 @@ |
711 | # it should only take about 0.4 seconds, but we allow more time in |
712 | # case the machine is heavily loaded |
713 | self.assertTrue(after - before <= 8.0, |
714 | - "took %f seconds to detect lock contention" % (after - before)) |
715 | + "took %f seconds to detect lock contention" % (after - before)) |
716 | finally: |
717 | lf1.unlock() |
718 | self.assertEqual(1, len(self._logged_reports)) |
719 | - self.assertEqual(self._logged_reports[0][0], |
720 | - '%s lock %s held by %s\n' |
721 | - 'at %s [process #%s], acquired %s.\n' |
722 | - 'Will continue to try until %s, unless ' |
723 | - 'you press Ctrl-C.\n' |
724 | - 'See "bzr help break-lock" for more.') |
725 | - start, lock_url, user, hostname, pid, time_ago, deadline_str = \ |
726 | - self._logged_reports[0][1] |
727 | - self.assertEqual(start, u'Unable to obtain') |
728 | - self.assertEqual(user, u'jrandom@example.com') |
729 | - # skip hostname |
730 | - self.assertContainsRe(pid, r'\d+') |
731 | - self.assertContainsRe(time_ago, r'.* ago') |
732 | - self.assertContainsRe(deadline_str, r'\d{2}:\d{2}:\d{2}') |
733 | + self.assertContainsRe(self._logged_reports[0][0], |
734 | + r'Unable to obtain lock .* held by jrandom@example\.com on .*' |
735 | + r' \(process #\d+\), acquired .* ago\.\n' |
736 | + r'Will continue to try until \d{2}:\d{2}:\d{2}, unless ' |
737 | + r'you press Ctrl-C.\n' |
738 | + r'See "bzr help break-lock" for more.') |
739 | |
740 | def test_31_lock_wait_easy(self): |
741 | """Succeed when waiting on a lock with no contention. |
742 | @@ -349,12 +348,15 @@ |
743 | ld.create() |
744 | ld.lock_write() |
745 | ld.transport.put_bytes_non_atomic('test_lock/held/info', '\0') |
746 | + |
747 | class LoggingUIFactory(bzrlib.ui.SilentUIFactory): |
748 | def __init__(self): |
749 | self.prompts = [] |
750 | + |
751 | def get_boolean(self, prompt): |
752 | self.prompts.append(('boolean', prompt)) |
753 | return True |
754 | + |
755 | ui = LoggingUIFactory() |
756 | self.overrideAttr(bzrlib.ui, 'ui_factory', ui) |
757 | ld2.break_lock() |
758 | @@ -372,12 +374,15 @@ |
759 | ld.create() |
760 | ld.lock_write() |
761 | ld.transport.delete('test_lock/held/info') |
762 | + |
763 | class LoggingUIFactory(bzrlib.ui.SilentUIFactory): |
764 | def __init__(self): |
765 | self.prompts = [] |
766 | + |
767 | def get_boolean(self, prompt): |
768 | self.prompts.append(('boolean', prompt)) |
769 | return True |
770 | + |
771 | ui = LoggingUIFactory() |
772 | orig_factory = bzrlib.ui.ui_factory |
773 | bzrlib.ui.ui_factory = ui |
774 | @@ -416,18 +421,17 @@ |
775 | lf1.unlock() |
776 | self.assertFalse(t.has('test_lock/held/info')) |
777 | |
778 | - def test__format_lock_info(self): |
779 | + def test_display_form(self): |
780 | ld1 = self.get_lock() |
781 | ld1.create() |
782 | ld1.lock_write() |
783 | try: |
784 | - info_list = ld1._format_lock_info(ld1.peek()) |
785 | + info_list = ld1.peek().to_readable_dict() |
786 | finally: |
787 | ld1.unlock() |
788 | - self.assertEqual(info_list[0], u'jrandom@example.com') |
789 | - # info_list[1] is hostname. we skip this. |
790 | - self.assertContainsRe(info_list[2], '^\d+$') # pid |
791 | - self.assertContainsRe(info_list[3], r'^\d+ seconds? ago$') # time_ago |
792 | + self.assertEqual(info_list['user'], u'jrandom@example.com') |
793 | + self.assertContainsRe(info_list['pid'], '^\d+$') |
794 | + self.assertContainsRe(info_list['time_ago'], r'^\d+ seconds? ago$') |
795 | |
796 | def test_lock_without_email(self): |
797 | global_config = config.GlobalConfig() |
798 | @@ -481,8 +485,10 @@ |
799 | # should be nothing before we start |
800 | ld1.create() |
801 | t = self.get_transport().clone('test_lock') |
802 | + |
803 | def check_dir(a): |
804 | self.assertEquals(a, t.list_dir('.')) |
805 | + |
806 | check_dir([]) |
807 | # when held, that's all we see |
808 | ld1.attempt_lock() |
809 | @@ -505,9 +511,10 @@ |
810 | t.put_bytes('test_lock/held/info', '') |
811 | lf = LockDir(t, 'test_lock') |
812 | info = lf.peek() |
813 | - formatted_info = lf._format_lock_info(info) |
814 | + formatted_info = info.to_readable_dict() |
815 | self.assertEquals( |
816 | - ['<unknown>', '<unknown>', '<unknown>', '(unknown)'], |
817 | + dict(user='<unknown>', hostname='<unknown>', pid='<unknown>', |
818 | + time_ago='(unknown)'), |
819 | formatted_info) |
820 | |
821 | def test_corrupt_lockdir_info(self): |
822 | @@ -646,3 +653,108 @@ |
823 | ld2.force_break(holder_info) |
824 | lock_path = ld.transport.abspath(ld.path) |
825 | self.assertEqual([], self._calls) |
826 | + |
827 | + |
828 | +class TestLockHeldInfo(TestCase): |
829 | + """Can get information about the lock holder, and detect whether they're |
830 | + still alive.""" |
831 | + |
832 | + def test_repr(self): |
833 | + info = LockHeldInfo.for_this_process(None) |
834 | + self.assertContainsRe(repr(info), r"LockHeldInfo\(.*\)") |
835 | + |
836 | + def test_unicode(self): |
837 | + info = LockHeldInfo.for_this_process(None) |
838 | + self.assertContainsRe(unicode(info), |
839 | + r'held by .* on .* \(process #\d+\), acquired .* ago') |
840 | + |
841 | + def test_is_locked_by_this_process(self): |
842 | + info = LockHeldInfo.for_this_process(None) |
843 | + self.assertTrue(info.is_locked_by_this_process()) |
844 | + |
845 | + def test_is_not_locked_by_this_process(self): |
846 | + info = LockHeldInfo.for_this_process(None) |
847 | + info.info_dict['pid'] = '123123123123123' |
848 | + self.assertFalse(info.is_locked_by_this_process()) |
849 | + |
850 | + def test_lock_holder_live_process(self): |
851 | + """Detect that the holder (this process) is still running.""" |
852 | + info = LockHeldInfo.for_this_process(None) |
853 | + self.assertFalse(info.is_lock_holder_known_dead()) |
854 | + |
855 | + def test_lock_holder_dead_process(self): |
856 | + """Detect that the holder (this process) is still running.""" |
857 | + info = LockHeldInfo.for_this_process(None) |
858 | + info.info_dict['pid'] = '123123123' |
859 | + if sys.platform == 'win32': |
860 | + self.knownFailure( |
861 | + 'live lock holder detection not implemented yet on win32') |
862 | + self.assertTrue(info.is_lock_holder_known_dead()) |
863 | + |
864 | + def test_lock_holder_other_machine(self): |
865 | + """The lock holder isn't here so we don't know if they're alive.""" |
866 | + info = LockHeldInfo.for_this_process(None) |
867 | + info.info_dict['hostname'] = 'egg.example.com' |
868 | + info.info_dict['pid'] = '123123123' |
869 | + self.assertFalse(info.is_lock_holder_known_dead()) |
870 | + |
871 | + def test_lock_holder_other_user(self): |
872 | + """Only auto-break locks held by this user.""" |
873 | + info = LockHeldInfo.for_this_process(None) |
874 | + info.info_dict['user'] = 'notme@example.com' |
875 | + info.info_dict['pid'] = '123123123' |
876 | + self.assertFalse(info.is_lock_holder_known_dead()) |
877 | + |
878 | + def test_no_good_hostname(self): |
879 | + """Correctly handle ambiguous hostnames. |
880 | + |
881 | + If the lock's recorded with just 'localhost' we can't really trust |
882 | + it's the same 'localhost'. (There are quite a few of them. :-) |
883 | + So even if the process is known not to be alive, we can't say that's |
884 | + known for sure. |
885 | + """ |
886 | + self.overrideAttr(lockdir, 'get_host_name', |
887 | + lambda: 'localhost') |
888 | + info = LockHeldInfo.for_this_process(None) |
889 | + info.info_dict['pid'] = '123123123' |
890 | + self.assertFalse(info.is_lock_holder_known_dead()) |
891 | + |
892 | + |
893 | +class TestStaleLockDir(TestCaseWithTransport): |
894 | + """Can automatically break stale locks. |
895 | + |
896 | + :see: https://bugs.launchpad.net/bzr/+bug/220464 |
897 | + """ |
898 | + |
899 | + def test_auto_break_stale_lock(self): |
900 | + """Locks safely known to be stale are just cleaned up. |
901 | + |
902 | + This generates a warning but no other user interaction. |
903 | + """ |
904 | + # This is off by default at present; see the discussion in the bug. |
905 | + # If you change the default, don't forget to update the docs. |
906 | + config.GlobalConfig().set_user_option('locks.steal_dead', True) |
907 | + # Create a lock pretending to come from a different nonexistent |
908 | + # process on the same machine. |
909 | + l1 = LockDir(self.get_transport(), 'a', |
910 | + extra_holder_info={'pid': '12312313'}) |
911 | + token_1 = l1.attempt_lock() |
912 | + l2 = LockDir(self.get_transport(), 'a') |
913 | + token_2 = l2.attempt_lock() |
914 | + # l1 will notice its lock was stolen. |
915 | + self.assertRaises(errors.LockBroken, |
916 | + l1.unlock) |
917 | + l2.unlock() |
918 | + |
919 | + def test_auto_break_stale_lock_configured_off(self): |
920 | + """Automatic breaking can be turned off""" |
921 | + l1 = LockDir(self.get_transport(), 'a', |
922 | + extra_holder_info={'pid': '12312313'}) |
923 | + token_1 = l1.attempt_lock() |
924 | + self.addCleanup(l1.unlock) |
925 | + l2 = LockDir(self.get_transport(), 'a') |
926 | + # This fails now, because dead lock breaking is off by default. |
927 | + self.assertRaises(LockContention, |
928 | + l2.attempt_lock) |
929 | + # and it's in fact not broken |
930 | + l1.confirm() |
931 | |
932 | === modified file 'bzrlib/ui/__init__.py' |
933 | --- bzrlib/ui/__init__.py 2011-05-27 05:16:48 +0000 |
934 | +++ bzrlib/ui/__init__.py 2011-06-13 21:24:39 +0000 |
935 | @@ -154,6 +154,8 @@ |
936 | "It is recommended that you upgrade by " |
937 | "running the command\n" |
938 | " bzr upgrade %(basedir)s"), |
939 | + locks_steal_dead=( |
940 | + u"Stole dead lock %(lock_url)s %(other_holder_info)s."), |
941 | ) |
942 | |
943 | def __init__(self): |
944 | @@ -304,13 +306,13 @@ |
945 | try: |
946 | template = self._user_warning_templates[warning_id] |
947 | except KeyError: |
948 | - fail = "failed to format warning %r, %r" % (warning_id, message_args) |
949 | - warnings.warn(fail) # so tests will fail etc |
950 | + fail = "bzr warning: %r, %r" % (warning_id, message_args) |
951 | + warnings.warn("no template for warning: " + fail) # so tests will fail etc |
952 | return fail |
953 | try: |
954 | return template % message_args |
955 | except ValueError, e: |
956 | - fail = "failed to format warning %r, %r: %s" % ( |
957 | + fail = "bzr unprintable warning: %r, %r, %s" % ( |
958 | warning_id, message_args, e) |
959 | warnings.warn(fail) # so tests will fail etc |
960 | return fail |
961 | |
962 | === modified file 'bzrlib/win32utils.py' |
963 | --- bzrlib/win32utils.py 2011-05-18 16:42:48 +0000 |
964 | +++ bzrlib/win32utils.py 2011-06-13 21:24:39 +0000 |
965 | @@ -576,3 +576,41 @@ |
966 | return argv |
967 | else: |
968 | get_unicode_argv = None |
969 | + |
970 | + |
971 | +if has_win32api: |
972 | + def _pywin32_is_local_pid_dead(pid): |
973 | + """True if pid doesn't correspond to live process on this machine""" |
974 | + try: |
975 | + handle = win32api.OpenProcess(1, False, pid) # PROCESS_TERMINATE |
976 | + except pywintypes.error, e: |
977 | + if e[0] == 5: # ERROR_ACCESS_DENIED |
978 | + # Probably something alive we're not allowed to kill |
979 | + return False |
980 | + elif e[0] == 87: # ERROR_INVALID_PARAMETER |
981 | + return True |
982 | + raise |
983 | + handle.close() |
984 | + return False |
985 | + is_local_pid_dead = _pywin32_is_local_pid_dead |
986 | +elif has_ctypes and sys.platform == 'win32': |
987 | + from ctypes.wintypes import BOOL, DWORD, HANDLE |
988 | + _kernel32 = ctypes.windll.kernel32 |
989 | + _CloseHandle = ctypes.WINFUNCTYPE(BOOL, HANDLE)( |
990 | + ("CloseHandle", _kernel32)) |
991 | + _OpenProcess = ctypes.WINFUNCTYPE(HANDLE, DWORD, BOOL, DWORD)( |
992 | + ("OpenProcess", _kernel32)) |
993 | + def _ctypes_is_local_pid_dead(pid): |
994 | + """True if pid doesn't correspond to live process on this machine""" |
995 | + handle = _OpenProcess(1, False, pid) # PROCESS_TERMINATE |
996 | + if not handle: |
997 | + errorcode = ctypes.GetLastError() |
998 | + if errorcode == 5: # ERROR_ACCESS_DENIED |
999 | + # Probably something alive we're not allowed to kill |
1000 | + return False |
1001 | + elif errorcode == 87: # ERROR_INVALID_PARAMETER |
1002 | + return True |
1003 | + raise ctypes.WinError(errorcode) |
1004 | + _CloseHandle(handle) |
1005 | + return False |
1006 | + is_local_pid_dead = _ctypes_is_local_pid_dead |
1007 | |
1008 | === modified file 'doc/en/release-notes/bzr-2.4.txt' |
1009 | --- doc/en/release-notes/bzr-2.4.txt 2011-06-13 17:18:13 +0000 |
1010 | +++ doc/en/release-notes/bzr-2.4.txt 2011-06-13 21:24:39 +0000 |
1011 | @@ -43,6 +43,13 @@ |
1012 | .. Fixes for situations where bzr would previously crash or give incorrect |
1013 | or undesirable results. |
1014 | |
1015 | +* Bazaar can now detect when a lock file is held by a dead process |
1016 | + originating from the same machine, and steal the lock after printing a |
1017 | + message to the user. This is off by default, for safety, but can be |
1018 | + turned on by setting the configuration variable ``locks.steal_dead`` to |
1019 | + ``True``. |
1020 | + (Martin Pool, #220464) |
1021 | + |
1022 | * Credentials in the log output produced by ``-Dhttp`` are masked so users |
1023 | can more freely post them in bug reports. (Vincent Ladeuil, #723074) |
1024 | |
1025 | @@ -89,6 +96,10 @@ |
1026 | .. Changes that may require updates in plugins or other code that uses |
1027 | bzrlib. |
1028 | |
1029 | +* Information about held lockdir locks returned from eg `LockDir.peek` is |
1030 | + now represented as a `LockHeldInfo` object, rather than a plain Python |
1031 | + dict. |
1032 | + |
1033 | Internals |
1034 | ********* |
1035 |
A few more points here:
* In general, this won't apply to hpss smart server locks, because they are supposed to be held by the server process. (Doing so should make it more likely they'll be cleaned up if the client abruptly disconnects.) In that case you see something like this:
mbp@grace% ./bzr push //bazaar. launchpad. net/~mbp/ bzr/220464- stale-locks/
Using saved push location: lp:~mbp/bzr/220464-stale-locks
Unable to obtain lock held by <email address hidden>
at crowberry [process #27519], acquired 0 seconds ago.
See "bzr help break-lock" for more.
bzr: ERROR: Could not acquire lock "(remote lock)": bzr+ssh:
* We could do an interactive "do you want to break this" ui but that's out of scope.
* In related bug 257217 I'm quite inclined to say a crash-only design will be safer and simpler: if something tries to kill bzr just let it happen and we'll break the lock next time.
* I should test this interactively too....