Merge lp:~mbp/bzr/220464-stale-locks into lp:bzr

Proposed by Martin Pool
Status: Merged
Approved by: Martin Pool
Approved revision: no longer in the source branch.
Merged at revision: 5970
Proposed branch: lp:~mbp/bzr/220464-stale-locks
Merge into: lp:bzr
Diff against target: 1034 lines (+488/-135)
10 files modified
bzrlib/commit.py (+0/-1)
bzrlib/config.py (+7/-4)
bzrlib/help_topics/en/configuration.txt (+8/-0)
bzrlib/lockdir.py (+246/-97)
bzrlib/osutils.py (+27/-0)
bzrlib/tests/__init__.py (+7/-3)
bzrlib/tests/test_lockdir.py (+139/-27)
bzrlib/ui/__init__.py (+5/-3)
bzrlib/win32utils.py (+38/-0)
doc/en/release-notes/bzr-2.4.txt (+11/-0)
To merge this branch: bzr merge lp:~mbp/bzr/220464-stale-locks
Reviewer Review Type Date Requested Status
John A Meinel Approve
Martin Packman (community) Approve
Andrew Bennetts Approve
Review via email: mp+62582@code.launchpad.net

Commit message

optionally detect and steal dead locks from the same machine and user (bug 220464)

Description of the change

bzr automatically detects stale locks originating from other processes on the same machine.

This is a somewhat old branch, but I think it's actually all finished, aside from not handling detection of live/dead processes on Windows. There was a list thread about that but it looks a bit nontrivial, so I'd like to handle that separately.

I think the config variable needs to be renamed to match our current convention.

To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

A few more points here:

* In general, this won't apply to hpss smart server locks, because they are supposed to be held by the server process. (Doing so should make it more likely they'll be cleaned up if the client abruptly disconnects.) In that case you see something like this:

mbp@grace% ./bzr push
Using saved push location: lp:~mbp/bzr/220464-stale-locks
Unable to obtain lock held by <email address hidden>
at crowberry [process #27519], acquired 0 seconds ago.
See "bzr help break-lock" for more.
bzr: ERROR: Could not acquire lock "(remote lock)": bzr+ssh://bazaar.launchpad.net/~mbp/bzr/220464-stale-locks/

* We could do an interactive "do you want to break this" ui but that's out of scope.

* In related bug 257217 I'm quite inclined to say a crash-only design will be safer and simpler: if something tries to kill bzr just let it happen and we'll break the lock next time.

* I should test this interactively too....

Revision history for this message
Martin Pool (mbp) wrote :

This is a little tricky to hit interactively from the command line, but you can get it pretty well from the Python shell:

>>> b.lock_write()
Unable to obtain lock file:///home/mbp/bzr/work/ held by Martin Pool <email address hidden>
at grace [process #7658], acquired 42 seconds ago.
Will continue to try until 10:35:52, unless you press Ctrl-C.
See "bzr help break-lock" for more.
^CTraceback (most recent call last):
....
KeyboardInterrupt

now release lock from another window

>>> b.lock_write()
Stole lock file:///home/mbp/bzr/work/.bzr/branches/220464-stale-locks/.bzr/branch/lock from dead process LockHeldInfo({'nonce': u'gxkq1gpt3t36aketelkt', 'start_time': u'1306456480', 'hostname': u'grace', 'pid': u'7658', 'user': u'Martin Pool <email address hidden>'}).
BranchWriteLockResult(0uqc649u5zeziqwzni06, <bound method BzrBranch7.unlock of BzrBranch7(file:///home/mbp/bzr/work/.bzr/branches/220464-stale-locks/)>)

One thing that demonstrates is that it should probably print the dead process info in a somewhat more baked form...

Revision history for this message
Andrew Bennetts (spiv) wrote :

I like this a lot. You seem to have taken a lot of care with the various possible cases, and you've done some nice refactoring and tidying too.

My only real question is if it's appropriate for is_lock_holder_known_dead to raise if os.kill raises something other than ESRCH or EPERM? On one hand other errnos shouldn't happen, at least on Linux, but even if they do aborting the bzr command entirely is probably a bit unfriendly to the user. Emitting a warning and then returning False might be better?

Apart from that I just have unimportant pedantry:

In test_auto_break_stale_lock:

646 + self.assertRaises(errors.LockBroken,
647 + l1.unlock)

That would fit comfortably on just one line :)

Ditto in test_auto_break_stale_lock_configured_off.

713 +* Information about held lockdir locks returned from eg `LockDir.peek` is
714 + now represented as a `LockHeldInfo` object, rather than a plain Python
715 + dict.

I think to be pendantically correct ReST you want double-backticks for an inline literal. Single backticks means interpreted text with the default role, which I think is unset in our sphinx conf. Hmm, I suppose it would be reasonable to set it to something that automatically links to our API docs, I think pydoctor provides an “api” role for this…

review: Approve
Revision history for this message
Andrew Bennetts (spiv) wrote :

I agree btw that the output isn't particularly pretty. Something formatted more like the usual LockContention output would be nicer. But I'd much rather have this with ugly output than not have it at all.

Revision history for this message
Martin Pool (mbp) wrote :

> I like this a lot. You seem to have taken a lot of care with the various
> possible cases, and you've done some nice refactoring and tidying too.

Thanks!

> My only real question is if it's appropriate for is_lock_holder_known_dead to
> raise if os.kill raises something other than ESRCH or EPERM? On one hand
> other errnos shouldn't happen, at least on Linux, but even if they do aborting
> the bzr command entirely is probably a bit unfriendly to the user. Emitting a
> warning and then returning False might be better?

Historically we have not tended to do that; I think Rob argued that these warnings are not normally going to be checked by tests and so could either cause false passes, or could have latent problems where the warning is emitted all the time.

However in this case I think users would almost certainly not thank us for having bzr stop there unnecessarily. If we do want to test it, there are ways.

> I think to be pendantically correct ReST you want double-backticks for an
> inline literal. Single backticks means interpreted text with the default
> role, which I think is unset in our sphinx conf. Hmm, I suppose it would be
> reasonable to set it to something that automatically links to our API docs, I
> think pydoctor provides an “api” role for this…

I think we should do that. Using double backticks for shell or computer output, and single for API references seems like the best way to spend our backtick budget. (Unless we want to get all docbooky and specifically mark roles on computer input and output, but that seems too longwinded.)

Revision history for this message
Martin Packman (gz) wrote :

See <lp:~gz/bzr/220464-stale-locks> for some changes built on this branch. It includes fixes for some trivial things, and an implementation (well, two really, the pywin32 ones needs testing) of process deadness detection for windows.

The current os.kill detection gets broken pretty thoroughly by Python 2.7 which includes a function of that name on windows that doesn't behave anything like the posix one. Currently my branch doesn't have any fallback (`lambda pid: False` would be reasonable) as posix+windows should cover everything.

review: Needs Fixing
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/27/2011 02:06 AM, Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/220464-stale-locks into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> Bug #220464 in Bazaar: "Bazaar doesn't detect its own stale locks"
> https://bugs.launchpad.net/bzr/+bug/220464
>
> For more details, see:
> https://code.launchpad.net/~mbp/bzr/220464-stale-locks/+merge/62582
>
> bzr automatically detects stale locks originating from other processes on the same machine.
>
> This is a somewhat old branch, but I think it's actually all finished, aside from not handling detection of live/dead processes on Windows. There was a list thread about that but it looks a bit nontrivial, so I'd like to handle that separately.
>
> I think the config variable needs to be renamed to match our current convention.

Unfortunately on newer Python's os.kill *is* available on windows, but
doesn't work like you want it to. So we probably need a "sys.platform ==
'windows'" check as well.
+ if getattr(os, 'kill', None) is None:
+ # Probably not available on Windows.
+ # XXX: How should we check for process liveness there?
+ return False

 review: needsfixing

Otherwise it looks good to me.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3fW3wACgkQJdeBCYSNAAP/IwCeJ2ATSFX6BWF7zWoTxA9ztlV7
jMAAnRiKL5tYKqJsOtt7h9o7duA/zqjB
=fBtG
-----END PGP SIGNATURE-----

review: Needs Fixing
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/27/2011 04:29 AM, Martin [gz] wrote:
> Review: Needs Fixing
> See <lp:~gz/bzr/220464-stale-locks> for some changes built on this branch. It includes fixes for some trivial things, and an implementation (well, two really, the pywin32 ones needs testing) of process deadness detection for windows.
>
> The current os.kill detection gets broken pretty thoroughly by Python 2.7 which includes a function of that name on windows that doesn't behave anything like the posix one. Currently my branch doesn't have any fallback (`lambda pid: False` would be reasonable) as posix+windows should cover everything.
>
>

I thought that was also in python2.6. Regardless, it needs to be done
differently.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3fW+UACgkQJdeBCYSNAAOMswCeNmvO/1LuTFKZjMGGZON+1YBb
dQcAoLq19easU8FMCfcEDJxqcLCZHzAi
=kwJ1
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote :

@jam: gz kindly provided what looks like a working Windows version of a factored-out is_local_pid_dead, so I think we're now ok there. Please let me know if you spot anything else.

Revision history for this message
Martin Pool (mbp) wrote :

I'd like some rereview on this, at least just so John can look at gz's windows code.

Revision history for this message
Robert Collins (lifeless) wrote :

A small question - without reading the diff (sorry, but it seemed a
little large).

Will this result in Launchpad smart servers thinking a lock is held by
a dead, when there are two smart server *machines* with one backend
store mounted over (NFS|OCFS) and the active process is on a different
machine?

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

On 3 June 2011 18:56, Robert Collins <email address hidden> wrote:
> A small question - without reading the diff (sorry, but it seemed a
> little large).
>
> Will this result in Launchpad smart servers thinking a lock is held by
> a dead, when there are two smart server *machines* with one backend
> store mounted over (NFS|OCFS) and the active process is on a different
> machine?

Only if they have the same hostname, which they shouldn't.

More of a worry is if some manages to have a client machine with the
same hostname as the ssh server, and the names are stored
not-fully-qualified, and therefore they get confused about identity.

Perhaps we need something stronger than that. We could put a random
per-user key into their home directory and into the lock file, though
that would be a larger change.

Revision history for this message
Robert Collins (lifeless) wrote :

The impact of a mistake could be pretty bad, so perhaps we should look
at something stronger before doing this?

Revision history for this message
Martin Packman (gz) wrote :

As windows and posix should cover everything, I think this is fine now.

> I'd like some rereview on this, at least just so John can look at gz's windows
> code.

I'd like that too, was coded a little too late at night.

A few notes it:

* I did a ctypes and pywin32 version out of habit. Now we're 2.6 only, there's not much point in having pywin32 code around (apart from where it's easier to spell).
* As Martin changed the posix version to never raise an exception, should the windows versions change too? Really, I think I'd prefer the caller to catch and log rather than the function, but pywin32 makes that annoying by using a different exception type.
* OpenProcess requests the PROCESS_TERMINATE process access right. It might make more sense to ask for PROCESS_QUERY_INFORMATION or something else.
* Could add code later to get the process executable name too with GetProcessImageFileName, as we have the handle open anyway. Being able to say there's a running bzr.exe holding the lock rather than some other process that's inherited the pid may be useful.

review: Approve
Revision history for this message
Vincent Ladeuil (vila) wrote :

sent to pqm by email

Revision history for this message
Martin Pool (mbp) wrote :

On 3 June 2011 19:34, Robert Collins <email address hidden> wrote:
> The impact of a mistake could be pretty bad, so perhaps we should look
> at something stronger before doing this?

I wrote some background about other types of locking improvements we
can do, but we probably all know that. This patch is just about
trying to detect locks belong to processes that are sure to be dead.
As I said above, the main risk is that we'll see a lock actually from
another machine and assume it's actually local. It is a little hard
because there is on unix no obvious guaranteed-unique machine
identifier.

Things we could do to guard against that:

0- don't use this approach at all
1- put in the host default ip (also not guaranteed unique; likely to
often be seen as 127.0.0.1; also arguably a privacy problem to send
this to lp)
2- disable this on the smart server; punt back to the client to decide
if the process is live or not (maybe a good idea anyhow)
3- make up a unique per-user nonce; store it in .bazaar; put it into
lock files (not a perfect solution for home on NFS for instance)
4- don't automatically break it; just ask the user if they want to do
so, or point out that it probably is dead
5- grab some other random data that might both be stable across
multiple processes on a single machine, and across reboots, and also
unique to machines (this does exist on windows, and might be available
in Python)

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/6/2011 12:02 PM, Vincent Ladeuil wrote:
> The proposal to merge lp:~mbp/bzr/220464-stale-locks into lp:bzr has been updated.
>
> Status: Approved => Needs review
>
> For more details, see:
> https://code.launchpad.net/~mbp/bzr/220464-stale-locks/+merge/62582

Vila, you changed this to needs-review after approving it and sending it
to pqm. Did it bounce?

I'll try to take a look at it tomorrow, but my WinAPI isn't all that strong.

I do think gz's comment about access to kill vs access to get info is
relevant. Specifically, if User A does the push, User B cannot kill
their processes. Would that lead to us thinking the process is already
dead, and unlocking things underneath them?

John
=:->
 review: needsinfo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3s5IYACgkQJdeBCYSNAAObsACg0rZOrcMV24l6VkfbXQf6TtBw
rmcAn0OtBiyVbXAvzGp1Ldburw9E0CP5
=PEAW
-----END PGP SIGNATURE-----

review: Needs Information
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/3/2011 11:34 AM, Robert Collins wrote:
> The impact of a mistake could be pretty bad, so perhaps we should look
> at something stronger before doing this?
>

I think the potential for trouble is moderate here, mostly because of
the lifetimes of locks. If two people race on the *repository* lock, you
could get strange corruption. (Both feel they uploaded their content,
but only one of them ends up with their file mentioned in pack-names.)

That, however is meant to be a very fast lock, and shouldn't be opened
by VFS anymore. (I'm sure it could still happen, but we probably would
be more productive fixing those cases rather than spending too much time
on the locking code.)

If a Branch/WT lock gets out of sync, it should be a lot easier to
recover. The locks there seem a bit more advisory. (I'm pushing to this
branch, so don't try to push to it until I'm done.) Because the actual
mutations should be pretty close to atomic. (overwrite
.bzr/branch/last_revision, etc.)

gz saying that we may not be able to detect other people's processes as
being alive is worrying.

Having the shared host be named "ubuntu" and your local host be named
"ubuntu" could be a problem. Would it be reasonable as a first step to
only support local host transports? (So we auto-unlock
file://path/to/branch, but not bzr+ssh://remote-host/path/to/branch.)

I specifically mention "ubuntu" because I believe all ubuntu installs
default to naming the machine "ubuntu" and someone has to actively
change the machine name. (May not be as true anymore.) I think some of
this stems from using a live-install where the machine needs a name
before actually installing.

What about a compromise, and auto-unlocking local objects only, until we
get something like the unique random id in .bazaar, etc. (Note that you
could still get bitten if someone copies their .bazaar folder...)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3s5hsACgkQJdeBCYSNAAM7vACfSMvettIMp/bIGOpf/Ki7zm+d
yeMAoMEnK2O/8IpiABRLcS67TxlJD+eD
=Rcq8
-----END PGP SIGNATURE-----

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> John Arbash Meinel <email address hidden> writes:

<snip/>

    > Vila, you changed this to needs-review after approving it and sending it
    > to pqm. Did it bounce?

Yes (mail forwarded) and poolie said on IRC he wanted more feedback on
it before landing anyway.

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, Jun 6, 2011 at 9:52 PM, Martin Pool <email address hidden> wrote:
> On 3 June 2011 19:34, Robert Collins <email address hidden> wrote:
>> The impact of a mistake could be pretty bad, so perhaps we should look
>> at something stronger before doing this?
>
> I wrote some background about other types of locking improvements we
> can do, but we probably all know that.  This patch is just about
> trying to detect locks belong to processes that are sure to be dead.
> As I said above, the main risk is that we'll see a lock actually from
> another machine and assume it's actually local.  It is a little hard
> because there is on unix no obvious guaranteed-unique machine
> identifier.
>
> Things we could do to guard against that:
>
> 0- don't use this approach at all
> 1- put in the host default ip (also not guaranteed unique; likely to
> often be seen as 127.0.0.1; also arguably a privacy problem to send
> this to lp)
> 2- disable this on the smart server; punt back to the client to decide
> if the process is live or not (maybe a good idea anyhow)
> 3- make up a unique per-user nonce; store it in .bazaar; put it into
> lock files (not a perfect solution for home on NFS for instance)
> 4- don't automatically break it; just ask the user if they want to do
> so, or point out that it probably is dead
> 5- grab some other random data that might both be stable across
> multiple processes on a single machine, and across reboots, and also
> unique to machines (this does exist on windows, and might be available
> in Python)

Right. So I'm worried about one of these cases incorrectly happening
to Launchpad:
 - Launchpad smart server 1 thinking smart server 2 is dead
 - Launchpad smart server 1 thinking sftp client 2 is dead
 - sftp client 1 thinking Launchpad smart server 2 is dead
 - sftp client 1 thinking sftp client 2 is dead

However, These are sorted in what I think is most likely frequency of
use - that is we use the smart server a lot.

Having this go wrong on a repository would be pretty bad. Having it go
wrong on a branch (which has a greater window of opportunity per
John's mail) would probably be less destructive.

For the first case, where Launchpad is running both smart servers, we
can configure a smart server nonce so the servers can detect each
other, if thats how you'd like to prevent misdetected dead processes.

Or we can use a network aware lock inside the data centre (e.g. in
memcache/a db/ ...).

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

On 7 June 2011 06:03, Robert Collins <email address hidden> wrote:
> Right. So I'm worried about one of these cases incorrectly happening
> to Launchpad:
>  - Launchpad smart server 1 thinking smart server 2 is dead
>  - Launchpad smart server 1 thinking sftp client 2 is dead
>  - sftp client 1 thinking Launchpad smart server 2 is dead
>  - sftp client 1 thinking sftp client 2 is dead
>
> However, These are sorted in what I think is most likely frequency of
> use - that is we use the smart server a lot.
>
> Having this go wrong on a repository would be pretty bad. Having it go
> wrong on a branch (which has a greater window of opportunity per
> John's mail) would probably be less destructive.
>
> For the first case, where Launchpad is running both smart servers, we
> can configure a smart server nonce so the servers can detect each
> other, if thats how you'd like to prevent misdetected dead processes.
>
> Or we can use a network aware lock inside the data centre (e.g. in
> memcache/a db/ ...).

When the smart server takes a lock, it puts its own (ie the server
side) hostname and pid into the lock file. (I checked in the source
and by taking a lock on lp.) So the first case should never happen
unless Canonical IS starts giving multiple servers the same name, or
there is some other kind of bizarre split-brain case.

hostname: crowberry
nonce: x4bgo6vp5iixo3o8sl4o
pid: 16838
start_time: 1307421845
user: <email address hidden>

I don't think using a network aware lock is especially useful; backing
the whole repository storage on something other than a disk might be
but I don't see much reason to move just the locks.

The second two cases can potentially happen if someone configures
their machine to have the same hostname as one of the Launchpad
servers, and they access the repo over sftp (or nosmart ssh). This is
probably unlikely considering how obscure Canonical's host names are
but it's possible.

I'd like to use the fqdn but I can't see an easy reliable way to get
it in Python. (It really wants to say localhost.localdomain and
that's just making things worse.) We could put this in configuration
but that will take some work to roll out to lp.

The last case should only happen when someone has configured two
machines with the same hostname and they're both accessing the same
branch at the same time, which is kind of asking for trouble.

We need to bear in mind that at the moment people are encouraged to
break locks by hand and they may well make worse decisions than bzr
can.

One easy step would be to check the user name too; I'll try that now.

Revision history for this message
Andrew Bennetts (spiv) wrote :

Martin Pool wrote:
[...]
> We need to bear in mind that at the moment people are encouraged to
> break locks by hand and they may well make worse decisions than bzr
> can.

Especially in the presence of a smart server, where it's not clear to
users whether the server or the client “owns” the lock. Confusingly,
the client does (via the nonce handed back from the server) but it's the
server's details that appear in the lock info. I've certainly seen
users confused about this on IRC.

I suppose ideally the lock info generated by the server would indicate
“on behalf of bzr smart client from 1.2.3.4:9876” or something like
that. I think LP's server already tweaks BZR_EMAIL to at least use the
LP username rather than codehosting system user.

Revision history for this message
Martin Pool (mbp) wrote :

It does seem to use the user's email address which I guess comes from
the lp account.

Also, I think lock breaking at the moment is done over vfs and it
would be good to abstract that.

Martin

Revision history for this message
Martin Packman (gz) wrote :

> I do think gz's comment about access to kill vs access to get info is
> relevant. Specifically, if User A does the push, User B cannot kill
> their processes. Would that lead to us thinking the process is already
> dead, and unlocking things underneath them?

No, it should fail safe. In this circumstance, you get ERROR_ACCESS_DENIED which we take to mean the process may exist, so don't break the lock. That's just slightly worse information than if we can get a process handle.

Revision history for this message
John A Meinel (jameinel) wrote :

I would like to see the constants in the win32 functions pre-defined as constants, so

PROCESS_TERMINATE = 1
ERROR_ACCESS_DENIED = 5

etc.

But that is a pretty trivial cleanup.
if self.get('hostname') == 'localhost':

I wonder if we should handle "get()" returning None? what about "get('hostname').startswith('localhost') In case it returns "localhost.localdomain".

Anyway, overall I think I've balanced enough to be okay with this. I think the checks Martin added is pretty strict, so should be enough.

review: Approve
Revision history for this message
Martin Pool (mbp) wrote :

Based on the irc discussion, we might start this by just warning in these cases.

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/commit.py'
2--- bzrlib/commit.py 2011-05-17 14:27:30 +0000
3+++ bzrlib/commit.py 2011-06-13 21:24:39 +0000
4@@ -52,7 +52,6 @@
5 from bzrlib import (
6 debug,
7 errors,
8- revision,
9 trace,
10 tree,
11 ui,
12
13=== modified file 'bzrlib/config.py'
14--- bzrlib/config.py 2011-06-01 12:31:04 +0000
15+++ bzrlib/config.py 2011-06-13 21:24:39 +0000
16@@ -75,7 +75,6 @@
17 import re
18 from cStringIO import StringIO
19
20-import bzrlib
21 from bzrlib import (
22 atomicfile,
23 bzrdir,
24@@ -372,16 +371,18 @@
25 value = self._expand_options_in_string(value)
26 return value
27
28- def get_user_option_as_bool(self, option_name, expand=None):
29- """Get a generic option as a boolean - no special process, no default.
30+ def get_user_option_as_bool(self, option_name, expand=None, default=None):
31+ """Get a generic option as a boolean.
32
33+ :param expand: Allow expanding references to other config values.
34+ :param default: Default value if nothing is configured
35 :return None if the option doesn't exist or its value can't be
36 interpreted as a boolean. Returns True or False otherwise.
37 """
38 s = self.get_user_option(option_name, expand=expand)
39 if s is None:
40 # The option doesn't exist
41- return None
42+ return default
43 val = ui.bool_from_string(s)
44 if val is None:
45 # The value can't be interpreted as a boolean
46@@ -2701,6 +2702,8 @@
47 ' the configuration file'),
48 ]
49
50+ _see_also = ['configuration']
51+
52 @commands.display_command
53 def run(self, name=None, all=False, directory=None, scope=None,
54 remove=False):
55
56=== modified file 'bzrlib/help_topics/en/configuration.txt'
57--- bzrlib/help_topics/en/configuration.txt 2011-05-24 14:40:29 +0000
58+++ bzrlib/help_topics/en/configuration.txt 2011-06-13 21:24:39 +0000
59@@ -462,6 +462,14 @@
60 These settings are only needed if the SMTP server requires authentication
61 to send mail.
62
63+locks.steal_dead
64+~~~~~~~~~~~~~~~~
65+
66+If set to true, bzr will automatically break locks held by processes from
67+the same machine and user that are no longer alive. Otherwise, it will
68+print a message and you can break the lock manually, if you are satisfied
69+the object is no longer in use.
70+
71 mail_client
72 ~~~~~~~~~~~
73
74
75=== modified file 'bzrlib/lockdir.py'
76--- bzrlib/lockdir.py 2011-05-16 13:39:39 +0000
77+++ bzrlib/lockdir.py 2011-06-13 21:24:39 +0000
78@@ -92,6 +92,10 @@
79 >>> # do something here
80 >>> l.unlock()
81
82+Some classes of stale locks can be predicted by checking: the host name is the
83+same as the local host name; the user name is the same as the local user; the
84+process id no longer exists. The check on user name is not strictly necessary
85+but helps protect against colliding host names.
86 """
87
88
89@@ -103,16 +107,19 @@
90 # the existing locking code and needs a new format of the containing object.
91 # -- robertc, mbp 20070628
92
93+import errno
94 import os
95 import time
96
97 from bzrlib import (
98+ config,
99 debug,
100 errors,
101 lock,
102 osutils,
103+ ui,
104+ urlutils,
105 )
106-import bzrlib.config
107 from bzrlib.decorators import only_raises
108 from bzrlib.errors import (
109 DirectoryNotEmpty,
110@@ -130,7 +137,6 @@
111 )
112 from bzrlib.trace import mutter, note
113 from bzrlib.osutils import format_delta, rand_chars, get_host_name
114-import bzrlib.ui
115
116 from bzrlib.lazy_import import lazy_import
117 lazy_import(globals(), """
118@@ -162,7 +168,8 @@
119
120 __INFO_NAME = '/info'
121
122- def __init__(self, transport, path, file_modebits=0644, dir_modebits=0755):
123+ def __init__(self, transport, path, file_modebits=0644, dir_modebits=0755,
124+ extra_holder_info=None):
125 """Create a new LockDir object.
126
127 The LockDir is initially unlocked - this just creates the object.
128@@ -171,6 +178,10 @@
129
130 :param path: Path to the lock within the base directory of the
131 transport.
132+
133+ :param extra_holder_info: If passed, {str:str} dict of extra or
134+ updated information to insert into the info file when the lock is
135+ taken.
136 """
137 self.transport = transport
138 self.path = path
139@@ -181,8 +192,9 @@
140 self._held_info_path = self._held_dir + self.__INFO_NAME
141 self._file_modebits = file_modebits
142 self._dir_modebits = dir_modebits
143-
144 self._report_function = note
145+ self.extra_holder_info = extra_holder_info
146+ self._warned_about_lock_holder = None
147
148 def __repr__(self):
149 return '%s(%s%s)' % (self.__class__.__name__,
150@@ -203,7 +215,6 @@
151 except (TransportError, PathError), e:
152 raise LockFailed(self, e)
153
154-
155 def _attempt_lock(self):
156 """Make the pending directory and attempt to rename into place.
157
158@@ -216,8 +227,8 @@
159
160 :returns: The nonce of the lock, if it was successfully acquired.
161
162- :raises LockContention: If the lock is held by someone else. The exception
163- contains the info of the current holder of the lock.
164+ :raises LockContention: If the lock is held by someone else. The
165+ exception contains the info of the current holder of the lock.
166 """
167 self._trace("lock_write...")
168 start_time = time.time()
169@@ -226,17 +237,24 @@
170 except (errors.TransportError, PathError), e:
171 self._trace("... failed to create pending dir, %s", e)
172 raise LockFailed(self, e)
173- try:
174- self.transport.rename(tmpname, self._held_dir)
175- except (errors.TransportError, PathError, DirectoryNotEmpty,
176- FileExists, ResourceBusy), e:
177- self._trace("... contention, %s", e)
178- self._remove_pending_dir(tmpname)
179- raise LockContention(self)
180- except Exception, e:
181- self._trace("... lock failed, %s", e)
182- self._remove_pending_dir(tmpname)
183- raise
184+ while True:
185+ try:
186+ self.transport.rename(tmpname, self._held_dir)
187+ break
188+ except (errors.TransportError, PathError, DirectoryNotEmpty,
189+ FileExists, ResourceBusy), e:
190+ self._trace("... contention, %s", e)
191+ other_holder = self.peek()
192+ self._trace("other holder is %r" % other_holder)
193+ try:
194+ self._handle_lock_contention(other_holder)
195+ except:
196+ self._remove_pending_dir(tmpname)
197+ raise
198+ except Exception, e:
199+ self._trace("... lock failed, %s", e)
200+ self._remove_pending_dir(tmpname)
201+ raise
202 # We must check we really got the lock, because Launchpad's sftp
203 # server at one time had a bug were the rename would successfully
204 # move the new directory into the existing directory, which was
205@@ -262,6 +280,33 @@
206 (time.time() - start_time) * 1000)
207 return self.nonce
208
209+ def _handle_lock_contention(self, other_holder):
210+ """A lock we want to take is held by someone else.
211+
212+ This function can: tell the user about it; possibly detect that it's
213+ safe or appropriate to steal the lock, or just raise an exception.
214+
215+ If this function returns (without raising an exception) the lock will
216+ be attempted again.
217+
218+ :param other_holder: A LockHeldInfo for the current holder; note that
219+ it might be None if the lock can be seen to be held but the info
220+ can't be read.
221+ """
222+ if (other_holder is not None):
223+ if other_holder.is_lock_holder_known_dead():
224+ if self.get_config().get_user_option_as_bool(
225+ 'locks.steal_dead',
226+ default=False):
227+ ui.ui_factory.show_user_warning(
228+ 'locks_steal_dead',
229+ lock_url=urlutils.join(self.transport.base, self.path),
230+ other_holder_info=unicode(other_holder))
231+ self.force_break(other_holder)
232+ self._trace("stole lock from dead holder")
233+ return
234+ raise LockContention(self)
235+
236 def _remove_pending_dir(self, tmpname):
237 """Remove the pending directory
238
239@@ -287,14 +332,14 @@
240 self.create(mode=self._dir_modebits)
241 # After creating the lock directory, try again
242 self.transport.mkdir(tmpname)
243- self.nonce = rand_chars(20)
244- info_bytes = self._prepare_info()
245+ info = LockHeldInfo.for_this_process(self.extra_holder_info)
246+ self.nonce = info.get('nonce')
247 # We use put_file_non_atomic because we just created a new unique
248 # directory so we don't have to worry about files existing there.
249 # We'll rename the whole directory into place to get atomic
250 # properties
251 self.transport.put_bytes_non_atomic(tmpname + self.__INFO_NAME,
252- info_bytes)
253+ info.to_bytes())
254 return tmpname
255
256 @only_raises(LockNotHeld, LockBroken)
257@@ -344,9 +389,10 @@
258 def break_lock(self):
259 """Break a lock not held by this instance of LockDir.
260
261- This is a UI centric function: it uses the bzrlib.ui.ui_factory to
262+ This is a UI centric function: it uses the ui.ui_factory to
263 prompt for input if a lock is detected and there is any doubt about
264- it possibly being still active.
265+ it possibly being still active. force_break is the non-interactive
266+ version.
267
268 :returns: LockResult for the broken lock.
269 """
270@@ -355,16 +401,16 @@
271 holder_info = self.peek()
272 except LockCorrupt, e:
273 # The lock info is corrupt.
274- if bzrlib.ui.ui_factory.get_boolean(u"Break (corrupt %r)" % (self,)):
275+ if ui.ui_factory.get_boolean(u"Break (corrupt %r)" % (self,)):
276 self.force_break_corrupt(e.file_data)
277 return
278 if holder_info is not None:
279- lock_info = '\n'.join(self._format_lock_info(holder_info))
280- if bzrlib.ui.ui_factory.confirm_action(
281- "Break %(lock_info)s", 'bzrlib.lockdir.break',
282- dict(lock_info=lock_info)):
283+ if ui.ui_factory.confirm_action(
284+ u"Break %(lock_info)s",
285+ 'bzrlib.lockdir.break',
286+ dict(lock_info=unicode(holder_info))):
287 result = self.force_break(holder_info)
288- bzrlib.ui.ui_factory.show_message(
289+ ui.ui_factory.show_message(
290 "Broke lock %s" % result.lock_url)
291
292 def force_break(self, dead_holder_info):
293@@ -374,18 +420,19 @@
294 it still thinks it has the lock there will be two concurrent writers.
295 In general the user's approval should be sought for lock breaks.
296
297- dead_holder_info must be the result of a previous LockDir.peek() call;
298- this is used to check that it's still held by the same process that
299- the user decided was dead. If this is not the current holder,
300- LockBreakMismatch is raised.
301-
302 After the lock is broken it will not be held by any process.
303 It is possible that another process may sneak in and take the
304 lock before the breaking process acquires it.
305
306+ :param dead_holder_info:
307+ Must be the result of a previous LockDir.peek() call; this is used
308+ to check that it's still held by the same process that the user
309+ decided was dead. If this is not the current holder,
310+ LockBreakMismatch is raised.
311+
312 :returns: LockResult for the broken lock.
313 """
314- if not isinstance(dead_holder_info, dict):
315+ if not isinstance(dead_holder_info, LockHeldInfo):
316 raise ValueError("dead_holder_info: %r" % dead_holder_info)
317 self._check_not_locked()
318 current_info = self.peek()
319@@ -413,10 +460,10 @@
320
321 def force_break_corrupt(self, corrupt_info_lines):
322 """Release a lock that has been corrupted.
323-
324+
325 This is very similar to force_break, it except it doesn't assume that
326 self.peek() can work.
327-
328+
329 :param corrupt_info_lines: the lines of the corrupted info file, used
330 to check that the lock hasn't changed between reading the (corrupt)
331 info file and calling force_break_corrupt.
332@@ -470,7 +517,8 @@
333
334 peek() reads the info file of the lock holder, if any.
335 """
336- return self._parse_info(self.transport.get_bytes(path))
337+ return LockHeldInfo.from_info_file_bytes(
338+ self.transport.get_bytes(path))
339
340 def peek(self):
341 """Check if the lock is held by anyone.
342@@ -489,35 +537,6 @@
343 def _prepare_info(self):
344 """Write information about a pending lock to a temporary file.
345 """
346- # XXX: is creating this here inefficient?
347- config = bzrlib.config.GlobalConfig()
348- try:
349- user = config.username()
350- except errors.NoWhoami:
351- user = osutils.getuser_unicode()
352- s = rio.Stanza(hostname=get_host_name(),
353- pid=str(os.getpid()),
354- start_time=str(int(time.time())),
355- nonce=self.nonce,
356- user=user,
357- )
358- return s.to_string()
359-
360- def _parse_info(self, info_bytes):
361- lines = osutils.split_lines(info_bytes)
362- try:
363- stanza = rio.read_stanza(lines)
364- except ValueError, e:
365- mutter('Corrupt lock info file: %r', lines)
366- raise LockCorrupt("could not parse lock info file: " + str(e),
367- lines)
368- if stanza is None:
369- # see bug 185013; we fairly often end up with the info file being
370- # empty after an interruption; we could log a message here but
371- # there may not be much we can say
372- return {}
373- else:
374- return stanza.as_dict()
375
376 def attempt_lock(self):
377 """Take the lock; fail if it's already held.
378@@ -598,24 +617,16 @@
379 else:
380 start = 'Lock owner changed for'
381 last_info = new_info
382- formatted_info = self._format_lock_info(new_info)
383+ msg = u'%s lock %s %s.' % (start, lock_url, new_info)
384 if deadline_str is None:
385 deadline_str = time.strftime('%H:%M:%S',
386- time.localtime(deadline))
387- user, hostname, pid, time_ago = formatted_info
388- msg = ('%s lock %s ' # lock_url
389- 'held by ' # start
390- '%s\n' # user
391- 'at %s ' # hostname
392- '[process #%s], ' # pid
393- 'acquired %s.') # time ago
394- msg_args = [start, lock_url, user, hostname, pid, time_ago]
395+ time.localtime(deadline))
396 if timeout > 0:
397 msg += ('\nWill continue to try until %s, unless '
398- 'you press Ctrl-C.')
399- msg_args.append(deadline_str)
400+ 'you press Ctrl-C.'
401+ % deadline_str)
402 msg += '\nSee "bzr help break-lock" for more.'
403- self._report_function(msg, *msg_args)
404+ self._report_function(msg)
405 if (max_attempts is not None) and (attempt_count >= max_attempts):
406 self._trace("exceeded %d attempts")
407 raise LockContention(self)
408@@ -676,23 +687,6 @@
409 raise LockContention(self)
410 self._fake_read_lock = True
411
412- def _format_lock_info(self, info):
413- """Turn the contents of peek() into something for the user"""
414- start_time = info.get('start_time')
415- if start_time is None:
416- time_ago = '(unknown)'
417- else:
418- time_ago = format_delta(time.time() - int(info['start_time']))
419- user = info.get('user', '<unknown>')
420- hostname = info.get('hostname', '<unknown>')
421- pid = info.get('pid', '<unknown>')
422- return [
423- user,
424- hostname,
425- pid,
426- time_ago,
427- ]
428-
429 def validate_token(self, token):
430 if token is not None:
431 info = self.peek()
432@@ -710,3 +704,158 @@
433 if 'lock' not in debug.debug_flags:
434 return
435 mutter(str(self) + ": " + (format % args))
436+
437+ def get_config(self):
438+ """Get the configuration that governs this lockdir."""
439+ # XXX: This really should also use the locationconfig at least, but
440+ # that seems a bit hard to hook up at the moment. -- mbp 20110329
441+ return config.GlobalConfig()
442+
443+
444+class LockHeldInfo(object):
445+ """The information recorded about a held lock.
446+
447+ This information is recorded into the lock when it's taken, and it can be
448+ read back by any process with access to the lockdir. It can be used, for
449+ example, to tell the user who holds the lock, or to try to detect whether
450+ the lock holder is still alive.
451+
452+ Prior to bzr 2.4 a simple dict was used instead of an object.
453+ """
454+
455+ def __init__(self, info_dict):
456+ self.info_dict = info_dict
457+
458+ def __repr__(self):
459+ """Return a debugging representation of this object."""
460+ return "%s(%r)" % (self.__class__.__name__, self.info_dict)
461+
462+ def __unicode__(self):
463+ """Return a user-oriented description of this object."""
464+ d = self.to_readable_dict()
465+ return (
466+ u'held by %(user)s on %(hostname)s (process #%(pid)s), '
467+ u'acquired %(time_ago)s' % d)
468+
469+ def to_readable_dict(self):
470+ """Turn the holder info into a dict of human-readable attributes.
471+
472+ For example, the start time is presented relative to the current time,
473+ rather than as seconds since the epoch.
474+
475+ Returns a list of [user, hostname, pid, time_ago] all as readable
476+ strings.
477+ """
478+ start_time = self.info_dict.get('start_time')
479+ if start_time is None:
480+ time_ago = '(unknown)'
481+ else:
482+ time_ago = format_delta(
483+ time.time() - int(self.info_dict['start_time']))
484+ user = self.info_dict.get('user', '<unknown>')
485+ hostname = self.info_dict.get('hostname', '<unknown>')
486+ pid = self.info_dict.get('pid', '<unknown>')
487+ return dict(
488+ user=user,
489+ hostname=hostname,
490+ pid=pid,
491+ time_ago=time_ago)
492+
493+ def get(self, field_name):
494+ """Return the contents of a field from the lock info, or None."""
495+ return self.info_dict.get(field_name)
496+
497+ @classmethod
498+ def for_this_process(cls, extra_holder_info):
499+ """Return a new LockHeldInfo for a lock taken by this process.
500+ """
501+ info = dict(
502+ hostname=get_host_name(),
503+ pid=str(os.getpid()),
504+ nonce=rand_chars(20),
505+ start_time=str(int(time.time())),
506+ user=get_username_for_lock_info(),
507+ )
508+ if extra_holder_info is not None:
509+ info.update(extra_holder_info)
510+ return cls(info)
511+
512+ def to_bytes(self):
513+ s = rio.Stanza(**self.info_dict)
514+ return s.to_string()
515+
516+ @classmethod
517+ def from_info_file_bytes(cls, info_file_bytes):
518+ """Construct from the contents of the held file."""
519+ lines = osutils.split_lines(info_file_bytes)
520+ try:
521+ stanza = rio.read_stanza(lines)
522+ except ValueError, e:
523+ mutter('Corrupt lock info file: %r', lines)
524+ raise LockCorrupt("could not parse lock info file: " + str(e),
525+ lines)
526+ if stanza is None:
527+ # see bug 185013; we fairly often end up with the info file being
528+ # empty after an interruption; we could log a message here but
529+ # there may not be much we can say
530+ return cls({})
531+ else:
532+ return cls(stanza.as_dict())
533+
534+ def __cmp__(self, other):
535+ """Value comparison of lock holders."""
536+ return (
537+ cmp(type(self), type(other))
538+ or cmp(self.info_dict, other.info_dict))
539+
540+ def is_locked_by_this_process(self):
541+ """True if this process seems to be the current lock holder."""
542+ return (
543+ self.get('hostname') == get_host_name()
544+ and self.get('pid') == str(os.getpid())
545+ and self.get('user') == get_username_for_lock_info())
546+
547+ def is_lock_holder_known_dead(self):
548+ """True if the lock holder process is known to be dead.
549+
550+ False if it's either known to be still alive, or if we just can't tell.
551+
552+ We can be fairly sure the lock holder is dead if it declared the same
553+ hostname and there is no process with the given pid alive. If people
554+ have multiple machines with the same hostname this may cause trouble.
555+
556+ This doesn't check whether the lock holder is in fact the same process
557+ calling this method. (In that case it will return true.)
558+ """
559+ if self.get('hostname') != get_host_name():
560+ return False
561+ if self.get('hostname') == 'localhost':
562+ # Too ambiguous.
563+ return False
564+ if self.get('user') != get_username_for_lock_info():
565+ # Could well be another local process by a different user, but
566+ # just to be safe we won't conclude about this either.
567+ return False
568+ pid_str = self.info_dict.get('pid', None)
569+ if not pid_str:
570+ mutter("no pid recorded in %r" % (self, ))
571+ return False
572+ try:
573+ pid = int(pid_str)
574+ except ValueError:
575+ mutter("can't parse pid %r from %r"
576+ % (pid_str, self))
577+ return False
578+ return osutils.is_local_pid_dead(pid)
579+
580+
581+def get_username_for_lock_info():
582+ """Get a username suitable for putting into a lock.
583+
584+ It's ok if what's written here is not a proper email address as long
585+ as it gives some clue who the user is.
586+ """
587+ try:
588+ return config.GlobalConfig().username()
589+ except errors.NoWhoami:
590+ return osutils.getuser_unicode()
591
592=== modified file 'bzrlib/osutils.py'
593--- bzrlib/osutils.py 2011-05-26 08:05:45 +0000
594+++ bzrlib/osutils.py 2011-06-13 21:24:39 +0000
595@@ -2448,3 +2448,30 @@
596 if os.access(f, os.X_OK):
597 return f
598 return None
599+
600+
601+def _posix_is_local_pid_dead(pid):
602+ """True if pid doesn't correspond to live process on this machine"""
603+ try:
604+ # Special meaning of unix kill: just check if it's there.
605+ os.kill(pid, 0)
606+ except OSError, e:
607+ if e.errno == errno.ESRCH:
608+ # On this machine, and really not found: as sure as we can be
609+ # that it's dead.
610+ return True
611+ elif e.errno == errno.EPERM:
612+ # exists, though not ours
613+ return False
614+ else:
615+ mutter("os.kill(%d, 0) failed: %s" % (pid, e))
616+ # Don't really know.
617+ return False
618+ else:
619+ # Exists and our process: not dead.
620+ return False
621+
622+if sys.platform == "win32":
623+ is_local_pid_dead = win32utils.is_local_pid_dead
624+else:
625+ is_local_pid_dead = _posix_is_local_pid_dead
626
627=== modified file 'bzrlib/tests/__init__.py'
628--- bzrlib/tests/__init__.py 2011-06-09 15:13:54 +0000
629+++ bzrlib/tests/__init__.py 2011-06-13 21:24:39 +0000
630@@ -1070,9 +1070,13 @@
631 # break some locks on purpose and should be taken into account by
632 # considering that breaking a lock is just a dirty way of releasing it.
633 if len(acquired_locks) != (len(released_locks) + len(broken_locks)):
634- message = ('Different number of acquired and '
635- 'released or broken locks. (%s, %s + %s)' %
636- (acquired_locks, released_locks, broken_locks))
637+ message = (
638+ 'Different number of acquired and '
639+ 'released or broken locks.\n'
640+ 'acquired=%s\n'
641+ 'released=%s\n'
642+ 'broken=%s\n' %
643+ (acquired_locks, released_locks, broken_locks))
644 if not self._lock_check_thorough:
645 # Rather than fail, just warn
646 print "Broken test %s: %s" % (self, message)
647
648=== modified file 'bzrlib/tests/test_lockdir.py'
649--- bzrlib/tests/test_lockdir.py 2011-05-13 12:51:05 +0000
650+++ bzrlib/tests/test_lockdir.py 2011-06-13 21:24:39 +0000
651@@ -17,6 +17,7 @@
652 """Tests for LockDir"""
653
654 import os
655+import sys
656 import time
657
658 import bzrlib
659@@ -24,6 +25,7 @@
660 config,
661 errors,
662 lock,
663+ lockdir,
664 osutils,
665 tests,
666 transport,
667@@ -35,9 +37,13 @@
668 LockFailed,
669 LockNotHeld,
670 )
671-from bzrlib.lockdir import LockDir
672+from bzrlib.lockdir import (
673+ LockDir,
674+ LockHeldInfo,
675+ )
676 from bzrlib.tests import (
677 features,
678+ TestCase,
679 TestCaseWithTransport,
680 )
681 from bzrlib.trace import note
682@@ -48,6 +54,7 @@
683 # implementation are tested separately. (The main requirement is just that
684 # they don't allow overwriting nonempty directories.)
685
686+
687 class TestLockDir(TestCaseWithTransport):
688 """Test LockDir operations"""
689
690@@ -141,8 +148,8 @@
691 self.addCleanup(lf1.unlock)
692 # lock is held, should get some info on it
693 info1 = lf1.peek()
694- self.assertEqual(set(info1.keys()),
695- set(['user', 'nonce', 'hostname', 'pid', 'start_time']))
696+ self.assertEqual(set(info1.info_dict.keys()),
697+ set(['user', 'nonce', 'hostname', 'pid', 'start_time']))
698 # should get the same info if we look at it through a different
699 # instance
700 info2 = LockDir(t, 'test_lock').peek()
701@@ -161,7 +168,7 @@
702 self.addCleanup(lf1.unlock)
703 info2 = lf2.peek()
704 self.assertTrue(info2)
705- self.assertEqual(info2['nonce'], lf1.nonce)
706+ self.assertEqual(info2.get('nonce'), lf1.nonce)
707
708 def test_30_lock_wait_fail(self):
709 """Wait on a lock, then fail
710@@ -184,24 +191,16 @@
711 # it should only take about 0.4 seconds, but we allow more time in
712 # case the machine is heavily loaded
713 self.assertTrue(after - before <= 8.0,
714- "took %f seconds to detect lock contention" % (after - before))
715+ "took %f seconds to detect lock contention" % (after - before))
716 finally:
717 lf1.unlock()
718 self.assertEqual(1, len(self._logged_reports))
719- self.assertEqual(self._logged_reports[0][0],
720- '%s lock %s held by %s\n'
721- 'at %s [process #%s], acquired %s.\n'
722- 'Will continue to try until %s, unless '
723- 'you press Ctrl-C.\n'
724- 'See "bzr help break-lock" for more.')
725- start, lock_url, user, hostname, pid, time_ago, deadline_str = \
726- self._logged_reports[0][1]
727- self.assertEqual(start, u'Unable to obtain')
728- self.assertEqual(user, u'jrandom@example.com')
729- # skip hostname
730- self.assertContainsRe(pid, r'\d+')
731- self.assertContainsRe(time_ago, r'.* ago')
732- self.assertContainsRe(deadline_str, r'\d{2}:\d{2}:\d{2}')
733+ self.assertContainsRe(self._logged_reports[0][0],
734+ r'Unable to obtain lock .* held by jrandom@example\.com on .*'
735+ r' \(process #\d+\), acquired .* ago\.\n'
736+ r'Will continue to try until \d{2}:\d{2}:\d{2}, unless '
737+ r'you press Ctrl-C.\n'
738+ r'See "bzr help break-lock" for more.')
739
740 def test_31_lock_wait_easy(self):
741 """Succeed when waiting on a lock with no contention.
742@@ -349,12 +348,15 @@
743 ld.create()
744 ld.lock_write()
745 ld.transport.put_bytes_non_atomic('test_lock/held/info', '\0')
746+
747 class LoggingUIFactory(bzrlib.ui.SilentUIFactory):
748 def __init__(self):
749 self.prompts = []
750+
751 def get_boolean(self, prompt):
752 self.prompts.append(('boolean', prompt))
753 return True
754+
755 ui = LoggingUIFactory()
756 self.overrideAttr(bzrlib.ui, 'ui_factory', ui)
757 ld2.break_lock()
758@@ -372,12 +374,15 @@
759 ld.create()
760 ld.lock_write()
761 ld.transport.delete('test_lock/held/info')
762+
763 class LoggingUIFactory(bzrlib.ui.SilentUIFactory):
764 def __init__(self):
765 self.prompts = []
766+
767 def get_boolean(self, prompt):
768 self.prompts.append(('boolean', prompt))
769 return True
770+
771 ui = LoggingUIFactory()
772 orig_factory = bzrlib.ui.ui_factory
773 bzrlib.ui.ui_factory = ui
774@@ -416,18 +421,17 @@
775 lf1.unlock()
776 self.assertFalse(t.has('test_lock/held/info'))
777
778- def test__format_lock_info(self):
779+ def test_display_form(self):
780 ld1 = self.get_lock()
781 ld1.create()
782 ld1.lock_write()
783 try:
784- info_list = ld1._format_lock_info(ld1.peek())
785+ info_list = ld1.peek().to_readable_dict()
786 finally:
787 ld1.unlock()
788- self.assertEqual(info_list[0], u'jrandom@example.com')
789- # info_list[1] is hostname. we skip this.
790- self.assertContainsRe(info_list[2], '^\d+$') # pid
791- self.assertContainsRe(info_list[3], r'^\d+ seconds? ago$') # time_ago
792+ self.assertEqual(info_list['user'], u'jrandom@example.com')
793+ self.assertContainsRe(info_list['pid'], '^\d+$')
794+ self.assertContainsRe(info_list['time_ago'], r'^\d+ seconds? ago$')
795
796 def test_lock_without_email(self):
797 global_config = config.GlobalConfig()
798@@ -481,8 +485,10 @@
799 # should be nothing before we start
800 ld1.create()
801 t = self.get_transport().clone('test_lock')
802+
803 def check_dir(a):
804 self.assertEquals(a, t.list_dir('.'))
805+
806 check_dir([])
807 # when held, that's all we see
808 ld1.attempt_lock()
809@@ -505,9 +511,10 @@
810 t.put_bytes('test_lock/held/info', '')
811 lf = LockDir(t, 'test_lock')
812 info = lf.peek()
813- formatted_info = lf._format_lock_info(info)
814+ formatted_info = info.to_readable_dict()
815 self.assertEquals(
816- ['<unknown>', '<unknown>', '<unknown>', '(unknown)'],
817+ dict(user='<unknown>', hostname='<unknown>', pid='<unknown>',
818+ time_ago='(unknown)'),
819 formatted_info)
820
821 def test_corrupt_lockdir_info(self):
822@@ -646,3 +653,108 @@
823 ld2.force_break(holder_info)
824 lock_path = ld.transport.abspath(ld.path)
825 self.assertEqual([], self._calls)
826+
827+
828+class TestLockHeldInfo(TestCase):
829+ """Can get information about the lock holder, and detect whether they're
830+ still alive."""
831+
832+ def test_repr(self):
833+ info = LockHeldInfo.for_this_process(None)
834+ self.assertContainsRe(repr(info), r"LockHeldInfo\(.*\)")
835+
836+ def test_unicode(self):
837+ info = LockHeldInfo.for_this_process(None)
838+ self.assertContainsRe(unicode(info),
839+ r'held by .* on .* \(process #\d+\), acquired .* ago')
840+
841+ def test_is_locked_by_this_process(self):
842+ info = LockHeldInfo.for_this_process(None)
843+ self.assertTrue(info.is_locked_by_this_process())
844+
845+ def test_is_not_locked_by_this_process(self):
846+ info = LockHeldInfo.for_this_process(None)
847+ info.info_dict['pid'] = '123123123123123'
848+ self.assertFalse(info.is_locked_by_this_process())
849+
850+ def test_lock_holder_live_process(self):
851+ """Detect that the holder (this process) is still running."""
852+ info = LockHeldInfo.for_this_process(None)
853+ self.assertFalse(info.is_lock_holder_known_dead())
854+
855+ def test_lock_holder_dead_process(self):
856+ """Detect that the holder (this process) is still running."""
857+ info = LockHeldInfo.for_this_process(None)
858+ info.info_dict['pid'] = '123123123'
859+ if sys.platform == 'win32':
860+ self.knownFailure(
861+ 'live lock holder detection not implemented yet on win32')
862+ self.assertTrue(info.is_lock_holder_known_dead())
863+
864+ def test_lock_holder_other_machine(self):
865+ """The lock holder isn't here so we don't know if they're alive."""
866+ info = LockHeldInfo.for_this_process(None)
867+ info.info_dict['hostname'] = 'egg.example.com'
868+ info.info_dict['pid'] = '123123123'
869+ self.assertFalse(info.is_lock_holder_known_dead())
870+
871+ def test_lock_holder_other_user(self):
872+ """Only auto-break locks held by this user."""
873+ info = LockHeldInfo.for_this_process(None)
874+ info.info_dict['user'] = 'notme@example.com'
875+ info.info_dict['pid'] = '123123123'
876+ self.assertFalse(info.is_lock_holder_known_dead())
877+
878+ def test_no_good_hostname(self):
879+ """Correctly handle ambiguous hostnames.
880+
881+ If the lock's recorded with just 'localhost' we can't really trust
882+ it's the same 'localhost'. (There are quite a few of them. :-)
883+ So even if the process is known not to be alive, we can't say that's
884+ known for sure.
885+ """
886+ self.overrideAttr(lockdir, 'get_host_name',
887+ lambda: 'localhost')
888+ info = LockHeldInfo.for_this_process(None)
889+ info.info_dict['pid'] = '123123123'
890+ self.assertFalse(info.is_lock_holder_known_dead())
891+
892+
893+class TestStaleLockDir(TestCaseWithTransport):
894+ """Can automatically break stale locks.
895+
896+ :see: https://bugs.launchpad.net/bzr/+bug/220464
897+ """
898+
899+ def test_auto_break_stale_lock(self):
900+ """Locks safely known to be stale are just cleaned up.
901+
902+ This generates a warning but no other user interaction.
903+ """
904+ # This is off by default at present; see the discussion in the bug.
905+ # If you change the default, don't forget to update the docs.
906+ config.GlobalConfig().set_user_option('locks.steal_dead', True)
907+ # Create a lock pretending to come from a different nonexistent
908+ # process on the same machine.
909+ l1 = LockDir(self.get_transport(), 'a',
910+ extra_holder_info={'pid': '12312313'})
911+ token_1 = l1.attempt_lock()
912+ l2 = LockDir(self.get_transport(), 'a')
913+ token_2 = l2.attempt_lock()
914+ # l1 will notice its lock was stolen.
915+ self.assertRaises(errors.LockBroken,
916+ l1.unlock)
917+ l2.unlock()
918+
919+ def test_auto_break_stale_lock_configured_off(self):
920+ """Automatic breaking can be turned off"""
921+ l1 = LockDir(self.get_transport(), 'a',
922+ extra_holder_info={'pid': '12312313'})
923+ token_1 = l1.attempt_lock()
924+ self.addCleanup(l1.unlock)
925+ l2 = LockDir(self.get_transport(), 'a')
926+ # This fails now, because dead lock breaking is off by default.
927+ self.assertRaises(LockContention,
928+ l2.attempt_lock)
929+ # and it's in fact not broken
930+ l1.confirm()
931
932=== modified file 'bzrlib/ui/__init__.py'
933--- bzrlib/ui/__init__.py 2011-05-27 05:16:48 +0000
934+++ bzrlib/ui/__init__.py 2011-06-13 21:24:39 +0000
935@@ -154,6 +154,8 @@
936 "It is recommended that you upgrade by "
937 "running the command\n"
938 " bzr upgrade %(basedir)s"),
939+ locks_steal_dead=(
940+ u"Stole dead lock %(lock_url)s %(other_holder_info)s."),
941 )
942
943 def __init__(self):
944@@ -304,13 +306,13 @@
945 try:
946 template = self._user_warning_templates[warning_id]
947 except KeyError:
948- fail = "failed to format warning %r, %r" % (warning_id, message_args)
949- warnings.warn(fail) # so tests will fail etc
950+ fail = "bzr warning: %r, %r" % (warning_id, message_args)
951+ warnings.warn("no template for warning: " + fail) # so tests will fail etc
952 return fail
953 try:
954 return template % message_args
955 except ValueError, e:
956- fail = "failed to format warning %r, %r: %s" % (
957+ fail = "bzr unprintable warning: %r, %r, %s" % (
958 warning_id, message_args, e)
959 warnings.warn(fail) # so tests will fail etc
960 return fail
961
962=== modified file 'bzrlib/win32utils.py'
963--- bzrlib/win32utils.py 2011-05-18 16:42:48 +0000
964+++ bzrlib/win32utils.py 2011-06-13 21:24:39 +0000
965@@ -576,3 +576,41 @@
966 return argv
967 else:
968 get_unicode_argv = None
969+
970+
971+if has_win32api:
972+ def _pywin32_is_local_pid_dead(pid):
973+ """True if pid doesn't correspond to live process on this machine"""
974+ try:
975+ handle = win32api.OpenProcess(1, False, pid) # PROCESS_TERMINATE
976+ except pywintypes.error, e:
977+ if e[0] == 5: # ERROR_ACCESS_DENIED
978+ # Probably something alive we're not allowed to kill
979+ return False
980+ elif e[0] == 87: # ERROR_INVALID_PARAMETER
981+ return True
982+ raise
983+ handle.close()
984+ return False
985+ is_local_pid_dead = _pywin32_is_local_pid_dead
986+elif has_ctypes and sys.platform == 'win32':
987+ from ctypes.wintypes import BOOL, DWORD, HANDLE
988+ _kernel32 = ctypes.windll.kernel32
989+ _CloseHandle = ctypes.WINFUNCTYPE(BOOL, HANDLE)(
990+ ("CloseHandle", _kernel32))
991+ _OpenProcess = ctypes.WINFUNCTYPE(HANDLE, DWORD, BOOL, DWORD)(
992+ ("OpenProcess", _kernel32))
993+ def _ctypes_is_local_pid_dead(pid):
994+ """True if pid doesn't correspond to live process on this machine"""
995+ handle = _OpenProcess(1, False, pid) # PROCESS_TERMINATE
996+ if not handle:
997+ errorcode = ctypes.GetLastError()
998+ if errorcode == 5: # ERROR_ACCESS_DENIED
999+ # Probably something alive we're not allowed to kill
1000+ return False
1001+ elif errorcode == 87: # ERROR_INVALID_PARAMETER
1002+ return True
1003+ raise ctypes.WinError(errorcode)
1004+ _CloseHandle(handle)
1005+ return False
1006+ is_local_pid_dead = _ctypes_is_local_pid_dead
1007
1008=== modified file 'doc/en/release-notes/bzr-2.4.txt'
1009--- doc/en/release-notes/bzr-2.4.txt 2011-06-13 17:18:13 +0000
1010+++ doc/en/release-notes/bzr-2.4.txt 2011-06-13 21:24:39 +0000
1011@@ -43,6 +43,13 @@
1012 .. Fixes for situations where bzr would previously crash or give incorrect
1013 or undesirable results.
1014
1015+* Bazaar can now detect when a lock file is held by a dead process
1016+ originating from the same machine, and steal the lock after printing a
1017+ message to the user. This is off by default, for safety, but can be
1018+ turned on by setting the configuration variable ``locks.steal_dead`` to
1019+ ``True``.
1020+ (Martin Pool, #220464)
1021+
1022 * Credentials in the log output produced by ``-Dhttp`` are masked so users
1023 can more freely post them in bug reports. (Vincent Ladeuil, #723074)
1024
1025@@ -89,6 +96,10 @@
1026 .. Changes that may require updates in plugins or other code that uses
1027 bzrlib.
1028
1029+* Information about held lockdir locks returned from eg `LockDir.peek` is
1030+ now represented as a `LockHeldInfo` object, rather than a plain Python
1031+ dict.
1032+
1033 Internals
1034 *********
1035