Merge lp:~mbp/bzr/220464-stale-locks into lp:bzr

Proposed by Martin Pool
Status: Merged
Approved by: Martin Pool
Approved revision: no longer in the source branch.
Merged at revision: 5970
Proposed branch: lp:~mbp/bzr/220464-stale-locks
Merge into: lp:bzr
Diff against target: 1034 lines (+488/-135)
10 files modified
bzrlib/commit.py (+0/-1)
bzrlib/config.py (+7/-4)
bzrlib/help_topics/en/configuration.txt (+8/-0)
bzrlib/lockdir.py (+246/-97)
bzrlib/osutils.py (+27/-0)
bzrlib/tests/__init__.py (+7/-3)
bzrlib/tests/test_lockdir.py (+139/-27)
bzrlib/ui/__init__.py (+5/-3)
bzrlib/win32utils.py (+38/-0)
doc/en/release-notes/bzr-2.4.txt (+11/-0)
To merge this branch: bzr merge lp:~mbp/bzr/220464-stale-locks
Reviewer Review Type Date Requested Status
John A Meinel Approve
Martin Packman (community) Approve
Andrew Bennetts Approve
Review via email: mp+62582@code.launchpad.net

Commit message

optionally detect and steal dead locks from the same machine and user (bug 220464)

Description of the change

bzr automatically detects stale locks originating from other processes on the same machine.

This is a somewhat old branch, but I think it's actually all finished, aside from not handling detection of live/dead processes on Windows. There was a list thread about that but it looks a bit nontrivial, so I'd like to handle that separately.

I think the config variable needs to be renamed to match our current convention.

To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

A few more points here:

* In general, this won't apply to hpss smart server locks, because they are supposed to be held by the server process. (Doing so should make it more likely they'll be cleaned up if the client abruptly disconnects.) In that case you see something like this:

mbp@grace% ./bzr push
Using saved push location: lp:~mbp/bzr/220464-stale-locks
Unable to obtain lock held by <email address hidden>
at crowberry [process #27519], acquired 0 seconds ago.
See "bzr help break-lock" for more.
bzr: ERROR: Could not acquire lock "(remote lock)": bzr+ssh://bazaar.launchpad.net/~mbp/bzr/220464-stale-locks/

* We could do an interactive "do you want to break this" ui but that's out of scope.

* In related bug 257217 I'm quite inclined to say a crash-only design will be safer and simpler: if something tries to kill bzr just let it happen and we'll break the lock next time.

* I should test this interactively too....

Revision history for this message
Martin Pool (mbp) wrote :

This is a little tricky to hit interactively from the command line, but you can get it pretty well from the Python shell:

>>> b.lock_write()
Unable to obtain lock file:///home/mbp/bzr/work/ held by Martin Pool <email address hidden>
at grace [process #7658], acquired 42 seconds ago.
Will continue to try until 10:35:52, unless you press Ctrl-C.
See "bzr help break-lock" for more.
^CTraceback (most recent call last):
....
KeyboardInterrupt

now release lock from another window

>>> b.lock_write()
Stole lock file:///home/mbp/bzr/work/.bzr/branches/220464-stale-locks/.bzr/branch/lock from dead process LockHeldInfo({'nonce': u'gxkq1gpt3t36aketelkt', 'start_time': u'1306456480', 'hostname': u'grace', 'pid': u'7658', 'user': u'Martin Pool <email address hidden>'}).
BranchWriteLockResult(0uqc649u5zeziqwzni06, <bound method BzrBranch7.unlock of BzrBranch7(file:///home/mbp/bzr/work/.bzr/branches/220464-stale-locks/)>)

One thing that demonstrates is that it should probably print the dead process info in a somewhat more baked form...

Revision history for this message
Andrew Bennetts (spiv) wrote :

I like this a lot. You seem to have taken a lot of care with the various possible cases, and you've done some nice refactoring and tidying too.

My only real question is if it's appropriate for is_lock_holder_known_dead to raise if os.kill raises something other than ESRCH or EPERM? On one hand other errnos shouldn't happen, at least on Linux, but even if they do aborting the bzr command entirely is probably a bit unfriendly to the user. Emitting a warning and then returning False might be better?

Apart from that I just have unimportant pedantry:

In test_auto_break_stale_lock:

646 + self.assertRaises(errors.LockBroken,
647 + l1.unlock)

That would fit comfortably on just one line :)

Ditto in test_auto_break_stale_lock_configured_off.

713 +* Information about held lockdir locks returned from eg `LockDir.peek` is
714 + now represented as a `LockHeldInfo` object, rather than a plain Python
715 + dict.

I think to be pendantically correct ReST you want double-backticks for an inline literal. Single backticks means interpreted text with the default role, which I think is unset in our sphinx conf. Hmm, I suppose it would be reasonable to set it to something that automatically links to our API docs, I think pydoctor provides an “api” role for this…

review: Approve
Revision history for this message
Andrew Bennetts (spiv) wrote :

I agree btw that the output isn't particularly pretty. Something formatted more like the usual LockContention output would be nicer. But I'd much rather have this with ugly output than not have it at all.

Revision history for this message
Martin Pool (mbp) wrote :

> I like this a lot. You seem to have taken a lot of care with the various
> possible cases, and you've done some nice refactoring and tidying too.

Thanks!

> My only real question is if it's appropriate for is_lock_holder_known_dead to
> raise if os.kill raises something other than ESRCH or EPERM? On one hand
> other errnos shouldn't happen, at least on Linux, but even if they do aborting
> the bzr command entirely is probably a bit unfriendly to the user. Emitting a
> warning and then returning False might be better?

Historically we have not tended to do that; I think Rob argued that these warnings are not normally going to be checked by tests and so could either cause false passes, or could have latent problems where the warning is emitted all the time.

However in this case I think users would almost certainly not thank us for having bzr stop there unnecessarily. If we do want to test it, there are ways.

> I think to be pendantically correct ReST you want double-backticks for an
> inline literal. Single backticks means interpreted text with the default
> role, which I think is unset in our sphinx conf. Hmm, I suppose it would be
> reasonable to set it to something that automatically links to our API docs, I
> think pydoctor provides an “api” role for this…

I think we should do that. Using double backticks for shell or computer output, and single for API references seems like the best way to spend our backtick budget. (Unless we want to get all docbooky and specifically mark roles on computer input and output, but that seems too longwinded.)

Revision history for this message
Martin Packman (gz) wrote :

See <lp:~gz/bzr/220464-stale-locks> for some changes built on this branch. It includes fixes for some trivial things, and an implementation (well, two really, the pywin32 ones needs testing) of process deadness detection for windows.

The current os.kill detection gets broken pretty thoroughly by Python 2.7 which includes a function of that name on windows that doesn't behave anything like the posix one. Currently my branch doesn't have any fallback (`lambda pid: False` would be reasonable) as posix+windows should cover everything.

review: Needs Fixing
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/27/2011 02:06 AM, Martin Pool wrote:
> Martin Pool has proposed merging lp:~mbp/bzr/220464-stale-locks into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> Bug #220464 in Bazaar: "Bazaar doesn't detect its own stale locks"
> https://bugs.launchpad.net/bzr/+bug/220464
>
> For more details, see:
> https://code.launchpad.net/~mbp/bzr/220464-stale-locks/+merge/62582
>
> bzr automatically detects stale locks originating from other processes on the same machine.
>
> This is a somewhat old branch, but I think it's actually all finished, aside from not handling detection of live/dead processes on Windows. There was a list thread about that but it looks a bit nontrivial, so I'd like to handle that separately.
>
> I think the config variable needs to be renamed to match our current convention.

Unfortunately on newer Python's os.kill *is* available on windows, but
doesn't work like you want it to. So we probably need a "sys.platform ==
'windows'" check as well.
+ if getattr(os, 'kill', None) is None:
+ # Probably not available on Windows.
+ # XXX: How should we check for process liveness there?
+ return False

 review: needsfixing

Otherwise it looks good to me.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3fW3wACgkQJdeBCYSNAAP/IwCeJ2ATSFX6BWF7zWoTxA9ztlV7
jMAAnRiKL5tYKqJsOtt7h9o7duA/zqjB
=fBtG
-----END PGP SIGNATURE-----

review: Needs Fixing
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/27/2011 04:29 AM, Martin [gz] wrote:
> Review: Needs Fixing
> See <lp:~gz/bzr/220464-stale-locks> for some changes built on this branch. It includes fixes for some trivial things, and an implementation (well, two really, the pywin32 ones needs testing) of process deadness detection for windows.
>
> The current os.kill detection gets broken pretty thoroughly by Python 2.7 which includes a function of that name on windows that doesn't behave anything like the posix one. Currently my branch doesn't have any fallback (`lambda pid: False` would be reasonable) as posix+windows should cover everything.
>
>

I thought that was also in python2.6. Regardless, it needs to be done
differently.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3fW+UACgkQJdeBCYSNAAOMswCeNmvO/1LuTFKZjMGGZON+1YBb
dQcAoLq19easU8FMCfcEDJxqcLCZHzAi
=kwJ1
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote :

@jam: gz kindly provided what looks like a working Windows version of a factored-out is_local_pid_dead, so I think we're now ok there. Please let me know if you spot anything else.

Revision history for this message
Martin Pool (mbp) wrote :

I'd like some rereview on this, at least just so John can look at gz's windows code.

Revision history for this message
Robert Collins (lifeless) wrote :

A small question - without reading the diff (sorry, but it seemed a
little large).

Will this result in Launchpad smart servers thinking a lock is held by
a dead, when there are two smart server *machines* with one backend
store mounted over (NFS|OCFS) and the active process is on a different
machine?

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

On 3 June 2011 18:56, Robert Collins <email address hidden> wrote:
> A small question - without reading the diff (sorry, but it seemed a
> little large).
>
> Will this result in Launchpad smart servers thinking a lock is held by
> a dead, when there are two smart server *machines* with one backend
> store mounted over (NFS|OCFS) and the active process is on a different
> machine?

Only if they have the same hostname, which they shouldn't.

More of a worry is if some manages to have a client machine with the
same hostname as the ssh server, and the names are stored
not-fully-qualified, and therefore they get confused about identity.

Perhaps we need something stronger than that. We could put a random
per-user key into their home directory and into the lock file, though
that would be a larger change.

Revision history for this message
Robert Collins (lifeless) wrote :

The impact of a mistake could be pretty bad, so perhaps we should look
at something stronger before doing this?

Revision history for this message
Martin Packman (gz) wrote :

As windows and posix should cover everything, I think this is fine now.

> I'd like some rereview on this, at least just so John can look at gz's windows
> code.

I'd like that too, was coded a little too late at night.

A few notes it:

* I did a ctypes and pywin32 version out of habit. Now we're 2.6 only, there's not much point in having pywin32 code around (apart from where it's easier to spell).
* As Martin changed the posix version to never raise an exception, should the windows versions change too? Really, I think I'd prefer the caller to catch and log rather than the function, but pywin32 makes that annoying by using a different exception type.
* OpenProcess requests the PROCESS_TERMINATE process access right. It might make more sense to ask for PROCESS_QUERY_INFORMATION or something else.
* Could add code later to get the process executable name too with GetProcessImageFileName, as we have the handle open anyway. Being able to say there's a running bzr.exe holding the lock rather than some other process that's inherited the pid may be useful.

review: Approve
Revision history for this message
Vincent Ladeuil (vila) wrote :

sent to pqm by email

Revision history for this message
Martin Pool (mbp) wrote :

On 3 June 2011 19:34, Robert Collins <email address hidden> wrote:
> The impact of a mistake could be pretty bad, so perhaps we should look
> at something stronger before doing this?

I wrote some background about other types of locking improvements we
can do, but we probably all know that. This patch is just about
trying to detect locks belong to processes that are sure to be dead.
As I said above, the main risk is that we'll see a lock actually from
another machine and assume it's actually local. It is a little hard
because there is on unix no obvious guaranteed-unique machine
identifier.

Things we could do to guard against that:

0- don't use this approach at all
1- put in the host default ip (also not guaranteed unique; likely to
often be seen as 127.0.0.1; also arguably a privacy problem to send
this to lp)
2- disable this on the smart server; punt back to the client to decide
if the process is live or not (maybe a good idea anyhow)
3- make up a unique per-user nonce; store it in .bazaar; put it into
lock files (not a perfect solution for home on NFS for instance)
4- don't automatically break it; just ask the user if they want to do
so, or point out that it probably is dead
5- grab some other random data that might both be stable across
multiple processes on a single machine, and across reboots, and also
unique to machines (this does exist on windows, and might be available
in Python)

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/6/2011 12:02 PM, Vincent Ladeuil wrote:
> The proposal to merge lp:~mbp/bzr/220464-stale-locks into lp:bzr has been updated.
>
> Status: Approved => Needs review
>
> For more details, see:
> https://code.launchpad.net/~mbp/bzr/220464-stale-locks/+merge/62582

Vila, you changed this to needs-review after approving it and sending it
to pqm. Did it bounce?

I'll try to take a look at it tomorrow, but my WinAPI isn't all that strong.

I do think gz's comment about access to kill vs access to get info is
relevant. Specifically, if User A does the push, User B cannot kill
their processes. Would that lead to us thinking the process is already
dead, and unlocking things underneath them?

John
=:->
 review: needsinfo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3s5IYACgkQJdeBCYSNAAObsACg0rZOrcMV24l6VkfbXQf6TtBw
rmcAn0OtBiyVbXAvzGp1Ldburw9E0CP5
=PEAW
-----END PGP SIGNATURE-----

review: Needs Information
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/3/2011 11:34 AM, Robert Collins wrote:
> The impact of a mistake could be pretty bad, so perhaps we should look
> at something stronger before doing this?
>

I think the potential for trouble is moderate here, mostly because of
the lifetimes of locks. If two people race on the *repository* lock, you
could get strange corruption. (Both feel they uploaded their content,
but only one of them ends up with their file mentioned in pack-names.)

That, however is meant to be a very fast lock, and shouldn't be opened
by VFS anymore. (I'm sure it could still happen, but we probably would
be more productive fixing those cases rather than spending too much time
on the locking code.)

If a Branch/WT lock gets out of sync, it should be a lot easier to
recover. The locks there seem a bit more advisory. (I'm pushing to this
branch, so don't try to push to it until I'm done.) Because the actual
mutations should be pretty close to atomic. (overwrite
.bzr/branch/last_revision, etc.)

gz saying that we may not be able to detect other people's processes as
being alive is worrying.

Having the shared host be named "ubuntu" and your local host be named
"ubuntu" could be a problem. Would it be reasonable as a first step to
only support local host transports? (So we auto-unlock
file://path/to/branch, but not bzr+ssh://remote-host/path/to/branch.)

I specifically mention "ubuntu" because I believe all ubuntu installs
default to naming the machine "ubuntu" and someone has to actively
change the machine name. (May not be as true anymore.) I think some of
this stems from using a live-install where the machine needs a name
before actually installing.

What about a compromise, and auto-unlocking local objects only, until we
get something like the unique random id in .bazaar, etc. (Note that you
could still get bitten if someone copies their .bazaar folder...)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3s5hsACgkQJdeBCYSNAAM7vACfSMvettIMp/bIGOpf/Ki7zm+d
yeMAoMEnK2O/8IpiABRLcS67TxlJD+eD
=Rcq8
-----END PGP SIGNATURE-----

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> John Arbash Meinel <email address hidden> writes:

<snip/>

    > Vila, you changed this to needs-review after approving it and sending it
    > to pqm. Did it bounce?

Yes (mail forwarded) and poolie said on IRC he wanted more feedback on
it before landing anyway.

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, Jun 6, 2011 at 9:52 PM, Martin Pool <email address hidden> wrote:
> On 3 June 2011 19:34, Robert Collins <email address hidden> wrote:
>> The impact of a mistake could be pretty bad, so perhaps we should look
>> at something stronger before doing this?
>
> I wrote some background about other types of locking improvements we
> can do, but we probably all know that.  This patch is just about
> trying to detect locks belong to processes that are sure to be dead.
> As I said above, the main risk is that we'll see a lock actually from
> another machine and assume it's actually local.  It is a little hard
> because there is on unix no obvious guaranteed-unique machine
> identifier.
>
> Things we could do to guard against that:
>
> 0- don't use this approach at all
> 1- put in the host default ip (also not guaranteed unique; likely to
> often be seen as 127.0.0.1; also arguably a privacy problem to send
> this to lp)
> 2- disable this on the smart server; punt back to the client to decide
> if the process is live or not (maybe a good idea anyhow)
> 3- make up a unique per-user nonce; store it in .bazaar; put it into
> lock files (not a perfect solution for home on NFS for instance)
> 4- don't automatically break it; just ask the user if they want to do
> so, or point out that it probably is dead
> 5- grab some other random data that might both be stable across
> multiple processes on a single machine, and across reboots, and also
> unique to machines (this does exist on windows, and might be available
> in Python)

Right. So I'm worried about one of these cases incorrectly happening
to Launchpad:
 - Launchpad smart server 1 thinking smart server 2 is dead
 - Launchpad smart server 1 thinking sftp client 2 is dead
 - sftp client 1 thinking Launchpad smart server 2 is dead
 - sftp client 1 thinking sftp client 2 is dead

However, These are sorted in what I think is most likely frequency of
use - that is we use the smart server a lot.

Having this go wrong on a repository would be pretty bad. Having it go
wrong on a branch (which has a greater window of opportunity per
John's mail) would probably be less destructive.

For the first case, where Launchpad is running both smart servers, we
can configure a smart server nonce so the servers can detect each
other, if thats how you'd like to prevent misdetected dead processes.

Or we can use a network aware lock inside the data centre (e.g. in
memcache/a db/ ...).

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

On 7 June 2011 06:03, Robert Collins <email address hidden> wrote:
> Right. So I'm worried about one of these cases incorrectly happening
> to Launchpad:
>  - Launchpad smart server 1 thinking smart server 2 is dead
>  - Launchpad smart server 1 thinking sftp client 2 is dead
>  - sftp client 1 thinking Launchpad smart server 2 is dead
>  - sftp client 1 thinking sftp client 2 is dead
>
> However, These are sorted in what I think is most likely frequency of
> use - that is we use the smart server a lot.
>
> Having this go wrong on a repository would be pretty bad. Having it go
> wrong on a branch (which has a greater window of opportunity per
> John's mail) would probably be less destructive.
>
> For the first case, where Launchpad is running both smart servers, we
> can configure a smart server nonce so the servers can detect each
> other, if thats how you'd like to prevent misdetected dead processes.
>
> Or we can use a network aware lock inside the data centre (e.g. in
> memcache/a db/ ...).

When the smart server takes a lock, it puts its own (ie the server
side) hostname and pid into the lock file. (I checked in the source
and by taking a lock on lp.) So the first case should never happen
unless Canonical IS starts giving multiple servers the same name, or
there is some other kind of bizarre split-brain case.

hostname: crowberry
nonce: x4bgo6vp5iixo3o8sl4o
pid: 16838
start_time: 1307421845
user: <email address hidden>

I don't think using a network aware lock is especially useful; backing
the whole repository storage on something other than a disk might be
but I don't see much reason to move just the locks.

The second two cases can potentially happen if someone configures
their machine to have the same hostname as one of the Launchpad
servers, and they access the repo over sftp (or nosmart ssh). This is
probably unlikely considering how obscure Canonical's host names are
but it's possible.

I'd like to use the fqdn but I can't see an easy reliable way to get
it in Python. (It really wants to say localhost.localdomain and
that's just making things worse.) We could put this in configuration
but that will take some work to roll out to lp.

The last case should only happen when someone has configured two
machines with the same hostname and they're both accessing the same
branch at the same time, which is kind of asking for trouble.

We need to bear in mind that at the moment people are encouraged to
break locks by hand and they may well make worse decisions than bzr
can.

One easy step would be to check the user name too; I'll try that now.

Revision history for this message
Andrew Bennetts (spiv) wrote :

Martin Pool wrote:
[...]
> We need to bear in mind that at the moment people are encouraged to
> break locks by hand and they may well make worse decisions than bzr
> can.

Especially in the presence of a smart server, where it's not clear to
users whether the server or the client “owns” the lock. Confusingly,
the client does (via the nonce handed back from the server) but it's the
server's details that appear in the lock info. I've certainly seen
users confused about this on IRC.

I suppose ideally the lock info generated by the server would indicate
“on behalf of bzr smart client from 1.2.3.4:9876” or something like
that. I think LP's server already tweaks BZR_EMAIL to at least use the
LP username rather than codehosting system user.

Revision history for this message
Martin Pool (mbp) wrote :

It does seem to use the user's email address which I guess comes from
the lp account.

Also, I think lock breaking at the moment is done over vfs and it
would be good to abstract that.

Martin

Revision history for this message
Martin Packman (gz) wrote :

> I do think gz's comment about access to kill vs access to get info is
> relevant. Specifically, if User A does the push, User B cannot kill
> their processes. Would that lead to us thinking the process is already
> dead, and unlocking things underneath them?

No, it should fail safe. In this circumstance, you get ERROR_ACCESS_DENIED which we take to mean the process may exist, so don't break the lock. That's just slightly worse information than if we can get a process handle.

Revision history for this message
John A Meinel (jameinel) wrote :

I would like to see the constants in the win32 functions pre-defined as constants, so

PROCESS_TERMINATE = 1
ERROR_ACCESS_DENIED = 5

etc.

But that is a pretty trivial cleanup.
if self.get('hostname') == 'localhost':

I wonder if we should handle "get()" returning None? what about "get('hostname').startswith('localhost') In case it returns "localhost.localdomain".

Anyway, overall I think I've balanced enough to be okay with this. I think the checks Martin added is pretty strict, so should be enough.

review: Approve
Revision history for this message
Martin Pool (mbp) wrote :

Based on the irc discussion, we might start this by just warning in these cases.

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Revision history for this message
Martin Pool (mbp) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/commit.py'
--- bzrlib/commit.py 2011-05-17 14:27:30 +0000
+++ bzrlib/commit.py 2011-06-13 21:24:39 +0000
@@ -52,7 +52,6 @@
52from bzrlib import (52from bzrlib import (
53 debug,53 debug,
54 errors,54 errors,
55 revision,
56 trace,55 trace,
57 tree,56 tree,
58 ui,57 ui,
5958
=== modified file 'bzrlib/config.py'
--- bzrlib/config.py 2011-06-01 12:31:04 +0000
+++ bzrlib/config.py 2011-06-13 21:24:39 +0000
@@ -75,7 +75,6 @@
75import re75import re
76from cStringIO import StringIO76from cStringIO import StringIO
7777
78import bzrlib
79from bzrlib import (78from bzrlib import (
80 atomicfile,79 atomicfile,
81 bzrdir,80 bzrdir,
@@ -372,16 +371,18 @@
372 value = self._expand_options_in_string(value)371 value = self._expand_options_in_string(value)
373 return value372 return value
374373
375 def get_user_option_as_bool(self, option_name, expand=None):374 def get_user_option_as_bool(self, option_name, expand=None, default=None):
376 """Get a generic option as a boolean - no special process, no default.375 """Get a generic option as a boolean.
377376
377 :param expand: Allow expanding references to other config values.
378 :param default: Default value if nothing is configured
378 :return None if the option doesn't exist or its value can't be379 :return None if the option doesn't exist or its value can't be
379 interpreted as a boolean. Returns True or False otherwise.380 interpreted as a boolean. Returns True or False otherwise.
380 """381 """
381 s = self.get_user_option(option_name, expand=expand)382 s = self.get_user_option(option_name, expand=expand)
382 if s is None:383 if s is None:
383 # The option doesn't exist384 # The option doesn't exist
384 return None385 return default
385 val = ui.bool_from_string(s)386 val = ui.bool_from_string(s)
386 if val is None:387 if val is None:
387 # The value can't be interpreted as a boolean388 # The value can't be interpreted as a boolean
@@ -2701,6 +2702,8 @@
2701 ' the configuration file'),2702 ' the configuration file'),
2702 ]2703 ]
27032704
2705 _see_also = ['configuration']
2706
2704 @commands.display_command2707 @commands.display_command
2705 def run(self, name=None, all=False, directory=None, scope=None,2708 def run(self, name=None, all=False, directory=None, scope=None,
2706 remove=False):2709 remove=False):
27072710
=== modified file 'bzrlib/help_topics/en/configuration.txt'
--- bzrlib/help_topics/en/configuration.txt 2011-05-24 14:40:29 +0000
+++ bzrlib/help_topics/en/configuration.txt 2011-06-13 21:24:39 +0000
@@ -462,6 +462,14 @@
462These settings are only needed if the SMTP server requires authentication462These settings are only needed if the SMTP server requires authentication
463to send mail.463to send mail.
464464
465locks.steal_dead
466~~~~~~~~~~~~~~~~
467
468If set to true, bzr will automatically break locks held by processes from
469the same machine and user that are no longer alive. Otherwise, it will
470print a message and you can break the lock manually, if you are satisfied
471the object is no longer in use.
472
465mail_client473mail_client
466~~~~~~~~~~~474~~~~~~~~~~~
467475
468476
=== modified file 'bzrlib/lockdir.py'
--- bzrlib/lockdir.py 2011-05-16 13:39:39 +0000
+++ bzrlib/lockdir.py 2011-06-13 21:24:39 +0000
@@ -92,6 +92,10 @@
92>>> # do something here92>>> # do something here
93>>> l.unlock()93>>> l.unlock()
9494
95Some classes of stale locks can be predicted by checking: the host name is the
96same as the local host name; the user name is the same as the local user; the
97process id no longer exists. The check on user name is not strictly necessary
98but helps protect against colliding host names.
95"""99"""
96100
97101
@@ -103,16 +107,19 @@
103# the existing locking code and needs a new format of the containing object.107# the existing locking code and needs a new format of the containing object.
104# -- robertc, mbp 20070628108# -- robertc, mbp 20070628
105109
110import errno
106import os111import os
107import time112import time
108113
109from bzrlib import (114from bzrlib import (
115 config,
110 debug,116 debug,
111 errors,117 errors,
112 lock,118 lock,
113 osutils,119 osutils,
120 ui,
121 urlutils,
114 )122 )
115import bzrlib.config
116from bzrlib.decorators import only_raises123from bzrlib.decorators import only_raises
117from bzrlib.errors import (124from bzrlib.errors import (
118 DirectoryNotEmpty,125 DirectoryNotEmpty,
@@ -130,7 +137,6 @@
130 )137 )
131from bzrlib.trace import mutter, note138from bzrlib.trace import mutter, note
132from bzrlib.osutils import format_delta, rand_chars, get_host_name139from bzrlib.osutils import format_delta, rand_chars, get_host_name
133import bzrlib.ui
134140
135from bzrlib.lazy_import import lazy_import141from bzrlib.lazy_import import lazy_import
136lazy_import(globals(), """142lazy_import(globals(), """
@@ -162,7 +168,8 @@
162168
163 __INFO_NAME = '/info'169 __INFO_NAME = '/info'
164170
165 def __init__(self, transport, path, file_modebits=0644, dir_modebits=0755):171 def __init__(self, transport, path, file_modebits=0644, dir_modebits=0755,
172 extra_holder_info=None):
166 """Create a new LockDir object.173 """Create a new LockDir object.
167174
168 The LockDir is initially unlocked - this just creates the object.175 The LockDir is initially unlocked - this just creates the object.
@@ -171,6 +178,10 @@
171178
172 :param path: Path to the lock within the base directory of the179 :param path: Path to the lock within the base directory of the
173 transport.180 transport.
181
182 :param extra_holder_info: If passed, {str:str} dict of extra or
183 updated information to insert into the info file when the lock is
184 taken.
174 """185 """
175 self.transport = transport186 self.transport = transport
176 self.path = path187 self.path = path
@@ -181,8 +192,9 @@
181 self._held_info_path = self._held_dir + self.__INFO_NAME192 self._held_info_path = self._held_dir + self.__INFO_NAME
182 self._file_modebits = file_modebits193 self._file_modebits = file_modebits
183 self._dir_modebits = dir_modebits194 self._dir_modebits = dir_modebits
184
185 self._report_function = note195 self._report_function = note
196 self.extra_holder_info = extra_holder_info
197 self._warned_about_lock_holder = None
186198
187 def __repr__(self):199 def __repr__(self):
188 return '%s(%s%s)' % (self.__class__.__name__,200 return '%s(%s%s)' % (self.__class__.__name__,
@@ -203,7 +215,6 @@
203 except (TransportError, PathError), e:215 except (TransportError, PathError), e:
204 raise LockFailed(self, e)216 raise LockFailed(self, e)
205217
206
207 def _attempt_lock(self):218 def _attempt_lock(self):
208 """Make the pending directory and attempt to rename into place.219 """Make the pending directory and attempt to rename into place.
209220
@@ -216,8 +227,8 @@
216227
217 :returns: The nonce of the lock, if it was successfully acquired.228 :returns: The nonce of the lock, if it was successfully acquired.
218229
219 :raises LockContention: If the lock is held by someone else. The exception230 :raises LockContention: If the lock is held by someone else. The
220 contains the info of the current holder of the lock.231 exception contains the info of the current holder of the lock.
221 """232 """
222 self._trace("lock_write...")233 self._trace("lock_write...")
223 start_time = time.time()234 start_time = time.time()
@@ -226,17 +237,24 @@
226 except (errors.TransportError, PathError), e:237 except (errors.TransportError, PathError), e:
227 self._trace("... failed to create pending dir, %s", e)238 self._trace("... failed to create pending dir, %s", e)
228 raise LockFailed(self, e)239 raise LockFailed(self, e)
229 try:240 while True:
230 self.transport.rename(tmpname, self._held_dir)241 try:
231 except (errors.TransportError, PathError, DirectoryNotEmpty,242 self.transport.rename(tmpname, self._held_dir)
232 FileExists, ResourceBusy), e:243 break
233 self._trace("... contention, %s", e)244 except (errors.TransportError, PathError, DirectoryNotEmpty,
234 self._remove_pending_dir(tmpname)245 FileExists, ResourceBusy), e:
235 raise LockContention(self)246 self._trace("... contention, %s", e)
236 except Exception, e:247 other_holder = self.peek()
237 self._trace("... lock failed, %s", e)248 self._trace("other holder is %r" % other_holder)
238 self._remove_pending_dir(tmpname)249 try:
239 raise250 self._handle_lock_contention(other_holder)
251 except:
252 self._remove_pending_dir(tmpname)
253 raise
254 except Exception, e:
255 self._trace("... lock failed, %s", e)
256 self._remove_pending_dir(tmpname)
257 raise
240 # We must check we really got the lock, because Launchpad's sftp258 # We must check we really got the lock, because Launchpad's sftp
241 # server at one time had a bug were the rename would successfully259 # server at one time had a bug were the rename would successfully
242 # move the new directory into the existing directory, which was260 # move the new directory into the existing directory, which was
@@ -262,6 +280,33 @@
262 (time.time() - start_time) * 1000)280 (time.time() - start_time) * 1000)
263 return self.nonce281 return self.nonce
264282
283 def _handle_lock_contention(self, other_holder):
284 """A lock we want to take is held by someone else.
285
286 This function can: tell the user about it; possibly detect that it's
287 safe or appropriate to steal the lock, or just raise an exception.
288
289 If this function returns (without raising an exception) the lock will
290 be attempted again.
291
292 :param other_holder: A LockHeldInfo for the current holder; note that
293 it might be None if the lock can be seen to be held but the info
294 can't be read.
295 """
296 if (other_holder is not None):
297 if other_holder.is_lock_holder_known_dead():
298 if self.get_config().get_user_option_as_bool(
299 'locks.steal_dead',
300 default=False):
301 ui.ui_factory.show_user_warning(
302 'locks_steal_dead',
303 lock_url=urlutils.join(self.transport.base, self.path),
304 other_holder_info=unicode(other_holder))
305 self.force_break(other_holder)
306 self._trace("stole lock from dead holder")
307 return
308 raise LockContention(self)
309
265 def _remove_pending_dir(self, tmpname):310 def _remove_pending_dir(self, tmpname):
266 """Remove the pending directory311 """Remove the pending directory
267312
@@ -287,14 +332,14 @@
287 self.create(mode=self._dir_modebits)332 self.create(mode=self._dir_modebits)
288 # After creating the lock directory, try again333 # After creating the lock directory, try again
289 self.transport.mkdir(tmpname)334 self.transport.mkdir(tmpname)
290 self.nonce = rand_chars(20)335 info = LockHeldInfo.for_this_process(self.extra_holder_info)
291 info_bytes = self._prepare_info()336 self.nonce = info.get('nonce')
292 # We use put_file_non_atomic because we just created a new unique337 # We use put_file_non_atomic because we just created a new unique
293 # directory so we don't have to worry about files existing there.338 # directory so we don't have to worry about files existing there.
294 # We'll rename the whole directory into place to get atomic339 # We'll rename the whole directory into place to get atomic
295 # properties340 # properties
296 self.transport.put_bytes_non_atomic(tmpname + self.__INFO_NAME,341 self.transport.put_bytes_non_atomic(tmpname + self.__INFO_NAME,
297 info_bytes)342 info.to_bytes())
298 return tmpname343 return tmpname
299344
300 @only_raises(LockNotHeld, LockBroken)345 @only_raises(LockNotHeld, LockBroken)
@@ -344,9 +389,10 @@
344 def break_lock(self):389 def break_lock(self):
345 """Break a lock not held by this instance of LockDir.390 """Break a lock not held by this instance of LockDir.
346391
347 This is a UI centric function: it uses the bzrlib.ui.ui_factory to392 This is a UI centric function: it uses the ui.ui_factory to
348 prompt for input if a lock is detected and there is any doubt about393 prompt for input if a lock is detected and there is any doubt about
349 it possibly being still active.394 it possibly being still active. force_break is the non-interactive
395 version.
350396
351 :returns: LockResult for the broken lock.397 :returns: LockResult for the broken lock.
352 """398 """
@@ -355,16 +401,16 @@
355 holder_info = self.peek()401 holder_info = self.peek()
356 except LockCorrupt, e:402 except LockCorrupt, e:
357 # The lock info is corrupt.403 # The lock info is corrupt.
358 if bzrlib.ui.ui_factory.get_boolean(u"Break (corrupt %r)" % (self,)):404 if ui.ui_factory.get_boolean(u"Break (corrupt %r)" % (self,)):
359 self.force_break_corrupt(e.file_data)405 self.force_break_corrupt(e.file_data)
360 return406 return
361 if holder_info is not None:407 if holder_info is not None:
362 lock_info = '\n'.join(self._format_lock_info(holder_info))408 if ui.ui_factory.confirm_action(
363 if bzrlib.ui.ui_factory.confirm_action(409 u"Break %(lock_info)s",
364 "Break %(lock_info)s", 'bzrlib.lockdir.break', 410 'bzrlib.lockdir.break',
365 dict(lock_info=lock_info)):411 dict(lock_info=unicode(holder_info))):
366 result = self.force_break(holder_info)412 result = self.force_break(holder_info)
367 bzrlib.ui.ui_factory.show_message(413 ui.ui_factory.show_message(
368 "Broke lock %s" % result.lock_url)414 "Broke lock %s" % result.lock_url)
369415
370 def force_break(self, dead_holder_info):416 def force_break(self, dead_holder_info):
@@ -374,18 +420,19 @@
374 it still thinks it has the lock there will be two concurrent writers.420 it still thinks it has the lock there will be two concurrent writers.
375 In general the user's approval should be sought for lock breaks.421 In general the user's approval should be sought for lock breaks.
376422
377 dead_holder_info must be the result of a previous LockDir.peek() call;
378 this is used to check that it's still held by the same process that
379 the user decided was dead. If this is not the current holder,
380 LockBreakMismatch is raised.
381
382 After the lock is broken it will not be held by any process.423 After the lock is broken it will not be held by any process.
383 It is possible that another process may sneak in and take the424 It is possible that another process may sneak in and take the
384 lock before the breaking process acquires it.425 lock before the breaking process acquires it.
385426
427 :param dead_holder_info:
428 Must be the result of a previous LockDir.peek() call; this is used
429 to check that it's still held by the same process that the user
430 decided was dead. If this is not the current holder,
431 LockBreakMismatch is raised.
432
386 :returns: LockResult for the broken lock.433 :returns: LockResult for the broken lock.
387 """434 """
388 if not isinstance(dead_holder_info, dict):435 if not isinstance(dead_holder_info, LockHeldInfo):
389 raise ValueError("dead_holder_info: %r" % dead_holder_info)436 raise ValueError("dead_holder_info: %r" % dead_holder_info)
390 self._check_not_locked()437 self._check_not_locked()
391 current_info = self.peek()438 current_info = self.peek()
@@ -413,10 +460,10 @@
413460
414 def force_break_corrupt(self, corrupt_info_lines):461 def force_break_corrupt(self, corrupt_info_lines):
415 """Release a lock that has been corrupted.462 """Release a lock that has been corrupted.
416 463
417 This is very similar to force_break, it except it doesn't assume that464 This is very similar to force_break, it except it doesn't assume that
418 self.peek() can work.465 self.peek() can work.
419 466
420 :param corrupt_info_lines: the lines of the corrupted info file, used467 :param corrupt_info_lines: the lines of the corrupted info file, used
421 to check that the lock hasn't changed between reading the (corrupt)468 to check that the lock hasn't changed between reading the (corrupt)
422 info file and calling force_break_corrupt.469 info file and calling force_break_corrupt.
@@ -470,7 +517,8 @@
470517
471 peek() reads the info file of the lock holder, if any.518 peek() reads the info file of the lock holder, if any.
472 """519 """
473 return self._parse_info(self.transport.get_bytes(path))520 return LockHeldInfo.from_info_file_bytes(
521 self.transport.get_bytes(path))
474522
475 def peek(self):523 def peek(self):
476 """Check if the lock is held by anyone.524 """Check if the lock is held by anyone.
@@ -489,35 +537,6 @@
489 def _prepare_info(self):537 def _prepare_info(self):
490 """Write information about a pending lock to a temporary file.538 """Write information about a pending lock to a temporary file.
491 """539 """
492 # XXX: is creating this here inefficient?
493 config = bzrlib.config.GlobalConfig()
494 try:
495 user = config.username()
496 except errors.NoWhoami:
497 user = osutils.getuser_unicode()
498 s = rio.Stanza(hostname=get_host_name(),
499 pid=str(os.getpid()),
500 start_time=str(int(time.time())),
501 nonce=self.nonce,
502 user=user,
503 )
504 return s.to_string()
505
506 def _parse_info(self, info_bytes):
507 lines = osutils.split_lines(info_bytes)
508 try:
509 stanza = rio.read_stanza(lines)
510 except ValueError, e:
511 mutter('Corrupt lock info file: %r', lines)
512 raise LockCorrupt("could not parse lock info file: " + str(e),
513 lines)
514 if stanza is None:
515 # see bug 185013; we fairly often end up with the info file being
516 # empty after an interruption; we could log a message here but
517 # there may not be much we can say
518 return {}
519 else:
520 return stanza.as_dict()
521540
522 def attempt_lock(self):541 def attempt_lock(self):
523 """Take the lock; fail if it's already held.542 """Take the lock; fail if it's already held.
@@ -598,24 +617,16 @@
598 else:617 else:
599 start = 'Lock owner changed for'618 start = 'Lock owner changed for'
600 last_info = new_info619 last_info = new_info
601 formatted_info = self._format_lock_info(new_info)620 msg = u'%s lock %s %s.' % (start, lock_url, new_info)
602 if deadline_str is None:621 if deadline_str is None:
603 deadline_str = time.strftime('%H:%M:%S',622 deadline_str = time.strftime('%H:%M:%S',
604 time.localtime(deadline))623 time.localtime(deadline))
605 user, hostname, pid, time_ago = formatted_info
606 msg = ('%s lock %s ' # lock_url
607 'held by ' # start
608 '%s\n' # user
609 'at %s ' # hostname
610 '[process #%s], ' # pid
611 'acquired %s.') # time ago
612 msg_args = [start, lock_url, user, hostname, pid, time_ago]
613 if timeout > 0:624 if timeout > 0:
614 msg += ('\nWill continue to try until %s, unless '625 msg += ('\nWill continue to try until %s, unless '
615 'you press Ctrl-C.')626 'you press Ctrl-C.'
616 msg_args.append(deadline_str)627 % deadline_str)
617 msg += '\nSee "bzr help break-lock" for more.'628 msg += '\nSee "bzr help break-lock" for more.'
618 self._report_function(msg, *msg_args)629 self._report_function(msg)
619 if (max_attempts is not None) and (attempt_count >= max_attempts):630 if (max_attempts is not None) and (attempt_count >= max_attempts):
620 self._trace("exceeded %d attempts")631 self._trace("exceeded %d attempts")
621 raise LockContention(self)632 raise LockContention(self)
@@ -676,23 +687,6 @@
676 raise LockContention(self)687 raise LockContention(self)
677 self._fake_read_lock = True688 self._fake_read_lock = True
678689
679 def _format_lock_info(self, info):
680 """Turn the contents of peek() into something for the user"""
681 start_time = info.get('start_time')
682 if start_time is None:
683 time_ago = '(unknown)'
684 else:
685 time_ago = format_delta(time.time() - int(info['start_time']))
686 user = info.get('user', '<unknown>')
687 hostname = info.get('hostname', '<unknown>')
688 pid = info.get('pid', '<unknown>')
689 return [
690 user,
691 hostname,
692 pid,
693 time_ago,
694 ]
695
696 def validate_token(self, token):690 def validate_token(self, token):
697 if token is not None:691 if token is not None:
698 info = self.peek()692 info = self.peek()
@@ -710,3 +704,158 @@
710 if 'lock' not in debug.debug_flags:704 if 'lock' not in debug.debug_flags:
711 return705 return
712 mutter(str(self) + ": " + (format % args))706 mutter(str(self) + ": " + (format % args))
707
708 def get_config(self):
709 """Get the configuration that governs this lockdir."""
710 # XXX: This really should also use the locationconfig at least, but
711 # that seems a bit hard to hook up at the moment. -- mbp 20110329
712 return config.GlobalConfig()
713
714
715class LockHeldInfo(object):
716 """The information recorded about a held lock.
717
718 This information is recorded into the lock when it's taken, and it can be
719 read back by any process with access to the lockdir. It can be used, for
720 example, to tell the user who holds the lock, or to try to detect whether
721 the lock holder is still alive.
722
723 Prior to bzr 2.4 a simple dict was used instead of an object.
724 """
725
726 def __init__(self, info_dict):
727 self.info_dict = info_dict
728
729 def __repr__(self):
730 """Return a debugging representation of this object."""
731 return "%s(%r)" % (self.__class__.__name__, self.info_dict)
732
733 def __unicode__(self):
734 """Return a user-oriented description of this object."""
735 d = self.to_readable_dict()
736 return (
737 u'held by %(user)s on %(hostname)s (process #%(pid)s), '
738 u'acquired %(time_ago)s' % d)
739
740 def to_readable_dict(self):
741 """Turn the holder info into a dict of human-readable attributes.
742
743 For example, the start time is presented relative to the current time,
744 rather than as seconds since the epoch.
745
746 Returns a list of [user, hostname, pid, time_ago] all as readable
747 strings.
748 """
749 start_time = self.info_dict.get('start_time')
750 if start_time is None:
751 time_ago = '(unknown)'
752 else:
753 time_ago = format_delta(
754 time.time() - int(self.info_dict['start_time']))
755 user = self.info_dict.get('user', '<unknown>')
756 hostname = self.info_dict.get('hostname', '<unknown>')
757 pid = self.info_dict.get('pid', '<unknown>')
758 return dict(
759 user=user,
760 hostname=hostname,
761 pid=pid,
762 time_ago=time_ago)
763
764 def get(self, field_name):
765 """Return the contents of a field from the lock info, or None."""
766 return self.info_dict.get(field_name)
767
768 @classmethod
769 def for_this_process(cls, extra_holder_info):
770 """Return a new LockHeldInfo for a lock taken by this process.
771 """
772 info = dict(
773 hostname=get_host_name(),
774 pid=str(os.getpid()),
775 nonce=rand_chars(20),
776 start_time=str(int(time.time())),
777 user=get_username_for_lock_info(),
778 )
779 if extra_holder_info is not None:
780 info.update(extra_holder_info)
781 return cls(info)
782
783 def to_bytes(self):
784 s = rio.Stanza(**self.info_dict)
785 return s.to_string()
786
787 @classmethod
788 def from_info_file_bytes(cls, info_file_bytes):
789 """Construct from the contents of the held file."""
790 lines = osutils.split_lines(info_file_bytes)
791 try:
792 stanza = rio.read_stanza(lines)
793 except ValueError, e:
794 mutter('Corrupt lock info file: %r', lines)
795 raise LockCorrupt("could not parse lock info file: " + str(e),
796 lines)
797 if stanza is None:
798 # see bug 185013; we fairly often end up with the info file being
799 # empty after an interruption; we could log a message here but
800 # there may not be much we can say
801 return cls({})
802 else:
803 return cls(stanza.as_dict())
804
805 def __cmp__(self, other):
806 """Value comparison of lock holders."""
807 return (
808 cmp(type(self), type(other))
809 or cmp(self.info_dict, other.info_dict))
810
811 def is_locked_by_this_process(self):
812 """True if this process seems to be the current lock holder."""
813 return (
814 self.get('hostname') == get_host_name()
815 and self.get('pid') == str(os.getpid())
816 and self.get('user') == get_username_for_lock_info())
817
818 def is_lock_holder_known_dead(self):
819 """True if the lock holder process is known to be dead.
820
821 False if it's either known to be still alive, or if we just can't tell.
822
823 We can be fairly sure the lock holder is dead if it declared the same
824 hostname and there is no process with the given pid alive. If people
825 have multiple machines with the same hostname this may cause trouble.
826
827 This doesn't check whether the lock holder is in fact the same process
828 calling this method. (In that case it will return true.)
829 """
830 if self.get('hostname') != get_host_name():
831 return False
832 if self.get('hostname') == 'localhost':
833 # Too ambiguous.
834 return False
835 if self.get('user') != get_username_for_lock_info():
836 # Could well be another local process by a different user, but
837 # just to be safe we won't conclude about this either.
838 return False
839 pid_str = self.info_dict.get('pid', None)
840 if not pid_str:
841 mutter("no pid recorded in %r" % (self, ))
842 return False
843 try:
844 pid = int(pid_str)
845 except ValueError:
846 mutter("can't parse pid %r from %r"
847 % (pid_str, self))
848 return False
849 return osutils.is_local_pid_dead(pid)
850
851
852def get_username_for_lock_info():
853 """Get a username suitable for putting into a lock.
854
855 It's ok if what's written here is not a proper email address as long
856 as it gives some clue who the user is.
857 """
858 try:
859 return config.GlobalConfig().username()
860 except errors.NoWhoami:
861 return osutils.getuser_unicode()
713862
=== modified file 'bzrlib/osutils.py'
--- bzrlib/osutils.py 2011-05-26 08:05:45 +0000
+++ bzrlib/osutils.py 2011-06-13 21:24:39 +0000
@@ -2448,3 +2448,30 @@
2448 if os.access(f, os.X_OK):2448 if os.access(f, os.X_OK):
2449 return f2449 return f
2450 return None2450 return None
2451
2452
2453def _posix_is_local_pid_dead(pid):
2454 """True if pid doesn't correspond to live process on this machine"""
2455 try:
2456 # Special meaning of unix kill: just check if it's there.
2457 os.kill(pid, 0)
2458 except OSError, e:
2459 if e.errno == errno.ESRCH:
2460 # On this machine, and really not found: as sure as we can be
2461 # that it's dead.
2462 return True
2463 elif e.errno == errno.EPERM:
2464 # exists, though not ours
2465 return False
2466 else:
2467 mutter("os.kill(%d, 0) failed: %s" % (pid, e))
2468 # Don't really know.
2469 return False
2470 else:
2471 # Exists and our process: not dead.
2472 return False
2473
2474if sys.platform == "win32":
2475 is_local_pid_dead = win32utils.is_local_pid_dead
2476else:
2477 is_local_pid_dead = _posix_is_local_pid_dead
24512478
=== modified file 'bzrlib/tests/__init__.py'
--- bzrlib/tests/__init__.py 2011-06-09 15:13:54 +0000
+++ bzrlib/tests/__init__.py 2011-06-13 21:24:39 +0000
@@ -1070,9 +1070,13 @@
1070 # break some locks on purpose and should be taken into account by1070 # break some locks on purpose and should be taken into account by
1071 # considering that breaking a lock is just a dirty way of releasing it.1071 # considering that breaking a lock is just a dirty way of releasing it.
1072 if len(acquired_locks) != (len(released_locks) + len(broken_locks)):1072 if len(acquired_locks) != (len(released_locks) + len(broken_locks)):
1073 message = ('Different number of acquired and '1073 message = (
1074 'released or broken locks. (%s, %s + %s)' %1074 'Different number of acquired and '
1075 (acquired_locks, released_locks, broken_locks))1075 'released or broken locks.\n'
1076 'acquired=%s\n'
1077 'released=%s\n'
1078 'broken=%s\n' %
1079 (acquired_locks, released_locks, broken_locks))
1076 if not self._lock_check_thorough:1080 if not self._lock_check_thorough:
1077 # Rather than fail, just warn1081 # Rather than fail, just warn
1078 print "Broken test %s: %s" % (self, message)1082 print "Broken test %s: %s" % (self, message)
10791083
=== modified file 'bzrlib/tests/test_lockdir.py'
--- bzrlib/tests/test_lockdir.py 2011-05-13 12:51:05 +0000
+++ bzrlib/tests/test_lockdir.py 2011-06-13 21:24:39 +0000
@@ -17,6 +17,7 @@
17"""Tests for LockDir"""17"""Tests for LockDir"""
1818
19import os19import os
20import sys
20import time21import time
2122
22import bzrlib23import bzrlib
@@ -24,6 +25,7 @@
24 config,25 config,
25 errors,26 errors,
26 lock,27 lock,
28 lockdir,
27 osutils,29 osutils,
28 tests,30 tests,
29 transport,31 transport,
@@ -35,9 +37,13 @@
35 LockFailed,37 LockFailed,
36 LockNotHeld,38 LockNotHeld,
37 )39 )
38from bzrlib.lockdir import LockDir40from bzrlib.lockdir import (
41 LockDir,
42 LockHeldInfo,
43 )
39from bzrlib.tests import (44from bzrlib.tests import (
40 features,45 features,
46 TestCase,
41 TestCaseWithTransport,47 TestCaseWithTransport,
42 )48 )
43from bzrlib.trace import note49from bzrlib.trace import note
@@ -48,6 +54,7 @@
48# implementation are tested separately. (The main requirement is just that54# implementation are tested separately. (The main requirement is just that
49# they don't allow overwriting nonempty directories.)55# they don't allow overwriting nonempty directories.)
5056
57
51class TestLockDir(TestCaseWithTransport):58class TestLockDir(TestCaseWithTransport):
52 """Test LockDir operations"""59 """Test LockDir operations"""
5360
@@ -141,8 +148,8 @@
141 self.addCleanup(lf1.unlock)148 self.addCleanup(lf1.unlock)
142 # lock is held, should get some info on it149 # lock is held, should get some info on it
143 info1 = lf1.peek()150 info1 = lf1.peek()
144 self.assertEqual(set(info1.keys()),151 self.assertEqual(set(info1.info_dict.keys()),
145 set(['user', 'nonce', 'hostname', 'pid', 'start_time']))152 set(['user', 'nonce', 'hostname', 'pid', 'start_time']))
146 # should get the same info if we look at it through a different153 # should get the same info if we look at it through a different
147 # instance154 # instance
148 info2 = LockDir(t, 'test_lock').peek()155 info2 = LockDir(t, 'test_lock').peek()
@@ -161,7 +168,7 @@
161 self.addCleanup(lf1.unlock)168 self.addCleanup(lf1.unlock)
162 info2 = lf2.peek()169 info2 = lf2.peek()
163 self.assertTrue(info2)170 self.assertTrue(info2)
164 self.assertEqual(info2['nonce'], lf1.nonce)171 self.assertEqual(info2.get('nonce'), lf1.nonce)
165172
166 def test_30_lock_wait_fail(self):173 def test_30_lock_wait_fail(self):
167 """Wait on a lock, then fail174 """Wait on a lock, then fail
@@ -184,24 +191,16 @@
184 # it should only take about 0.4 seconds, but we allow more time in191 # it should only take about 0.4 seconds, but we allow more time in
185 # case the machine is heavily loaded192 # case the machine is heavily loaded
186 self.assertTrue(after - before <= 8.0,193 self.assertTrue(after - before <= 8.0,
187 "took %f seconds to detect lock contention" % (after - before))194 "took %f seconds to detect lock contention" % (after - before))
188 finally:195 finally:
189 lf1.unlock()196 lf1.unlock()
190 self.assertEqual(1, len(self._logged_reports))197 self.assertEqual(1, len(self._logged_reports))
191 self.assertEqual(self._logged_reports[0][0],198 self.assertContainsRe(self._logged_reports[0][0],
192 '%s lock %s held by %s\n'199 r'Unable to obtain lock .* held by jrandom@example\.com on .*'
193 'at %s [process #%s], acquired %s.\n'200 r' \(process #\d+\), acquired .* ago\.\n'
194 'Will continue to try until %s, unless '201 r'Will continue to try until \d{2}:\d{2}:\d{2}, unless '
195 'you press Ctrl-C.\n'202 r'you press Ctrl-C.\n'
196 'See "bzr help break-lock" for more.')203 r'See "bzr help break-lock" for more.')
197 start, lock_url, user, hostname, pid, time_ago, deadline_str = \
198 self._logged_reports[0][1]
199 self.assertEqual(start, u'Unable to obtain')
200 self.assertEqual(user, u'jrandom@example.com')
201 # skip hostname
202 self.assertContainsRe(pid, r'\d+')
203 self.assertContainsRe(time_ago, r'.* ago')
204 self.assertContainsRe(deadline_str, r'\d{2}:\d{2}:\d{2}')
205204
206 def test_31_lock_wait_easy(self):205 def test_31_lock_wait_easy(self):
207 """Succeed when waiting on a lock with no contention.206 """Succeed when waiting on a lock with no contention.
@@ -349,12 +348,15 @@
349 ld.create()348 ld.create()
350 ld.lock_write()349 ld.lock_write()
351 ld.transport.put_bytes_non_atomic('test_lock/held/info', '\0')350 ld.transport.put_bytes_non_atomic('test_lock/held/info', '\0')
351
352 class LoggingUIFactory(bzrlib.ui.SilentUIFactory):352 class LoggingUIFactory(bzrlib.ui.SilentUIFactory):
353 def __init__(self):353 def __init__(self):
354 self.prompts = []354 self.prompts = []
355
355 def get_boolean(self, prompt):356 def get_boolean(self, prompt):
356 self.prompts.append(('boolean', prompt))357 self.prompts.append(('boolean', prompt))
357 return True358 return True
359
358 ui = LoggingUIFactory()360 ui = LoggingUIFactory()
359 self.overrideAttr(bzrlib.ui, 'ui_factory', ui)361 self.overrideAttr(bzrlib.ui, 'ui_factory', ui)
360 ld2.break_lock()362 ld2.break_lock()
@@ -372,12 +374,15 @@
372 ld.create()374 ld.create()
373 ld.lock_write()375 ld.lock_write()
374 ld.transport.delete('test_lock/held/info')376 ld.transport.delete('test_lock/held/info')
377
375 class LoggingUIFactory(bzrlib.ui.SilentUIFactory):378 class LoggingUIFactory(bzrlib.ui.SilentUIFactory):
376 def __init__(self):379 def __init__(self):
377 self.prompts = []380 self.prompts = []
381
378 def get_boolean(self, prompt):382 def get_boolean(self, prompt):
379 self.prompts.append(('boolean', prompt))383 self.prompts.append(('boolean', prompt))
380 return True384 return True
385
381 ui = LoggingUIFactory()386 ui = LoggingUIFactory()
382 orig_factory = bzrlib.ui.ui_factory387 orig_factory = bzrlib.ui.ui_factory
383 bzrlib.ui.ui_factory = ui388 bzrlib.ui.ui_factory = ui
@@ -416,18 +421,17 @@
416 lf1.unlock()421 lf1.unlock()
417 self.assertFalse(t.has('test_lock/held/info'))422 self.assertFalse(t.has('test_lock/held/info'))
418423
419 def test__format_lock_info(self):424 def test_display_form(self):
420 ld1 = self.get_lock()425 ld1 = self.get_lock()
421 ld1.create()426 ld1.create()
422 ld1.lock_write()427 ld1.lock_write()
423 try:428 try:
424 info_list = ld1._format_lock_info(ld1.peek())429 info_list = ld1.peek().to_readable_dict()
425 finally:430 finally:
426 ld1.unlock()431 ld1.unlock()
427 self.assertEqual(info_list[0], u'jrandom@example.com')432 self.assertEqual(info_list['user'], u'jrandom@example.com')
428 # info_list[1] is hostname. we skip this.433 self.assertContainsRe(info_list['pid'], '^\d+$')
429 self.assertContainsRe(info_list[2], '^\d+$') # pid434 self.assertContainsRe(info_list['time_ago'], r'^\d+ seconds? ago$')
430 self.assertContainsRe(info_list[3], r'^\d+ seconds? ago$') # time_ago
431435
432 def test_lock_without_email(self):436 def test_lock_without_email(self):
433 global_config = config.GlobalConfig()437 global_config = config.GlobalConfig()
@@ -481,8 +485,10 @@
481 # should be nothing before we start485 # should be nothing before we start
482 ld1.create()486 ld1.create()
483 t = self.get_transport().clone('test_lock')487 t = self.get_transport().clone('test_lock')
488
484 def check_dir(a):489 def check_dir(a):
485 self.assertEquals(a, t.list_dir('.'))490 self.assertEquals(a, t.list_dir('.'))
491
486 check_dir([])492 check_dir([])
487 # when held, that's all we see493 # when held, that's all we see
488 ld1.attempt_lock()494 ld1.attempt_lock()
@@ -505,9 +511,10 @@
505 t.put_bytes('test_lock/held/info', '')511 t.put_bytes('test_lock/held/info', '')
506 lf = LockDir(t, 'test_lock')512 lf = LockDir(t, 'test_lock')
507 info = lf.peek()513 info = lf.peek()
508 formatted_info = lf._format_lock_info(info)514 formatted_info = info.to_readable_dict()
509 self.assertEquals(515 self.assertEquals(
510 ['<unknown>', '<unknown>', '<unknown>', '(unknown)'],516 dict(user='<unknown>', hostname='<unknown>', pid='<unknown>',
517 time_ago='(unknown)'),
511 formatted_info)518 formatted_info)
512519
513 def test_corrupt_lockdir_info(self):520 def test_corrupt_lockdir_info(self):
@@ -646,3 +653,108 @@
646 ld2.force_break(holder_info)653 ld2.force_break(holder_info)
647 lock_path = ld.transport.abspath(ld.path)654 lock_path = ld.transport.abspath(ld.path)
648 self.assertEqual([], self._calls)655 self.assertEqual([], self._calls)
656
657
658class TestLockHeldInfo(TestCase):
659 """Can get information about the lock holder, and detect whether they're
660 still alive."""
661
662 def test_repr(self):
663 info = LockHeldInfo.for_this_process(None)
664 self.assertContainsRe(repr(info), r"LockHeldInfo\(.*\)")
665
666 def test_unicode(self):
667 info = LockHeldInfo.for_this_process(None)
668 self.assertContainsRe(unicode(info),
669 r'held by .* on .* \(process #\d+\), acquired .* ago')
670
671 def test_is_locked_by_this_process(self):
672 info = LockHeldInfo.for_this_process(None)
673 self.assertTrue(info.is_locked_by_this_process())
674
675 def test_is_not_locked_by_this_process(self):
676 info = LockHeldInfo.for_this_process(None)
677 info.info_dict['pid'] = '123123123123123'
678 self.assertFalse(info.is_locked_by_this_process())
679
680 def test_lock_holder_live_process(self):
681 """Detect that the holder (this process) is still running."""
682 info = LockHeldInfo.for_this_process(None)
683 self.assertFalse(info.is_lock_holder_known_dead())
684
685 def test_lock_holder_dead_process(self):
686 """Detect that the holder (this process) is still running."""
687 info = LockHeldInfo.for_this_process(None)
688 info.info_dict['pid'] = '123123123'
689 if sys.platform == 'win32':
690 self.knownFailure(
691 'live lock holder detection not implemented yet on win32')
692 self.assertTrue(info.is_lock_holder_known_dead())
693
694 def test_lock_holder_other_machine(self):
695 """The lock holder isn't here so we don't know if they're alive."""
696 info = LockHeldInfo.for_this_process(None)
697 info.info_dict['hostname'] = 'egg.example.com'
698 info.info_dict['pid'] = '123123123'
699 self.assertFalse(info.is_lock_holder_known_dead())
700
701 def test_lock_holder_other_user(self):
702 """Only auto-break locks held by this user."""
703 info = LockHeldInfo.for_this_process(None)
704 info.info_dict['user'] = 'notme@example.com'
705 info.info_dict['pid'] = '123123123'
706 self.assertFalse(info.is_lock_holder_known_dead())
707
708 def test_no_good_hostname(self):
709 """Correctly handle ambiguous hostnames.
710
711 If the lock's recorded with just 'localhost' we can't really trust
712 it's the same 'localhost'. (There are quite a few of them. :-)
713 So even if the process is known not to be alive, we can't say that's
714 known for sure.
715 """
716 self.overrideAttr(lockdir, 'get_host_name',
717 lambda: 'localhost')
718 info = LockHeldInfo.for_this_process(None)
719 info.info_dict['pid'] = '123123123'
720 self.assertFalse(info.is_lock_holder_known_dead())
721
722
723class TestStaleLockDir(TestCaseWithTransport):
724 """Can automatically break stale locks.
725
726 :see: https://bugs.launchpad.net/bzr/+bug/220464
727 """
728
729 def test_auto_break_stale_lock(self):
730 """Locks safely known to be stale are just cleaned up.
731
732 This generates a warning but no other user interaction.
733 """
734 # This is off by default at present; see the discussion in the bug.
735 # If you change the default, don't forget to update the docs.
736 config.GlobalConfig().set_user_option('locks.steal_dead', True)
737 # Create a lock pretending to come from a different nonexistent
738 # process on the same machine.
739 l1 = LockDir(self.get_transport(), 'a',
740 extra_holder_info={'pid': '12312313'})
741 token_1 = l1.attempt_lock()
742 l2 = LockDir(self.get_transport(), 'a')
743 token_2 = l2.attempt_lock()
744 # l1 will notice its lock was stolen.
745 self.assertRaises(errors.LockBroken,
746 l1.unlock)
747 l2.unlock()
748
749 def test_auto_break_stale_lock_configured_off(self):
750 """Automatic breaking can be turned off"""
751 l1 = LockDir(self.get_transport(), 'a',
752 extra_holder_info={'pid': '12312313'})
753 token_1 = l1.attempt_lock()
754 self.addCleanup(l1.unlock)
755 l2 = LockDir(self.get_transport(), 'a')
756 # This fails now, because dead lock breaking is off by default.
757 self.assertRaises(LockContention,
758 l2.attempt_lock)
759 # and it's in fact not broken
760 l1.confirm()
649761
=== modified file 'bzrlib/ui/__init__.py'
--- bzrlib/ui/__init__.py 2011-05-27 05:16:48 +0000
+++ bzrlib/ui/__init__.py 2011-06-13 21:24:39 +0000
@@ -154,6 +154,8 @@
154 "It is recommended that you upgrade by "154 "It is recommended that you upgrade by "
155 "running the command\n"155 "running the command\n"
156 " bzr upgrade %(basedir)s"),156 " bzr upgrade %(basedir)s"),
157 locks_steal_dead=(
158 u"Stole dead lock %(lock_url)s %(other_holder_info)s."),
157 )159 )
158160
159 def __init__(self):161 def __init__(self):
@@ -304,13 +306,13 @@
304 try:306 try:
305 template = self._user_warning_templates[warning_id]307 template = self._user_warning_templates[warning_id]
306 except KeyError:308 except KeyError:
307 fail = "failed to format warning %r, %r" % (warning_id, message_args)309 fail = "bzr warning: %r, %r" % (warning_id, message_args)
308 warnings.warn(fail) # so tests will fail etc310 warnings.warn("no template for warning: " + fail) # so tests will fail etc
309 return fail311 return fail
310 try:312 try:
311 return template % message_args313 return template % message_args
312 except ValueError, e:314 except ValueError, e:
313 fail = "failed to format warning %r, %r: %s" % (315 fail = "bzr unprintable warning: %r, %r, %s" % (
314 warning_id, message_args, e)316 warning_id, message_args, e)
315 warnings.warn(fail) # so tests will fail etc317 warnings.warn(fail) # so tests will fail etc
316 return fail318 return fail
317319
=== modified file 'bzrlib/win32utils.py'
--- bzrlib/win32utils.py 2011-05-18 16:42:48 +0000
+++ bzrlib/win32utils.py 2011-06-13 21:24:39 +0000
@@ -576,3 +576,41 @@
576 return argv576 return argv
577else:577else:
578 get_unicode_argv = None578 get_unicode_argv = None
579
580
581if has_win32api:
582 def _pywin32_is_local_pid_dead(pid):
583 """True if pid doesn't correspond to live process on this machine"""
584 try:
585 handle = win32api.OpenProcess(1, False, pid) # PROCESS_TERMINATE
586 except pywintypes.error, e:
587 if e[0] == 5: # ERROR_ACCESS_DENIED
588 # Probably something alive we're not allowed to kill
589 return False
590 elif e[0] == 87: # ERROR_INVALID_PARAMETER
591 return True
592 raise
593 handle.close()
594 return False
595 is_local_pid_dead = _pywin32_is_local_pid_dead
596elif has_ctypes and sys.platform == 'win32':
597 from ctypes.wintypes import BOOL, DWORD, HANDLE
598 _kernel32 = ctypes.windll.kernel32
599 _CloseHandle = ctypes.WINFUNCTYPE(BOOL, HANDLE)(
600 ("CloseHandle", _kernel32))
601 _OpenProcess = ctypes.WINFUNCTYPE(HANDLE, DWORD, BOOL, DWORD)(
602 ("OpenProcess", _kernel32))
603 def _ctypes_is_local_pid_dead(pid):
604 """True if pid doesn't correspond to live process on this machine"""
605 handle = _OpenProcess(1, False, pid) # PROCESS_TERMINATE
606 if not handle:
607 errorcode = ctypes.GetLastError()
608 if errorcode == 5: # ERROR_ACCESS_DENIED
609 # Probably something alive we're not allowed to kill
610 return False
611 elif errorcode == 87: # ERROR_INVALID_PARAMETER
612 return True
613 raise ctypes.WinError(errorcode)
614 _CloseHandle(handle)
615 return False
616 is_local_pid_dead = _ctypes_is_local_pid_dead
579617
=== modified file 'doc/en/release-notes/bzr-2.4.txt'
--- doc/en/release-notes/bzr-2.4.txt 2011-06-13 17:18:13 +0000
+++ doc/en/release-notes/bzr-2.4.txt 2011-06-13 21:24:39 +0000
@@ -43,6 +43,13 @@
43.. Fixes for situations where bzr would previously crash or give incorrect43.. Fixes for situations where bzr would previously crash or give incorrect
44 or undesirable results.44 or undesirable results.
4545
46* Bazaar can now detect when a lock file is held by a dead process
47 originating from the same machine, and steal the lock after printing a
48 message to the user. This is off by default, for safety, but can be
49 turned on by setting the configuration variable ``locks.steal_dead`` to
50 ``True``.
51 (Martin Pool, #220464)
52
46* Credentials in the log output produced by ``-Dhttp`` are masked so users53* Credentials in the log output produced by ``-Dhttp`` are masked so users
47 can more freely post them in bug reports. (Vincent Ladeuil, #723074)54 can more freely post them in bug reports. (Vincent Ladeuil, #723074)
4855
@@ -89,6 +96,10 @@
89.. Changes that may require updates in plugins or other code that uses96.. Changes that may require updates in plugins or other code that uses
90 bzrlib.97 bzrlib.
9198
99* Information about held lockdir locks returned from eg `LockDir.peek` is
100 now represented as a `LockHeldInfo` object, rather than a plain Python
101 dict.
102
92Internals103Internals
93*********104*********
94105