Bazaar

timeout for LockContention retry is always 0 for smart server

Bug #592148 reported by Parth Malwankar on 2010-06-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Bazaar	Won't Fix	Undecided	Unassigned

Bug Description

The timeout for LockContention retry is always 0 for smart server. Due to this, the retry does not happen.
I put some checks in bzrlib.lockdir.wait_lock.
if timeout is None:
timeout = _DEFAULT_TIMEOUT_SECONDS
While lockdir.py initialized _DEFAULT_TIMEOUT_SECONDS to 300, the actual value at runtime is 0.

I am using trunk (2.2b3) on Lucid.

Output for trunk
=============
[foo]% date
Thu Jun 10 17:15:28 IST 2010
[foo]% ~/src/bzr.dev/trunk/bzr push bzr+ssh://parthm@localhost/~/tmp/lock-path/trunk
parthm@localhost's password:
Unable to obtain lock filtered-161494092:///~/tmp/lock-path/trunk/.bzr/branch/lock
held by Parth Malwankar <email address hidden> on host parthm-laptop [process #9606]
locked 26 hours, 27 minutes ago
Will continue to try until 17:15:33, unless you press Ctrl-C.
See "bzr help break-lock" for more.
bzr: ERROR: Could not acquire lock "(remote lock)":
[foo]% date
Thu Jun 10 17:15:34 IST 2010
[foo]% ssh parthm@localhost "which bzr"
parthm@localhost's password:
/home/parthm/src/bzr.dev/trunk/bzr
[foo]%

Output with debug info
==================
[foo]% which bzr
/home/parthm/src/bzr.dev/250451-better-url-for-break-lock/bzr
[foo]% ssh parthm@localhost "which bzr"
parthm@localhost's password:
/home/parthm/src/bzr.dev/250451-better-url-for-break-lock/bzr
[foo]% date
Thu Jun 10 17:23:40 IST 2010
[foo]% bzr push bzr+ssh://parthm@localhost/~/tmp/lock-path/trunk
parthm@localhost's password:
Unable to obtain lock held by Parth Malwankar <email address hidden> at parthm-laptop [process #9606], acquired 26 hours, 35 minutes ago.
Will continue to try until 17:23:51#current time: 17:23:51#timeout: 0, unless you press Ctrl-C.
See "bzr help break-lock" for more.
bzr: ERROR: Could not acquire lock "(remote lock)": bzr+ssh://parthm@localhost/~/tmp/lock-path/trunk
[foo]% date
Thu Jun 10 17:23:54 IST 2010
[foo]%

Note that in the "Will continue.." line, "try until" is the same as "current time" as timeout is 0.

Revision history for this message

John A Meinel (jameinel) wrote on 2010-06-10: Re: [Bug 592148] [NEW] timeout for LockContention retry is always 0 for smart server

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Parth Malwankar wrote:
> Public bug reported:
>
> The timeout for LockContention retry is always 0 for smart server. Due to this, the retry does not happen.
> I put some checks in bzrlib.lockdir.wait_lock.
> if timeout is None:
> timeout = _DEFAULT_TIMEOUT_SECONDS
> While lockdir.py initialized _DEFAULT_TIMEOUT_SECONDS to 300, the actual value at runtime is 0.
>
> I am using trunk (2.2b3) on Lucid.
>

This is actually by design. The problem is that it was hanging on the
server side, without having the user know what they could do.

If you look closely, the default is overridden inside 'cmd_serve'.

status: wontfix

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwQ9VoACgkQJdeBCYSNAAP5UQCg1Lz7/BR8p01UqXUs1bLMjk7H
EioAn3HwMscsPL49RnNkwCuNtfO/2S4L
=WT3D
-----END PGP SIGNATURE-----

Changed in bzr:
status:	New → Won't Fix

Revision history for this message

Parth Malwankar (parthm) wrote on 2010-06-10:

So is the message "Will continue to try until XXXX, unless you press Ctrl-C." (and the related timeout code) actually used for any specific case or should this be removed?

Revision history for this message

John A Meinel (jameinel) wrote on 2010-06-10: Re: [Bug 592148] Re: timeout for LockContention retry is always 0 for smart server

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Parth Malwankar wrote:
> So is the message "Will continue to try until XXXX, unless you press
> Ctrl-C." (and the related timeout code) actually used for any specific
> case or should this be removed?
>

"Will continue to try" is bogus, which I mentioned (I believe) on the
other bug that you posted. That should be removed.

Though if I think about it, I think a *short* timeout would be
reasonable (10s). I'm not entirely sure why we picked 5minutes as the
global default.

I'm pretty sure it is rare that people encounter a branch on Launchpad
that is locked by real contention (such that waiting for the lock to
release is what you want to do). If you are pushing, then only one of
the two people pushing can succeed (eg. even if you wait on the lock,
then your basis *should* be out of date and you'll need to pull &&
merge/update before you can push again anyway.)

As such, if there is a lock it is:

a) Stale, and will never be released anyway
b) Something is being updated, and you'll probably need to serialize on
that being completed before you can do what you wanted anyway.

The only caveat is that updating branch.conf currently requires a write
lock, so there are a couple things that could be done 'in parallel',
where waiting for a lock could be reasonable. But rare enough that I
don't think we need to focus on supporting it.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwQ/OAACgkQJdeBCYSNAAPt/ACeOt/XZrlJ7iaFV7e0oS7ajUJa
6yMAn3mK7s89QnxhhPBWh/YXLSY0fDMl
=NbmV
-----END PGP SIGNATURE-----

Revision history for this message

Parth Malwankar (parthm) wrote on 2010-06-10:

> "Will continue to try" is bogus, which I mentioned (I believe) on the
> other bug that you posted. That should be removed.

That makes sense. There is another mp[1] about showing a valid
url for break-lock. The problem I was facing (in case we put a timeout)
is that if the user does a ^C, its on the client, and the client can't be
sure its a LockContention error, so we can't really be sure if a message
regarding "break-lock" with url should be displayed. If we get rid of timeout
entirely and raise LockContention immediately, the client can show
a clear message to the user and the user can try manually. I will do this
change as a part of the mp[1].

Thanks for clarifying this.

[1] https://code.launchpad.net/~parthm/bzr/250451-better-url-for-break-lock/+merge/27187

Revision history for this message

Robert Collins (lifeless) wrote on 2010-06-14: Re: [Bug 592148] Re: timeout for LockContention retry is always 0 for smart server

The timeout we chose was designed to permit pushes to branches to
finish - which can be longer than the near-instant insertion-to-repo
locks. That said, 5 minutes is long with todays code base.

I'm not sure 10 seconds is high enough though.

-Rob

Revision history for this message

Parth Malwankar (parthm) wrote on 2010-06-14:

Hi Robert,

The timeout doesn't really come into play for remote operations which just raise a LockContention error and exits immediately. I would think local operation is fast enough. Personally, my usage pattern has been to ^C (as 300s was too high) and try the operation later, so I would prefer a short timeout.
Do you have any suggestion for the timeout, maybe 20s?

Revision history for this message

John A Meinel (jameinel) wrote on 2010-06-14:

I would probably recommend something like 30s. I think the original motivation was based on contested repositories (pre-pack formats), which would allow us to have 2 fetches succeed. However pack+ (everything since 1.0) allows concurrent data copying into the repository, so that it won't prevent 2 people updating different branches in the same shared repository.

30s just 'feels' reasonable to me, without having an exact answer.

Revision history for this message

Robert Collins (lifeless) wrote on 2010-06-15:

John, you are ignoring the Branch lock, which we take out *at the
start of the fetch*.

How long does an average bound commit to a bzr+ssh server take ? ;)

However, a more important question is, what operations do we have that
a delayed lock would actually succeed on?

If someone does a push or commit to a branch, and a second users spins
on the lock, they are almost certainly going to have to merge first
(or supply --overwrite).

With this added into the reasoning , I'd suggest setting the timeout
to 5 seconds or less.

-Rob

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.