Merge lp:~robru/cupstream2distro/fix-locking into lp:cupstream2distro

Proposed by Robert Bruce Park
Status: Merged
Merged at revision: 1183
Proposed branch: lp:~robru/cupstream2distro/fix-locking
Merge into: lp:cupstream2distro
Diff against target: 142 lines (+25/-15)
9 files modified
citrain/build.py (+3/-2)
citrain/merge_clean.py (+2/-2)
citrain/migration.py (+2/-2)
citrain/publisher.py (+2/-2)
cupstream2distro/silomanager.py (+1/-4)
cupstream2distro/utils.py (+5/-1)
tests/unit/test_script_migration.py (+1/-1)
tests/unit/test_silomanager.py (+1/-1)
tests/unit/test_utils.py (+8/-0)
To merge this branch: bzr merge lp:~robru/cupstream2distro/fix-locking
Reviewer Review Type Date Requested Status
PS Jenkins bot continuous-integration Approve
Robert Bruce Park (community) Approve
Barry Warsaw Pending
Review via email: mp+275931@code.launchpad.net

Commit message

Fix silo locking properly.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

FAILED: Continuous integration, rev:1180
http://jenkins.qa.ubuntu.com/job/cu2d-choo-choo-ci/855/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/cu2d-choo-choo-ci/855/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:1181
http://jenkins.qa.ubuntu.com/job/cu2d-choo-choo-ci/856/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/cu2d-choo-choo-ci/856/rebuild

review: Approve (continuous-integration)
Revision history for this message
Barry Warsaw (barry) wrote :

High level comment: while it's not unreasonable to use the low-level flock
interface to do simple locking, you might want to consider using one of
several library alternatives. Locking can be tricky and there may be corner
cases lurking, and the libraries have generally been more thoroughly tested.
Totally up to you of course, so I'm just putting this out there.

The two I like are flufl.lock and lockfile, both of which have Python 2 and 3
versions in the archive. Obviously I wrote flufl.lock and it's the only one
that I'm aware of that is safe across NFS mounts

http://pythonhosted.org/flufl.lock/
http://pythonhosted.org/lockfile/

(We should be up-to-date with flufl.lock, but I think lockfile is a minor rev
behind upstream - I'll ping the maintainer.)

Take a look at the semantics of the two packages; they differ so one might be
more appropriate to your use case than the other.

Aside from that...

On Oct 27, 2015, at 10:42 PM, Robert Bruce Park wrote:

>=== modified file 'cupstream2distro/silomanager.py'
>--- cupstream2distro/silomanager.py 2015-10-22 18:10:30 +0000
>+++ cupstream2distro/silomanager.py 2015-10-27 22:41:56 +0000
>@@ -22,6 +22,7 @@
> import re
> import os
> import json
>+import fcntl
> import logging
> import requests
> import signal
>@@ -47,7 +48,6 @@
> STABLE_OVERLAY_PPA,
> )
> from cupstream2distro.utils import (
>- atomic_write,
> env,
> memoize,
> log_value_of,
>@@ -197,8 +197,7 @@
> if self.strict:
> signal.signal(signal.SIGTERM, self.cleanup_on_kill)
> sys.excepthook = self.cleanup_on_exception
>- with atomic_write(self.lockfile, 'w') as lock:
>- lock.write(str(os.getpid()))
>+ self.enforce_lock()

The trickiest part of locks IMHO is ensuring you don't get deadlocked in the
face of exceptions and errors. The context manager syntax (i.e. the `with`
statement) pretty much guarantees that the locked resource will be freed if
anything goes wrong. Of course, the process could still be kill -9'd, etc,
but that's why for example flufl.lock has a timeout feature. Eventually, all
locks will auto-expire.

How confident are you that there aren't corner cases which would deadlock the
silo?

Revision history for this message
Robert Bruce Park (robru) wrote :

On Oct 28, 2015 8:36 AM, "Barry Warsaw" <email address hidden> wrote:
> How confident are you that there aren't corner cases which would deadlock
the
> silo?

Very confident, because this isn't long running code. Any error or
exception will cause the process to exit, which as far as i can tell,
guarantees the file handle be closed and the lock released.

Also NFS mounts aren't a concern but I'll take a look at flufl in a sec.

Revision history for this message
Robert Bruce Park (robru) wrote :

Ok I've just tested this further, and not even os._exit() can achieve a deadlock. It seems flock is smart enough to clear the lock when the process exits even if it doesn't exit cleanly, so I'm gonna go ahead and merge this.

review: Approve
Revision history for this message
Robert Bruce Park (robru) wrote :

(also confirmed kill -9 releases the lock too)

Revision history for this message
Barry Warsaw (barry) wrote :

Stevens confirms flocks are released at process exit and fd closure.

1182. By Robert Bruce Park

Clean up behavior when trying to run two jobs simultaneously.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:1182
http://jenkins.qa.ubuntu.com/job/cu2d-choo-choo-ci/857/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/cu2d-choo-choo-ci/857/rebuild

review: Approve (continuous-integration)
1183. By Robert Bruce Park

Unify redundant BlockingIOError handling.

1184. By Robert Bruce Park

Drop problematic finally clause.

1185. By Robert Bruce Park

finally: clause triggers even on unhandled exceptions and I do not want that.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:1185
http://jenkins.qa.ubuntu.com/job/cu2d-choo-choo-ci/858/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/cu2d-choo-choo-ci/858/rebuild

review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'citrain/build.py'
2--- citrain/build.py 2015-10-27 19:35:44 +0000
3+++ citrain/build.py 2015-10-28 19:46:37 +0000
4@@ -88,10 +88,11 @@
5 except CITrainError as err:
6 logging.error(str(err))
7 silo_state.status = 'Build failed: ' + str(err)
8- return 1
9- finally:
10 silo_state.mark_dirty()
11 silo_state.save_config()
12+ return 1
13+ silo_state.mark_dirty()
14+ silo_state.save_config()
15 return 0
16
17
18
19=== modified file 'citrain/merge_clean.py'
20--- citrain/merge_clean.py 2015-10-27 19:35:44 +0000
21+++ citrain/merge_clean.py 2015-10-28 19:46:37 +0000
22@@ -52,9 +52,9 @@
23 except CITrainError as err:
24 logging.error(str(err))
25 silo_state.status = 'Merge&Clean failed: ' + str(err)
26+ silo_state.save_config()
27 return 1
28- finally:
29- silo_state.save_config()
30+ silo_state.save_config()
31 return 0
32
33
34
35=== modified file 'citrain/migration.py'
36--- citrain/migration.py 2015-10-27 19:35:44 +0000
37+++ citrain/migration.py 2015-10-28 19:46:37 +0000
38@@ -45,8 +45,8 @@
39 BuildBase.failures.clear()
40 try:
41 silo_state.enforce_lock()
42- except CITrainError as err:
43- logging.info(str(err) + '\n')
44+ except BlockingIOError:
45+ logging.info('Silo is in use, skipping.\n')
46 continue
47
48 if 'dirty' not in silo_state.tokenize():
49
50=== modified file 'citrain/publisher.py'
51--- citrain/publisher.py 2015-10-27 19:35:44 +0000
52+++ citrain/publisher.py 2015-10-28 19:46:37 +0000
53@@ -60,9 +60,9 @@
54 logging.error(str(err))
55 if type(err) is not NoStatusError:
56 silo_state.status = 'Publish failed: ' + str(err)
57+ silo_state.save_config()
58 return 1
59- finally:
60- silo_state.save_config()
61+ silo_state.save_config()
62 # This bit needs to be in a separate block because it's so slow that it
63 # makes the last call to save_config() end up losing races with
64 # check-publication-migration. Then you get logs that say
65
66=== modified file 'cupstream2distro/silomanager.py'
67--- cupstream2distro/silomanager.py 2015-10-28 19:05:41 +0000
68+++ cupstream2distro/silomanager.py 2015-10-28 19:46:37 +0000
69@@ -231,10 +231,7 @@
70
71 def enforce_lock(self):
72 """Raise PrepError if this silo is locked by a running process."""
73- try:
74- fcntl.flock(self.lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
75- except BlockingIOError:
76- raise PrepError('Silo {} is locked.'.format(self))
77+ fcntl.flock(self.lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
78
79 def assign(self, requestid):
80 """Assign a new requestid to an available silo."""
81
82=== modified file 'cupstream2distro/utils.py'
83--- cupstream2distro/utils.py 2015-10-19 21:34:21 +0000
84+++ cupstream2distro/utils.py 2015-10-28 19:46:37 +0000
85@@ -187,7 +187,11 @@
86 if set(['--help', '-h']) & set(sys.argv):
87 sys.stdout.write(doc)
88 else:
89- sys.exit(main())
90+ try:
91+ sys.exit(main())
92+ except BlockingIOError:
93+ logging.error('Another job is already running in this silo.')
94+ sys.exit(2)
95
96
97 def call(*args, **kwargs):
98
99=== modified file 'tests/unit/test_script_migration.py'
100--- tests/unit/test_script_migration.py 2015-10-21 23:12:36 +0000
101+++ tests/unit/test_script_migration.py 2015-10-28 19:46:37 +0000
102@@ -75,7 +75,7 @@
103 def test_main_skip_locked_silos(self):
104 """Skip over silos that are currently used by other processes."""
105 states = self.script.SiloState.iterate.return_value = [Mock()]
106- states[0].enforce_lock.side_effect = self.script.CITrainError
107+ states[0].enforce_lock.side_effect = BlockingIOError
108 self.assertEqual(self.script.main(), 0)
109 self.assertEqual(self.script.Manager.mock_calls, [])
110 self.assertEqual(self.script.merge_main.mock_calls, [])
111
112=== modified file 'tests/unit/test_silomanager.py'
113--- tests/unit/test_silomanager.py 2015-10-27 22:40:32 +0000
114+++ tests/unit/test_silomanager.py 2015-10-28 19:46:37 +0000
115@@ -126,7 +126,7 @@
116 for silo in ('one', 'two', 'three'):
117 os.mkdir(join(self.tempdir, silo))
118 states.append(SiloState(silo, primary=True))
119- with self.assertRaisesRegexp(PrepError, 'Silo two is locked'):
120+ with self.assertRaises(BlockingIOError):
121 SiloState('two', primary=True)
122 del states[1]
123 SiloState('two', primary=True) # No exception raised.
124
125=== modified file 'tests/unit/test_utils.py'
126--- tests/unit/test_utils.py 2015-09-24 05:06:40 +0000
127+++ tests/unit/test_utils.py 2015-10-28 19:46:37 +0000
128@@ -136,6 +136,14 @@
129 self.assertEqual(sys_mock.exit.mock_calls, [])
130 sys_mock.stdout.write.assert_called_with(helper * 5)
131
132+ @patch('cupstream2distro.utils.sys')
133+ def test_run_script_locked(self, sys_mock):
134+ """Log & exit when silo is locked."""
135+ main_mock = Mock(side_effect=BlockingIOError)
136+ sys_mock.argv = []
137+ run_script('__main__', 'docbrown', main_mock)
138+ sys_mock.exit.assert_called_once_with(2)
139+
140 def test_singleton(self):
141 """There Can Be Only One."""
142 @singleton

Subscribers

People subscribed via source and target branches