Merge lp:~robru/cupstream2distro/fix-locking into lp:cupstream2distro

Proposed by Robert Bruce Park
Status: Merged
Merged at revision: 1183
Proposed branch: lp:~robru/cupstream2distro/fix-locking
Merge into: lp:cupstream2distro
Diff against target: 142 lines (+25/-15)
9 files modified
citrain/build.py (+3/-2)
citrain/merge_clean.py (+2/-2)
citrain/migration.py (+2/-2)
citrain/publisher.py (+2/-2)
cupstream2distro/silomanager.py (+1/-4)
cupstream2distro/utils.py (+5/-1)
tests/unit/test_script_migration.py (+1/-1)
tests/unit/test_silomanager.py (+1/-1)
tests/unit/test_utils.py (+8/-0)
To merge this branch: bzr merge lp:~robru/cupstream2distro/fix-locking
Reviewer Review Type Date Requested Status
PS Jenkins bot continuous-integration Approve
Robert Bruce Park (community) Approve
Barry Warsaw Pending
Review via email: mp+275931@code.launchpad.net

Commit message

Fix silo locking properly.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

FAILED: Continuous integration, rev:1180
http://jenkins.qa.ubuntu.com/job/cu2d-choo-choo-ci/855/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/cu2d-choo-choo-ci/855/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:1181
http://jenkins.qa.ubuntu.com/job/cu2d-choo-choo-ci/856/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/cu2d-choo-choo-ci/856/rebuild

review: Approve (continuous-integration)
Revision history for this message
Barry Warsaw (barry) wrote :

High level comment: while it's not unreasonable to use the low-level flock
interface to do simple locking, you might want to consider using one of
several library alternatives. Locking can be tricky and there may be corner
cases lurking, and the libraries have generally been more thoroughly tested.
Totally up to you of course, so I'm just putting this out there.

The two I like are flufl.lock and lockfile, both of which have Python 2 and 3
versions in the archive. Obviously I wrote flufl.lock and it's the only one
that I'm aware of that is safe across NFS mounts

http://pythonhosted.org/flufl.lock/
http://pythonhosted.org/lockfile/

(We should be up-to-date with flufl.lock, but I think lockfile is a minor rev
behind upstream - I'll ping the maintainer.)

Take a look at the semantics of the two packages; they differ so one might be
more appropriate to your use case than the other.

Aside from that...

On Oct 27, 2015, at 10:42 PM, Robert Bruce Park wrote:

>=== modified file 'cupstream2distro/silomanager.py'
>--- cupstream2distro/silomanager.py 2015-10-22 18:10:30 +0000
>+++ cupstream2distro/silomanager.py 2015-10-27 22:41:56 +0000
>@@ -22,6 +22,7 @@
> import re
> import os
> import json
>+import fcntl
> import logging
> import requests
> import signal
>@@ -47,7 +48,6 @@
> STABLE_OVERLAY_PPA,
> )
> from cupstream2distro.utils import (
>- atomic_write,
> env,
> memoize,
> log_value_of,
>@@ -197,8 +197,7 @@
> if self.strict:
> signal.signal(signal.SIGTERM, self.cleanup_on_kill)
> sys.excepthook = self.cleanup_on_exception
>- with atomic_write(self.lockfile, 'w') as lock:
>- lock.write(str(os.getpid()))
>+ self.enforce_lock()

The trickiest part of locks IMHO is ensuring you don't get deadlocked in the
face of exceptions and errors. The context manager syntax (i.e. the `with`
statement) pretty much guarantees that the locked resource will be freed if
anything goes wrong. Of course, the process could still be kill -9'd, etc,
but that's why for example flufl.lock has a timeout feature. Eventually, all
locks will auto-expire.

How confident are you that there aren't corner cases which would deadlock the
silo?

Revision history for this message
Robert Bruce Park (robru) wrote :

On Oct 28, 2015 8:36 AM, "Barry Warsaw" <email address hidden> wrote:
> How confident are you that there aren't corner cases which would deadlock
the
> silo?

Very confident, because this isn't long running code. Any error or
exception will cause the process to exit, which as far as i can tell,
guarantees the file handle be closed and the lock released.

Also NFS mounts aren't a concern but I'll take a look at flufl in a sec.

Revision history for this message
Robert Bruce Park (robru) wrote :

Ok I've just tested this further, and not even os._exit() can achieve a deadlock. It seems flock is smart enough to clear the lock when the process exits even if it doesn't exit cleanly, so I'm gonna go ahead and merge this.

review: Approve
Revision history for this message
Robert Bruce Park (robru) wrote :

(also confirmed kill -9 releases the lock too)

Revision history for this message
Barry Warsaw (barry) wrote :

Stevens confirms flocks are released at process exit and fd closure.

1182. By Robert Bruce Park

Clean up behavior when trying to run two jobs simultaneously.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:1182
http://jenkins.qa.ubuntu.com/job/cu2d-choo-choo-ci/857/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/cu2d-choo-choo-ci/857/rebuild

review: Approve (continuous-integration)
1183. By Robert Bruce Park

Unify redundant BlockingIOError handling.

1184. By Robert Bruce Park

Drop problematic finally clause.

1185. By Robert Bruce Park

finally: clause triggers even on unhandled exceptions and I do not want that.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:1185
http://jenkins.qa.ubuntu.com/job/cu2d-choo-choo-ci/858/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/cu2d-choo-choo-ci/858/rebuild

review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'citrain/build.py'
--- citrain/build.py 2015-10-27 19:35:44 +0000
+++ citrain/build.py 2015-10-28 19:46:37 +0000
@@ -88,10 +88,11 @@
88 except CITrainError as err:88 except CITrainError as err:
89 logging.error(str(err))89 logging.error(str(err))
90 silo_state.status = 'Build failed: ' + str(err)90 silo_state.status = 'Build failed: ' + str(err)
91 return 1
92 finally:
93 silo_state.mark_dirty()91 silo_state.mark_dirty()
94 silo_state.save_config()92 silo_state.save_config()
93 return 1
94 silo_state.mark_dirty()
95 silo_state.save_config()
95 return 096 return 0
9697
9798
9899
=== modified file 'citrain/merge_clean.py'
--- citrain/merge_clean.py 2015-10-27 19:35:44 +0000
+++ citrain/merge_clean.py 2015-10-28 19:46:37 +0000
@@ -52,9 +52,9 @@
52 except CITrainError as err:52 except CITrainError as err:
53 logging.error(str(err))53 logging.error(str(err))
54 silo_state.status = 'Merge&Clean failed: ' + str(err)54 silo_state.status = 'Merge&Clean failed: ' + str(err)
55 silo_state.save_config()
55 return 156 return 1
56 finally:57 silo_state.save_config()
57 silo_state.save_config()
58 return 058 return 0
5959
6060
6161
=== modified file 'citrain/migration.py'
--- citrain/migration.py 2015-10-27 19:35:44 +0000
+++ citrain/migration.py 2015-10-28 19:46:37 +0000
@@ -45,8 +45,8 @@
45 BuildBase.failures.clear()45 BuildBase.failures.clear()
46 try:46 try:
47 silo_state.enforce_lock()47 silo_state.enforce_lock()
48 except CITrainError as err:48 except BlockingIOError:
49 logging.info(str(err) + '\n')49 logging.info('Silo is in use, skipping.\n')
50 continue50 continue
5151
52 if 'dirty' not in silo_state.tokenize():52 if 'dirty' not in silo_state.tokenize():
5353
=== modified file 'citrain/publisher.py'
--- citrain/publisher.py 2015-10-27 19:35:44 +0000
+++ citrain/publisher.py 2015-10-28 19:46:37 +0000
@@ -60,9 +60,9 @@
60 logging.error(str(err))60 logging.error(str(err))
61 if type(err) is not NoStatusError:61 if type(err) is not NoStatusError:
62 silo_state.status = 'Publish failed: ' + str(err)62 silo_state.status = 'Publish failed: ' + str(err)
63 silo_state.save_config()
63 return 164 return 1
64 finally:65 silo_state.save_config()
65 silo_state.save_config()
66 # This bit needs to be in a separate block because it's so slow that it66 # This bit needs to be in a separate block because it's so slow that it
67 # makes the last call to save_config() end up losing races with67 # makes the last call to save_config() end up losing races with
68 # check-publication-migration. Then you get logs that say68 # check-publication-migration. Then you get logs that say
6969
=== modified file 'cupstream2distro/silomanager.py'
--- cupstream2distro/silomanager.py 2015-10-28 19:05:41 +0000
+++ cupstream2distro/silomanager.py 2015-10-28 19:46:37 +0000
@@ -231,10 +231,7 @@
231231
232 def enforce_lock(self):232 def enforce_lock(self):
233 """Raise PrepError if this silo is locked by a running process."""233 """Raise PrepError if this silo is locked by a running process."""
234 try:234 fcntl.flock(self.lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
235 fcntl.flock(self.lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
236 except BlockingIOError:
237 raise PrepError('Silo {} is locked.'.format(self))
238235
239 def assign(self, requestid):236 def assign(self, requestid):
240 """Assign a new requestid to an available silo."""237 """Assign a new requestid to an available silo."""
241238
=== modified file 'cupstream2distro/utils.py'
--- cupstream2distro/utils.py 2015-10-19 21:34:21 +0000
+++ cupstream2distro/utils.py 2015-10-28 19:46:37 +0000
@@ -187,7 +187,11 @@
187 if set(['--help', '-h']) & set(sys.argv):187 if set(['--help', '-h']) & set(sys.argv):
188 sys.stdout.write(doc)188 sys.stdout.write(doc)
189 else:189 else:
190 sys.exit(main())190 try:
191 sys.exit(main())
192 except BlockingIOError:
193 logging.error('Another job is already running in this silo.')
194 sys.exit(2)
191195
192196
193def call(*args, **kwargs):197def call(*args, **kwargs):
194198
=== modified file 'tests/unit/test_script_migration.py'
--- tests/unit/test_script_migration.py 2015-10-21 23:12:36 +0000
+++ tests/unit/test_script_migration.py 2015-10-28 19:46:37 +0000
@@ -75,7 +75,7 @@
75 def test_main_skip_locked_silos(self):75 def test_main_skip_locked_silos(self):
76 """Skip over silos that are currently used by other processes."""76 """Skip over silos that are currently used by other processes."""
77 states = self.script.SiloState.iterate.return_value = [Mock()]77 states = self.script.SiloState.iterate.return_value = [Mock()]
78 states[0].enforce_lock.side_effect = self.script.CITrainError78 states[0].enforce_lock.side_effect = BlockingIOError
79 self.assertEqual(self.script.main(), 0)79 self.assertEqual(self.script.main(), 0)
80 self.assertEqual(self.script.Manager.mock_calls, [])80 self.assertEqual(self.script.Manager.mock_calls, [])
81 self.assertEqual(self.script.merge_main.mock_calls, [])81 self.assertEqual(self.script.merge_main.mock_calls, [])
8282
=== modified file 'tests/unit/test_silomanager.py'
--- tests/unit/test_silomanager.py 2015-10-27 22:40:32 +0000
+++ tests/unit/test_silomanager.py 2015-10-28 19:46:37 +0000
@@ -126,7 +126,7 @@
126 for silo in ('one', 'two', 'three'):126 for silo in ('one', 'two', 'three'):
127 os.mkdir(join(self.tempdir, silo))127 os.mkdir(join(self.tempdir, silo))
128 states.append(SiloState(silo, primary=True))128 states.append(SiloState(silo, primary=True))
129 with self.assertRaisesRegexp(PrepError, 'Silo two is locked'):129 with self.assertRaises(BlockingIOError):
130 SiloState('two', primary=True)130 SiloState('two', primary=True)
131 del states[1]131 del states[1]
132 SiloState('two', primary=True) # No exception raised.132 SiloState('two', primary=True) # No exception raised.
133133
=== modified file 'tests/unit/test_utils.py'
--- tests/unit/test_utils.py 2015-09-24 05:06:40 +0000
+++ tests/unit/test_utils.py 2015-10-28 19:46:37 +0000
@@ -136,6 +136,14 @@
136 self.assertEqual(sys_mock.exit.mock_calls, [])136 self.assertEqual(sys_mock.exit.mock_calls, [])
137 sys_mock.stdout.write.assert_called_with(helper * 5)137 sys_mock.stdout.write.assert_called_with(helper * 5)
138138
139 @patch('cupstream2distro.utils.sys')
140 def test_run_script_locked(self, sys_mock):
141 """Log & exit when silo is locked."""
142 main_mock = Mock(side_effect=BlockingIOError)
143 sys_mock.argv = []
144 run_script('__main__', 'docbrown', main_mock)
145 sys_mock.exit.assert_called_once_with(2)
146
139 def test_singleton(self):147 def test_singleton(self):
140 """There Can Be Only One."""148 """There Can Be Only One."""
141 @singleton149 @singleton

Subscribers

People subscribed via source and target branches