Merge lp:~chad.smith/charms/precise/storage/storage-fix-nfs-relation-ordering into lp:~dpb/charms/precise/storage/trunk

Proposed by Chad Smith
Status: Merged
Merged at revision: 34
Proposed branch: lp:~chad.smith/charms/precise/storage/storage-fix-nfs-relation-ordering
Merge into: lp:~dpb/charms/precise/storage/trunk
Diff against target: 272 lines (+69/-65)
5 files modified
hooks/common_util.py (+18/-18)
hooks/storage-provider.d/nfs/data-relation-changed (+2/-1)
hooks/storage-provider.d/nfs/nfs-relation-changed (+0/-23)
hooks/storage-provider.d/nfs/nfs-relation-departed (+0/-7)
hooks/test_common_util.py (+49/-16)
To merge this branch: bzr merge lp:~chad.smith/charms/precise/storage/storage-fix-nfs-relation-ordering
Reviewer Review Type Date Requested Status
David Britton Approve
Fernando Correa Neto (community) Approve
Review via email: mp+210892@code.launchpad.net

Description of the change

Because we cannot rely on ordering of nfs versus data relation hooks fired, we need to account for either the data relation with the principal charm being fired before the nfs relation with the nfs charm or the other way around.

This branch allows either hook to call mount_volume() which will now attempt to grab specific attributes from a designated "data" or "nfs" hook name. This way either hook can determine if all required information is present before proceeding to try mounting the nfs share.

To post a comment you must log in.
Revision history for this message
David Britton (dpb) wrote :

[0] Test failures... Of the form:

https://pastebin.canonical.com/106789/

  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/usr/lib/python2.7/dist-packages/mocker.py", line 146, in test_method_wrapper
    result = test_method()
  File "/home/dpb/src/canonical/juju/charms/blockstoragebroker/bsb-ec2-support/hooks/test_util.py", line 1332, in test_wb_ec2_describe_volumes_with_attac
hed_instances
    self.storage._ec2_describe_volumes(), expected)
  File "/home/dpb/src/canonical/juju/charms/blockstoragebroker/bsb-ec2-support/hooks/util.py", line 250, in _ec2_describe_volumes
    command = describevolumes.DescribeVolumes()

[...]

requestbuilder.exceptions.ServiceInitError: No ec2 endpoint to connect to was given. ec2 endpoints may be specified in a config file with "ec2-url".

test_util.TestEC2Util.test_wb_ec2_describe_volumes_with_attached_instances

--
David Britton <email address hidden>

Revision history for this message
David Britton (dpb) wrote :
Download full text (3.2 KiB)

Failure while testing:

2014-03-20 17:02:47 INFO worker.uniter.jujuc server.go:108 running hook tool "relation-set" ["-r" "data:1" "mountpoint=/srv/data"]
2014-03-20 17:02:47 INFO juju.worker.uniter uniter.go:356 ran "block-storage-relation-changed" hook
2014-03-20 17:02:47 INFO juju.worker.uniter uniter.go:363 committing "relation-changed" hook
2014-03-20 17:02:47 INFO juju.worker.uniter uniter.go:381 committed "relation-changed" hook
2014-03-20 17:46:08 INFO juju fslock.go:144 attempted lock failed "uniter-hook-execution", storage/0: running hook "data-relation-departed", currently held: postgresql/0: running hook "data-relation-departed"
2014-03-20 17:46:11 INFO juju fslock.go:144 attempted lock failed "uniter-hook-execution", storage/0: running hook "data-relation-departed", currently held: postgresql/0: running hook "data-relation-broken"
2014-03-20 17:46:13 INFO juju.worker.uniter uniter.go:348 running "data-relation-departed" hook
2014-03-20 17:46:13 INFO worker.uniter.jujuc server.go:108 running hook tool "config-get" ["provider"]
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK Storage provider: block-storage-broker calling data-relation-departed hook
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK umount: /srv/data: device is busy.
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK (In some cases useful info about processes that use
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK the device is found by lsof(8) or fuser(1))
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK Traceback (most recent call last):
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK File "hooks/storage-provider.d/block-storage-broker/data-relation-departed", line 6, in <module>
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK common_util.unmount_volume()
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK File "/var/lib/juju/agents/unit-storage-0/charm/hooks/common_util.py", line 110, in unmount_volume
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK "lsof %s | awk '(NR == 2){print $1}'" % mountpoint)
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK File "/usr/lib/python2.7/subprocess.py", line 537, in check_output
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK process = Popen(stdout=PIPE, *popenargs, **kwargs)
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK errread, errwrite)
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK raise child_exception
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK OSError: [Errno 2] No such file or directory
2014-03-20 17:46:13 ERROR juju.worker.uniter uniter.go:350 hook failed: exit status 1
2014-03-20 17:46:13 INFO juju.worker.uniter modes.go:421 ModeHookError starting
2014-03-20 17:46:13 INFO juju.worker.uniter.filter f...

Read more...

Revision history for this message
David Britton (dpb) wrote :

[0] can be ignored... wrong MP

37. By Chad Smith

don't use awk. use python to parse lsof output. Add unit test for lsof subprocess command failure

38. By Chad Smith

use lsof -b

Revision history for this message
Chad Smith (chad.smith) wrote :

>
> [0] Test failures... Of the form:
>

Okay the above was block-storage-broker charm issues with trusty. Resolved in a followon branch https://code.launchpad.net/~chad.smith/charms/precise/block-storage-broker/bsb-trusty-support

Revision history for this message
Chad Smith (chad.smith) wrote :

[1] lsof |awk issues when volume is umounted.... This is resolved by dropping the lsof | awk in subprocess command and instead using subprocess.check_output(["lsof", mountpoint, "-b"])

fixes and unit tests are pushed in rev 37 &38

Revision history for this message
Fernando Correa Neto (fcorrea) wrote :

Works great! +1

Went through the process of deploying landscape, nfs and this storage branch.
Tried to establish landscape<->storage and could verify that it didn't go into a error state and it also logged the message.
As soon as I established storage<->nfs, the hooks started firing again and landscape ended with its nfs mounted.

review: Approve
Revision history for this message
David Britton (dpb) wrote :

I'm deploying and removing OK now. +1

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'hooks/common_util.py'
2--- hooks/common_util.py 2014-02-25 23:09:20 +0000
3+++ hooks/common_util.py 2014-03-20 21:20:18 +0000
4@@ -106,10 +106,15 @@
5 log("Unmounted %s, Done" % mountpoint)
6 break
7 else: # Then device is in use let's report the offending process
8- process_name = subprocess.check_output(
9- "lsof %s | awk '(NR == 2){print $1}'" % mountpoint)
10- log("WARNING: umount %s failed. Device in use by (%s)" %
11- (mountpoint, process_name.strip()))
12+ message = "WARNING: umount %s failed. Retrying." % mountpoint
13+ try:
14+ lsof = subprocess.check_output(["lsof", mountpoint, "-b"])
15+ except subprocess.CalledProcessError, e:
16+ log("WARNING: %s" % str(e), hookenv.WARNING)
17+ else:
18+ process_name = lsof.split("\n")[1].split()[0]
19+ message += " Device in use by (%s)" % process_name
20+ log(message, hookenv.WARNING)
21 sleep(5)
22
23 if not success:
24@@ -216,17 +221,12 @@
25 issue and exit. Otherwise attempt to initialize and mount the blockstorage
26 or nfs device.
27 """
28- provider = get_provider()
29- if provider == "block-storage-broker":
30- # Called as part of a block storage hook, but mountpoint comes
31- # from the data relation.
32- mountpoint = _get_from_relation("data", "mountpoint")
33- else:
34- mountpoint = hookenv.relation_get("mountpoint")
35+ # mountpoint defined by our related principal in the "data" relation
36+ mountpoint = _get_from_relation("data", "mountpoint")
37
38 if not mountpoint:
39 log("No mountpoint defined by relation. Cannot mount volume yet.")
40- sys.exit(0)
41+ return
42
43 _persist_data("mountpoint", mountpoint) # Saved for our departed hook
44 if is_mounted(mountpoint):
45@@ -235,9 +235,10 @@
46 for relid in hookenv.relation_ids("data"):
47 hookenv.relation_set(
48 relid, relation_settings={"mountpoint": mountpoint})
49- sys.exit(0)
50+ return
51
52 options = "default"
53+ provider = get_provider()
54 if provider == "nfs":
55 relids = hookenv.relation_ids("nfs")
56 for relid in relids:
57@@ -250,11 +251,10 @@
58 options = relation.get("options", "default")
59 break
60
61- if None in [nfs_server, nfs_path, fstype]:
62- log("ERROR: Missing required relation values. "
63- "nfs-relation-changed hook needs to run first.",
64- hookenv.ERROR)
65- sys.exit(1)
66+ if len(relids) == 0 or None in [nfs_server, nfs_path, fstype]:
67+ log("Missing required relation values. "
68+ "nfs-relation-changed hook needs to run first.")
69+ return
70
71 device_path = "%s:%s" % (nfs_server, nfs_path)
72 elif provider == "block-storage-broker":
73
74=== modified file 'hooks/storage-provider.d/nfs/data-relation-changed'
75--- hooks/storage-provider.d/nfs/data-relation-changed 2014-02-10 18:34:02 +0000
76+++ hooks/storage-provider.d/nfs/data-relation-changed 2014-03-20 21:20:18 +0000
77@@ -2,4 +2,5 @@
78
79 import common_util
80
81-common_util.mount_volume() # Will wait until mountpoint is set by principal
82+# Will NOOP until mountpoint is set by principal and relation setup with nfs
83+common_util.mount_volume()
84
85=== added symlink 'hooks/storage-provider.d/nfs/nfs-relation-changed'
86=== target is u'data-relation-changed'
87=== removed file 'hooks/storage-provider.d/nfs/nfs-relation-changed'
88--- hooks/storage-provider.d/nfs/nfs-relation-changed 2014-01-31 23:49:37 +0000
89+++ hooks/storage-provider.d/nfs/nfs-relation-changed 1970-01-01 00:00:00 +0000
90@@ -1,23 +0,0 @@
91-#!/usr/bin/python
92-
93-from charmhelpers.core import hookenv
94-import common_util
95-import os
96-import subprocess
97-import sys
98-
99-current_dir = os.path.dirname(os.path.realpath(__file__))
100-config_changed_path = "%s/config-changed" % current_dir
101-if os.path.exists(config_changed_path):
102- subprocess.check_call(config_changed_path)
103-
104-common_util.log("nfs: We've got an nfs mount")
105-
106-options = hookenv.relation_get("options")
107-mountpoint = hookenv.relation_get("mountpoint")
108-fstype = hookenv.relation_get("fstype")
109-host = hookenv.relation_get("private-address")
110-
111-if not fstype:
112- common_util.log("nfs: No fstype defined. Waiting for some real data")
113- sys.exit(0)
114
115=== added symlink 'hooks/storage-provider.d/nfs/nfs-relation-departed'
116=== target is u'data-relation-departed'
117=== removed file 'hooks/storage-provider.d/nfs/nfs-relation-departed'
118--- hooks/storage-provider.d/nfs/nfs-relation-departed 2014-01-14 18:53:58 +0000
119+++ hooks/storage-provider.d/nfs/nfs-relation-departed 1970-01-01 00:00:00 +0000
120@@ -1,7 +0,0 @@
121-#!/usr/bin/python
122-
123-import common_util
124-
125-common_util.log("nfs: We've lost our shared NFS mount")
126-common_util.unmount_volume(remove_persistent_data=True)
127-common_util.log("nfs: Fairwell nfs mount, we hardly knew you")
128
129=== modified file 'hooks/test_common_util.py'
130--- hooks/test_common_util.py 2014-02-25 23:09:20 +0000
131+++ hooks/test_common_util.py 2014-03-20 21:20:18 +0000
132@@ -260,6 +260,10 @@
133 """
134 util._persist_data("mountpoint", "/mnt/this")
135
136+ lsof_output = (
137+ "COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME\n"
138+ "postgresql 30544 csmith 10r CHR 1,9 0t0 2056 /mnt/this")
139+
140 ismount = self.mocker.replace(os.path.ismount)
141 ismount("/mnt/this")
142 self.mocker.result(True)
143@@ -268,8 +272,8 @@
144 self.mocker.result(2) # Failure from the shell cmd
145 self.mocker.count(20)
146 lsof = self.mocker.replace(subprocess.check_output)
147- lsof("lsof /mnt/this | awk '(NR == 2){print $1}'",)
148- self.mocker.result("postgresql") # Process name using the filesystem
149+ lsof(["lsof", "/mnt/this", "-b"])
150+ self.mocker.result(lsof_output)
151 self.mocker.count(20)
152 sleep_mock = self.mocker.replace(sleep)
153 sleep_mock(5)
154@@ -281,6 +285,36 @@
155 self.assertTrue(
156 util.hookenv.is_logged(message), "Not logged- %s" % message)
157
158+ def test_unmount_lsof_command_failure(self):
159+ """Log a warning if lsof fails during L{unmount_volume} retries"""
160+ util._persist_data("mountpoint", "/mnt/this")
161+
162+ ismount = self.mocker.replace(os.path.ismount)
163+ ismount("/mnt/this")
164+ self.mocker.result(True)
165+ umount = self.mocker.replace(subprocess.call)
166+ umount(["umount", "/mnt/this"])
167+ self.mocker.result(2) # Failure from the umount shell cmd
168+ self.mocker.count(20)
169+ lsof = self.mocker.replace(subprocess.check_output)
170+ lsof(["lsof", "/mnt/this", "-b"])
171+ self.mocker.count(20)
172+ self.mocker.throw(
173+ subprocess.CalledProcessError(1, "lsof /mnt/this")) # lsof fail
174+ sleep_mock = self.mocker.replace(sleep)
175+ sleep_mock(5)
176+ self.mocker.count(20)
177+ self.mocker.replay()
178+
179+ self.assertRaises(SystemExit, util.unmount_volume)
180+ messages = [
181+ "ERROR: Unmount failed, leaving relation in errored state",
182+ "WARNING: umount /mnt/this failed. Retrying.",
183+ "WARNING: Command 'lsof /mnt/this' returned non-zero exit status 1"]
184+ for message in messages:
185+ self.assertTrue(
186+ util.hookenv.is_logged(message), "Not logged- %s" % message)
187+
188 def test_unmount_volume_success(self):
189 """
190 L{unmount_volume} will unmount a mounted filesystem calling the shell
191@@ -519,8 +553,7 @@
192 has not yet defined by the relation principal yet.
193 """
194 self.assertIsNone(util.hookenv.relation_get("mountpoint"))
195- result = self.assertRaises(SystemExit, util.mount_volume)
196- self.assertEqual(result.code, 0)
197+ util.mount_volume()
198 message = "No mountpoint defined by relation. Cannot mount volume yet."
199 self.assertTrue(
200 util.hookenv.is_logged(message), "Not logged- %s" % message)
201@@ -533,15 +566,14 @@
202 set juju relation C{mountpoint} in response.
203 """
204 util.hookenv._incoming_relation_data = (
205- (None, "mountpoint", "/mnt/this"),)
206+ ("data:1", "mountpoint", "/mnt/this"),)
207
208 ismount = self.mocker.replace(os.path.ismount)
209 ismount("/mnt/this")
210 self.mocker.result(True)
211 self.mocker.replay()
212
213- result = self.assertRaises(SystemExit, util.mount_volume)
214- self.assertEqual(result.code, 0)
215+ util.mount_volume()
216 self.assertIn(("mountpoint", "/mnt/this"),
217 util.hookenv._outgoing_relation_data)
218 message = "Volume (/mnt/this) already mounted"
219@@ -559,16 +591,16 @@
220 already mounted.
221 """
222 util.hookenv._incoming_relation_data = (
223- (None, "mountpoint", "/mnt/this"),)
224- self.assertEqual(util.hookenv.relation_get("mountpoint"), "/mnt/this")
225+ ("data:1", "mountpoint", "/mnt/this"),)
226+ self.assertEqual(
227+ util.hookenv.relation_get("mountpoint", rid="data:1"), "/mnt/this")
228
229 ismount = self.mocker.replace(os.path.ismount)
230 ismount("/mnt/this")
231 self.mocker.result(True)
232 self.mocker.replay()
233
234- result = self.assertRaises(SystemExit, util.mount_volume)
235- self.assertEqual(result.code, 0)
236+ util.mount_volume()
237 message = "Volume (/mnt/this) already mounted"
238 self.assertTrue(
239 util.hookenv.is_logged(message), "Not logged- %s" % message)
240@@ -581,7 +613,7 @@
241 C{mount_type}.
242 """
243 data = (
244- (None, "mountpoint", "/mnt/this"),
245+ ("data:1", "mountpoint", "/mnt/this"),
246 ("nfs:1", "private-address", "me.com"),
247 ("nfs:1", "mountpoint", "/nfs/server/path"),
248 ("nfs:1", "fstype", "nfs"),)
249@@ -595,10 +627,11 @@
250 partial_data = partial_data + (item,)
251 util.hookenv._incoming_relation_data = partial_data
252 self.assertEqual(
253- util.hookenv.relation_get("mountpoint"), "/mnt/this")
254- self.assertRaises(SystemExit, util.mount_volume)
255+ util.hookenv.relation_get("mountpoint", rid="data:1"),
256+ "/mnt/this")
257+ util.mount_volume()
258 message = (
259- "ERROR: Missing required relation values. "
260+ "Missing required relation values. "
261 "nfs-relation-changed hook needs to run first.")
262 self.assertTrue(
263 util.hookenv.is_logged(message), "Not logged- %s" % message)
264@@ -614,7 +647,7 @@
265 to report the successful initialization to the principal.
266 """
267 util.hookenv._incoming_relation_data = (
268- (None, "mountpoint", "/mnt/this"),
269+ ("data:1", "mountpoint", "/mnt/this"),
270 ("nfs:1", "private-address", "me.com"),
271 ("nfs:1", "options", "someopts"),
272 ("nfs:1", "mountpoint", "/nfs/server/path"),

Subscribers

People subscribed via source and target branches

to all changes: