Merge lp:~chad.smith/charms/precise/storage/storage-fix-nfs-relation-ordering into lp:~dpb/charms/precise/storage/trunk
- Precise Pangolin (12.04)
- storage-fix-nfs-relation-ordering
- Merge into trunk
Status: | Merged |
---|---|
Merged at revision: | 34 |
Proposed branch: | lp:~chad.smith/charms/precise/storage/storage-fix-nfs-relation-ordering |
Merge into: | lp:~dpb/charms/precise/storage/trunk |
Diff against target: |
272 lines (+69/-65) 5 files modified
hooks/common_util.py (+18/-18) hooks/storage-provider.d/nfs/data-relation-changed (+2/-1) hooks/storage-provider.d/nfs/nfs-relation-changed (+0/-23) hooks/storage-provider.d/nfs/nfs-relation-departed (+0/-7) hooks/test_common_util.py (+49/-16) |
To merge this branch: | bzr merge lp:~chad.smith/charms/precise/storage/storage-fix-nfs-relation-ordering |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
David Britton | Approve | ||
Fernando Correa Neto (community) | Approve | ||
Review via email:
|
Commit message
Description of the change
Because we cannot rely on ordering of nfs versus data relation hooks fired, we need to account for either the data relation with the principal charm being fired before the nfs relation with the nfs charm or the other way around.
This branch allows either hook to call mount_volume() which will now attempt to grab specific attributes from a designated "data" or "nfs" hook name. This way either hook can determine if all required information is present before proceeding to try mounting the nfs share.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
David Britton (dpb) wrote : | # |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
David Britton (dpb) wrote : | # |
Failure while testing:
2014-03-20 17:02:47 INFO worker.uniter.jujuc server.go:108 running hook tool "relation-set" ["-r" "data:1" "mountpoint=
2014-03-20 17:02:47 INFO juju.worker.uniter uniter.go:356 ran "block-
2014-03-20 17:02:47 INFO juju.worker.uniter uniter.go:363 committing "relation-changed" hook
2014-03-20 17:02:47 INFO juju.worker.uniter uniter.go:381 committed "relation-changed" hook
2014-03-20 17:46:08 INFO juju fslock.go:144 attempted lock failed "uniter-
2014-03-20 17:46:11 INFO juju fslock.go:144 attempted lock failed "uniter-
2014-03-20 17:46:13 INFO juju.worker.uniter uniter.go:348 running "data-relation-
2014-03-20 17:46:13 INFO worker.uniter.jujuc server.go:108 running hook tool "config-get" ["provider"]
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK Storage provider: block-storage-
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK umount: /srv/data: device is busy.
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK (In some cases useful info about processes that use
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK the device is found by lsof(8) or fuser(1))
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK Traceback (most recent call last):
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK File "hooks/
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK common_
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK File "/var/lib/
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK "lsof %s | awk '(NR == 2){print $1}'" % mountpoint)
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK File "/usr/lib/
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK process = Popen(stdout=PIPE, *popenargs, **kwargs)
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK File "/usr/lib/
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK errread, errwrite)
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK File "/usr/lib/
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK raise child_exception
2014-03-20 17:46:13 INFO juju.worker.uniter context.go:255 HOOK OSError: [Errno 2] No such file or directory
2014-03-20 17:46:13 ERROR juju.worker.uniter uniter.go:350 hook failed: exit status 1
2014-03-20 17:46:13 INFO juju.worker.uniter modes.go:421 ModeHookError starting
2014-03-20 17:46:13 INFO juju.worker.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
David Britton (dpb) wrote : | # |
[0] can be ignored... wrong MP
- 37. By Chad Smith
-
don't use awk. use python to parse lsof output. Add unit test for lsof subprocess command failure
- 38. By Chad Smith
-
use lsof -b
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Chad Smith (chad.smith) wrote : | # |
>
> [0] Test failures... Of the form:
>
Okay the above was block-storage-
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Chad Smith (chad.smith) wrote : | # |
[1] lsof |awk issues when volume is umounted.... This is resolved by dropping the lsof | awk in subprocess command and instead using subprocess.
fixes and unit tests are pushed in rev 37 &38
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Fernando Correa Neto (fcorrea) wrote : | # |
Works great! +1
Went through the process of deploying landscape, nfs and this storage branch.
Tried to establish landscape<->storage and could verify that it didn't go into a error state and it also logged the message.
As soon as I established storage<->nfs, the hooks started firing again and landscape ended with its nfs mounted.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
David Britton (dpb) wrote : | # |
I'm deploying and removing OK now. +1
Preview Diff
1 | === modified file 'hooks/common_util.py' |
2 | --- hooks/common_util.py 2014-02-25 23:09:20 +0000 |
3 | +++ hooks/common_util.py 2014-03-20 21:20:18 +0000 |
4 | @@ -106,10 +106,15 @@ |
5 | log("Unmounted %s, Done" % mountpoint) |
6 | break |
7 | else: # Then device is in use let's report the offending process |
8 | - process_name = subprocess.check_output( |
9 | - "lsof %s | awk '(NR == 2){print $1}'" % mountpoint) |
10 | - log("WARNING: umount %s failed. Device in use by (%s)" % |
11 | - (mountpoint, process_name.strip())) |
12 | + message = "WARNING: umount %s failed. Retrying." % mountpoint |
13 | + try: |
14 | + lsof = subprocess.check_output(["lsof", mountpoint, "-b"]) |
15 | + except subprocess.CalledProcessError, e: |
16 | + log("WARNING: %s" % str(e), hookenv.WARNING) |
17 | + else: |
18 | + process_name = lsof.split("\n")[1].split()[0] |
19 | + message += " Device in use by (%s)" % process_name |
20 | + log(message, hookenv.WARNING) |
21 | sleep(5) |
22 | |
23 | if not success: |
24 | @@ -216,17 +221,12 @@ |
25 | issue and exit. Otherwise attempt to initialize and mount the blockstorage |
26 | or nfs device. |
27 | """ |
28 | - provider = get_provider() |
29 | - if provider == "block-storage-broker": |
30 | - # Called as part of a block storage hook, but mountpoint comes |
31 | - # from the data relation. |
32 | - mountpoint = _get_from_relation("data", "mountpoint") |
33 | - else: |
34 | - mountpoint = hookenv.relation_get("mountpoint") |
35 | + # mountpoint defined by our related principal in the "data" relation |
36 | + mountpoint = _get_from_relation("data", "mountpoint") |
37 | |
38 | if not mountpoint: |
39 | log("No mountpoint defined by relation. Cannot mount volume yet.") |
40 | - sys.exit(0) |
41 | + return |
42 | |
43 | _persist_data("mountpoint", mountpoint) # Saved for our departed hook |
44 | if is_mounted(mountpoint): |
45 | @@ -235,9 +235,10 @@ |
46 | for relid in hookenv.relation_ids("data"): |
47 | hookenv.relation_set( |
48 | relid, relation_settings={"mountpoint": mountpoint}) |
49 | - sys.exit(0) |
50 | + return |
51 | |
52 | options = "default" |
53 | + provider = get_provider() |
54 | if provider == "nfs": |
55 | relids = hookenv.relation_ids("nfs") |
56 | for relid in relids: |
57 | @@ -250,11 +251,10 @@ |
58 | options = relation.get("options", "default") |
59 | break |
60 | |
61 | - if None in [nfs_server, nfs_path, fstype]: |
62 | - log("ERROR: Missing required relation values. " |
63 | - "nfs-relation-changed hook needs to run first.", |
64 | - hookenv.ERROR) |
65 | - sys.exit(1) |
66 | + if len(relids) == 0 or None in [nfs_server, nfs_path, fstype]: |
67 | + log("Missing required relation values. " |
68 | + "nfs-relation-changed hook needs to run first.") |
69 | + return |
70 | |
71 | device_path = "%s:%s" % (nfs_server, nfs_path) |
72 | elif provider == "block-storage-broker": |
73 | |
74 | === modified file 'hooks/storage-provider.d/nfs/data-relation-changed' |
75 | --- hooks/storage-provider.d/nfs/data-relation-changed 2014-02-10 18:34:02 +0000 |
76 | +++ hooks/storage-provider.d/nfs/data-relation-changed 2014-03-20 21:20:18 +0000 |
77 | @@ -2,4 +2,5 @@ |
78 | |
79 | import common_util |
80 | |
81 | -common_util.mount_volume() # Will wait until mountpoint is set by principal |
82 | +# Will NOOP until mountpoint is set by principal and relation setup with nfs |
83 | +common_util.mount_volume() |
84 | |
85 | === added symlink 'hooks/storage-provider.d/nfs/nfs-relation-changed' |
86 | === target is u'data-relation-changed' |
87 | === removed file 'hooks/storage-provider.d/nfs/nfs-relation-changed' |
88 | --- hooks/storage-provider.d/nfs/nfs-relation-changed 2014-01-31 23:49:37 +0000 |
89 | +++ hooks/storage-provider.d/nfs/nfs-relation-changed 1970-01-01 00:00:00 +0000 |
90 | @@ -1,23 +0,0 @@ |
91 | -#!/usr/bin/python |
92 | - |
93 | -from charmhelpers.core import hookenv |
94 | -import common_util |
95 | -import os |
96 | -import subprocess |
97 | -import sys |
98 | - |
99 | -current_dir = os.path.dirname(os.path.realpath(__file__)) |
100 | -config_changed_path = "%s/config-changed" % current_dir |
101 | -if os.path.exists(config_changed_path): |
102 | - subprocess.check_call(config_changed_path) |
103 | - |
104 | -common_util.log("nfs: We've got an nfs mount") |
105 | - |
106 | -options = hookenv.relation_get("options") |
107 | -mountpoint = hookenv.relation_get("mountpoint") |
108 | -fstype = hookenv.relation_get("fstype") |
109 | -host = hookenv.relation_get("private-address") |
110 | - |
111 | -if not fstype: |
112 | - common_util.log("nfs: No fstype defined. Waiting for some real data") |
113 | - sys.exit(0) |
114 | |
115 | === added symlink 'hooks/storage-provider.d/nfs/nfs-relation-departed' |
116 | === target is u'data-relation-departed' |
117 | === removed file 'hooks/storage-provider.d/nfs/nfs-relation-departed' |
118 | --- hooks/storage-provider.d/nfs/nfs-relation-departed 2014-01-14 18:53:58 +0000 |
119 | +++ hooks/storage-provider.d/nfs/nfs-relation-departed 1970-01-01 00:00:00 +0000 |
120 | @@ -1,7 +0,0 @@ |
121 | -#!/usr/bin/python |
122 | - |
123 | -import common_util |
124 | - |
125 | -common_util.log("nfs: We've lost our shared NFS mount") |
126 | -common_util.unmount_volume(remove_persistent_data=True) |
127 | -common_util.log("nfs: Fairwell nfs mount, we hardly knew you") |
128 | |
129 | === modified file 'hooks/test_common_util.py' |
130 | --- hooks/test_common_util.py 2014-02-25 23:09:20 +0000 |
131 | +++ hooks/test_common_util.py 2014-03-20 21:20:18 +0000 |
132 | @@ -260,6 +260,10 @@ |
133 | """ |
134 | util._persist_data("mountpoint", "/mnt/this") |
135 | |
136 | + lsof_output = ( |
137 | + "COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME\n" |
138 | + "postgresql 30544 csmith 10r CHR 1,9 0t0 2056 /mnt/this") |
139 | + |
140 | ismount = self.mocker.replace(os.path.ismount) |
141 | ismount("/mnt/this") |
142 | self.mocker.result(True) |
143 | @@ -268,8 +272,8 @@ |
144 | self.mocker.result(2) # Failure from the shell cmd |
145 | self.mocker.count(20) |
146 | lsof = self.mocker.replace(subprocess.check_output) |
147 | - lsof("lsof /mnt/this | awk '(NR == 2){print $1}'",) |
148 | - self.mocker.result("postgresql") # Process name using the filesystem |
149 | + lsof(["lsof", "/mnt/this", "-b"]) |
150 | + self.mocker.result(lsof_output) |
151 | self.mocker.count(20) |
152 | sleep_mock = self.mocker.replace(sleep) |
153 | sleep_mock(5) |
154 | @@ -281,6 +285,36 @@ |
155 | self.assertTrue( |
156 | util.hookenv.is_logged(message), "Not logged- %s" % message) |
157 | |
158 | + def test_unmount_lsof_command_failure(self): |
159 | + """Log a warning if lsof fails during L{unmount_volume} retries""" |
160 | + util._persist_data("mountpoint", "/mnt/this") |
161 | + |
162 | + ismount = self.mocker.replace(os.path.ismount) |
163 | + ismount("/mnt/this") |
164 | + self.mocker.result(True) |
165 | + umount = self.mocker.replace(subprocess.call) |
166 | + umount(["umount", "/mnt/this"]) |
167 | + self.mocker.result(2) # Failure from the umount shell cmd |
168 | + self.mocker.count(20) |
169 | + lsof = self.mocker.replace(subprocess.check_output) |
170 | + lsof(["lsof", "/mnt/this", "-b"]) |
171 | + self.mocker.count(20) |
172 | + self.mocker.throw( |
173 | + subprocess.CalledProcessError(1, "lsof /mnt/this")) # lsof fail |
174 | + sleep_mock = self.mocker.replace(sleep) |
175 | + sleep_mock(5) |
176 | + self.mocker.count(20) |
177 | + self.mocker.replay() |
178 | + |
179 | + self.assertRaises(SystemExit, util.unmount_volume) |
180 | + messages = [ |
181 | + "ERROR: Unmount failed, leaving relation in errored state", |
182 | + "WARNING: umount /mnt/this failed. Retrying.", |
183 | + "WARNING: Command 'lsof /mnt/this' returned non-zero exit status 1"] |
184 | + for message in messages: |
185 | + self.assertTrue( |
186 | + util.hookenv.is_logged(message), "Not logged- %s" % message) |
187 | + |
188 | def test_unmount_volume_success(self): |
189 | """ |
190 | L{unmount_volume} will unmount a mounted filesystem calling the shell |
191 | @@ -519,8 +553,7 @@ |
192 | has not yet defined by the relation principal yet. |
193 | """ |
194 | self.assertIsNone(util.hookenv.relation_get("mountpoint")) |
195 | - result = self.assertRaises(SystemExit, util.mount_volume) |
196 | - self.assertEqual(result.code, 0) |
197 | + util.mount_volume() |
198 | message = "No mountpoint defined by relation. Cannot mount volume yet." |
199 | self.assertTrue( |
200 | util.hookenv.is_logged(message), "Not logged- %s" % message) |
201 | @@ -533,15 +566,14 @@ |
202 | set juju relation C{mountpoint} in response. |
203 | """ |
204 | util.hookenv._incoming_relation_data = ( |
205 | - (None, "mountpoint", "/mnt/this"),) |
206 | + ("data:1", "mountpoint", "/mnt/this"),) |
207 | |
208 | ismount = self.mocker.replace(os.path.ismount) |
209 | ismount("/mnt/this") |
210 | self.mocker.result(True) |
211 | self.mocker.replay() |
212 | |
213 | - result = self.assertRaises(SystemExit, util.mount_volume) |
214 | - self.assertEqual(result.code, 0) |
215 | + util.mount_volume() |
216 | self.assertIn(("mountpoint", "/mnt/this"), |
217 | util.hookenv._outgoing_relation_data) |
218 | message = "Volume (/mnt/this) already mounted" |
219 | @@ -559,16 +591,16 @@ |
220 | already mounted. |
221 | """ |
222 | util.hookenv._incoming_relation_data = ( |
223 | - (None, "mountpoint", "/mnt/this"),) |
224 | - self.assertEqual(util.hookenv.relation_get("mountpoint"), "/mnt/this") |
225 | + ("data:1", "mountpoint", "/mnt/this"),) |
226 | + self.assertEqual( |
227 | + util.hookenv.relation_get("mountpoint", rid="data:1"), "/mnt/this") |
228 | |
229 | ismount = self.mocker.replace(os.path.ismount) |
230 | ismount("/mnt/this") |
231 | self.mocker.result(True) |
232 | self.mocker.replay() |
233 | |
234 | - result = self.assertRaises(SystemExit, util.mount_volume) |
235 | - self.assertEqual(result.code, 0) |
236 | + util.mount_volume() |
237 | message = "Volume (/mnt/this) already mounted" |
238 | self.assertTrue( |
239 | util.hookenv.is_logged(message), "Not logged- %s" % message) |
240 | @@ -581,7 +613,7 @@ |
241 | C{mount_type}. |
242 | """ |
243 | data = ( |
244 | - (None, "mountpoint", "/mnt/this"), |
245 | + ("data:1", "mountpoint", "/mnt/this"), |
246 | ("nfs:1", "private-address", "me.com"), |
247 | ("nfs:1", "mountpoint", "/nfs/server/path"), |
248 | ("nfs:1", "fstype", "nfs"),) |
249 | @@ -595,10 +627,11 @@ |
250 | partial_data = partial_data + (item,) |
251 | util.hookenv._incoming_relation_data = partial_data |
252 | self.assertEqual( |
253 | - util.hookenv.relation_get("mountpoint"), "/mnt/this") |
254 | - self.assertRaises(SystemExit, util.mount_volume) |
255 | + util.hookenv.relation_get("mountpoint", rid="data:1"), |
256 | + "/mnt/this") |
257 | + util.mount_volume() |
258 | message = ( |
259 | - "ERROR: Missing required relation values. " |
260 | + "Missing required relation values. " |
261 | "nfs-relation-changed hook needs to run first.") |
262 | self.assertTrue( |
263 | util.hookenv.is_logged(message), "Not logged- %s" % message) |
264 | @@ -614,7 +647,7 @@ |
265 | to report the successful initialization to the principal. |
266 | """ |
267 | util.hookenv._incoming_relation_data = ( |
268 | - (None, "mountpoint", "/mnt/this"), |
269 | + ("data:1", "mountpoint", "/mnt/this"), |
270 | ("nfs:1", "private-address", "me.com"), |
271 | ("nfs:1", "options", "someopts"), |
272 | ("nfs:1", "mountpoint", "/nfs/server/path"), |
[0] Test failures... Of the form:
https:/ /pastebin. canonical. com/106789/
File "/usr/lib/ python2. 7/unittest/ case.py" , line 331, in run python2. 7/dist- packages/ mocker. py", line 146, in test_method_wrapper dpb/src/ canonical/ juju/charms/ blockstoragebro ker/bsb- ec2-support/ hooks/test_ util.py" , line 1332, in test_wb_ ec2_describe_ volumes_ with_attac storage. _ec2_describe_ volumes( ), expected) dpb/src/ canonical/ juju/charms/ blockstoragebro ker/bsb- ec2-support/ hooks/util. py", line 250, in _ec2_describe_ volumes .DescribeVolume s()
testMethod()
File "/usr/lib/
result = test_method()
File "/home/
hed_instances
self.
File "/home/
command = describevolumes
[...]
requestbuilder. exceptions. ServiceInitErro r: No ec2 endpoint to connect to was given. ec2 endpoints may be specified in a config file with "ec2-url".
test_util. TestEC2Util. test_wb_ ec2_describe_ volumes_ with_attached_ instances
--
David Britton <email address hidden>