Merge lp:~rvb/gwacl/delete-disk-fix into lp:gwacl

Proposed by Raphaël Badin
Status: Merged
Approved by: Raphaël Badin
Approved revision: 94
Merged at revision: 91
Proposed branch: lp:~rvb/gwacl/delete-disk-fix
Merge into: lp:gwacl
Diff against target: 318 lines (+195/-15)
8 files modified
.bzrignore (+1/-0)
deletedisk.go (+64/-0)
deletedisk_test.go (+103/-0)
example/live_example_managementapi.go (+0/-4)
managementapi.go (+8/-0)
managementapi_test.go (+5/-0)
poller.go (+7/-7)
poller_test.go (+7/-4)
To merge this branch: bzr merge lp:~rvb/gwacl/delete-disk-fix
Reviewer Review Type Date Requested Status
Gavin Panella Approve
Review via email: mp+159150@code.launchpad.net

Commit message

Use a poller in ManagementAPI.DeleteDisk() to work around Windows Azure's bug.

Description of the change

This branch adds a poller used to delete a disk in order to work around a Windows Azure bug: it takes an indeterminate time for a disk previously attached to a deleted VM to become "not in use" and thus be available for deletion.

I refactored the poller: isDone() now also takes the error returned by poll(). This was done because I needed it for this branch obviously but this simply makes the poller more generic: isDone() can also decide whether or not the polling should stop using the error returned by poll() (and not only using the 'response' returned by poll()).

I've made all I could to keep this workaround contained. The goal was that, once the bug will be fixed in Windows Azure, only minimal changes will be required to revert to the old implementation. That's why the test TestManagementAPIDeleteDiskPolls is in deletedisk_test.go file. All we will have to do when the Azure bug is fixed is:
- remove the existing DeleteDisk() method and rename _DeleteDisk() into DeleteDisk()
- delete the files deletedisk.go and deletedisk_test.go.
- remove the tweak to deleteDiskInterval in managementapi_test.go (the variable deleteDiskInterval is in deletedisk.go so once that file is removed, the need to remove its usage in managementapi_test.go will become clear (i.e. managementapi_test.go won't compile))

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :

Looks good :)

[1]

+func isInUseError(errString string, diskName string) bool {
+    pattern := fmt.Sprintf("BadRequest - A disk with name %s is currently in use by virtual machine.*", regexp.QuoteMeta(diskName))
+    res, err := regexp.MatchString(pattern, errString)

Do you think it's necessary to check the disk name in the error message?

It might be clearer to check that the response status is 400, and that
the body contains the phrases "A disk with name" and "is currently in
use by virtual machine".

[2]

+    res, err := regexp.MatchString(pattern, errString)
+    if err != nil {
+        panic(fmt.Sprintf("can not compile regular expression: %v", pattern))
+    }

Fwiw, there's regexp.MustCompile for these kinds of situations; saves
having to do the 3-line-dance.

review: Approve
lp:~rvb/gwacl/delete-disk-fix updated
94. By Raphaël Badin

Use regexp.MustCompile.

Revision history for this message
Raphaël Badin (rvb) wrote :

Thanks for the review!

> [1]
>
> +func isInUseError(errString string, diskName string) bool {
> +    pattern := fmt.Sprintf("BadRequest - A disk with name %s is currently in
> use by virtual machine.*", regexp.QuoteMeta(diskName))
> +    res, err := regexp.MatchString(pattern, errString)
>
> Do you think it's necessary to check the disk name in the error message?

I think it's better to keep the disk name in the error message just to be 100% certain that the error is about the disk that we're trying to delete. This might look a bit like "belt and braces" but I'd like to be on the safe side with Azure.

>
> [2]
>
> +    res, err := regexp.MatchString(pattern, errString)
> +    if err != nil {
> +        panic(fmt.Sprintf("can not compile regular expression: %v", pattern))
> +    }
>
> Fwiw, there's regexp.MustCompile for these kinds of situations; saves
> having to do the 3-line-dance.

Good point, done.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file '.bzrignore'
2--- .bzrignore 2013-04-10 10:40:01 +0000
3+++ .bzrignore 2013-04-17 08:28:24 +0000
4@@ -1,2 +1,3 @@
5 ./example/live_example_managementapi
6 ./example/storage_example
7+./tags
8
9=== added file 'deletedisk.go'
10--- deletedisk.go 1970-01-01 00:00:00 +0000
11+++ deletedisk.go 2013-04-17 08:28:24 +0000
12@@ -0,0 +1,64 @@
13+// Copyright 2013 Canonical Ltd. This software is licensed under the
14+// GNU Lesser General Public License version 3 (see the file COPYING).
15+
16+// A poller object used to delete a disk.
17+//
18+// It takes an indeterminate time for a disk previously attached to a
19+// deleted VM to become "not in use" and thus be available for deletion.
20+// When we receive the "disk is still attached" error, we try again every
21+// 10 seconds until it succeeds, with a timeout of 30 minutes).
22+// This bug might be related to:
23+// http://social.msdn.microsoft.com/Forums/en-US/WAVirtualMachinesforWindows/thread/4394c75d-59ff-4634-8212-2ad71bf6fbd5/
24+//
25+// Once this bug is fixed in Windows Azure, this file and the related tests
26+// can safely be removed, and ManagementAPI._DeleteDisk() can replace the
27+// current implementation of ManagementAPI.DeleteDisk() (which uses this
28+// poller).
29+
30+package gwacl
31+
32+import (
33+ "fmt"
34+ "regexp"
35+ "time"
36+)
37+
38+var deleteDiskTimeout = 30 * time.Minute
39+var deleteDiskInterval = 10 * time.Second
40+
41+type diskDeletePoller struct {
42+ api *ManagementAPI
43+ diskName string
44+}
45+
46+var _ poller = &diskDeletePoller{}
47+
48+func (poller diskDeletePoller) poll() (*x509Response, error) {
49+ return nil, poller.api._DeleteDisk(poller.diskName)
50+}
51+
52+// isInUseError returns whether or not the given string is of the "disk in use"
53+// type.
54+// Here is a real-world example of the error in question:
55+// "BadRequest - A disk with name gwacldiske5w7lkj is currently in use
56+// by virtual machine gwaclrolemvo1yab running within hosted service
57+// gwacl623yosxtppsa9577xy5, deployment gwaclmachinewes4n64f. (http
58+// code 400: Bad Request)"
59+func isInUseError(errString string, diskName string) bool {
60+ pattern := fmt.Sprintf("BadRequest - A disk with name %s is currently in use by virtual machine.*", regexp.QuoteMeta(diskName))
61+ reg := regexp.MustCompile(pattern)
62+ return reg.MatchString(errString)
63+}
64+
65+func (poller diskDeletePoller) isDone(response *x509Response, pollerErr error) (bool, error) {
66+ if pollerErr == nil {
67+ return true, nil
68+ }
69+ if isInUseError(pollerErr.Error(), poller.diskName) {
70+ // The error is of the "disk in use" type: continue polling.
71+ return false, nil
72+ }
73+ // The error is *not* of the "disk in use" type: stop polling and return
74+ // the error.
75+ return true, pollerErr
76+}
77
78=== added file 'deletedisk_test.go'
79--- deletedisk_test.go 1970-01-01 00:00:00 +0000
80+++ deletedisk_test.go 2013-04-17 08:28:24 +0000
81@@ -0,0 +1,103 @@
82+// Copyright 2013 Canonical Ltd. This software is licensed under the
83+// GNU Lesser General Public License version 3 (see the file COPYING).
84+
85+package gwacl
86+
87+import (
88+ "fmt"
89+ . "launchpad.net/gocheck"
90+ "net/http"
91+ "time"
92+)
93+
94+type deleteDiskSuite struct{}
95+
96+var _ = Suite(&deleteDiskSuite{})
97+
98+// Real-world error messages and names.
99+const (
100+ diskInUseErrorTemplate = "BadRequest - A disk with name %s is currently in use by virtual machine gwaclrolemvo1yab running within hosted service gwacl623yosxtppsa9577xy5, deployment gwaclmachinewes4n64f. (http code 400: Bad Request)"
101+ diskName = "gwacldiske5w7lkj"
102+ diskDoesNotExistError = "DELETE request failed: ResourceNotFound - The disk with the specified name does not exist. (http code 404: Not Found)"
103+)
104+
105+func (suite *deleteDiskSuite) TestIsInUseError(c *C) {
106+ var testValues = []struct {
107+ errorString string
108+ diskName string
109+ expectedResult bool
110+ }{
111+ {fmt.Sprintf(diskInUseErrorTemplate, diskName), diskName, true},
112+ {fmt.Sprintf(diskInUseErrorTemplate, diskName), "another-disk", false},
113+ {"unknown error", diskName, false},
114+ {diskDoesNotExistError, diskName, false},
115+ }
116+ for _, test := range testValues {
117+ c.Check(isInUseError(test.errorString, test.diskName), Equals, test.expectedResult)
118+ }
119+}
120+
121+func (suite *deleteDiskSuite) TestIsDoneReturnsTrueIfNilError(c *C) {
122+ poller := diskDeletePoller{nil, ""}
123+ randomResponse := x509Response{StatusCode: http.StatusAccepted}
124+ done, err := poller.isDone(&randomResponse, nil)
125+ c.Check(done, Equals, true)
126+ c.Check(err, IsNil)
127+}
128+
129+func (suite *deleteDiskSuite) TestIsDoneReturnsFalseIfDiskInUseError(c *C) {
130+ diskName := "gwacldiske5w7lkj"
131+ diskInUseError := fmt.Errorf(diskInUseErrorTemplate, diskName)
132+ poller := diskDeletePoller{nil, diskName}
133+ done, err := poller.isDone(nil, diskInUseError)
134+ c.Check(done, Equals, false)
135+ c.Check(err, IsNil)
136+}
137+
138+func (suite *deleteDiskSuite) TestIsDoneReturnsTrueIfAnotherError(c *C) {
139+ anotherError := fmt.Errorf("Unknown error")
140+ poller := diskDeletePoller{nil, "disk-name"}
141+ done, err := poller.isDone(nil, anotherError)
142+ c.Check(done, Equals, true)
143+ c.Check(err, Equals, anotherError)
144+}
145+
146+func (suite *deleteDiskSuite) TestPollCallsDeleteDisk(c *C) {
147+ api := makeAPI(c)
148+ recordedRequests := setUpDispatcher("operationID")
149+ diskName := "gwacldiske5w7lkj"
150+ poller := diskDeletePoller{api, diskName}
151+
152+ response, err := poller.poll()
153+
154+ c.Assert(response, IsNil)
155+ c.Assert(err, IsNil)
156+ expectedURL := api.session.composeURL("services/disks/" + diskName)
157+ checkOneRequest(c, recordedRequests, expectedURL, []byte{}, "DELETE")
158+}
159+
160+func (suite *deleteDiskSuite) TestManagementAPIDeleteDiskPolls(c *C) {
161+ firstResponse := DispatcherResponse{
162+ response: &x509Response{},
163+ errorObject: fmt.Errorf(diskInUseErrorTemplate, diskName)}
164+ secondResponse := DispatcherResponse{
165+ response: &x509Response{StatusCode: http.StatusOK},
166+ errorObject: nil}
167+ responses := []DispatcherResponse{firstResponse, secondResponse}
168+ rigPreparedResponseDispatcher(responses)
169+ recordedRequests := make([]*x509Request, 0)
170+ rigRecordingDispatcher(&recordedRequests)
171+
172+ api := makeAPI(c)
173+ diskName := "gwacldiske5w7lkj"
174+ poller := diskDeletePoller{api, diskName}
175+
176+ response, err := performPolling(poller, time.Nanosecond, time.Minute)
177+
178+ c.Assert(response, IsNil)
179+ c.Assert(err, IsNil)
180+ expectedURL := api.session.composeURL("services/disks/" + diskName)
181+ c.Check(len(recordedRequests), Equals, 2)
182+ checkRequest(c, recordedRequests[0], expectedURL, []byte{}, "DELETE")
183+ checkRequest(c, recordedRequests[1], expectedURL, []byte{}, "DELETE")
184+}
185
186=== modified file 'example/live_example_managementapi.go'
187--- example/live_example_managementapi.go 2013-04-02 03:44:55 +0000
188+++ example/live_example_managementapi.go 2013-04-17 08:28:24 +0000
189@@ -147,10 +147,6 @@
190 checkError(err)
191 fmt.Println("Done deleting deployment\n")
192
193- // TODO: it takes some time for a disk previously attached to a deleted VM
194- // to become "not in use" and thus be available for deletion.
195- time.Sleep(3 * time.Minute)
196-
197 fmt.Println("Deleting VM disk...")
198 err = api.DeleteDisk(diskName)
199 checkError(err)
200
201=== modified file 'managementapi.go'
202--- managementapi.go 2013-04-10 15:14:32 +0000
203+++ managementapi.go 2013-04-17 08:28:24 +0000
204@@ -225,6 +225,14 @@
205 }
206
207 func (api *ManagementAPI) DeleteDisk(diskName string) error {
208+ // Use the disk deletion poller to work around a bug in Windows Azure.
209+ // See the documentation in the file deletedisk.go for details.
210+ poller := diskDeletePoller{api, diskName}
211+ _, err := performPolling(poller, deleteDiskInterval, deleteDiskTimeout)
212+ return err
213+}
214+
215+func (api *ManagementAPI) _DeleteDisk(diskName string) error {
216 response, err := api.session.delete("services/disks/" + diskName)
217 if err != nil {
218 return err
219
220=== modified file 'managementapi_test.go'
221--- managementapi_test.go 2013-04-03 11:20:20 +0000
222+++ managementapi_test.go 2013-04-17 08:28:24 +0000
223@@ -507,6 +507,11 @@
224 }
225
226 func (suite *managementAPISuite) TestDeleteDisk(c *C) {
227+ // The current implementation of DeleteDisk() works around a bug in
228+ // Windows Azure by polling the server. See the documentation in the file
229+ // deletedisk.go for details.
230+ // Change the polling interval to speed up the tests:
231+ deleteDiskInterval = time.Nanosecond
232 api := makeAPI(c)
233 recordedRequests := setUpDispatcher("operationID")
234 diskName := "diskName"
235
236=== modified file 'poller.go'
237--- poller.go 2013-04-03 11:20:20 +0000
238+++ poller.go 2013-04-17 08:28:24 +0000
239@@ -14,7 +14,7 @@
240 // the response given by the server means that the polling is finished.
241 type poller interface {
242 poll() (*x509Response, error)
243- isDone(*x509Response) (bool, error)
244+ isDone(*x509Response, error) (bool, error)
245 }
246
247 // performPolling calls the poll() method of the given 'poller' object every
248@@ -29,11 +29,8 @@
249 case <-timeoutChannel:
250 return nil, fmt.Errorf("polling timed out waiting for an asynchronous operation")
251 default:
252- response, err := poller.poll()
253- if err != nil {
254- return nil, err
255- }
256- done, err := poller.isDone(response)
257+ response, pollerErr := poller.poll()
258+ done, err := poller.isDone(response, pollerErr)
259 if err != nil {
260 return nil, err
261 }
262@@ -89,9 +86,12 @@
263 // IsDone returns true if the given response has a status code indicating
264 // success and if the returned XML response corresponds to a valid Operation
265 // with a status indicating that the operation is completed.
266-func (poller operationPoller) isDone(response *x509Response) (bool, error) {
267+func (poller operationPoller) isDone(response *x509Response, pollerError error) (bool, error) {
268 // TODO: Add a timeout so that polling won't continue forever if the
269 // server cannot be reached.
270+ if pollerError != nil {
271+ return true, pollerError
272+ }
273 if response.StatusCode >= 200 && response.StatusCode < 300 {
274 operation := Operation{}
275 err := operation.Deserialize(response.Body)
276
277=== modified file 'poller_test.go'
278--- poller_test.go 2013-04-03 11:20:20 +0000
279+++ poller_test.go 2013-04-17 08:28:24 +0000
280@@ -51,7 +51,10 @@
281 return &testPollerResponse, nil
282 }
283
284-func (poller *testPoller) isDone(response *x509Response) (bool, error) {
285+func (poller *testPoller) isDone(response *x509Response, pollerError error) (bool, error) {
286+ if pollerError != nil {
287+ return true, pollerError
288+ }
289 if poller.notDoneCalls == 0 {
290 return true, nil
291 }
292@@ -128,7 +131,7 @@
293 StatusCode: http.StatusOK,
294 }
295
296- isDone, err := poller.isDone(&response)
297+ isDone, err := poller.isDone(&response, nil)
298 c.Assert(err, IsNil)
299 c.Assert(isDone, Equals, true)
300 }
301@@ -148,7 +151,7 @@
302 {StatusCode: http.StatusInternalServerError},
303 }
304 for _, response := range notDoneResponses {
305- isDone, _ := poller.isDone(&response)
306+ isDone, _ := poller.isDone(&response, nil)
307 c.Assert(isDone, Equals, false)
308 }
309 }
310@@ -160,7 +163,7 @@
311 Body: []byte("><invalid XML"),
312 StatusCode: http.StatusOK,
313 }
314- _, err := poller.isDone(&response)
315+ _, err := poller.isDone(&response, nil)
316 c.Assert(err, NotNil)
317 c.Check(err, FitsTypeOf, new(xml.SyntaxError))
318 }

Subscribers

People subscribed via source and target branches

to all changes: