Merge lp:~julian-edwards/gwacl/removeall into lp:gwacl

Proposed by Julian Edwards
Status: Merged
Approved by: Julian Edwards
Approved revision: 152
Merged at revision: 150
Proposed branch: lp:~julian-edwards/gwacl/removeall
Merge into: lp:gwacl
Diff against target: 157 lines (+120/-1)
3 files modified
Makefile (+6/-1)
storage.go (+25/-0)
storage_test.go (+89/-0)
To merge this branch: bzr merge lp:~julian-edwards/gwacl/removeall
Reviewer Review Type Date Requested Status
Gavin Panella Approve
Review via email: mp+172276@code.launchpad.net

Commit message

New storage function DeleteAllBlobs to delete everything in a container.

Description of the change

I got fed up with looking up the right incantations and elf runes, so I also added a new Makefile target to generate debug builds.

To post a comment you must log in.
Revision history for this message
Raphaël Badin (rvb) wrote :

45 + err := context.DeleteBlob(request.Container, blob.Name)
46 + if err != nil {
47 + return err
48 + }

Don't we want to continue trying to remove the other blobs even if one deletion fail? I mean shouldn't we continue iterating over the blobs and try deleting them all and then, at the end of the loop, return the first error encountered (if any)?

Revision history for this message
Gavin Panella (allenap) wrote :

> Don't we want to continue trying to remove the other blobs even if
> one deletion fail? I mean shouldn't we continue iterating over the
> blobs and try deleting them all and then, at the end of the loop,
> return the first error encountered (if any)?

I think we should keep it simple.

Three reasons:

- Continuing in spite of an error has a bad failure mode. if there
  are, say, 10k blobs and none of them can be deleted - because of a
  lease or an internal server error, for example - a long time would
  be spent doing useless work before giving the user the chance to
  retry or do something else.

- One common reason to call DeleteAllBlobs is so that you can delete
  the container straight afterwards. If even one blob fails to delete
  it's the same as all the blobs not deleting.

- The code is simpler.

Revision history for this message
Raphaël Badin (rvb) wrote :

> - Continuing in spite of an error has a bad failure mode. if there
> are, say, 10k blobs and none of them can be deleted - because of a
> lease or an internal server error, for example - a long time would
> be spent doing useless work before giving the user the chance to
> retry or do something else.

True, that's why I think we should use goroutines to remove the blobs in parallel rather than in sequence (same as what we've done in the MAAS provider in juju-core http://bazaar.launchpad.net/~go-bot/juju-core/trunk/view/head:/environs/maas/storage.go#L1).

> - One common reason to call DeleteAllBlobs is so that you can delete
> the container straight afterwards. If even one blob fails to delete
> it's the same as all the blobs not deleting.

Well, that's definitely not the case when using DeleteAllBlobs() from the provider when destroying an environment (which is DeleteAllBlobs's raison d'être). The goal here is to be able to cleanup an existing container so that it can be re-used to re-deploy an new environment.

>
> - The code is simpler.

That one is undisputed :)

Revision history for this message
Gavin Panella (allenap) wrote :

>> - Continuing in spite of an error has a bad failure mode. if there
>> are, say, 10k blobs and none of them can be deleted - because of a
>> lease or an internal server error, for example - a long time would
>> be spent doing useless work before giving the user the chance to
>> retry or do something else.
>
> True, that's why I think we should use goroutines to remove the
> blobs in parallel rather than in sequence

In the situation above it just means that more useless work will be
done, but concurrently.

What if the server is busy? It's going to get several concurrent
requests for blob deletion, instead of one at a time, and this may
make it overall *slower*. It may also serialise those requests anyway.

Keep it simple until we can prove it's better. Which we don't really
have time for now.

> (same as what we've done in the MAAS provider in juju-core
> http://bazaar.launchpad.net/~go-bot/juju-core/trunk/view/head:/environs/maas/storage.go#L1).

There's a bug in that code: if stor.Remove() panics, no error will be
returned, but deleteAll will complete and return nil.

Istm that Go, by having two competing paradigms for signalling errors,
actually seems to make concurrency harder.

>> - One common reason to call DeleteAllBlobs is so that you can delete
>> the container straight afterwards. If even one blob fails to delete
>> it's the same as all the blobs not deleting.
>
> Well, that's definitely not the case when using DeleteAllBlobs()
> from the provider when destroying an environment (which is
> DeleteAllBlobs's raison d'être). The goal here is to be able to
> cleanup an existing container so that it can be re-used to re-deploy
> an new environment.

It may be DeleteAllBlobs's raison d'être for us, but GWACL is meant to
be a general purpose Azure library in Go too, which is why I said
"*One* common reason".

>>
>> - The code is simpler.
>
> That one is undisputed :)

Excellent :) I think this is really important, especially right now. I
don't want to be chasing concurrency bugs before a deadline.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Monday 01 Jul 2013 08:47:22 you wrote:
> > - Continuing in spite of an error has a bad failure mode. if there
> >
> > are, say, 10k blobs and none of them can be deleted - because of a
> > lease or an internal server error, for example - a long time would
> > be spent doing useless work before giving the user the chance to
> > retry or do something else.

I thought this might come up as I wrote that code. My reasoning is as Gavin's
here - most of the time you want this to be atomic, and the error reporting
for partial failures gets tricky.

> True, that's why I think we should use goroutines to remove the blobs in
> parallel rather than in sequence (same as what we've done in the MAAS
> provider in juju-core
> http://bazaar.launchpad.net/~go-bot/juju-core/trunk/view/head:/environs/maa
> s/storage.go#L1).

Premature optimisation. There is no use case to do that. If and when there
is, we can improve it.

> > - One common reason to call DeleteAllBlobs is so that you can delete
> >
> > the container straight afterwards. If even one blob fails to delete
> > it's the same as all the blobs not deleting.
>
> Well, that's definitely not the case when using DeleteAllBlobs() from the
> provider when destroying an environment (which is DeleteAllBlobs's raison
> d'être). The goal here is to be able to cleanup an existing container so
> that it can be re-used to re-deploy an new environment.

And if you leave undeleted things behind then it's still not cleaned
regardless of how many there are.

> > - The code is simpler.
>
> That one is undisputed :)

And that is the best reason :)

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Monday 01 Jul 2013 09:25:18 you wrote:
> Excellent :) I think this is really important, especially right now. I
> don't want to be chasing concurrency bugs before a deadline.

So if anyone wants to +1 this, that'd be great :)

Revision history for this message
Gavin Panella (allenap) wrote :

Looks good. Some minor points, that's all.

[1]

+debug_check:

Fwiw, make is fine with hyphens, so to safe an extra press of Shift
you could make this debug-check.</bikeshed>

[2]

+// RemoveAllBlobs requests a deletion of all the blobs in a container.
+// Azure only marks the blobs for deletion; they are garbage collected later.
+func (context *StorageContext) DeleteAllBlobs(request *DeleteAllBlobsRequest) error {

Fwiw, is the line about garbage collection relevant here? It sounds
just like a bit of Azure trivia, implementation detail that could go
stale. If it is relevant, could you add a comment explaining why?

[3]

+func (s *testDeleteAllBlobs) getListingResponse() *http.Response {

s/get/make/

[4]

+    c.Check(
+        transport.Exchanges[0].Request.URL.String(),
+        Matches, context.getContainerURL("container")+"?.*")

The ? at the end means something in a regex. How about:

+        Matches, context.getContainerURL("container")+"[?].*")

Fwiw, I prefer to use square brackets instead of \ escapes, because I
can't remember if the escape will be interpreted by the Go compiler or
the regex compiler. Square brackets makes it unambiguous.

[5]

+func (s *testDeleteAllBlobs) TestErrorsListing(c *C) {
...
+    err := context.DeleteAllBlobs(&DeleteAllBlobsRequest{Container: "c"})
+    c.Assert(err, NotNil)

Check that err is the error you expected, so that this tests doesn't
mask future mistakes. Here and in one other place.

review: Approve
Revision history for this message
Raphaël Badin (rvb) wrote :

> > Well, that's definitely not the case when using DeleteAllBlobs()
> > from the provider when destroying an environment (which is
> > DeleteAllBlobs's raison d'être). The goal here is to be able to
> > cleanup an existing container so that it can be re-used to re-deploy
> > an new environment.
>
> It may be DeleteAllBlobs's raison d'être for us, but GWACL is meant to
> be a general purpose Azure library in Go too, which is why I said
> "*One* common reason".

Trying to remove the other blobs if the deletion of one blob fails is not more generic, it's just a different behavior. Turns out it's closer to the behavior we need for the provider and thus it should be preferred IMH. We're not implementing random methods just for the fun of it :).

My main point is this: Azure can be quite unreliable. A container typically contains a bunch of files: the state of the whole environment and then one file (with a name that contains a random part) per deployed node (plus probably a bunch of other things). Destroying an environment without removing the file that contains the state of the env would be catastrophic but if we fail to remove one of the filed used to store the info about one node, it's really not that bad. Hence my suggestion: always try to remove all the files, because we really don't want to leave the "state file" behind.

Revision history for this message
Raphaël Badin (rvb) wrote :

> On Monday 01 Jul 2013 08:47:22 you wrote:
> > > - Continuing in spite of an error has a bad failure mode. if there
> > >
> > > are, say, 10k blobs and none of them can be deleted - because of a
> > > lease or an internal server error, for example - a long time would
> > > be spent doing useless work before giving the user the chance to
> > > retry or do something else.
>
> I thought this might come up as I wrote that code. My reasoning is as Gavin's
> here - most of the time you want this to be atomic, and the error reporting
> for partial failures gets tricky.

I'm still not sure about this, see my other comment.

> > True, that's why I think we should use goroutines to remove the blobs in
> > parallel rather than in sequence (same as what we've done in the MAAS
> > provider in juju-core
> > http://bazaar.launchpad.net/~go-bot/juju-core/trunk/view/head:/environs/maa
> > s/storage.go#L1).
>
> Premature optimisation. There is no use case to do that. If and when there
> is, we can improve it.

Very true, it's premature. I just mentioned it because of Gavin's remark.

Revision history for this message
Gavin Panella (allenap) wrote :

On 1 July 2013 11:41, Raphaël Badin <email address hidden> wrote:
> Trying to remove the other blobs if the deletion of one blob fails
> is not more generic, it's just a different behavior. Turns out it's
> closer to the behavior we need for the provider and thus it should
> be preferred IMH. We're not implementing random methods just for
> the fun of it :).

The method is DeleteAllBlobs, not DeleteMostBlobs ;)

>
> My main point is this: Azure can be quite unreliable. A container
> typically contains a bunch of files: the state of the whole
> environment and then one file (with a name that contains a random
> part) per deployed node (plus probably a bunch of other things).
> Destroying an environment without removing the file that contains
> the state of the env would be catastrophic but if we fail to remove
> one of the filed used to store the info about one node, it's really
> not that bad. Hence my suggestion: always try to remove all the
> files, because we really don't want to leave the "state file"
> behind.

I don't want heuristics in the provider, and especially not GWACL,
unless we absolutely must have them. If we can't delete all the files
that we want to delete, that's an error and we stop there.

We might be happy to be relaxed about success or failure for some
operations at the moment. We're in the midst of understanding Azure.
We're learning about things like the hostname length issue. Remember,
that looked like a reliability issue until you figured it out.

As we progress we will tighten things up and be less forgiving. We may
have to make exceptions, but I'd like those to be well understood and
narrow, rather than just saying that Azure can be unreliable.

If we do need to say things like "file X must go, but files Y and Z
can stay around" then DeleteAllBlobs is not the right thing to use.

A simple contract like "delete all the blobs, stop on the first error"
is not only simpler to code, it's simpler for callers to use.

A contract like "delete all the blobs, continue when there are errors,
but return the first error" or "delete all the blobs, continue when
there are errors, but stop after 5 errors, unless the file is called Y
or Z in which case ignore it" is much more complex to code. It also
adds a cognitive load to the programmer who wants to use it.

In any case, if there's an error from DeleteAllBlobs, what will most
programmers do next? I suspect the answer is: try again, or pass the
error up.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Monday 01 Jul 2013 10:43:00 you wrote:
> > I thought this might come up as I wrote that code. My reasoning is as
> > Gavin's here - most of the time you want this to be atomic, and the error
> > reporting for partial failures gets tricky.
>
> I'm still not sure about this, see my other comment.

(Gah I didn't mean atomic)

As Gavin astutely pointed out, this method is called DeleteAllBlobs not
DeleteMostBlobs :)

If any of them fail to get deleted then it's an error. The way in which they
get deleted (or not) is an implementation detail.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On 01/07/13 20:38, Gavin Panella wrote:
> Review: Approve
>
> Looks good. Some minor points, that's all.
>
>
> [1]
>
> +debug_check:
>
> Fwiw, make is fine with hyphens, so to safe an extra press of Shift
> you could make this debug-check.</bikeshed>

I made it "debug-test" as that's better reflecting what it's really doing.

>
>
> [2]
>
> +// RemoveAllBlobs requests a deletion of all the blobs in a container.
> +// Azure only marks the blobs for deletion; they are garbage collected later.
> +func (context *StorageContext) DeleteAllBlobs(request *DeleteAllBlobsRequest) error {
>
> Fwiw, is the line about garbage collection relevant here? It sounds
> just like a bit of Azure trivia, implementation detail that could go
> stale. If it is relevant, could you add a comment explaining why?

It's pertinent to set user expectations really, the blobs won't be in a
deleted state when the call returns. I'll try to reword the comment to
remove the trivia and state the results of the call.

>
>
> [3]
>
> +func (s *testDeleteAllBlobs) getListingResponse() *http.Response {
>
> s/get/make/

Fixed.

>
>
> [4]
>
> + c.Check(
> + transport.Exchanges[0].Request.URL.String(),
> + Matches, context.getContainerURL("container")+"?.*")
>
> The ? at the end means something in a regex. How about:
>
> + Matches, context.getContainerURL("container")+"[?].*")
>
> Fwiw, I prefer to use square brackets instead of \ escapes, because I
> can't remember if the escape will be interpreted by the Go compiler or
> the regex compiler. Square brackets makes it unambiguous.

I think I copied this from one of your tests :)

BTW we desperately need to fix the name of TestTransport2, it's
proliferating and I cringe every time I use it. I've added a card on
the board.

>
>
> [5]
>
> +func (s *testDeleteAllBlobs) TestErrorsListing(c *C) {
> ...
> + err := context.DeleteAllBlobs(&DeleteAllBlobsRequest{Container: "c"})
> + c.Assert(err, NotNil)
>
> Check that err is the error you expected, so that this tests doesn't
> mask future mistakes. Here and in one other place.
>

Wilco.

Thanks for the review.

lp:~julian-edwards/gwacl/removeall updated
151. By Julian Edwards

merge trunk

152. By Julian Edwards

tweaks and improvements from Gavin's review

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'Makefile'
2--- Makefile 2013-06-21 04:01:58 +0000
3+++ Makefile 2013-07-02 02:31:26 +0000
4@@ -2,6 +2,11 @@
5 check: examples
6 go test -gocheck.v=true ./...
7
8+debug-test:
9+ go test -c -gcflags "-N -l"
10+ gdb gwacl.test
11+ $(RM) gwacl.test
12+
13 all_source := $(shell find . -name '*.go' ! -name '*_test.go')
14
15 example_source := $(wildcard example/*/run.go)
16@@ -22,4 +27,4 @@
17 %: %.go $(all_source)
18 go build -o $@ $<
19
20-.PHONY: check clean format examples
21+.PHONY: check clean format examples debug_check
22
23=== modified file 'storage.go'
24--- storage.go 2013-06-26 13:42:49 +0000
25+++ storage.go 2013-07-02 02:31:26 +0000
26@@ -83,6 +83,31 @@
27 return batch, nil
28 }
29
30+type DeleteAllBlobsRequest struct {
31+ Container string
32+ // Other params possible, add later.
33+}
34+
35+// RemoveAllBlobs requests a deletion of all the blobs in a container.
36+// The blobs are not deleted immediately, so when this call returns they
37+// may still be present for a while.
38+func (context *StorageContext) DeleteAllBlobs(request *DeleteAllBlobsRequest) error {
39+ blobs, err := context.ListAllBlobs(&ListBlobsRequest{
40+ Container: request.Container})
41+ if err != nil {
42+ return err
43+ }
44+
45+ for _, blob := range blobs.Blobs {
46+ err := context.DeleteBlob(request.Container, blob.Name)
47+ if err != nil {
48+ return err
49+ }
50+ }
51+
52+ return nil
53+}
54+
55 // ListAllContainers requests from the storage service a list of containers
56 // in the storage account.
57 func (context *StorageContext) ListAllContainers() (*ContainerEnumerationResults, error) {
58
59=== modified file 'storage_test.go'
60--- storage_test.go 2013-06-27 11:53:50 +0000
61+++ storage_test.go 2013-07-02 02:31:26 +0000
62@@ -344,6 +344,95 @@
63 c.Check(containers.Containers[1].Name, Equals, lastContainer)
64 }
65
66+type testDeleteAllBlobs struct{}
67+
68+var _ = Suite(&testDeleteAllBlobs{})
69+
70+func (s *testDeleteAllBlobs) makeListingResponse() *http.Response {
71+ return &http.Response{
72+ Status: fmt.Sprintf("%d", http.StatusOK),
73+ StatusCode: http.StatusOK,
74+ Body: makeResponseBody(`
75+ <?xml version="1.0" encoding="utf-8"?>
76+ <EnumerationResults ContainerName="http://myaccount.blob.core.windows.net/mycontainer">
77+ <Prefix>prefix</Prefix>
78+ <Marker>marker</Marker>
79+ <MaxResults>maxresults</MaxResults>
80+ <Delimiter>delimiter</Delimiter>
81+ <Blobs>
82+ <Blob>
83+ <Name>blob-name</Name>
84+ </Blob>
85+ <Blob>
86+ <Name>blob-name2</Name>
87+ </Blob>
88+ </Blobs>
89+ <NextMarker />
90+ </EnumerationResults>`),
91+ }
92+}
93+
94+func (s *testDeleteAllBlobs) TestHappyPath(c *C) {
95+ listResponse := s.makeListingResponse()
96+ deleteResponse := &http.Response{
97+ Status: fmt.Sprintf("%d", http.StatusAccepted),
98+ StatusCode: http.StatusAccepted,
99+ }
100+
101+ transport := &TestTransport2{}
102+ transport.AddExchange(listResponse, nil)
103+ transport.AddExchange(deleteResponse, nil)
104+ transport.AddExchange(deleteResponse, nil)
105+ context := makeStorageContext(transport)
106+
107+ err := context.DeleteAllBlobs(&DeleteAllBlobsRequest{Container: "container"})
108+ c.Assert(err, IsNil)
109+ c.Assert(transport.ExchangeCount, Equals, 3)
110+
111+ // Check the ListAllBlobs exchange.
112+ c.Check(
113+ transport.Exchanges[0].Request.URL.String(),
114+ Matches, context.getContainerURL("container")+"[?].*")
115+ c.Check(transport.Exchanges[0].Request.URL.Query(),
116+ DeepEquals, url.Values{
117+ "restype": {"container"},
118+ "comp": {"list"},
119+ })
120+
121+ // Check the DeleteBlob exchanges.
122+ c.Check(
123+ transport.Exchanges[1].Request.URL.String(),
124+ Equals, context.GetFileURL("container", "blob-name"))
125+ c.Check(transport.Exchanges[1].Request.Method, Equals, "DELETE")
126+
127+ c.Check(
128+ transport.Exchanges[2].Request.URL.String(),
129+ Equals, context.GetFileURL("container", "blob-name2"))
130+ c.Check(transport.Exchanges[2].Request.Method, Equals, "DELETE")
131+}
132+
133+func (s *testDeleteAllBlobs) TestErrorsListing(c *C) {
134+ transport := &TestTransport2{}
135+ transport.AddExchange(&http.Response{
136+ Status: fmt.Sprintf("%d", http.StatusNotFound),
137+ StatusCode: http.StatusNotFound}, nil)
138+ context := makeStorageContext(transport)
139+ err := context.DeleteAllBlobs(&DeleteAllBlobsRequest{Container: "c"})
140+ c.Assert(err, ErrorMatches, `request for blobs list failed: Azure request failed \(404: Not Found\)`)
141+}
142+
143+func (s *testDeleteAllBlobs) TestErrorsDeleting(c *C) {
144+ transport := &TestTransport2{}
145+ listResponse := s.makeListingResponse()
146+ transport.AddExchange(listResponse, nil)
147+ transport.AddExchange(&http.Response{
148+ Status: fmt.Sprintf("%d", http.StatusBadRequest),
149+ StatusCode: http.StatusBadRequest}, nil)
150+ context := makeStorageContext(transport)
151+ err := context.DeleteAllBlobs(&DeleteAllBlobsRequest{Container: "c"})
152+ c.Assert(err, ErrorMatches, `failed to delete blob blob-name: Azure request failed \(400: Bad Request\)`)
153+}
154+
155 type testCreateInstanceDataVHD struct{}
156
157 var _ = Suite(&testCreateInstanceDataVHD{})

Subscribers

People subscribed via source and target branches

to all changes: