Merge lp:~jameinel/juju-core/read-before-download into lp:~go-bot/juju-core/trunk

Proposed by John A Meinel
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: 2011
Proposed branch: lp:~jameinel/juju-core/read-before-download
Merge into: lp:~go-bot/juju-core/trunk
Diff against target: 62 lines (+26/-2)
3 files modified
worker/upgrader/export_test.go (+8/-0)
worker/upgrader/upgrader.go (+6/-2)
worker/upgrader/upgrader_test.go (+12/-0)
To merge this branch: bzr merge lp:~jameinel/juju-core/read-before-download
Reviewer Review Type Date Requested Status
Juju Engineering Pending
Review via email: mp+193008@code.launchpad.net

Commit message

worker/upgrader: check for tools before getting

When we moved Upgrader behind the API, we lost one of the checks that
it was doing. Namely, when it saw that an upgrade was requested, it
used to check if we already had that version of the tools available.
This restores that check.

I noticed this while doing scale testing. If you have 1000 units all
trying to upgrade at the same time on a machine, you end up consuming
4.5MB*1000 = 4.5GB of temporary Tar data, plus 20MB*1000=40GB of
'unpacking-' files. Which is more than my VMs have available :).

There is still a race condition, if you have all of the Unit agents
notice at the same time, they will still start trying to download
concurrently. The only thing I could really do there would be to add a
filesystem lock (like uniter-hook-execution). I looked into it, but it
isn't very trivial to set up a fs hook correctly (in uniter.go there
is a lot of code around checking the Tomb to see if we are stopping,
if we held the lock before so we can break it now, etc.) I'd rather
not get the fs locking wrong and end up in a situation where we
deadlock trying to upgrade.

https://codereview.appspot.com/19000043/

Description of the change

worker/upgrader: check for tools before getting

When we moved Upgrader behind the API, we lost one of the checks that
it was doing. Namely, when it saw that an upgrade was requested, it
used to check if we already had that version of the tools available.
This restores that check.

I noticed this while doing scale testing. If you have 1000 units all
trying to upgrade at the same time on a machine, you end up consuming
4.5MB*1000 = 4.5GB of temporary Tar data, plus 20MB*1000=40GB of
'unpacking-' files. Which is more than my VMs have available :).

There is still a race condition, if you have all of the Unit agents
notice at the same time, they will still start trying to download
concurrently. The only thing I could really do there would be to add a
filesystem lock (like uniter-hook-execution). I looked into it, but it
isn't very trivial to set up a fs hook correctly (in uniter.go there
is a lot of code around checking the Tomb to see if we are stopping,
if we held the lock before so we can break it now, etc.) I'd rather
not get the fs locking wrong and end up in a situation where we
deadlock trying to upgrade.

https://codereview.appspot.com/19000043/

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (4.5 KiB)

Reviewers: mp+193008_code.launchpad.net,

Message:
Please take a look.

Description:
worker/upgrader: check for tools before getting

When we moved Upgrader behind the API, we lost one of the checks that
it was doing. Namely, when it saw that an upgrade was requested, it
used to check if we already had that version of the tools available.
This restores that check.

I noticed this while doing scale testing. If you have 1000 units all
trying to upgrade at the same time on a machine, you end up consuming
4.5MB*1000 = 4.5GB of temporary Tar data, plus 20MB*1000=40GB of
'unpacking-' files. Which is more than my VMs have available :).

There is still a race condition, if you have all of the Unit agents
notice at the same time, they will still start trying to download
concurrently. The only thing I could really do there would be to add a
filesystem lock (like uniter-hook-execution). I looked into it, but it
isn't very trivial to set up a fs hook correctly (in uniter.go there
is a lot of code around checking the Tomb to see if we are stopping,
if we held the lock before so we can break it now, etc.) I'd rather
not get the fs locking wrong and end up in a situation where we
deadlock trying to upgrade.

https://code.launchpad.net/~jameinel/juju-core/read-before-download/+merge/193008

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/19000043/

Affected files (+28, -2 lines):
   A [revision details]
   M worker/upgrader/export_test.go
   M worker/upgrader/upgrader.go
   M worker/upgrader/upgrader_test.go

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20131026004354-fc2et5acdg4ntbe4
+New revision: <email address hidden>

Index: worker/upgrader/export_test.go
=== modified file 'worker/upgrader/export_test.go'
--- worker/upgrader/export_test.go 2013-07-29 09:06:43 +0000
+++ worker/upgrader/export_test.go 2013-10-29 06:08:08 +0000
@@ -3,4 +3,12 @@

  package upgrader

+import (
+ "launchpad.net/juju-core/tools"
+)
+
  var RetryAfter = &retryAfter
+
+func EnsureTools(u *Upgrader, agentTools *tools.Tools,
disableSSLHostnameVerification bool) error {
+ return u.ensureTools(agentTools, disableSSLHostnameVerification)
+}

Index: worker/upgrader/upgrader.go
=== modified file 'worker/upgrader/upgrader.go'
--- worker/upgrader/upgrader.go 2013-10-17 22:48:19 +0000
+++ worker/upgrader/upgrader.go 2013-10-29 06:05:03 +0000
@@ -152,7 +152,7 @@
     // repeatedly (causing the agent to be stopped), as long
     // as we have got as far as this, we will still be able to
     // upgrade the agent.
- err := u.fetchTools(wantTools, disableSSLHostnameVerification)
+ err := u.ensureTools(wantTools, disableSSLHostnameVerification)
     if err == nil {
      return &UpgradeReadyError{
       OldTools: currentTools,
@@ -167,13 +167,17 @@
   }
  }

-func (u *Upgrader) fetchTools(agentTools *coretools.Tools,
disableSSLHostnameVerification bool) error {
+func (u *Upgrader) ensureTools(agentTools *coretools.Tools,
disableSSLHostnameVerif...

Read more...

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

LGTM. At first I thought this fix is not really beneficial except for a
very narrow edge case (hulk-smashing multiple unit agents on the same
machine), but after a discussion with William, I realized we actually
have this more often than not (i.e. 2-3 agents running on the same
machine - a machine and a unit agent, possibly even a subordinate unit
agent).

https://codereview.appspot.com/19000043/

Revision history for this message
Roger Peppe (rogpeppe) wrote :

LGTM with one small thought

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go
File worker/upgrader/upgrader.go (right):

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go#newcode177
worker/upgrader/upgrader.go:177: if _, err :=
agenttools.ReadTools(u.dataDir, agentTools.Version); err == nil {
This could probably be at the start of this function (the "fetching" log
message might be confusing when we're not actually fetching anything)

https://codereview.appspot.com/19000043/

Revision history for this message
John A Meinel (jameinel) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

On 2013/10/30 13:46:03, dimitern wrote:
> LGTM. At first I thought this fix is not really beneficial except for
a very
> narrow edge case (hulk-smashing multiple unit agents on the same
machine), but
> after a discussion with William, I realized we actually have this more
often
> than not (i.e. 2-3 agents running on the same machine - a machine and
a unit
> agent, possibly even a subordinate unit agent).

Right. We also have a use case where we would stage upgrades by asking
the Machine agent to upgrade first, and then the units, etc. Which would
save some bandwidth downloading the tools multiple times for each
machine. But it doesn't actually help if you always download before
checking if the results are going to be used :).

https://codereview.appspot.com/19000043/

Revision history for this message
John A Meinel (jameinel) wrote :

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go
File worker/upgrader/upgrader.go (right):

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go#newcode177
worker/upgrader/upgrader.go:177: if _, err :=
agenttools.ReadTools(u.dataDir, agentTools.Version); err == nil {
On 2013/10/30 18:03:30, rog wrote:
> This could probably be at the start of this function (the "fetching"
log message
> might be confusing when we're not actually fetching anything)

Done.

https://codereview.appspot.com/19000043/

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'worker/upgrader/export_test.go'
--- worker/upgrader/export_test.go 2013-07-29 09:06:43 +0000
+++ worker/upgrader/export_test.go 2013-10-31 05:25:34 +0000
@@ -3,4 +3,12 @@
33
4package upgrader4package upgrader
55
6import (
7 "launchpad.net/juju-core/tools"
8)
9
6var RetryAfter = &retryAfter10var RetryAfter = &retryAfter
11
12func EnsureTools(u *Upgrader, agentTools *tools.Tools, disableSSLHostnameVerification bool) error {
13 return u.ensureTools(agentTools, disableSSLHostnameVerification)
14}
715
=== modified file 'worker/upgrader/upgrader.go'
--- worker/upgrader/upgrader.go 2013-10-17 22:48:19 +0000
+++ worker/upgrader/upgrader.go 2013-10-31 05:25:34 +0000
@@ -152,7 +152,7 @@
152 // repeatedly (causing the agent to be stopped), as long152 // repeatedly (causing the agent to be stopped), as long
153 // as we have got as far as this, we will still be able to153 // as we have got as far as this, we will still be able to
154 // upgrade the agent.154 // upgrade the agent.
155 err := u.fetchTools(wantTools, disableSSLHostnameVerification)155 err := u.ensureTools(wantTools, disableSSLHostnameVerification)
156 if err == nil {156 if err == nil {
157 return &UpgradeReadyError{157 return &UpgradeReadyError{
158 OldTools: currentTools,158 OldTools: currentTools,
@@ -167,7 +167,11 @@
167 }167 }
168}168}
169169
170func (u *Upgrader) fetchTools(agentTools *coretools.Tools, disableSSLHostnameVerification bool) error {170func (u *Upgrader) ensureTools(agentTools *coretools.Tools, disableSSLHostnameVerification bool) error {
171 if _, err := agenttools.ReadTools(u.dataDir, agentTools.Version); err == nil {
172 // Tools have already been downloaded
173 return nil
174 }
171 client := http.DefaultClient175 client := http.DefaultClient
172 logger.Infof("fetching tools from %q", agentTools.URL)176 logger.Infof("fetching tools from %q", agentTools.URL)
173 if disableSSLHostnameVerification {177 if disableSSLHostnameVerification {
174178
=== modified file 'worker/upgrader/upgrader_test.go'
--- worker/upgrader/upgrader_test.go 2013-10-24 20:58:30 +0000
+++ worker/upgrader/upgrader_test.go 2013-10-31 05:25:34 +0000
@@ -216,3 +216,15 @@
216 c.Assert(err, gc.IsNil)216 c.Assert(err, gc.IsNil)
217 c.Assert(link, gc.Equals, newTools.Version.String())217 c.Assert(link, gc.Equals, newTools.Version.String())
218}218}
219
220func (s *UpgraderSuite) TestEnsureToolsChecksBeforeDownloading(c *gc.C) {
221 stor := s.Conn.Environ.Storage()
222 newTools := envtesting.PrimeTools(c, stor, s.DataDir(), version.MustParseBinary("5.4.3-precise-amd64"))
223 // We've already downloaded the tools, so change the URL to be
224 // something invalid and ensure we don't actually get an error, because
225 // it doesn't actually do an HTTP request
226 u := s.makeUpgrader()
227 newTools.URL = "http://localhost:999999/invalid/path/tools.tgz"
228 err := upgrader.EnsureTools(u, newTools, true)
229 c.Assert(err, gc.IsNil)
230}

Subscribers

People subscribed via source and target branches

to status/vote changes: