Merge lp:~jameinel/juju-core/read-before-download into lp:~go-bot/juju-core/trunk

Proposed by John A Meinel
Status: Merged
Approved by: John A Meinel
Approved revision: no longer in the source branch.
Merged at revision: 2011
Proposed branch: lp:~jameinel/juju-core/read-before-download
Merge into: lp:~go-bot/juju-core/trunk
Diff against target: 62 lines (+26/-2)
3 files modified
worker/upgrader/export_test.go (+8/-0)
worker/upgrader/upgrader.go (+6/-2)
worker/upgrader/upgrader_test.go (+12/-0)
To merge this branch: bzr merge lp:~jameinel/juju-core/read-before-download
Reviewer Review Type Date Requested Status
Juju Engineering Pending
Review via email: mp+193008@code.launchpad.net

Commit message

worker/upgrader: check for tools before getting

When we moved Upgrader behind the API, we lost one of the checks that
it was doing. Namely, when it saw that an upgrade was requested, it
used to check if we already had that version of the tools available.
This restores that check.

I noticed this while doing scale testing. If you have 1000 units all
trying to upgrade at the same time on a machine, you end up consuming
4.5MB*1000 = 4.5GB of temporary Tar data, plus 20MB*1000=40GB of
'unpacking-' files. Which is more than my VMs have available :).

There is still a race condition, if you have all of the Unit agents
notice at the same time, they will still start trying to download
concurrently. The only thing I could really do there would be to add a
filesystem lock (like uniter-hook-execution). I looked into it, but it
isn't very trivial to set up a fs hook correctly (in uniter.go there
is a lot of code around checking the Tomb to see if we are stopping,
if we held the lock before so we can break it now, etc.) I'd rather
not get the fs locking wrong and end up in a situation where we
deadlock trying to upgrade.

https://codereview.appspot.com/19000043/

Description of the change

worker/upgrader: check for tools before getting

When we moved Upgrader behind the API, we lost one of the checks that
it was doing. Namely, when it saw that an upgrade was requested, it
used to check if we already had that version of the tools available.
This restores that check.

I noticed this while doing scale testing. If you have 1000 units all
trying to upgrade at the same time on a machine, you end up consuming
4.5MB*1000 = 4.5GB of temporary Tar data, plus 20MB*1000=40GB of
'unpacking-' files. Which is more than my VMs have available :).

There is still a race condition, if you have all of the Unit agents
notice at the same time, they will still start trying to download
concurrently. The only thing I could really do there would be to add a
filesystem lock (like uniter-hook-execution). I looked into it, but it
isn't very trivial to set up a fs hook correctly (in uniter.go there
is a lot of code around checking the Tomb to see if we are stopping,
if we held the lock before so we can break it now, etc.) I'd rather
not get the fs locking wrong and end up in a situation where we
deadlock trying to upgrade.

https://codereview.appspot.com/19000043/

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (4.5 KiB)

Reviewers: mp+193008_code.launchpad.net,

Message:
Please take a look.

Description:
worker/upgrader: check for tools before getting

When we moved Upgrader behind the API, we lost one of the checks that
it was doing. Namely, when it saw that an upgrade was requested, it
used to check if we already had that version of the tools available.
This restores that check.

I noticed this while doing scale testing. If you have 1000 units all
trying to upgrade at the same time on a machine, you end up consuming
4.5MB*1000 = 4.5GB of temporary Tar data, plus 20MB*1000=40GB of
'unpacking-' files. Which is more than my VMs have available :).

There is still a race condition, if you have all of the Unit agents
notice at the same time, they will still start trying to download
concurrently. The only thing I could really do there would be to add a
filesystem lock (like uniter-hook-execution). I looked into it, but it
isn't very trivial to set up a fs hook correctly (in uniter.go there
is a lot of code around checking the Tomb to see if we are stopping,
if we held the lock before so we can break it now, etc.) I'd rather
not get the fs locking wrong and end up in a situation where we
deadlock trying to upgrade.

https://code.launchpad.net/~jameinel/juju-core/read-before-download/+merge/193008

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/19000043/

Affected files (+28, -2 lines):
   A [revision details]
   M worker/upgrader/export_test.go
   M worker/upgrader/upgrader.go
   M worker/upgrader/upgrader_test.go

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20131026004354-fc2et5acdg4ntbe4
+New revision: <email address hidden>

Index: worker/upgrader/export_test.go
=== modified file 'worker/upgrader/export_test.go'
--- worker/upgrader/export_test.go 2013-07-29 09:06:43 +0000
+++ worker/upgrader/export_test.go 2013-10-29 06:08:08 +0000
@@ -3,4 +3,12 @@

  package upgrader

+import (
+ "launchpad.net/juju-core/tools"
+)
+
  var RetryAfter = &retryAfter
+
+func EnsureTools(u *Upgrader, agentTools *tools.Tools,
disableSSLHostnameVerification bool) error {
+ return u.ensureTools(agentTools, disableSSLHostnameVerification)
+}

Index: worker/upgrader/upgrader.go
=== modified file 'worker/upgrader/upgrader.go'
--- worker/upgrader/upgrader.go 2013-10-17 22:48:19 +0000
+++ worker/upgrader/upgrader.go 2013-10-29 06:05:03 +0000
@@ -152,7 +152,7 @@
     // repeatedly (causing the agent to be stopped), as long
     // as we have got as far as this, we will still be able to
     // upgrade the agent.
- err := u.fetchTools(wantTools, disableSSLHostnameVerification)
+ err := u.ensureTools(wantTools, disableSSLHostnameVerification)
     if err == nil {
      return &UpgradeReadyError{
       OldTools: currentTools,
@@ -167,13 +167,17 @@
   }
  }

-func (u *Upgrader) fetchTools(agentTools *coretools.Tools,
disableSSLHostnameVerification bool) error {
+func (u *Upgrader) ensureTools(agentTools *coretools.Tools,
disableSSLHostnameVerif...

Read more...

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

LGTM. At first I thought this fix is not really beneficial except for a
very narrow edge case (hulk-smashing multiple unit agents on the same
machine), but after a discussion with William, I realized we actually
have this more often than not (i.e. 2-3 agents running on the same
machine - a machine and a unit agent, possibly even a subordinate unit
agent).

https://codereview.appspot.com/19000043/

Revision history for this message
Roger Peppe (rogpeppe) wrote :

LGTM with one small thought

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go
File worker/upgrader/upgrader.go (right):

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go#newcode177
worker/upgrader/upgrader.go:177: if _, err :=
agenttools.ReadTools(u.dataDir, agentTools.Version); err == nil {
This could probably be at the start of this function (the "fetching" log
message might be confusing when we're not actually fetching anything)

https://codereview.appspot.com/19000043/

Revision history for this message
John A Meinel (jameinel) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

On 2013/10/30 13:46:03, dimitern wrote:
> LGTM. At first I thought this fix is not really beneficial except for
a very
> narrow edge case (hulk-smashing multiple unit agents on the same
machine), but
> after a discussion with William, I realized we actually have this more
often
> than not (i.e. 2-3 agents running on the same machine - a machine and
a unit
> agent, possibly even a subordinate unit agent).

Right. We also have a use case where we would stage upgrades by asking
the Machine agent to upgrade first, and then the units, etc. Which would
save some bandwidth downloading the tools multiple times for each
machine. But it doesn't actually help if you always download before
checking if the results are going to be used :).

https://codereview.appspot.com/19000043/

Revision history for this message
John A Meinel (jameinel) wrote :

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go
File worker/upgrader/upgrader.go (right):

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go#newcode177
worker/upgrader/upgrader.go:177: if _, err :=
agenttools.ReadTools(u.dataDir, agentTools.Version); err == nil {
On 2013/10/30 18:03:30, rog wrote:
> This could probably be at the start of this function (the "fetching"
log message
> might be confusing when we're not actually fetching anything)

Done.

https://codereview.appspot.com/19000043/

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'worker/upgrader/export_test.go'
2--- worker/upgrader/export_test.go 2013-07-29 09:06:43 +0000
3+++ worker/upgrader/export_test.go 2013-10-31 05:25:34 +0000
4@@ -3,4 +3,12 @@
5
6 package upgrader
7
8+import (
9+ "launchpad.net/juju-core/tools"
10+)
11+
12 var RetryAfter = &retryAfter
13+
14+func EnsureTools(u *Upgrader, agentTools *tools.Tools, disableSSLHostnameVerification bool) error {
15+ return u.ensureTools(agentTools, disableSSLHostnameVerification)
16+}
17
18=== modified file 'worker/upgrader/upgrader.go'
19--- worker/upgrader/upgrader.go 2013-10-17 22:48:19 +0000
20+++ worker/upgrader/upgrader.go 2013-10-31 05:25:34 +0000
21@@ -152,7 +152,7 @@
22 // repeatedly (causing the agent to be stopped), as long
23 // as we have got as far as this, we will still be able to
24 // upgrade the agent.
25- err := u.fetchTools(wantTools, disableSSLHostnameVerification)
26+ err := u.ensureTools(wantTools, disableSSLHostnameVerification)
27 if err == nil {
28 return &UpgradeReadyError{
29 OldTools: currentTools,
30@@ -167,7 +167,11 @@
31 }
32 }
33
34-func (u *Upgrader) fetchTools(agentTools *coretools.Tools, disableSSLHostnameVerification bool) error {
35+func (u *Upgrader) ensureTools(agentTools *coretools.Tools, disableSSLHostnameVerification bool) error {
36+ if _, err := agenttools.ReadTools(u.dataDir, agentTools.Version); err == nil {
37+ // Tools have already been downloaded
38+ return nil
39+ }
40 client := http.DefaultClient
41 logger.Infof("fetching tools from %q", agentTools.URL)
42 if disableSSLHostnameVerification {
43
44=== modified file 'worker/upgrader/upgrader_test.go'
45--- worker/upgrader/upgrader_test.go 2013-10-24 20:58:30 +0000
46+++ worker/upgrader/upgrader_test.go 2013-10-31 05:25:34 +0000
47@@ -216,3 +216,15 @@
48 c.Assert(err, gc.IsNil)
49 c.Assert(link, gc.Equals, newTools.Version.String())
50 }
51+
52+func (s *UpgraderSuite) TestEnsureToolsChecksBeforeDownloading(c *gc.C) {
53+ stor := s.Conn.Environ.Storage()
54+ newTools := envtesting.PrimeTools(c, stor, s.DataDir(), version.MustParseBinary("5.4.3-precise-amd64"))
55+ // We've already downloaded the tools, so change the URL to be
56+ // something invalid and ensure we don't actually get an error, because
57+ // it doesn't actually do an HTTP request
58+ u := s.makeUpgrader()
59+ newTools.URL = "http://localhost:999999/invalid/path/tools.tgz"
60+ err := upgrader.EnsureTools(u, newTools, true)
61+ c.Assert(err, gc.IsNil)
62+}

Subscribers

People subscribed via source and target branches

to status/vote changes: