Merge into trunk : read-before-download : Code : juju-core

Status:

Merged

Approved by:

John A Meinel on 2013-10-31

Approved revision:

no longer in the source branch.

Merged at revision:

2011

Proposed branch:

lp:~jameinel/juju-core/read-before-download

Merge into:

lp:~go-bot/juju-core/trunk

Diff against target:

62 lines (+26/-2)

3 files modified

worker/upgrader/export_test.go (+8/-0)
worker/upgrader/upgrader.go (+6/-2)
worker/upgrader/upgrader_test.go (+12/-0)

To merge this branch:

bzr merge lp:~jameinel/juju-core/read-before-download

High

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Juju Engineering		2013-10-29	Pending
Review via email: mp+193008@code.launchpad.net

Commit message

worker/upgrader: check for tools before getting

When we moved Upgrader behind the API, we lost one of the checks that
it was doing. Namely, when it saw that an upgrade was requested, it
used to check if we already had that version of the tools available.
This restores that check.

I noticed this while doing scale testing. If you have 1000 units all
trying to upgrade at the same time on a machine, you end up consuming
4.5MB*1000 = 4.5GB of temporary Tar data, plus 20MB*1000=40GB of
'unpacking-' files. Which is more than my VMs have available :).

There is still a race condition, if you have all of the Unit agents
notice at the same time, they will still start trying to download
concurrently. The only thing I could really do there would be to add a
filesystem lock (like uniter-hook-execution). I looked into it, but it
isn't very trivial to set up a fs hook correctly (in uniter.go there
is a lot of code around checking the Tomb to see if we are stopping,
if we held the lock before so we can break it now, etc.) I'd rather
not get the fs locking wrong and end up in a situation where we
deadlock trying to upgrade.

https://codereview.appspot.com/19000043/

Description of the change

worker/upgrader: check for tools before getting

When we moved Upgrader behind the API, we lost one of the checks that
it was doing. Namely, when it saw that an upgrade was requested, it
used to check if we already had that version of the tools available.
This restores that check.

I noticed this while doing scale testing. If you have 1000 units all
trying to upgrade at the same time on a machine, you end up consuming
4.5MB*1000 = 4.5GB of temporary Tar data, plus 20MB*1000=40GB of
'unpacking-' files. Which is more than my VMs have available :).

There is still a race condition, if you have all of the Unit agents
notice at the same time, they will still start trying to download
concurrently. The only thing I could really do there would be to add a
filesystem lock (like uniter-hook-execution). I looked into it, but it
isn't very trivial to set up a fs hook correctly (in uniter.go there
is a lot of code around checking the Tomb to see if we are stopping,
if we held the lock before so we can break it now, etc.) I'd rather
not get the fs locking wrong and end up in a situation where we
deadlock trying to upgrade.

https://codereview.appspot.com/19000043/

Revision history for this message

John A Meinel (jameinel) wrote on 2013-10-29:

#

Download full text (4.5 KiB)

Reviewers: mp+193008_code.launchpad.net,

Message:
Please take a look.

Description:
worker/upgrader: check for tools before getting

When we moved Upgrader behind the API, we lost one of the checks that
it was doing. Namely, when it saw that an upgrade was requested, it
used to check if we already had that version of the tools available.
This restores that check.

I noticed this while doing scale testing. If you have 1000 units all
trying to upgrade at the same time on a machine, you end up consuming
4.5MB*1000 = 4.5GB of temporary Tar data, plus 20MB*1000=40GB of
'unpacking-' files. Which is more than my VMs have available :).

There is still a race condition, if you have all of the Unit agents
notice at the same time, they will still start trying to download
concurrently. The only thing I could really do there would be to add a
filesystem lock (like uniter-hook-execution). I looked into it, but it
isn't very trivial to set up a fs hook correctly (in uniter.go there
is a lot of code around checking the Tomb to see if we are stopping,
if we held the lock before so we can break it now, etc.) I'd rather
not get the fs locking wrong and end up in a situation where we
deadlock trying to upgrade.

https://code.launchpad.net/~jameinel/juju-core/read-before-download/+merge/193008

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/19000043/

Affected files (+28, -2 lines):
   A [revision details]
   M worker/upgrader/export_test.go
   M worker/upgrader/upgrader.go
   M worker/upgrader/upgrader_test.go

Index: [revision details]
=== added file '[revision details]'
--- [revision details] 2012-01-01 00:00:00 +0000
+++ [revision details] 2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20131026004354-fc2et5acdg4ntbe4
+New revision: <email address hidden>

Index: worker/upgrader/export_test.go
=== modified file 'worker/upgrader/export_test.go'
--- worker/upgrader/export_test.go 2013-07-29 09:06:43 +0000
+++ worker/upgrader/export_test.go 2013-10-29 06:08:08 +0000
@@ -3,4 +3,12 @@

package upgrader

+import (
+ "launchpad.net/juju-core/tools"
+)
+
var RetryAfter = &retryAfter
+
+func EnsureTools(u *Upgrader, agentTools *tools.Tools,
disableSSLHostnameVerification bool) error {
+ return u.ensureTools(agentTools, disableSSLHostnameVerification)
+}

Index: worker/upgrader/upgrader.go
=== modified file 'worker/upgrader/upgrader.go'
--- worker/upgrader/upgrader.go 2013-10-17 22:48:19 +0000
+++ worker/upgrader/upgrader.go 2013-10-29 06:05:03 +0000
@@ -152,7 +152,7 @@
     // repeatedly (causing the agent to be stopped), as long
     // as we have got as far as this, we will still be able to
     // upgrade the agent.
- err := u.fetchTools(wantTools, disableSSLHostnameVerification)
+ err := u.ensureTools(wantTools, disableSSLHostnameVerification)
     if err == nil {
      return &UpgradeReadyError{
       OldTools: currentTools,
@@ -167,13 +167,17 @@
   }
  }

-func (u *Upgrader) fetchTools(agentTools *coretools.Tools,
disableSSLHostnameVerification bool) error {
+func (u *Upgrader) ensureTools(agentTools *coretools.Tools,
disableSSLHostnameVerif...

Reviewers: mp+193008_code.launchpad.net,

Message:
Please take a look.

Description:
worker/upgrader: check for tools before getting

When we moved Upgrader behind the API, we lost one of the checks that
it was doing. Namely, when it saw that an upgrade was requested, it
used to check if we already had that version of the tools available.
This restores that check.

I noticed this while doing scale testing. If you have 1000 units all
trying to upgrade at the same time on a machine, you end up consuming
4.5MB*1000 = 4.5GB of temporary Tar data, plus 20MB*1000=40GB of
'unpacking-' files. Which is more than my VMs have available :).

There is still a race condition, if you have all of the Unit agents
notice at the same time, they will still start trying to download
concurrently. The only thing I could really do there would be to add a
filesystem lock (like uniter-hook-execution). I looked into it, but it
isn't very trivial to set up a fs hook correctly (in uniter.go there
is a lot of code around checking the Tomb to see if we are stopping,
if we held the lock before so we can break it now, etc.) I'd rather
not get the fs locking wrong and end up in a situation where we
deadlock trying to upgrade.

https://code.launchpad.net/~jameinel/juju-core/read-before-download/+merge/193008

(do not edit description out of merge proposal)

Please review this at https://codereview.appspot.com/19000043/

Affected files (+28, -2 lines):
   A [revision details]
   M worker/upgrader/export_test.go
   M worker/upgrader/upgrader.go
   M worker/upgrader/upgrader_test.go

Index: [revision details]
=== added file '[revision details]'
--- [revision details]	2012-01-01 00:00:00 +0000
+++ [revision details]	2012-01-01 00:00:00 +0000
@@ -0,0 +1,2 @@
+Old revision: tarmac-20131026004354-fc2et5acdg4ntbe4
+New revision: john@arbash-meinel.com-20131029060808-lacqx34fi7fu20ie

Index: worker/upgrader/export_test.go
=== modified file 'worker/upgrader/export_test.go'
--- worker/upgrader/export_test.go	2013-07-29 09:06:43 +0000
+++ worker/upgrader/export_test.go	2013-10-29 06:08:08 +0000
@@ -3,4 +3,12 @@

package upgrader

+import (
+	"launchpad.net/juju-core/tools"
+)
+
  var RetryAfter = &retryAfter
+
+func EnsureTools(u *Upgrader, agentTools *tools.Tools,  
disableSSLHostnameVerification bool) error {
+	return u.ensureTools(agentTools, disableSSLHostnameVerification)
+}

Index: worker/upgrader/upgrader.go
=== modified file 'worker/upgrader/upgrader.go'
--- worker/upgrader/upgrader.go	2013-10-17 22:48:19 +0000
+++ worker/upgrader/upgrader.go	2013-10-29 06:05:03 +0000
@@ -152,7 +152,7 @@
  			// repeatedly (causing the agent to be stopped), as long
  			// as we have got as far as this, we will still be able to
  			// upgrade the agent.
-			err := u.fetchTools(wantTools, disableSSLHostnameVerification)
+			err := u.ensureTools(wantTools, disableSSLHostnameVerification)
  			if err == nil {
  				return &UpgradeReadyError{
  					OldTools:  currentTools,
@@ -167,13 +167,17 @@
  	}
  }

-func (u *Upgrader) fetchTools(agentTools *coretools.Tools,  
disableSSLHostnameVerification bool) error {
+func (u *Upgrader) ensureTools(agentTools *coretools.Tools,  
disableSSLHostnameVerification bool) error {
  	client := http.DefaultClient
  	logger.Infof("fetching tools from %q", agentTools.URL)
  	if disableSSLHostnameVerification {
  		logger.Infof("hostname SSL verification disabled")
  		client = utils.GetNonValidatingHTTPClient()
  	}
+	if _, err := agenttools.ReadTools(u.dataDir, agentTools.Version); err ==  
nil {
+		// Tools have already been downloaded
+		return nil
+	}
  	resp, err := client.Get(agentTools.URL)
  	if err != nil {
  		return err

Index: worker/upgrader/upgrader_test.go
=== modified file 'worker/upgrader/upgrader_test.go'
--- worker/upgrader/upgrader_test.go	2013-10-16 04:40:57 +0000
+++ worker/upgrader/upgrader_test.go	2013-10-29 06:05:03 +0000
@@ -216,3 +216,15 @@
  	c.Assert(err, gc.IsNil)
  	c.Assert(link, gc.Equals, newTools.Version.String())
  }
+
+func (s *UpgraderSuite) TestEnsureToolsChecksBeforeDownloading(c *gc.C) {
+	stor := s.Conn.Environ.Storage()
+	newTools := envtesting.PrimeTools(c, stor, s.DataDir(),  
version.MustParseBinary("5.4.3-precise-amd64"))
+	// We've already downloaded the tools, so change the URL to be
+	// something invalid and ensure we don't actually get an error, because
+	// it doesn't actually do an HTTP request
+	u := s.makeUpgrader()
+	newTools.URL = "http://localhost:999999/invalid/path/tools.tgz"
+	err := upgrader.EnsureTools(u, newTools, true)
+	c.Assert(err, gc.IsNil)
+}

Revision history for this message

Dimiter Naydenov (dimitern) wrote on 2013-10-30:

#

LGTM. At first I thought this fix is not really beneficial except for a
very narrow edge case (hulk-smashing multiple unit agents on the same
machine), but after a discussion with William, I realized we actually
have this more often than not (i.e. 2-3 agents running on the same
machine - a machine and a unit agent, possibly even a subordinate unit
agent).

https://codereview.appspot.com/19000043/

Revision history for this message

Roger Peppe (rogpeppe) wrote on 2013-10-30:

#

LGTM with one small thought

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go
File worker/upgrader/upgrader.go (right):

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go#newcode177
worker/upgrader/upgrader.go:177: if _, err :=
agenttools.ReadTools(u.dataDir, agentTools.Version); err == nil {
This could probably be at the start of this function (the "fetching" log
message might be confusing when we're not actually fetching anything)

https://codereview.appspot.com/19000043/

Revision history for this message

John A Meinel (jameinel) wrote on 2013-10-31:

#

Please take a look.

https://codereview.appspot.com/19000043/

Revision history for this message

John A Meinel (jameinel) wrote on 2013-10-31:

#

On 2013/10/30 13:46:03, dimitern wrote:
> LGTM. At first I thought this fix is not really beneficial except for
a very
> narrow edge case (hulk-smashing multiple unit agents on the same
machine), but
> after a discussion with William, I realized we actually have this more
often
> than not (i.e. 2-3 agents running on the same machine - a machine and
a unit
> agent, possibly even a subordinate unit agent).

Right. We also have a use case where we would stage upgrades by asking
the Machine agent to upgrade first, and then the units, etc. Which would
save some bandwidth downloading the tools multiple times for each
machine. But it doesn't actually help if you always download before
checking if the results are going to be used :).

https://codereview.appspot.com/19000043/

Revision history for this message

John A Meinel (jameinel) wrote on 2013-10-31:

#

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go
File worker/upgrader/upgrader.go (right):

https://codereview.appspot.com/19000043/diff/1/worker/upgrader/upgrader.go#newcode177
worker/upgrader/upgrader.go:177: if _, err :=
agenttools.ReadTools(u.dataDir, agentTools.Version); err == nil {
On 2013/10/30 18:03:30, rog wrote:
> This could probably be at the start of this function (the "fetching"
log message
> might be confusing when we're not actually fetching anything)

Done.

https://codereview.appspot.com/19000043/

juju-core

Merge lp:~jameinel/juju-core/read-before-download into lp:~go-bot/juju-core/trunk

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'worker/upgrader/export_test.go'
 --- worker/upgrader/export_test.go	2013-07-29 09:06:43 +0000
 +++ worker/upgrader/export_test.go	2013-10-31 05:25:34 +0000
@@ -3,4 +3,12 @@
  package upgrader
++import (
++	"launchpad.net/juju-core/tools"
++)
++
  var RetryAfter = &retryAfter
++
++func EnsureTools(u *Upgrader, agentTools *tools.Tools, disableSSLHostnameVerification bool) error {
++	return u.ensureTools(agentTools, disableSSLHostnameVerification)
++}
 === modified file 'worker/upgrader/upgrader.go'
 --- worker/upgrader/upgrader.go	2013-10-17 22:48:19 +0000
 +++ worker/upgrader/upgrader.go	2013-10-31 05:25:34 +0000
@@ -152,7 +152,7 @@
  			// repeatedly (causing the agent to be stopped), as long
  			// as we have got as far as this, we will still be able to
  			// upgrade the agent.
--			err := u.fetchTools(wantTools, disableSSLHostnameVerification)
++			err := u.ensureTools(wantTools, disableSSLHostnameVerification)
  			if err == nil {
  				return &UpgradeReadyError{
  					OldTools:  currentTools,
@@ -167,7 +167,11 @@
+ 	}
+ }
--func (u *Upgrader) fetchTools(agentTools *coretools.Tools, disableSSLHostnameVerification bool) error {
++func (u *Upgrader) ensureTools(agentTools *coretools.Tools, disableSSLHostnameVerification bool) error {
++	if _, err := agenttools.ReadTools(u.dataDir, agentTools.Version); err == nil {
++		// Tools have already been downloaded
++		return nil
++	}
  	client := http.DefaultClient
  	logger.Infof("fetching tools from %q", agentTools.URL)
  	if disableSSLHostnameVerification {
 === modified file 'worker/upgrader/upgrader_test.go'
 --- worker/upgrader/upgrader_test.go	2013-10-24 20:58:30 +0000
 +++ worker/upgrader/upgrader_test.go	2013-10-31 05:25:34 +0000
@@ -216,3 +216,15 @@
  	c.Assert(err, gc.IsNil)
  	c.Assert(link, gc.Equals, newTools.Version.String())
+ }
++
++func (s *UpgraderSuite) TestEnsureToolsChecksBeforeDownloading(c *gc.C) {
++	stor := s.Conn.Environ.Storage()
++	newTools := envtesting.PrimeTools(c, stor, s.DataDir(), version.MustParseBinary("5.4.3-precise-amd64"))
++	// We've already downloaded the tools, so change the URL to be
++	// something invalid and ensure we don't actually get an error, because
++	// it doesn't actually do an HTTP request
++	u := s.makeUpgrader()
++	newTools.URL = "http://localhost:999999/invalid/path/tools.tgz"
++	err := upgrader.EnsureTools(u, newTools, true)
++	c.Assert(err, gc.IsNil)
++}