Code review comment for lp:~thumper/juju-core/provisioner-reget-state-info

Revision history for this message
William Reade (fwereade) wrote :

Re what we do without a valid environment... hmm. Thinking aloud:

* we certainly start off with an invalid environ config, and only get a
valid one once whoever started the environment has connected for the
first time.

* today, on first startup, there will be one machine present; but it'll
be marked as provisioned already, so it won't cause env access directly;
and if we're tolerant of list-instances failures, we'll be able to
complete the first processMachines without difficulty. But I don't think
we can necessarily depend on that forever...

* in theory, it is impossible to set an invalid environ config in state,
except via the special mechanism at bootstrap time.

* in practice, an environ config can most certainly become invalid --
you can (1) replace the config with *any* valid config, which might ofc
be invalid in this context (changed environ type, for example, will
screw us most comprehensively) and (2) config update is racy anyway,
such that two users making valid changes concurrently can end up with an
environment config that reflects neither of their desires... and I bet
there's a way for that to cause invalidity. (oh, and, on inspection,
(3): it looks like values can't be dependably unset anyway. we probably
need that too. grar.)

* BUT the consequences of an invalid environ affect many more things
than just the provisioner.

* AND SO it would be folly to spend time and effort hardening the
provisioner against catastrophic environ failure when we could be fixing
the root cause.

* AND THUS you should (1) wait for a valid config at startup before
processing machines and (2) proceed otherwise as though the environment
configuration were always guaranteed consistent and valid from a juju
perspective and (3) use the time thereby saved to fix SetEnvironConfig.

* BUT ALSO remember that a valid *environ config* does not necessarily
contain valid *provider credentials*. So issues can always arise at
runtime anyway, and should I think be dealt with as follows:

   * StartInstance: record on machine via SetInstanceError.
   * AllInstances: log and ignore -- I don't think there are negative
consequences (out-of-date
     instance list is always possible anyway).
   * StopInstances: log and ignore, they'll be tried again next time we
processMachines. (We should
     add a periodic processMachines call so that we do at least handle
them in something resembling
     a timely fashion. Not too hard, I think?)

https://codereview.appspot.com/9824047/

« Back to merge proposal